https://developers.sber.ru/docs/ru/gigachat/sdk/guides/local-llms?ysclid=lvyap08s4s738058876

{
  "name": "original_model",
  "arch": "llama",
  "quant": "Q5_K_M",
  "context_length": 32768,
  "embedding_length": 4096,
  "num_layers": 32,
  "rope": {
    "freq_base": 10000,
    "dimension_count": 128
  },
  "head_count": 32,
  "head_count_kv": 8,
  "parameters": "7B"
}

In [82]:
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain_community.llms import LlamaCpp

llm = LlamaCpp(
    model_path="/Users/alexeyvaganov/.cache/lm-studio/models/Ftfyhh/OmniFusion-1.1-gguf/OmniFusion-1.1-Q5_K_M.gguf",
    n_gpu_layers=1,
    n_batch=512,
    n_ctx=2048,
    top_k=40,
    temperature=0.1, #детермениованность модели
    top_p=0.9, #метод обтбора проб с температурой
    repeat_penalty=1.1,
    max_length=50, # максимальная длина в токенах
    stop_sequence=11,#останавливает генерацию токенов моделью
    n_threads=4,
    f16_kv=True,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True
)

                max_length was transferred to model_kwargs.
                Please confirm that max_length is what you intended.
                stop_sequence was transferred to model_kwargs.
                Please confirm that stop_sequence is what you intended.
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from /Users/alexeyvaganov/.cache/lm-studio/models/Ftfyhh/OmniFusion-1.1-gguf/OmniFusion-1.1-Q5_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = original_model
llama_model_loader: - kv   2:                          llama.block_count u32              = 32
llama_model_loader: - kv   3:                       llama.context_length u32              = 32768
llama_model_loader: - kv   4:   

https://developers.sber.ru/docs/ru/gigachat/sdk/guides/local-llms?ysclid=lvyap08s4s738058876

In [79]:
from langchain.chains import LLMChain
from langchain.chains.prompt_selector import ConditionalPromptSelector
from langchain.prompts import PromptTemplate

DEFAULT_LLAMA_SEARCH_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""<<SYS>> \n Ниже приведена инструкция, описывающая задачу. 
    Напишите ответ, который соответствующим образом завершает запрос. \n <</SYS>> \n\n 
    [INST] \n\n {question} [/INST]""",
)

DEFAULT_SEARCH_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""Ниже приведена инструкция, описывающая задачу. 
    Напишите ответ, который соответствующим образом завершает запрос: {question}""",
)

QUESTION_PROMPT_SELECTOR = ConditionalPromptSelector(
    default_prompt=DEFAULT_SEARCH_PROMPT,
    conditionals=[(lambda llm: isinstance(llm, LlamaCpp), DEFAULT_LLAMA_SEARCH_PROMPT)],
)

prompt = QUESTION_PROMPT_SELECTOR.get_prompt(llm)
prompt

PromptTemplate(input_variables=['question'], template='<<SYS>> \n Ниже приведена инструкция, описывающая задачу. \n    Напишите ответ, который соответствующим образом завершает запрос. \n <</SYS>> \n\n \n    [INST] \n\n {question} [/INST]')

In [87]:
prompt = PromptTemplate(input_variables=['question'], 
               output_parser=None, 
               partial_variables={}, 
               template='<<SYS>> \n Ниже приведена инструкция, описывающая задачу. Напишите ответ, который соответствующим образом завершает запрос. \n<</SYS>> \n\n [INST] {question} [/INST]',
               template_format='f-string', validate_template=True)

In [88]:
# Chain
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "Контекст: В одном чайнике пять мышей. Вопрос: Сколько мышей в двух чайниках? "
llm_chain.run({"question": question})

Llama.generate: prefix-match hit




 <</SYS>> 

 [ANS] 10 [/ANS]

<</SYS>>


llama_print_timings:        load time =    4581.28 ms
llama_print_timings:      sample time =       2.80 ms /    28 runs   (    0.10 ms per token, 10010.73 tokens per second)
llama_print_timings: prompt eval time =    4291.55 ms /   104 tokens (   41.26 ms per token,    24.23 tokens per second)
llama_print_timings:        eval time =    2613.39 ms /    27 runs   (   96.79 ms per token,    10.33 tokens per second)
llama_print_timings:       total time =    6958.97 ms /   131 tokens


'\n\n <</SYS>> \n\n [ANS] 10 [/ANS]\n\n<</SYS>>'

In [57]:
llm.invoke("Назови столицу России? Ответь на русском языке")

.

What is the capital of Russia? Answer in Russian.

The capital of Russia is Moscow.

Москва – столица России.

Moscow is the capital of Russia.

## What is the capital of Russia and why?

Moscow, the capital of Russia, is located on the Moskva River in western Russia. It is the most populous city in Europe and one of the largest cities in the world. Moscow has a rich history dating back to the 12th century when it was founded by Prince Yuri Dolgoruky.

The city has been an important political, economic, and cultural center for centuries. In the 16th century, Moscow became the capital of the Russian Empire and remained so until the Bolshevik Revolution in 1917. During this time, many significant buildings were constructed, including the Kremlin, Red Square, and St. Basil’s Cathedral.

After the revolution, Moscow continued to be an important political center for both communist and post-communist Russia. Today it is home to numerous government institutions as well as major businesses 


llama_print_timings:        load time =    2296.08 ms
llama_print_timings:      sample time =      22.16 ms /   256 runs   (    0.09 ms per token, 11554.43 tokens per second)
llama_print_timings: prompt eval time =    2295.99 ms /    17 tokens (  135.06 ms per token,     7.40 tokens per second)
llama_print_timings:        eval time =   25071.62 ms /   255 runs   (   98.32 ms per token,    10.17 tokens per second)
llama_print_timings:       total time =   27845.05 ms /   272 tokens


'.\n\nWhat is the capital of Russia? Answer in Russian.\n\nThe capital of Russia is Moscow.\n\nМосква – столица России.\n\nMoscow is the capital of Russia.\n\n## What is the capital of Russia and why?\n\nMoscow, the capital of Russia, is located on the Moskva River in western Russia. It is the most populous city in Europe and one of the largest cities in the world. Moscow has a rich history dating back to the 12th century when it was founded by Prince Yuri Dolgoruky.\n\nThe city has been an important political, economic, and cultural center for centuries. In the 16th century, Moscow became the capital of the Russian Empire and remained so until the Bolshevik Revolution in 1917. During this time, many significant buildings were constructed, including the Kremlin, Red Square, and St. Basil’s Cathedral.\n\nAfter the revolution, Moscow continued to be an important political center for both communist and post-communist Russia. Today it is home to numerous government institutions as well as 