# Домашка 5. LLM и RAG

В этой домашке вам предстоит сделать чат-бота-врача, используя технику RAG (Retrieval-Augmented-Generation) и фреймворки huggingface и LangChain

In [1]:
# !pip3 install datasets langchain_community langchain_chroma langchain langchain_core tiktoken sentence-transformers==2.2.2 lark InstructorEmbedding bitsandbytes accelerate >> /dev/null

### Задание 1. Чтение и индексация данных (2.5 балла)

In [3]:
# import pandas as pd

# f = pd.read_csv('/Users/free001style/medal_train.csv')
# f[['TEXT', 'LABEL']].to_csv('medal1.csv')

In [36]:
from langchain_community.document_loaders.csv_loader import CSVLoader


FILE_PATH = '/Users/free001style/Library/Mobile Documents/com~apple~CloudDocs/HSE/HSE_/GANs/LLM/medal.csv'
docs = CSVLoader(FILE_PATH).load()

In [37]:
from langchain.embeddings import HuggingFaceInstructEmbeddings

emb_model = HuggingFaceInstructEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2",
    model_kwargs={'device': 'cpu'},
    encode_kwargs={'normalize_embeddings': False}
)

load INSTRUCTOR_Transformer
max_seq_length  512


In [38]:
from langchain.vectorstores import Chroma

persist_directory = 'DB'
vectordb = Chroma.from_documents(documents=docs, embedding=emb_model, persist_directory=persist_directory)
vectordb.persist()

In [39]:
from tqdm.auto import tqdm

N_DOCS=2000

for i, doc in tqdm(enumerate(docs), total=N_DOCS):
	vectordb.add_documents([doc])

vectordb.persist()

100%|██████████| 2000/2000 [03:55<00:00,  8.50it/s]


In [40]:
!ls -lht DB

total 953744
-rw-r--r--@ 1 free001style  staff   462M May  6 02:09 chroma.sqlite3
drwxr-xr-x  7 free001style  staff   224B May  6 02:08 [34m53fdc3f6-c0a6-4d95-87b5-ebd2b5a6c11c[m[m


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [53]:
from langchain_core.callbacks import CallbackManager, StreamingStdOutCallbackHandler
from langchain_community.llms import LlamaCpp

llm = LlamaCpp(
    model_path="/Users/free001style/Library/Mobile Documents/com~apple~CloudDocs/HSE/HSE_/GANs/LLM/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    n_gpu_layers=-1,
    n_batch=512,
    n_ctx=4096,
    f16_kv=True,
    callback_manager=CallbackManager([StreamingStdOutCallbackHandler()]),
    verbose=True,
)

retriever = vectordb.as_retriever()

llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /Users/free001style/Library/Mobile Documents/com~apple~CloudDocs/HSE/HSE_/GANs/LLM/mistral-7b-instruct-v0.2.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama


llama_model_loader: - kv   1:                               general.name str              = mistralai_mistral-7b-instruct-v0.2
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 1000000

In [47]:
retriever.invoke("hepatitis g virus") # проверим, что поиск по индексу работает

[Document(page_content=': 5\nTEXT: gb virus c gbvc or HGV hgv is transmitted by the parenteral route but the importance of sexual transmission needs to be ascertained gbvchgv infections were investigated using rna and eantibody PCD methods in subjects INF by the HIV-1 hiv divided into groups of individuals each according to their main risk factor for hiv infection blood product recipients group intravenous drug users group homosexuals group or heterosexual SE group the overall prevalence of gbvchgv infection was no significant difference was observed in gbvc hgv prevalence among the four CG and in CG and respectively hepatitis c virus hcv antibodies used as a control for parenteral exposure were found in and of the subjects in groups and versus only and of the subjects in groups and respectively p similarly coinfections with gbvchgv and hcv were significantly associated with the parenteral route p these data emphasized the usefulness of combining the detection of rna and the e antibody

### Задание 2. Prompt Engineering. Создание Prompt Template (3 балла)

In [57]:
from langchain.prompts import PromptTemplate


SYSTEM_PROMPT = """You are a doctor. You know everything about Medical Abbreviation Disambiguation."""


USE_HISTORY = False
if USE_HISTORY:
    instruction = """
    Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
    """
    prompt_template = """
    {system}

    {instruction}

    CONTEXT:
    {context}

    CHAT HISTORY:
    {chat_history}

    Question: 
    {question}

    Answer:"""
    prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question", "chat_history"], partial_variables={'system': SYSTEM_PROMPT, 'instruction': instruction})
else:
    instruction = """
    Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
    """
    prompt_template = """
    {system}

    {instruction}

    CONTEXT:
    {context}

    Question: 
    {question}

    Answer:"""
    prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"], partial_variables={'system': SYSTEM_PROMPT, 'instruction': instruction})

### Задание 3. Создание Chain (Langchain pipeline) (2 балла)

In [58]:
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers.string import StrOutputParser

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()} | # feature engineering (retrieval augmentation)
    prompt | # препроцессинг (prompt engineering)
    llm | # модель
    StrOutputParser() # постпроцессинг
)

ggml_metal_free: deallocating


In [59]:
chain.invoke('How to treat pneumonia?')


    
    The treatment for pneumonia depends on the cause, severity, and underlying conditions of the patient. In general, the mainstay of therapy is antibiotics to eradicate or suppress the infection. Other supportive measures may include oxygen therapy, fluid resuscitation, nutritional support, mechanical ventilation, and intensive care unit management. The specific antibiotic regimen and duration of therapy depend on the suspected causative pathogens and their antimicrobial susceptibility patterns. Commonly used antibiotics for community-acquired pneumonia include macrolides such as azithromycin or clarithromycin, fluoroquinolones such as levofloxacin or moxifloxacin, and beta-lactam antibiotics such as amoxicillin or ceftriaxone. The choice of antibiotic regimen and duration of therapy depend on the specific clinical scenario, local resistance patterns, and available guidelines. In addition to antibiotic therapy, other supportive measures may be necessary to optimize the overall m


llama_print_timings:        load time =    7886.99 ms
llama_print_timings:      sample time =      25.25 ms /   247 runs   (    0.10 ms per token,  9782.57 tokens per second)
llama_print_timings: prompt eval time =   13094.26 ms /  1706 tokens (    7.68 ms per token,   130.29 tokens per second)
llama_print_timings:        eval time =   12102.67 ms /   246 runs   (   49.20 ms per token,    20.33 tokens per second)
llama_print_timings:       total time =   25853.45 ms /  1952 tokens


'\n    \n    The treatment for pneumonia depends on the cause, severity, and underlying conditions of the patient. In general, the mainstay of therapy is antibiotics to eradicate or suppress the infection. Other supportive measures may include oxygen therapy, fluid resuscitation, nutritional support, mechanical ventilation, and intensive care unit management. The specific antibiotic regimen and duration of therapy depend on the suspected causative pathogens and their antimicrobial susceptibility patterns. Commonly used antibiotics for community-acquired pneumonia include macrolides such as azithromycin or clarithromycin, fluoroquinolones such as levofloxacin or moxifloxacin, and beta-lactam antibiotics such as amoxicillin or ceftriaxone. The choice of antibiotic regimen and duration of therapy depend on the specific clinical scenario, local resistance patterns, and available guidelines. In addition to antibiotic therapy, other supportive measures may be necessary to optimize the overal

In [55]:
chain.invoke('Tell in details what is ceftobiprole bpr?')

Llama.generate: prefix-match hit


 Ceftobiprole BPR, also known as Zemdri, is a cephalosporin antibiotic that is used to treat various types of bacterial infections. It is specifically indicated for the treatment of acute bacterial skin and skin structure infections caused by susceptible strains of gram-positive and gram-negative bacteria.
    
    Ceftobiprole BPR works by inhibiting the synthesis of bacterial peptidoglycan, which is a critical component of the bacterial cell wall. By disrupting the bacterial cell wall, ceftobiprole BPR is able to kill the susceptible bacterial strains and help to clear the infection.
    
    Ceftobiprole BPR is usually administered intravenously over a period of several hours, depending on the severity of the infection and other individual patient factors. In some cases, ceftobiprole BPR may be administered via other routes, such as intramuscular injection or topical application, depending on the specific clinical situation and the preferences of the prescribing healthcare provider.


llama_print_timings:        load time =    9108.77 ms
llama_print_timings:      sample time =      27.79 ms /   256 runs   (    0.11 ms per token,  9213.60 tokens per second)
llama_print_timings: prompt eval time =    7871.92 ms /   668 tokens (   11.78 ms per token,    84.86 tokens per second)
llama_print_timings:        eval time =   11430.21 ms /   255 runs   (   44.82 ms per token,    22.31 tokens per second)
llama_print_timings:       total time =   19908.46 ms /   923 tokens


' Ceftobiprole BPR, also known as Zemdri, is a cephalosporin antibiotic that is used to treat various types of bacterial infections. It is specifically indicated for the treatment of acute bacterial skin and skin structure infections caused by susceptible strains of gram-positive and gram-negative bacteria.\n    \n    Ceftobiprole BPR works by inhibiting the synthesis of bacterial peptidoglycan, which is a critical component of the bacterial cell wall. By disrupting the bacterial cell wall, ceftobiprole BPR is able to kill the susceptible bacterial strains and help to clear the infection.\n    \n    Ceftobiprole BPR is usually administered intravenously over a period of several hours, depending on the severity of the infection and other individual patient factors. In some cases, ceftobiprole BPR may be administered via other routes, such as intramuscular injection or topical application, depending on the specific clinical situation and the preferences of the prescribing healthcare prov

### Бонус (2 балла). Добавьте в пайплайн историю переписки с ботом
Подсказка: langchain.memory.ConversationBufferMemory

In [74]:
instruction = """
    Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
    """
prompt_template = """
    {system}

    {instruction}

    CONTEXT:
    {context}

    CHAT HISTORY:
    {chat_history}

    Question: 
    {question}

    Answer:"""
prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question", "chat_history"], partial_variables={'system': SYSTEM_PROMPT, 'instruction': instruction})

In [76]:
from langchain.memory import ConversationBufferMemory
from langchain.chains.conversational_retrieval.base import ConversationalRetrievalChain

memory = ConversationBufferMemory(memory_key="chat_history", input_key="question", output_key='answer', return_messages=True)
chain = ConversationalRetrievalChain.from_llm(llm=llm, chain_type="stuff", retriever=retriever, verbose = True, combine_docs_chain_kwargs={'prompt': prompt}, memory = memory)

In [77]:
chain.invoke('How to treat pneumonia?')



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
    You are a doctor. You know everything about Medical Abbreviation Disambiguation.

    
    Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
    

    CONTEXT:
    : 3456
TEXT: communityacquired pneumonia cap is a L1 cause of morbidity and mortality in elderly patients therefore efforts to optimize the healthcare process for patients with cap are warranted an organized RPA to management is likely to improve clinical results assessing the severity of cap is crucial to predicting outcome deciding the site of care and selecting AGA empirical therapy unfortunately current prognostic scoring systems for cap such as curb confusion uraemia respiratory rate low blood pressure and years of age or the PSI have not been validated specifically in ol

Llama.generate: prefix-match hit


 
    Pneumonia is an infection that affects the lungs. The treatment for pneumonia depends on the cause and severity of the infection.

    In general, the treatment for pneumonia involves rest, fluids, and antibiotics. Antibiotics are used to kill the bacteria causing the infection.

    Depending on the severity of the infection, the patient may need to be hospitalized for intravenous antibiotics and close monitoring of their condition.

    In some cases, the cause of pneumonia may be viral rather than bacterial. In these cases, antibiotics are not effective and supportive care such as rest, fluids, and oxygen therapy may be necessary to help the patient's body fight the virus.

    Overall, the treatment for pneumonia depends on the cause and severity of the infection. Antibiotics may be necessary to treat bacterial infections, while supportive care may be necessary to help the body fight viral infections. Close monitoring of the patient's condition is essential to ensure that the


llama_print_timings:        load time =    7886.99 ms
llama_print_timings:      sample time =      26.63 ms /   240 runs   (    0.11 ms per token,  9012.39 tokens per second)
llama_print_timings: prompt eval time =   11976.07 ms /  1636 tokens (    7.32 ms per token,   136.61 tokens per second)
llama_print_timings:        eval time =   11580.39 ms /   239 runs   (   48.45 ms per token,    20.64 tokens per second)
llama_print_timings:       total time =   24481.11 ms /  1875 tokens



[1m> Finished chain.[0m

[1m> Finished chain.[0m


{'question': 'How to treat pneumonia?',
 'chat_history': [HumanMessage(content='How to treat pneumonia?'),
  AIMessage(content=" \n    Pneumonia is an infection that affects the lungs. The treatment for pneumonia depends on the cause and severity of the infection.\n\n    In general, the treatment for pneumonia involves rest, fluids, and antibiotics. Antibiotics are used to kill the bacteria causing the infection.\n\n    Depending on the severity of the infection, the patient may need to be hospitalized for intravenous antibiotics and close monitoring of their condition.\n\n    In some cases, the cause of pneumonia may be viral rather than bacterial. In these cases, antibiotics are not effective and supportive care such as rest, fluids, and oxygen therapy may be necessary to help the patient's body fight the virus.\n\n    Overall, the treatment for pneumonia depends on the cause and severity of the infection. Antibiotics may be necessary to treat bacterial infections, while supportive c

In [78]:
chain.invoke("What do you mean saying Depending on the severity of the infection?")



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mGiven the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:

Human: How to treat pneumonia?
Assistant:  
    Pneumonia is an infection that affects the lungs. The treatment for pneumonia depends on the cause and severity of the infection.

    In general, the treatment for pneumonia involves rest, fluids, and antibiotics. Antibiotics are used to kill the bacteria causing the infection.

    Depending on the severity of the infection, the patient may need to be hospitalized for intravenous antibiotics and close monitoring of their condition.

    In some cases, the cause of pneumonia may be viral rather than bacterial. In these cases, antibiotics are not effective and supportive care such as rest, fluids, and oxygen therapy may be necessary to help the patient's body fight the virus.

    Overall, the treatm

Llama.generate: prefix-match hit


 What factors determine the severity of pneumonia?


llama_print_timings:        load time =    7886.99 ms
llama_print_timings:      sample time =       2.17 ms /    12 runs   (    0.18 ms per token,  5524.86 tokens per second)
llama_print_timings: prompt eval time =    6750.75 ms /   313 tokens (   21.57 ms per token,    46.37 tokens per second)
llama_print_timings:        eval time =     494.03 ms /    11 runs   (   44.91 ms per token,    22.27 tokens per second)
llama_print_timings:       total time =    7306.76 ms /   324 tokens



[1m> Finished chain.[0m


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
    You are a doctor. You know everything about Medical Abbreviation Disambiguation.

    
    Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
    

    CONTEXT:
    : 1952
TEXT: an epidemic outbreak of acute respiratory infection patients in an organized group of young people was observed in decembermay pneumococcal etiology was established by means of indirect immunofluorescence reaction in cases of outpatient pneumonia AB and acute respiratory diseases respiratory chlamydiosis caused by CT pneumoniae was detected in enzyme immunoassay with the use of immunocomb CT bivalent igg in patients with pneumonia acute bronchitis and acute respiratory diseases synergic relationship between pneumococcal and chlamydial in

Llama.generate: prefix-match hit


 
     The severity of pneumonia depends on several factors, including the cause of the infection, the patient's age and underlying health conditions, and the extent of lung involvement. Bacterial infections, such as those caused by Streptococcus pneumoniae or Haemophilus influenzae, tend to be more severe than viral infections. Additionally, older adults and those with chronic health conditions, such as heart disease, diabetes, or chronic obstructive pulmonary disease (COPD), are more susceptible to developing severe cases of pneumonia. Finally, the extent of lung involvement, which can vary from mild to life-threatening, also plays a role in determining the severity of pneumonia. Close monitoring of patients with pneumonia is essential to ensure that they receive appropriate and effective treatment for their condition.


llama_print_timings:        load time =    7886.99 ms
llama_print_timings:      sample time =      21.30 ms /   182 runs   (    0.12 ms per token,  8544.60 tokens per second)
llama_print_timings: prompt eval time =    5156.74 ms /  1198 tokens (    4.30 ms per token,   232.32 tokens per second)
llama_print_timings:        eval time =    8699.42 ms /   181 runs   (   48.06 ms per token,    20.81 tokens per second)
llama_print_timings:       total time =   14310.82 ms /  1379 tokens



[1m> Finished chain.[0m

[1m> Finished chain.[0m


{'question': 'What do you mean saying Depending on the severity of the infection?',
 'chat_history': [HumanMessage(content='How to treat pneumonia?'),
  AIMessage(content=" \n    Pneumonia is an infection that affects the lungs. The treatment for pneumonia depends on the cause and severity of the infection.\n\n    In general, the treatment for pneumonia involves rest, fluids, and antibiotics. Antibiotics are used to kill the bacteria causing the infection.\n\n    Depending on the severity of the infection, the patient may need to be hospitalized for intravenous antibiotics and close monitoring of their condition.\n\n    In some cases, the cause of pneumonia may be viral rather than bacterial. In these cases, antibiotics are not effective and supportive care such as rest, fluids, and oxygen therapy may be necessary to help the patient's body fight the virus.\n\n    Overall, the treatment for pneumonia depends on the cause and severity of the infection. Antibiotics may be necessary to tr