<a href="https://colab.research.google.com/github/peremartra/Large-Language-Model-Notebooks-Course/blob/main/3-LangChain/langchain_retrieval_agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


<h1><a href="https://github.com/peremartra/Large-Language-Model-Notebooks-Course">LLM Hands On Course</a></h1>
    <h3>Understand And Apply Large Language Models</h3>
    <p>by <b>Pere Martra</b></p>
    <h2>Create a Medical Assistant RAG System chat with LangChain & ChromaDB</h2>
</div>


<div>
    &nbsp;
    <a target="_blank" href="https://www.linkedin.com/in/pere-martra/"><img src="https://img.shields.io/badge/style--5eba00.svg?label=LinkedIn&logo=linkedin&style=social"></a>
    
</div>


#Installing libraries & Loading Dataset

In [1]:
!pip install -q openai
!pip install -q langchain
!pip install -q langchain-openai
!pip install -q tiktoken==0.4.0
!pip install -q datasets==2.12.0
!pip install -q chromadb==0.4.22

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.[0m[31m
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
langchain-openai 0.0.2 requires tiktoken<0.6.0,>=0.5.2, but you have tiktoken 0.4.0 which is incompatible.[0m[31m
[0m

We will download the dataset from the Hugging Face datasets library. It's a dataset with information about diseases.

In [2]:
from datasets import load_dataset

data = load_dataset("keivalya/MedQuad-MedicalQnADataset", split='train')


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [3]:
data = data.to_pandas()
data.head()

Unnamed: 0,qtype,Question,Answer
0,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,LCMV infections can occur after exposure to fr...
1,symptoms,What are the symptoms of Lymphocytic Choriomen...,LCMV is most commonly recognized as causing ne...
2,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...
3,exams and tests,How to diagnose Lymphocytic Choriomeningitis (...,"During the first phase of the disease, the mos..."
4,treatment,What are the treatments for Lymphocytic Chorio...,"Aseptic meningitis, encephalitis, or meningoen..."


Load the langchain libraries to load the document.

In [4]:
from langchain.document_loaders import DataFrameLoader
from langchain.vectorstores import Chroma

The Document is in the Answer column, and the others columsn are Metadata.

In [5]:
df_loader = DataFrameLoader(data, page_content_column="Answer")


In [6]:
df_document = df_loader.load()
display(df_document[:2])

[Document(page_content='LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents.  Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, or the mouth, or presumably, via the bite of an infected rodent. Person-to-person transmission has not been reported, with the exception of vertical transmission from infected mother to fetus, and rarely, through organ transplantation.', metadata={'qtype': 'susceptibility', 'Question': 'Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?'}),
 Document(page_content='LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms or mild febrile illnesses are more common clinical manifestations. \n                \nFor infected persons who do become ill, onset of symptoms usually occurs 8-13 days after exposure to the virus as part of a biphasic febrile illness. This initial 

We can chunk the documents. The size to which we want to split the document is a design decision. The larger it is, the larger the prompt will be, and the slower the Model's response process.

We also need to consider the maximum prompt size and ensure that the document does not exceed it.

In [7]:
from langchain.text_splitter import CharacterTextSplitter

In [8]:
text_splitter = CharacterTextSplitter(chunk_size=1250, chunk_overlap=100)
texts = text_splitter.split_documents(df_document)




These warnings we see are because it can't perform the partition of the required size. This is because it waits for a page break to divide the text and does so when possible.

In [9]:
first_doc = texts[1]
print(first_doc.page_content)

LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms or mild febrile illnesses are more common clinical manifestations. 
                
For infected persons who do become ill, onset of symptoms usually occurs 8-13 days after exposure to the virus as part of a biphasic febrile illness. This initial phase, which may last as long as a week, typically begins with any or all of the following symptoms: fever, malaise, lack of appetite, muscle aches, headache, nausea, and vomiting. Other symptoms appearing less frequently include sore throat, cough, joint pain, chest pain, testicular pain, and parotid (salivary gland) pain. 
                
Following a few days of recovery, a second phase of illness may occur. Symptoms may consist of meningitis (fever, headache, stiff neck, etc.), encephalitis (drowsiness, confusion, sensory disturbances, and/or motor abnormalities, such as paralysis), or meningoencephalitis (inflammation 

### Initialize the Embedding Model and Vector DB

We load the text-embedding-ada-002 model from OpenAI.

In [10]:
from getpass import getpass
from langchain_openai import OpenAIEmbeddings

OPENAI_API_KEY = getpass("OpenAI API Key: ")
model_name = 'text-embedding-ada-002'

embed = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=OPENAI_API_KEY
)

OpenAI API Key: ··········


The execution of this cell may take 3 to 5 minutes. If you want it to be faster, you can reduce the number of records in the dataset.

In [11]:
directory_cdb = '/content/drive/MyDrive/chromadb'
chroma_db = Chroma.from_documents(
    df_document, embed, persist_directory=directory_cdb
)

We are going to create three objects.

* The language model, which can be any of those from OpenAI, the most common being gpt-3.5.
* The memory, responsible for keeping the prompt with all the necessary history.
* The retrieval, used to obtain information stored in ChromaDB.

In [25]:
from langchain.chat_models import ChatOpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA

llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model_name='gpt-3.5-turbo',
    temperature=0.0
)

conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=4, #Number of messages srtored in memory
    return_messages=True #Must return the messages in the response.
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=chroma_db.as_retriever()
)

We can try the isolated Retrieval to see if the information it returns is relevant.




In [13]:
qa.run("What is the main symptom of LCM?")

  warn_deprecated(


'The main symptom of Lethal Congenital Contracture Syndrome 1 (LCM) is skeletal muscle atrophy, which is seen in approximately 90% of patients with this condition.'

Perfect! The information returned is exactly what we desired.

## Creating the Agent.

In [26]:
from langchain.agents import Tool

#Defining the list of tool objects to be used by LangChain.
tools = [
    Tool(
        name='Medical KB',
        func=qa.run,
        description=(
            'use this tool when answering medical knowledge queries to get '
            'more information about the topic'
        )
    )
]

In [15]:
from langchain.agents import initialize_agent

agent = initialize_agent(
    agent='chat-conversational-react-description',
    tools=tools,
    llm=llm,
    verbose=True,
    max_iterations=3,
    early_stopping_method='generate',
    memory=conversational_memory
)

  warn_deprecated(


### Using the Conversational Agent

To make queries we simply call the `agent` directly.

First i will try a order not related to the Medical field.

In [16]:
agent("Give me the area of square of 2x2")

  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "The area of a square with side length 2 is 4 square units."
}[0m

[1m> Finished chain.[0m


{'input': 'Give me the area of square of 2x2',
 'chat_history': [],
 'output': 'The area of a square with side length 2 is 4 square units.'}

Perfect, the model has responded without accessing the configured knowledge database.

Now I will try with a question that is also not related to health.

In [17]:
agent("Do you know the secret identity of SuperMan?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "Superman's secret identity is Clark Kent."
}[0m

[1m> Finished chain.[0m


{'input': 'Do you know the secret identity of SuperMan?',
 'chat_history': [HumanMessage(content='Give me the area of square of 2x2'),
  AIMessage(content='The area of a square with side length 2 is 4 square units.')],
 'output': "Superman's secret identity is Clark Kent."}

It has not accessed either, as the model has been able to identify that it is not a question related to the database that LangChain provides.

Now it's time to try with a question related to Medicine. Let's see if the model can understand that it should first look for information in the vector database at its disposal.

In [27]:
agent("It seems I have a bit of fever, dry cough, and mucus, what could it be?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Medical KB",
    "action_input": "fever, dry cough, mucus"
}[0m
Observation: [36;1m[1;3mFever, dry cough, and mucus can be symptoms of various conditions, including respiratory infections like the common cold or flu. In these cases, the fever is usually accompanied by other symptoms such as sore throat, body aches, and fatigue. The dry cough may be persistent and can be accompanied by clear or yellow/green mucus. It's important to note that if you are coughing up blood, you should contact your doctor immediately. If you are experiencing these symptoms, it is recommended to consult with a healthcare professional for an accurate diagnosis and appropriate treatment.[0m
Thought:[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "Fever, dry cough, and mucus can be symptoms of various conditions, including respiratory infections like the common cold or flu. It is recommended to consult with a heal

{'input': 'It seems I have a bit of fever, dry cough, and mucus, what could it be?',
 'chat_history': [HumanMessage(content='Can it be prevented?'),
  AIMessage(content='To prevent Lymphocytic choriomeningitis (LCM), it is important to take precautions such as avoiding contact with rodents, practicing good hygiene, using insect repellent, wearing protective clothing, and practicing safe handling of rodents.'),
  HumanMessage(content='It seems I have a bit of fever, dry cough, and mucus, what could it be?'),
  AIMessage(content="Fever, dry cough, and mucus can be symptoms of various conditions, including respiratory infections like the common cold or flu. In these cases, the fever is usually accompanied by other symptoms such as sore throat, body aches, and fatigue. The dry cough may be persistent and can be accompanied by clear or yellow/green mucus. It's important to note that if you are coughing up blood, you should contact your doctor immediately. If you are experiencing these sympt

Perfect, the most important thing for us is that it has been able to identify that it should go to the medical database to search for information about the symptoms.

In [28]:
agent("Is this a serious illness?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m{
    "action": "Medical KB",
    "action_input": "Is fever, dry cough, and mucus a sign of a serious illness?"
}[0m
Observation: [36;1m[1;3mFever, dry cough, and mucus can be symptoms of a serious illness, but they can also be symptoms of more common conditions like a cold or flu. It's important to consider other factors such as the severity and duration of the symptoms, as well as any other accompanying symptoms, to determine the seriousness of the illness. If you are concerned about your symptoms, it is best to consult with a healthcare professional for an accurate diagnosis and appropriate treatment.[0m
Thought:[32;1m[1;3m{
    "action": "Final Answer",
    "action_input": "Fever, dry cough, and mucus can be symptoms of a serious illness, but they can also be symptoms of more common conditions like a cold or flu. It's important to consider other factors such as the severity and duration of the symptoms, as well as a

{'input': 'Is this a serious illness?',
 'chat_history': [HumanMessage(content='It seems I have a bit of fever, dry cough, and mucus, what could it be?'),
  AIMessage(content="Fever, dry cough, and mucus can be symptoms of various conditions, including respiratory infections like the common cold or flu. In these cases, the fever is usually accompanied by other symptoms such as sore throat, body aches, and fatigue. The dry cough may be persistent and can be accompanied by clear or yellow/green mucus. It's important to note that if you are coughing up blood, you should contact your doctor immediately. If you are experiencing these symptoms, it is recommended to consult with a healthcare professional for an accurate diagnosis and appropriate treatment."),
  HumanMessage(content='The paciet is pregnat, is this a risk for the fetus?'),
  AIMessage(content='Pregnancy can carry certain risks for the fetus, depending on various factors. It is important for pregnant individuals to receive prope

And the memory works perfectly. We can maintain a conversation, taking into account that the model knows the previous questions and answers.

# Conclusions.
El experimento ha sido un pequeño Exito. Se ha configurado la base de datos de Vectores se ha llenado con información del Dataset. Se ha creado un agente de Langchain, y este ha sido capaz de ir a buscar la información a la base de datos tan solo cuando ha sido necesario. Sin olvidar que nuestro ChatBot tiene memoria.

Todo esto en unas pocas lineas de código!


---