<a href="https://colab.research.google.com/github/mrhamedani/LLM-Agents/blob/main/9_Medical_Assistant_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Create a Medical Assistant RAG System chat with LangChain & ChromaDB


In [None]:
!pip install -q langchain==0.1.4
!pip install -q langchain-openai==0.0.5
!pip install -q langchainhub==0.1.14
!pip install -q datasets==2.16.1
!pip install -q chromadb==0.4.22

In [None]:
from datasets import load_dataset
data = load_dataset("keivalya/MedQuad-MedicalQnADataset", split='train')  #We will download the dataset from the Hugging Face datasets library. It's a dataset with information about diseases.

In [3]:
data = data.to_pandas()
data

Unnamed: 0,qtype,Question,Answer
0,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,LCMV infections can occur after exposure to fr...
1,symptoms,What are the symptoms of Lymphocytic Choriomen...,LCMV is most commonly recognized as causing ne...
2,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...
3,exams and tests,How to diagnose Lymphocytic Choriomeningitis (...,"During the first phase of the disease, the mos..."
4,treatment,What are the treatments for Lymphocytic Chorio...,"Aseptic meningitis, encephalitis, or meningoen..."
...,...,...,...
16402,symptoms,What are the symptoms of Familial visceral myo...,What are the signs and symptoms of Familial vi...
16403,information,What is (are) Pseudopelade of Brocq ?,Pseudopelade of Brocq (PBB) is a slowly progre...
16404,symptoms,What are the symptoms of Pseudopelade of Brocq ?,What are the signs and symptoms of Pseudopelad...
16405,treatment,What are the treatments for Pseudopelade of Br...,Is there treatment or a cure for pseudopelade ...


In [5]:
data = data[0:100] #uncoment this line if you want to limit the size of the data.

In [6]:
from langchain.document_loaders import DataFrameLoader
from langchain.vectorstores import Chroma

In [7]:
df_loader = DataFrameLoader(data, page_content_column="Answer")  #The Document is in the Answer column, and the others columns are Metadata.

In [8]:
df_document = df_loader.load()
display(df_document[:2])

[Document(page_content='LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents.  Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, or the mouth, or presumably, via the bite of an infected rodent. Person-to-person transmission has not been reported, with the exception of vertical transmission from infected mother to fetus, and rarely, through organ transplantation.', metadata={'qtype': 'susceptibility', 'Question': 'Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?'}),
 Document(page_content='LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms or mild febrile illnesses are more common clinical manifestations. \n                \nFor infected persons who do become ill, onset of symptoms usually occurs 8-13 days after exposure to the virus as part of a biphasic febrile illness. This initial 

We can chunk the documents. The size to which we want to split the document is a design decision. The larger it is, the larger the prompt will be, and the slower the Model's response process.

We also need to consider the maximum prompt size and ensure that the document does not exceed it.

In [9]:
from langchain.text_splitter import CharacterTextSplitter

In [10]:
text_splitter = CharacterTextSplitter(chunk_size=1250,separator="\n",chunk_overlap=100)
texts = text_splitter.split_documents(df_document)

These warnings we see are because it can't perform the partition of the required size. This is because it waits for a page break to divide the text and does so when possible.

In [11]:
first_doc = texts[1]
print(first_doc.page_content)

LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms or mild febrile illnesses are more common clinical manifestations. 
                
For infected persons who do become ill, onset of symptoms usually occurs 8-13 days after exposure to the virus as part of a biphasic febrile illness. This initial phase, which may last as long as a week, typically begins with any or all of the following symptoms: fever, malaise, lack of appetite, muscle aches, headache, nausea, and vomiting. Other symptoms appearing less frequently include sore throat, cough, joint pain, chest pain, testicular pain, and parotid (salivary gland) pain.


### Initialize the Embedding Model and Vector DB

In [16]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma

In [19]:
model_name = "sentence-transformers/all-MiniLM-L6-v2"
embed = HuggingFaceEmbeddings(model_name=model_name)

directory_cdb = "/content/drive/MyDrive/chromadb"

chroma_db = Chroma.from_documents(df_document, embed, persist_directory=directory_cdb)

In [20]:
from langchain.chat_models import ChatOpenAI
from langchain_openai import OpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA

llm=OpenAI(openai_api_key=OPENAI_API_KEY,
           temperature=0.0)

conversational_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=4, #Number of messages stored in memory
    return_messages=True #Must return the messages in the response.
)

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=chroma_db.as_retriever()
)

We can try the isolated Retrieval to see if the information it returns is relevant.




In [None]:
qa.run("What is the main symptom of LCM?")

  warn_deprecated(


' The main symptom of LCM is a biphasic febrile illness, which includes symptoms such as fever, malaise, lack of appetite, muscle aches, headache, nausea, and vomiting.'

Perfect! The information returned is exactly what we desired.

## Creating the Agent.

In [None]:
from langchain.agents import Tool

#Defining the list of tool objects to be used by LangChain.
tools = [
    Tool(
        name='Medical KB',
        func=qa.run,
        description=(
            """use this tool when answering medical knowledge queries to get
            more information about the topic"""
        )
    )
]

In [None]:
from langchain.agents import create_react_agent
from langchain import hub

prompt = hub.pull("hwchase17/react-chat")
agent = create_react_agent(
    tools=tools,
    llm=llm,
    prompt=prompt,
)

In [None]:
# Create an agent executor by passing in the agent and tools
from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent=agent,
                               tools=tools,
                               verbose=True,
                               memory=conversational_memory,
                               max_iterations=30,
                               max_execution_time=600,
                               handle_parsing_errors=True
                               )

### Using the Conversational Agent

To make queries we simply call the `agent` directly.

First i will try a order not related to the Medical field.

In [None]:
agent_executor.invoke({"input": "Give me the area of square of 2x2"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: Do I need to use a tool? Yes
Action: Medical KB
Action Input: Area of square[0m[36;1m[1;3m I don't know.[0m[32;1m[1;3mDo I need to use a tool? No
Final Answer: The area of a square with sides of 2 units is 4 square units.[0m

[1m> Finished chain.[0m


{'input': 'Give me the area of square of 2x2',
 'chat_history': [],
 'output': 'The area of a square with sides of 2 units is 4 square units.'}

Perfect, the model has responded without accessing the configured knowledge database.

Now I will try with a question that is also not related to health.

In [None]:
agent_executor.invoke({"input": "Do you know who is Clark Kent?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? No
Final Answer: Clark Kent is the secret identity of the superhero Superman.[0m

[1m> Finished chain.[0m


{'input': 'Do you know who is Clark Kent?',
 'chat_history': [HumanMessage(content='Give me the area of square of 2x2'),
  AIMessage(content='The area of a square with sides of 2 units is 4 square units.')],
 'output': 'Clark Kent is the secret identity of the superhero Superman.'}

It has not accessed either, as the model has been able to identify that it is not a question related to the database that LangChain provides.

Now it's time to try with a question related to Medicine. Let's see if the model can understand that it should first look for information in the vector database at its disposal.

In [None]:
 agent_executor.memory.clear()

In [None]:
agent_executor.invoke({"input": """I have a patient that can have Botulism,
how can I confirm the diagnosis?"""})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? Yes
Action: Medical KB
Action Input: Botulism[0m[36;1m[1;3m Botulism is a rare but serious paralytic illness caused by a nerve toxin produced by certain bacteria. It can be contracted through contaminated food, wounds, or ingestion of bacterial spores. Symptoms include muscle paralysis, difficulty swallowing, and respiratory failure. Treatment includes antitoxin, supportive care, and removal of the source of the toxin.[0m[32;1m[1;3mDo I need to use a tool? No
Final Answer: To confirm the diagnosis, you can perform a physical exam and order laboratory tests, such as a stool or blood test, to detect the presence of the bacteria or its toxin. It is important to act quickly, as botulism can be life-threatening if left untreated.[0m

[1m> Finished chain.[0m


{'input': 'I have a patient that can have Botulism,\nhow can I confirm the diagnosis?',
 'chat_history': [],
 'output': 'To confirm the diagnosis, you can perform a physical exam and order laboratory tests, such as a stool or blood test, to detect the presence of the bacteria or its toxin. It is important to act quickly, as botulism can be life-threatening if left untreated.'}

Perfect, the most important thing for us is that it has been able to identify that it should go to the medical database to search for information about the symptoms.

In [None]:
agent_executor.invoke({"input": "Is this an important illness?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Thought: Do I need to use a tool? No
Final Answer: Yes, botulism is a serious illness that can be life-threatening if left untreated. It is important to seek medical attention and confirm the diagnosis as soon as possible.[0m

[1m> Finished chain.[0m


{'input': 'Is this an important illness?',
 'chat_history': [HumanMessage(content='I have a patient that can have Botulism,\nhow can I confirm the diagnosis?'),
  AIMessage(content='To confirm the diagnosis, you can perform a physical exam and order laboratory tests, such as a stool or blood test, to detect the presence of the bacteria or its toxin. It is important to act quickly, as botulism can be life-threatening if left untreated.')],
 'output': 'Yes, botulism is a serious illness that can be life-threatening if left untreated. It is important to seek medical attention and confirm the diagnosis as soon as possible.'}

And the memory works perfectly. We can maintain a conversation, taking into account that the model knows the previous questions and answers.

# Conclusions.
The experiment has been a small success. The Vectorial database has been configured and filled with information from the dataset. A LangChain agent has been created, and it has been able to retrieve information from the database only when necessary. Don't forget that our ChatBot has memory.

All of this in just a few lines of code!


---