<a href="https://colab.research.google.com/github/ritwiks9635/Medical_FAQ_Chatbot/blob/main/Medical_FAQ_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import os
from google.colab import userdata


os.environ['KAGGLE_USERNAME'] = userdata.get('KAGGLE_USERNAME')
os.environ['KAGGLE_KEY'] = userdata.get("KAGGLE_API_KEY")
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.dataset_download_files( "thedevastator/comprehensive-medical-q-a-dataset",path="https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset")

Dataset URL: https://www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset


In [2]:
!unzip /content/https:/www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset/comprehensive-medical-q-a-dataset.zip

Archive:  /content/https:/www.kaggle.com/datasets/thedevastator/comprehensive-medical-q-a-dataset/comprehensive-medical-q-a-dataset.zip
  inflating: train.csv               


In [15]:
!pip install -q langchain openai chromadb pandas langchain-google-genai -U langchain-community google-generativeai>=0.8.3


In [23]:
import os
import pandas as pd

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

CHROMA_DIR = "chroma_db"
GOOGLE_API_KEY = userdata.get("GOOGLE_API_KEY")
os.environ["GOOGLE_API_KEY"] = GOOGLE_API_KEY
data = pd.read_csv("/content/train.csv")
data.head()

Unnamed: 0,qtype,Question,Answer
0,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,LCMV infections can occur after exposure to fr...
1,symptoms,What are the symptoms of Lymphocytic Choriomen...,LCMV is most commonly recognized as causing ne...
2,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...
3,exams and tests,How to diagnose Lymphocytic Choriomeningitis (...,"During the first phase of the disease, the mos..."
4,treatment,What are the treatments for Lymphocytic Chorio...,"Aseptic meningitis, encephalitis, or meningoen..."


In [5]:
data.shape

(16407, 3)

In [6]:
data.isnull().sum()

Unnamed: 0,0
qtype,0
Question,0
Answer,0


In [8]:
def build_documents(df: pd.DataFrame):
    docs = []
    for _, row in df.iterrows():
        q = str(row["question"]).strip()
        a = str(row["answer"]).strip()
        qtype = str(row["qtype"]).strip() if "qtype" in df.columns else ""
        text = f"Question: {q}\nAnswer: {a}"
        metadata = {"question": q, "qtype": qtype}
        docs.append(Document(page_content=text, metadata=metadata))
    return docs

raw_docs = build_documents(data)
print(f"Built {len(raw_docs)} base documents")
raw_docs[:2]

Built 16407 base documents


[Document(metadata={'question': 'Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?', 'qtype': 'susceptibility'}, page_content='Question: Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?\nAnswer: LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents.  Transmission may also occur when these materials are directly introduced into broken skin, the nose, the eyes, or the mouth, or presumably, via the bite of an infected rodent. Person-to-person transmission has not been reported, with the exception of vertical transmission from infected mother to fetus, and rarely, through organ transplantation.'),
 Document(metadata={'question': 'What are the symptoms of Lymphocytic Choriomeningitis (LCM) ?', 'qtype': 'symptoms'}, page_content='Question: What are the symptoms of Lymphocytic Choriomeningitis (LCM) ?\nAnswer: LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection w

In [9]:
splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=200,
    length_function=len
)

docs = splitter.split_documents(raw_docs)
print(f"Split into {len(docs)} chunks")

for d in docs[:2]:
    print(d.page_content[:200], "\n---", d.metadata, "\n")

Split into 48798 chunks
Question: Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?
Answer: LCMV infections can occur after exposure to fresh urine, droppings, saliva, or nesting materials from infected rodents.  Tran 
--- {'question': 'Who is at risk for Lymphocytic Choriomeningitis (LCM)? ?', 'qtype': 'susceptibility'} 

Question: What are the symptoms of Lymphocytic Choriomeningitis (LCM) ?
Answer: LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms 
--- {'question': 'What are the symptoms of Lymphocytic Choriomeningitis (LCM) ?', 'qtype': 'symptoms'} 



In [18]:
embedding = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

In [19]:
vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=embedding,
    persist_directory=CHROMA_DIR
)

vectorstore.persist()
print(f"Chroma DB built at: {CHROMA_DIR}, total docs stored: {len(docs)}")

Chroma DB built at: chroma_db, total docs stored: 48798


  vectorstore.persist()


In [22]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

query = "What are the symptoms of Lymphocytic Choriomeningitis?"
results = retriever.get_relevant_documents(query)

print("Query:", query)
print("\nTop retrieved docs:")
for r in results:
    print("-----")
    print(r.page_content[:500])
    print("Meta:", r.metadata)

Query: What are the symptoms of Lymphocytic Choriomeningitis?

Top retrieved docs:
-----
Question: What are the symptoms of Lymphocytic Choriomeningitis (LCM) ?
Answer: LCMV is most commonly recognized as causing neurological disease, as its name implies, though infection without symptoms or mild febrile illnesses are more common clinical manifestations. 
                
For infected persons who do become ill, onset of symptoms usually occurs 8-13 days after exposure to the virus as part of a biphasic febrile illness. This initial phase, which may last as long as a week, typically 
Meta: {'question': 'What are the symptoms of Lymphocytic Choriomeningitis (LCM) ?', 'qtype': 'symptoms'}
-----
Question: How to diagnose Lymphocytic Choriomeningitis (LCM) ?
Answer: During the first phase of the disease, the most common laboratory abnormalities are a low white blood cell count (leukopenia) and a low platelet count (thrombocytopenia). Liver enzymes in the serum may also be mildly elevated. A

In [28]:
gemini_model = ChatGoogleGenerativeAI(
        model="gemini-1.5-flash-latest",
        temperature=0.4,
        google_api_key=GOOGLE_API_KEY
    )

In [26]:
vectorstore = Chroma(
    persist_directory=CHROMA_DIR,
    embedding_function=embedding
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
print("Retriever ready. Connected to Chroma index.")

Retriever ready. Connected to Chroma index.


  vectorstore = Chroma(


In [27]:
def build_prompt(question: str, docs: list[Document]) -> str:
    context = "\n\n".join([d.page_content for d in docs])
    prompt = f"""
You are a helpful, careful medical assistant chatbot.

Use ONLY the following context to answer the user question.
If the answer is not in the context, say:
"I don't have enough information from the knowledge base. Please consult a healthcare professional."

Context:
{context}

User Question: {question}

Answer clearly and concisely.
"""
    return prompt


In [38]:
def generate_answer(question: str, docs: list[Document],
                    model, temperature: float = 0.0) -> str:
    prompt = build_prompt(question, docs)
    response = model.invoke(prompt)
    return response.content

In [42]:
def answer_query(question: str):
    docs = retriever.get_relevant_documents(question)
    answer = generate_answer(question, docs, gemini_model)

    print("\nUser:", question)
    print("\nBot:", answer)

In [40]:
answer_query("What are the symptoms of Lymphocytic Choriomeningitis?")


User: What are the symptoms of Lymphocytic Choriomeningitis?

Bot: The most common symptoms of Lymphocytic Choriomeningitis (LCM) are fever, malaise, lack of appetite, muscle aches, headache, nausea, and vomiting.  Less frequent symptoms include sore throat, cough, joint pain, chest pain, testicular pain, and parotid (salivary gland) pain.  Symptoms usually begin 8-13 days after exposure and may last up to a week.

--- Retrieved Context ---
- What are the symptoms of Lymphocytic Choriomeningitis (LCM) ?
- How to diagnose Lymphocytic Choriomeningitis (LCM) ?
- What are the treatments for Lymphocytic Choriomeningitis (LCM) ?
- What are the symptoms of Langerhans cell histiocytosis ?


In [41]:
answer_query("What foods are good for heart health?")


User: What foods are good for heart health?

Bot: A heart-healthy diet includes whole grains, fruits (apples, bananas, oranges, pears, prunes), vegetables (broccoli, cabbage, carrots), legumes (kidney beans, lentils, chickpeas, black-eyed peas, lima beans), fat-free or low-fat dairy (skim milk), and fish high in omega-3 fatty acids (salmon, tuna, trout) about twice a week.

--- Retrieved Context ---
- What are the treatments for High Blood Pressure ?
- What are the treatments for Heart Attack ?
- What are the treatments for Heart Valve Disease ?
- What are the treatments for Coronary Heart Disease ?
