<a href="https://colab.research.google.com/github/parisa-kavian/MediChat-RAG/blob/main/MedicChat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
!pip install -q langchain langchain_openai langchain_community faiss-cpu tiktoken wikipedia arxiv

# HeartWise-ChatBot (No Memory)
A RAG-based medical chatbot using LangChain and Mayo Clinic data, powered by OpenAI's GPT-4o-mini. This version does not maintain conversation history, treating each question independently. Suitable for single-query medical information retrieval about heart disease and health topics.

In [None]:
import os
import pandas as pd
from langchain.document_loaders import AsyncHtmlLoader
from langchain_community.document_transformers import BeautifulSoupTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS

# API
os.environ["OPENAI_API_KEY"] = "sk-proj-G4OwU0k5Czohsr6kFmwrZDOIq45oy3Zm2K9PBA"
openai_api_key = os.environ.get("OPENAI_API_KEY")

# URLs from Mayo Clinic
urls = [
    "https://www.mayoclinic.org/diseases-conditions/heart-disease/symptoms-causes/syc-20353118",
    "https://www.mayoclinic.org/diseases-conditions/cancer/symptoms-causes/syc-20370588",
    "https://www.mayoclinic.org/diseases-conditions/diabetes/symptoms-causes/syc-20371444",
    "https://www.mayoclinic.org/healthy-lifestyle/nutrition-and-healthy-eating/in-depth/healthy-diets/art-20045016",
    "https://www.mayoclinic.org/diseases-conditions/high-blood-pressure/symptoms-causes/syc-20373410"
]

df = pd.DataFrame(urls, columns=["url"])
df.to_excel("urls.xlsx", index=False)

df = pd.read_excel('urls.xlsx')
urls = df['url'].values.tolist()
urls[:5]

# Scraper
loader = AsyncHtmlLoader(urls)
docs = loader.load()

bs_transformer = BeautifulSoupTransformer()
docs_clean = bs_transformer.transform_documents(
    docs, tags_to_extract=["p", "li", "div", "article"]
)

# Chunk
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    is_separator_regex=False,
)
chunks = text_splitter.split_documents(docs_clean)

# Embedding
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Vectorstore
vectorstore = FAISS.from_documents(chunks, embeddings)
vectorstore.save_local("/content/drive/MyDrive/mayoclinic/mayoclinic_faiss")
vectorstore = FAISS.load_local("/content/drive/MyDrive/mayoclinic/mayoclinic_faiss", embeddings, allow_dangerous_deserialization=True)

# Define LLM
llm = ChatOpenAI(model="gpt-4o-mini", openai_api_key=openai_api_key, temperature=0.2)

# Define prompt with history
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful medical assistant, providing accurate health information based on reliable sources from {context}. Always advise users to consult a doctor for personalized advice."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{question}")
])

# RAG Chain
chain = (
    RunnableParallel({
        "context": itemgetter("question") | vectorstore.as_retriever(search_kwargs={"k": 3}),
        "question": RunnablePassthrough(),
        "chat_history": lambda x: x.get("chat_history", [])
    })
    | prompt
    | llm.with_config({"temperature": 0.2})
    | StrOutputParser()
)

# Sample question
from langchain.callbacks import get_openai_callback

question = "What are the common symptoms and causes of heart disease?"

with get_openai_callback() as cb:
    response = chain.invoke({
        "question": question,
        "chat_history": []
    })

print("پاسخ:", response)

Fetching pages: 100%|##########| 5/5 [00:00<00:00,  7.40it/s]


پاسخ: Common symptoms of heart disease can vary depending on the specific type of heart condition, but some general symptoms include:

1. **Chest Pain**: Discomfort or pain in the chest, which may feel like pressure, squeezing, fullness, or pain.
2. **Shortness of Breath**: Difficulty breathing during activity or at rest, and sometimes waking up at night short of breath.
3. **Fatigue**: Unusual tiredness or lack of energy.
4. **Dizziness or Lightheadedness**: Feeling faint or dizzy, which can sometimes lead to fainting.
5. **Irregular Heartbeats**: Palpitations, which may feel like a racing, pounding, or fluttering heartbeat.
6. **Swelling**: Swelling in the legs, ankles, or feet.

### Common Causes of Heart Disease:

1. **Coronary Artery Disease**: Often caused by atherosclerosis, where plaque builds up in the arteries, leading to narrowed or blocked arteries.
2. **High Cholesterol**: Excess cholesterol can contribute to plaque formation in the arteries.
3. **High Blood Pressure**: Ca

# HeartWise-ChatBot (With Memory)
A RAG-based medical chatbot using LangChain and Mayo Clinic data, powered by OpenAI's GPT-4o-mini. This version maintains conversation history (limited to the last 4 messages) to provide context-aware responses about heart disease and health topics. Ideal for interactive, multi-turn conversations.

In [None]:
import os
import pandas as pd
from langchain.document_loaders import AsyncHtmlLoader
from langchain_community.document_transformers import BeautifulSoupTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableParallel, RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain_core.messages import HumanMessage, AIMessage
from langchain.callbacks import get_openai_callback

# API
os.environ["OPENAI_API_KEY"] = "sk-proj-G4OwU0k5Czohsr6kFmwrZDOIq45oy3Zm2K9PBA"
openai_api_key = os.environ.get("OPENAI_API_KEY")

# URLs from Mayo Clinic
urls = [
    "https://www.mayoclinic.org/diseases-conditions/heart-disease/symptoms-causes/syc-20353118",
    "https://www.mayoclinic.org/diseases-conditions/cancer/symptoms-causes/syc-20370588",
    "https://www.mayoclinic.org/diseases-conditions/diabetes/symptoms-causes/syc-20371444",
    "https://www.mayoclinic.org/healthy-lifestyle/nutrition-and-healthy-eating/in-depth/healthy-diets/art-20045016",
    "https://www.mayoclinic.org/diseases-conditions/high-blood-pressure/symptoms-causes/syc-20373410"
]

df = pd.DataFrame(urls, columns=["url"])
df.to_excel("urls.xlsx", index=False)

df = pd.read_excel('urls.xlsx')
urls = df['url'].values.tolist()

# Scraper
loader = AsyncHtmlLoader(urls)
docs = loader.load()

bs_transformer = BeautifulSoupTransformer()
docs_clean = bs_transformer.transform_documents(
    docs, tags_to_extract=["p", "li", "div", "article"]
)

# Chunk
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    length_function=len,
    is_separator_regex=False,
)
chunks = text_splitter.split_documents(docs_clean)

# Embedding
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)

# Vectorstore
vectorstore = FAISS.from_documents(chunks, embeddings)
vectorstore.save_local("/content/drive/MyDrive/mayoclinic/mayoclinic_faiss")
vectorstore = FAISS.load_local("/content/drive/MyDrive/mayoclinic/mayoclinic_faiss", embeddings, allow_dangerous_deserialization=True)

# Define LLM
llm = ChatOpenAI(model="gpt-4o-mini", openai_api_key=openai_api_key, temperature=0.2)

# Define prompt with history
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful medical assistant, providing accurate health information based on reliable sources from {context}. Always refer to previous questions and answers in the conversation to maintain context, and advise users to consult a doctor for personalized advice."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{question}")
])

# RAG Chain
chain = (
    RunnableParallel({
        "context": itemgetter("question") | vectorstore.as_retriever(search_kwargs={"k": 3}),
        "question": RunnablePassthrough(),
        "chat_history": lambda x: x.get("chat_history", [])
    })
    | prompt
    | llm.with_config({"temperature": 0.2})
    | StrOutputParser()
)

# Initialize chat history
chat_history = []

# Function to add to chat history and ask a question
def ask_question(question, chat_history):
    # Limit chat history to the last 4 messages
    if len(chat_history) > 4:
        chat_history = chat_history[-4:]

    with get_openai_callback() as cb:
        response = chain.invoke({
            "question": question,
            "chat_history": chat_history
        })
        # Add user question and AI response to chat history
        chat_history.append(HumanMessage(content=question))
        chat_history.append(AIMessage(content=response))
    print("Question:", question)
    print("Response:", response)
    print("-" * 50)
    return chat_history

# Test with the three questions
questions = [
    "What are the common symptoms of heart disease?",
    "What lifestyle changes can help prevent its?",
    "Can you explain more about how diet affects  it?"
]

# Ask questions in sequence, maintaining chat history
for question in questions:
    chat_history = ask_question(question, chat_history)

# Optional
chat_history = ask_question("How does high blood pressure relate to the heart disease symptoms we discussed earlier?", chat_history)

Fetching pages: 100%|##########| 5/5 [00:00<00:00,  7.69it/s]


سوال: What are the common symptoms of heart disease?
پاسخ: Common symptoms of heart disease can vary depending on the specific type of heart condition, but some general symptoms include:

1. **Chest Pain**: Discomfort or pain in the chest area.
2. **Shortness of Breath**: Difficulty breathing during activity or at rest.
3. **Fatigue**: Unusual tiredness or lack of energy.
4. **Dizziness or Lightheadedness**: Feeling faint or weak.
5. **Irregular Heartbeats**: Palpitations or fluttering sensations in the chest.
6. **Swollen Legs, Ankles, or Feet**: Accumulation of fluid in the lower extremities.
7. **Fainting or Near Fainting**: Sudden loss of consciousness or feeling close to it.

If you experience any of these symptoms, especially chest pain or shortness of breath, it's important to seek medical attention promptly. Always consult a healthcare professional for personalized advice and diagnosis.
--------------------------------------------------
سوال: What lifestyle changes can help pre