<a href="https://colab.research.google.com/github/claudio1975/PyCon_Italia_2025/blob/main/Phi_3.5/Medical_Q%26A_from_BoW_to_Agent_Phi_3_5_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Prepare Workspace

In [None]:
!pip install -q torch transformers sentence-transformers faiss-cpu pypdf &> /dev/null


In [None]:
!pip install -U langchain-huggingface &>/dev/null


In [None]:
!pip install -q langchain langchain-community &> /dev/null


In [None]:
!pip install ipywidgets &>/dev/null

In [None]:
! pip install huggingface_hub[hf_xet] &> /dev/null

In [None]:
! pip install -U "autogen[openai]" &>/dev/null

In [None]:
import os
import pandas as pd
import numpy as np
import langchain as lc
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from langchain import LLMMathChain
from langchain.chains import LLMChain
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.schema import Document
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import pipeline
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_huggingface import HuggingFacePipeline
import autogen
from autogen import AssistantAgent

import warnings
warnings.filterwarnings("ignore")


In [None]:
llm_config = {
    "model": "gpt-4o-mini",
    "api_key": ""
    }

### Upload the dataset

Is used a medical dataset with Q&A (https://huggingface.co/datasets/lavita/ChatDoctor-iCliniq) located in the Lavita AI Hugging Face space (https://huggingface.co/lavita). LAVITA is a blockchain-AI based next generation healthcare platform, enabling the use of massive biomedical datasets for research while preserving individual privacy and ownership of data.

In [None]:
dataset = pd.read_parquet("hf://datasets/lavita/ChatDoctor-iCliniq/data/train-00000-of-00001-7f15f39e4c3a7ee9.parquet")
# Select relevant columns
selected_columns = ['input', 'answer_icliniq']
train_dataset = dataset[selected_columns].copy()

# Convert the DataFrame to a list of Document objects
documents = [
    Document(page_content=row['input'], metadata={'answer': row['answer_icliniq']})
    for _, row in train_dataset.iterrows()
]


In [None]:
#-----------------
# Look up
#-----------------



#### What it is:
Retrieves exact matches by comparing the query directly to dataset inputs to find corresponding answers. Can be used for FAQ.

#### Strengths & Weaknesses:
Simple and fast for exact matches; no complex computations or models are required. Limited to existing exact matches.

In [None]:
# Define function to retrieve answers
def get_answer_exact(query, df):
    result = train_dataset[train_dataset["input"] == query]["answer_icliniq"]
    return result.iloc[0] if not result.empty else "No exact match found"


In [None]:
# Example search
query = "HI doctor, What are common symptoms of flu?"
print(get_answer_exact(query, train_dataset))



No exact match found


In [None]:
# Example search
query= "Hi doctor,My last USG report showed intrauterine pregnancy with 8 mm gestation sac, no fetal and yolk sac. Last week I had brown discharge with mild lower abdominal pain and was prescribed Susten 200 mg (oral). Now, I am feeling the pain frequently in the right lower abdomen with mild bleeding and pain. It is less when I lie on bed. What to do?"
print(get_answer_exact(query, train_dataset))



Hello, Welcome to Chat Doctor forum. I have read through your question and understand your concerns. I think it is threatened abortion. You should take bed rest along with tablet Susten 200 mg twice daily (Progesterone) for two weeks and Mbryosafe sachet once daily (L-arginine). I think it will help you and give you better prognosis. Ultrasound is required after two weeks for evaluation.


In [None]:
# Example search
query="Hello doctor,I am currently having acne on my cheeks. I am currently applying Benzoyl peroxide gel on my acne. It only reduces the inflammation but bumps are not reducing. I also have pigmentation caused by acne before. I only get acne on my cheeks. What may be the reason? I usually get cyst acne bumps on my cheeks. If acne clears, then it comes again. I have red pigmentation caused by acne. Can you recommend a product for acne and after acne has cleared, can you suggest a product for pigmentation? Can you recommend a product for controlling further breakouts? Also, I wanted to know which sunscreen and moisturizer to use?"
print(get_answer_exact(query, train_dataset))



Hi. Acne reduces in severity with aging. Use cleanser like Cetaphil to wash face. Apply Retino-A (Retinol) cream in the night, Clindamycin gel in the morning. Treat dan ChatDoctor.  You can use sunscreens like Shade (Avobenzone, Oxybenzone) lotion. Retino-A helps in pigmentation and to some extent scars. You can consider lasers later if scars persist. For more information consult a dermatologist online


In [None]:
#---------
# BoW
#---------




#### What it is:
Can be used representing text as TF-IDF vectors or word frequency.
It finds similarity between queries and dataset input on term overlap.

#### Strengths & weaknesses:
Effective for identifying relevant documents based on word patterns; computationally efficient, it's simple but lacks deep understanding, because it ignores word order and context.



In [None]:
# Create BoW / TF-IDF vectors
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(train_dataset["input"])  # Convert input column into vector form

# Function to find the closest matching input
def retrieve_answer_bow(query, df, vectorizer, X):
    query_vec = vectorizer.transform([query])  # Vectorize the query
    similarities = cosine_similarity(query_vec, X).flatten()  # Compute similarity
    best_match_idx = similarities.argmax()  # Get the closest match index
    return df.iloc[best_match_idx]["answer_icliniq"]


In [None]:
# Example search
query = "HI doctor, What are common symptoms of flu?"
answer = retrieve_answer_bow(query, train_dataset, vectorizer, X)
print(answer)



Hello. According to your statement, you got a fever that subsided after two days and then developed flu and mouth sore. Mouth sore is a common complication of flu. Again mouth sore may develope by a viral infection like herpes simplex virus most commonly fungal infection, nutritional deficiency like iron, zinc, vitamin B12, etc. Some medications like corticosteroid, chemotherapy, sulfa  ChatDoctor.  Autoimmune disorders like lichen planus also. For your flu-like running nose and body ache you can take anti-histamine, Acetamenofen or Paracetamol, topical antiviral cream, vitamins containing B12 with zinc and iron. If any complications occur or your symptoms do not subside within seven to ten days inform me as soon as possible, I will try to help you further.


In [None]:
# Example search
query= "Hi doctor,My last USG report showed intrauterine pregnancy with 8 mm gestation sac, no fetal and yolk sac. Last week I had brown discharge with mild lower abdominal pain and was prescribed Susten 200 mg (oral). Now, I am feeling the pain frequently in the right lower abdomen with mild bleeding and pain. It is less when I lie on bed. What to do?"
answer = retrieve_answer_bow(query, train_dataset, vectorizer, X)
print(answer)



Hello, Welcome to Chat Doctor forum. I have read through your question and understand your concerns. I think it is threatened abortion. You should take bed rest along with tablet Susten 200 mg twice daily (Progesterone) for two weeks and Mbryosafe sachet once daily (L-arginine). I think it will help you and give you better prognosis. Ultrasound is required after two weeks for evaluation.


In [None]:
# Example search
query="Hello doctor,I am currently having acne on my cheeks. I am currently applying Benzoyl peroxide gel on my acne. It only reduces the inflammation but bumps are not reducing. I also have pigmentation caused by acne before. I only get acne on my cheeks. What may be the reason? I usually get cyst acne bumps on my cheeks. If acne clears, then it comes again. I have red pigmentation caused by acne. Can you recommend a product for acne and after acne has cleared, can you suggest a product for pigmentation? Can you recommend a product for controlling further breakouts? Also, I wanted to know which sunscreen and moisturizer to use?"
answer = retrieve_answer_bow(query, train_dataset, vectorizer, X)
print(answer)



Hi. Acne reduces in severity with aging. Use cleanser like Cetaphil to wash face. Apply Retino-A (Retinol) cream in the night, Clindamycin gel in the morning. Treat dan ChatDoctor.  You can use sunscreens like Shade (Avobenzone, Oxybenzone) lotion. Retino-A helps in pigmentation and to some extent scars. You can consider lasers later if scars persist. For more information consult a dermatologist online


In [None]:
#------------------------------------------------
# Semantic Search (Using Embeddings without RAG)
#------------------------------------------------


#### What it is:
Transforms text into dense vector representations using models like HuggingFace’s embeddings for semantic similarity search.

#### Strengths & weaknesses:
Captures semantic meaning beyond exact words; handles synonyms and contextual nuances well. It captures meaning better, but require vector storage; may miss domain-specific meanings.

In [None]:
# Initialize HuggingFace Embeddings
embedding_model = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')

# Create FAISS index from Document objects
db_emb = FAISS.from_documents(documents, embedding_model)

# Initialize Retriever
retriever = db_emb.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 3}
)

# Function to retrieve answers using Semantic Search
def retrieve_answer_semantic(query, retriever, top_k=3):
    retrieved_docs = retriever.get_relevant_documents(query)
    if not retrieved_docs:
        return "No relevant answers found."
    # For simplicity, return the top 1 answer
    return retrieved_docs[0].metadata['answer']




In [None]:
# Example search
query = "HI doctor, What are common symptoms of flu?"
answer = retrieve_answer_semantic(query, retriever)
print(answer)


Hello, Welcome to Chat Doctor forum. I have gone through your query, and I can understand your concerns. The symptoms you have mentioned are suggestive of pharyngitis (a sore throat) along with a common cold. You do not need to worry regarding this as it is highly curable and complications are rare. I suggest you consult a general practitioner and get the following things done. For more information consult a general practitioner online


In [None]:
# Example search
query= "Hi doctor,My last USG report showed intrauterine pregnancy with 8 mm gestation sac, no fetal and yolk sac. Last week I had brown discharge with mild lower abdominal pain and was prescribed Susten 200 mg (oral). Now, I am feeling the pain frequently in the right lower abdomen with mild bleeding and pain. It is less when I lie on bed. What to do?"
answer = retrieve_answer_semantic(query, retriever)
print(answer)


Hello, Welcome to Chat Doctor forum. I have read through your question and understand your concerns. I think it is threatened abortion. You should take bed rest along with tablet Susten 200 mg twice daily (Progesterone) for two weeks and Mbryosafe sachet once daily (L-arginine). I think it will help you and give you better prognosis. Ultrasound is required after two weeks for evaluation.


In [None]:
# Example search
query="Hello doctor,I am currently having acne on my cheeks. I am currently applying Benzoyl peroxide gel on my acne. It only reduces the inflammation but bumps are not reducing. I also have pigmentation caused by acne before. I only get acne on my cheeks. What may be the reason? I usually get cyst acne bumps on my cheeks. If acne clears, then it comes again. I have red pigmentation caused by acne. Can you recommend a product for acne and after acne has cleared, can you suggest a product for pigmentation? Can you recommend a product for controlling further breakouts? Also, I wanted to know which sunscreen and moisturizer to use?"
answer = retrieve_answer_semantic(query, retriever)
print(answer)


Hi. Acne reduces in severity with aging. Use cleanser like Cetaphil to wash face. Apply Retino-A (Retinol) cream in the night, Clindamycin gel in the morning. Treat dan ChatDoctor.  You can use sunscreens like Shade (Avobenzone, Oxybenzone) lotion. Retino-A helps in pigmentation and to some extent scars. You can consider lasers later if scars persist. For more information consult a dermatologist online


In [None]:
#---------------
# RAG TIME
#---------------


#### What it is:
Combines document retrieval with LLMs to produce informed, context-aware answers based on recovered information.

#### Strengths & weaknesses:
Generates more accurate and contextually relevant responses; leverages both retrieval and generation capabilities, but is heavier computationally, and potential for generating incorrect or hallucinated information if retrieval is poor.



In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)
chunked_docs = splitter.split_documents(documents)

print("Dataset Splited by Chunks - You have {0} number of chunks.".format(len(chunked_docs)))


Dataset Splited by Chunks - You have 10343 number of chunks.


In [None]:
db = FAISS.from_documents(chunked_docs,
                          HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2'))

retriever = db.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 3}
)

model_name = "microsoft/Phi-3.5-mini-instruct"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)


In [None]:
# Pipeline for text generation
text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=500,
)

llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

# Prompt template to match desired output format
prompt_template = """
=================================================================================================
You are an expert doctor giving a precise answer based on the provided context.
If you don't find the response in the context, just say "I haven't found the answer".
If you don't know the reply, just say "I don't know", don't try to make up an answer.
Give the return in bullet points within 50 words.
=================================================================================================
Context:
{context}
=================================================================================================
Question: {question}
=================================================================================================
Answer:
=================================================================================================
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template,
)

llm_chain = prompt | llm | StrOutputParser()


rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | llm_chain
)


Device set to use cuda:0


In [None]:
question = "HI doctor, What are common symptoms of flu?"
# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print(result)



You are an expert doctor giving a precise answer based on the provided context.
If you don't find the response in the context, just say "I haven't found the answer".
If you don't know the reply, just say "I don't know", don't try to make up an answer.
Give the return in bullet points within 50 words.
Context:
[Document(id='0fe8af79-97cd-4bac-ac5a-36e0e4295890', metadata={'answer': 'Hi. I have read your query and understood your health concern. Infection by Streptococcus or viral. 1. Complete blood count.2. Chest x-ray.3. Nasal and throat swab culture.4. PCR.  Take care.'}, page_content='now. I am wondering are stomach cramps, chest pain, back pain, and labored breathing symptoms of flu? Please do the needful.'), Document(id='9c73ac0b-f4b1-4c88-9fdb-59b7376fea03', metadata={'answer': 'Hello, Welcome to Chat Doctor forum. I have gone through your query, and I can understand your concerns. The symptoms you have mentioned are suggestive of pharyngitis (a sore throat) along with a common c

In [None]:
question = "Hi doctor,My last USG report showed intrauterine pregnancy with 8 mm gestation sac, no fetal and yolk sac. Last week I had brown discharge with mild lower abdominal pain and was prescribed Susten 200 mg (oral). Now, I am feeling the pain frequently in the right lower abdomen with mild bleeding and pain. It is less when I lie on bed. What to do?"
# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print(result)



You are an expert doctor giving a precise answer based on the provided context.
If you don't find the response in the context, just say "I haven't found the answer".
If you don't know the reply, just say "I don't know", don't try to make up an answer.
Give the return in bullet points within 50 words.
Context:
[Document(id='92b68f6f-f664-49c6-8449-66e8984fce30', metadata={'answer': 'Hello, Welcome to Chat Doctor forum. I have read through your question and understand your concerns. I think it is threatened abortion. You should take bed rest along with tablet Susten 200 mg twice daily (Progesterone) for two weeks and Mbryosafe sachet once daily (L-arginine). I think it will help you and give you better prognosis. Ultrasound is required after two weeks for evaluation.'}, page_content='Hi doctor,My last USG report showed intrauterine pregnancy with 8 mm gestation sac, no fetal and yolk sac. Last week I had brown discharge with mild lower abdominal pain and was prescribed Susten 200 mg (or

In [None]:
question = "Hello doctor,I am currently having acne on my cheeks. I am currently applying Benzoyl peroxide gel on my acne. It only reduces the inflammation but bumps are not reducing. I also have pigmentation caused by acne before. I only get acne on my cheeks. What may be the reason? I usually get cyst acne bumps on my cheeks. If acne clears, then it comes again. I have red pigmentation caused by acne. Can you recommend a product for acne and after acne has cleared, can you suggest a product for pigmentation? Can you recommend a product for controlling further breakouts? Also, I wanted to know which sunscreen and moisturizer to use?"
# Invoke the chain to generate answers
result = rag_chain.invoke(question)

# Display the output
print(result)



You are an expert doctor giving a precise answer based on the provided context.
If you don't find the response in the context, just say "I haven't found the answer".
If you don't know the reply, just say "I don't know", don't try to make up an answer.
Give the return in bullet points within 50 words.
Context:
[Document(id='db9ea6d4-9d02-4d7d-b026-deb54f16bef0', metadata={'answer': 'Hi. Acne reduces in severity with aging. Use cleanser like Cetaphil to wash face. Apply Retino-A (Retinol) cream in the night, Clindamycin gel in the morning. Treat dan ChatDoctor.  You can use sunscreens like Shade (Avobenzone, Oxybenzone) lotion. Retino-A helps in pigmentation and to some extent scars. You can consider lasers later if scars persist. For more information consult a dermatologist online'}, page_content='Hello doctor,I am currently having acne on my cheeks. I am currently applying Benzoyl peroxide gel on my acne. It only reduces the inflammation but bumps are not reducing. I also have pigment

In [None]:
#============================
# Agent
#============================

#### What it is:
The Agent interprets the user’s request, constructs a plan/action, and identifies decision points. From this, it executes the steps and makes the required decisions independently. In this context it refines retrieved answers and provide detailed, context-enhanced medical recommendations through conversational AI.

#### Strengths & weaknesses:
Adds intelligent processing and refinement. Depends on the quality of underlying models; may require extensive tuning to ensure accuracy and reliability.

In [None]:
# Pipeline for text generation
text_generation_pipeline = pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    temperature=0.2,
    do_sample=True,
    repetition_penalty=1.1,
    return_full_text=False,
    max_new_tokens=500,
)



Device set to use cuda:0


In [None]:
medical_assistant = autogen.AssistantAgent(
    name="medical_assistant",
    system_message="You are a medical assistant involved in the analysis of patients disease.",
    llm_config=llm_config,
)

In [None]:
task= '''
      Firstly review the {result} from "rag_chain.invoke(question)" and refine {result} adding details from [metadata].
      Then write a concise medical recommendation to user's question in bullet points.
      Please write both answers within 50 words for each one.
      '''

In [None]:
question = "HI doctor, What are common symptoms of flu?"
# Invoke the chain to generate answers
result = rag_chain.invoke(question)
# Prepare the messages for the assistant
messages = [
        {"role": "user", "content": question},
        {"role": "user", "content": f"Here is the retrieved context: {result}"},
        {"role": "user", "content": task}
]
# Generate the reply using the assistant
reply = medical_assistant.generate_reply(messages=messages)

# Display the output
print('=======================================================================================')
print("Agent Review with Medical Recommendation:")
print('=======================================================================================')
print(reply)


Agent Review with Medical Recommendation:
### Refined Result:
- Common symptoms of flu include:
  - Sore Throat
  - Headaches
  - Body Aches
  - High Temperature/Fever
- Symptoms may differ among individuals; consulting a medical professional is essential for an accurate diagnosis and treatment.

### Medical Recommendation:
- Monitor your symptoms closely.
- Stay hydrated and rest.
- Use over-the-counter medications for fever and pain relief.
- Seek medical attention if symptoms worsen or persist.


In [None]:
question = "Hi doctor,My last USG report showed intrauterine pregnancy with 8 mm gestation sac, no fetal and yolk sac. Last week I had brown discharge with mild lower abdominal pain and was prescribed Susten 200 mg (oral). Now, I am feeling the pain frequently in the right lower abdomen with mild bleeding and pain. It is less when I lie on bed. What to do?"
# Invoke the chain to generate answers
result = rag_chain.invoke(question)
# Prepare the messages for the assistant
messages = [
        {"role": "user", "content": question},
        {"role": "user", "content": f"Here is the retrieved context: {result}"},
        {"role": "user", "content": task}
]
# Generate the reply using the assistant
reply = medical_assistant.generate_reply(messages=messages)

# Display the output
print('=======================================================================================')
print("Agent Review with Medical Recommendation:")
print('=======================================================================================')
print(reply)


Agent Review with Medical Recommendation:
### Refined Result
* The report indicates a threatened abortion with an 8 mm gestation sac but no fetal or yolk sac visible. Brown discharge and abdominal pain were noted. Susten 200 mg was prescribed to support the pregnancy. Rest and an ultrasound follow-up are advised.

### Medical Recommendation
* Continue Susten 200 mg twice daily and consider L-arginine sachets.
* Take bed rest to alleviate pain and bleeding.
* Schedule a follow-up ultrasound in two weeks for further evaluation.


In [None]:
question = "Hello doctor,I am currently having acne on my cheeks. I am currently applying Benzoyl peroxide gel on my acne. It only reduces the inflammation but bumps are not reducing. I also have pigmentation caused by acne before. I only get acne on my cheeks. What may be the reason? I usually get cyst acne bumps on my cheeks. If acne clears, then it comes again. I have red pigmentation caused by acne. Can you recommend a product for acne and after acne has cleared, can you suggest a product for pigmentation? Can you recommend a product for controlling further breakouts? Also, I wanted to know which sunscreen and moisturizer to use?"
# Invoke the chain to generate answers
result = rag_chain.invoke(question)
# Prepare the messages for the assistant
messages = [
        {"role": "user", "content": question},
        {"role": "user", "content": f"Here is the retrieved context: {result}"},
        {"role": "user", "content": task}
]
# Generate the reply using the assistant
reply = medical_assistant.generate_reply(messages=messages)

# Display the output
print('=======================================================================================')
print("Agent Review with Medical Recommendation:")
print('=======================================================================================')
print(reply)


Agent Review with Medical Recommendation:
**Refined Result:**
- Severe acne may require Oral Isotretinoin treatment; consult a dermatologist.
- Use a gentle cleanser like Cetaphil and apply Retino-A (night) with Clindamycin (morning).
- For pigmentation, consider Retino-A or vitamin C serum post-acne.
- Use sunscreen with Avobenzone and moisturizer with hyaluronic acid and niacinamide.

**Concise Medical Recommendation:**
- Switch to Cetaphil cleanser; apply Retino-A at night, Clindamycin in the morning.
- Use vitamin C serum for pigmentation after acne clears.
- Apply broad-spectrum sunscreen with Avobenzone and a moisturizer containing hyaluronic acid and niacinamide to improve skin barrier and reduce inflammation.
