# üß† Workshop: Adding Knowledge to LLMs  
### Dataset: lavita/ChatDoctor-HealthCareMagic-100k  
HuggingFace: https://huggingface.co/datasets/lavita/ChatDoctor-HealthCareMagic-100k  

### Base Model: google/gemma-2-2b-it  
HuggingFace: https://huggingface.co/google/gemma-2-2b-it  

---

## 4Ô∏è‚É£ üìö RAG: Retrieval-Augmented Generation

In **RAG**, the model is augmented with a **retrieval system** that fetches relevant documents from a knowledge base at inference time.  

This allows the LLM to provide **up-to-date, evidence-backed answers** without needing to store all knowledge in its parameters.

---


In [1]:
# ============================================================
# Workshop: Adding Knowledge to LLMs
# ============================================================
# Dataset: lavita/ChatDoctor-HealthCareMagic-100k
#         HuggingFace Dataset Link: https://huggingface.co/datasets/lavita/ChatDoctor-HealthCareMagic-100k

# Model: google/gemma-2-2b-it
#         HuggingFace Model Link: https://huggingface.co/google/gemma-2-2b-it

# ============================================================
# Goal:
# - Fine-tune a model on Medical ChatDoctor Data using:
# 1) Full Fine-Tuning
# 2) LoRA
# 3) QLoRA (4-bit + LoRA)
# 4) Build a RAG baseline using the SAME data and Evaluate all approaches using the SAME questions
# 5) Create a Medical Agent
# ============================================================


In [2]:
# =====================================================
# RAG
# =====================================================

---

### üì¶ Step 0: Environment Setup


In [3]:
# =====================================================
# 0. Setup
# =====================================================
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer,
    DataCollatorForLanguageModeling, BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
from sklearn.model_selection import train_test_split
from utils.utils import get_gpu_memory, generate_chat_response, generate_RAG_response
import bitsandbytes as bnb
import torch.nn as nn
from sentence_transformers import SentenceTransformer, util
import faiss
import random
from bert_score import score as bert_score_function
#from bleurt import score as bleurt_score
from rouge_score import rouge_scorer as rouge_scorer_function
import numpy as np
from transformers import logging


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
# Define Environment Variables
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
os.environ["DATA_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/datasets/ChatDoctor-dataset/data/"
os.environ["MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/models/gemma-2-2b-it"
os.environ["SAVE_PATH_FAISS_INDEX"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/FT-models/RAG/RAG-faiss-index.idx"
os.environ["SENTENCE_TRANSFORMER_MODEL_PATH"]= "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/models/multi-qa-MiniLM-L6-cos-v1"


In [5]:
#!nvidia-smi

In [6]:
gpu_mem = get_gpu_memory()
print(gpu_mem)

{'total_gb': 63.42, 'used_gb': 27.96, 'free_gb': 35.46, 'source': 'torch'}


---

### üì• Step 1: Load Dataset


In [7]:
# =====================================================
# 1. Load ChatDoctor Dataset
# =====================================================
# Load the dataset from the local directory
chatdoctor = load_dataset(os.getenv("DATA_PATH", None))
device = "cuda" if torch.cuda.is_available() else "cpu"


In [8]:
device

'cuda'

---

### üìÇ Step 2: Define Model Path and Load Tokenizer



In [9]:
# =====================================================
# 2. Tokenizer
# =====================================================
# Define the model we want to fine tune.
model_path = os.getenv("MODEL_PATH", None)
model_name = str(model_path.split("/")[-1])

# Get Model Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token = tokenizer.eos_token


In [10]:
print(f"Model used for RAG: {model_name}")

Model used for RAG: gemma-2-2b-it


---

### üßπ Step 3: Apply Chat Template to the Data 


In [11]:
# =====================================================
# 3. Apply Chat Template & Data Collator with Dynamic Padding
# =====================================================
def format_chat_template(row):
    row["text"] = f"PATIENT MESSAGE: {row['input']}\nANSWER: {row['output']}"
    return row

# Apply chat template to all data
chatdoctor = chatdoctor.map(format_chat_template, num_proc=4)

# Get train dataset
train_dataset = chatdoctor['train']
texts = [ex["text"] for ex in train_dataset]


Map (num_proc=4): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 112165/112165 [00:01<00:00, 89883.04 examples/s] 


---

### üß© Step 4: Compute Embeddings


In [12]:
# =====================================================
# 4. Compute Embeddings
# =====================================================
# Define path of the Sentence Transformer Model (for Q&A detection).
ST_model_path = os.getenv("SENTENCE_TRANSFORMER_MODEL_PATH", None)

# Load embedding model for detecting Q&A
# Model that allow us to Encodes text into vectors:
embed_model = SentenceTransformer(ST_model_path, device=device)

# Encode all texts into embeddings with a progress bar .
embeddings = embed_model.encode(texts, convert_to_numpy=True, show_progress_bar=True)


Batches: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3506/3506 [02:18<00:00, 25.39it/s]


---

### üóÇÔ∏è Step 5: Build FAISS Index

In [15]:
# =====================================================
# 5. Build FAISS index
# =====================================================
# Get embedding dimension.
embedding_dim = embeddings.shape[1]
print(f"embedding_dim: {embedding_dim} - embeddings: {embeddings.shape}")
# embeddings matrix has:
#   112,165 rows --> text chunks/documents/sentences
#   384 columns --> each text is represented by a 384-dimensional vector

# Normalize embeddings for cosine similarity.
# Scaling each embedding vector to a unit length (1).
faiss.normalize_L2(embeddings)

# Create FAISS index using inner product (cosine similarity after normalization).
# FAISS index that will store your embeddings for fast similarity search.
index = faiss.IndexFlatIP(embedding_dim)

# Add all embeddings to the index.
index.add(embeddings)

# Confirm number of vectors in the index
print("FAISS index created with", index.ntotal, "vectors")


embedding_dim: 384 - embeddings: (112165, 384)
FAISS index created with 112165 vectors


---

### üîç 6: Test RAG


In [12]:
# =====================================================
# 6.1. Test RAG Retriever
# =====================================================
# Define the user query to search relevant passages
query = "I woke up this morning feeling the whole room is spinning when i was sitting down. I went to the bathroom walking unsteadily, as i tried to focus i feel nauseous. I try to vomit but it wont come out.. After taking panadol and sleep for few hours, i still feel the same.. By the way, if i lay down or sit down, my head do not spin, only when i want to move around then i feel the whole world is spinning.. And it is normal stomach discomfort at the same time? Earlier after i relieved myself, the spinning lessen so i am not sure whether its connected or coincidences.. Thank you doc!"

# Encode the query into embedding using SentenceTransformer
query_emb = embed_model.encode([query], convert_to_numpy=True)

# Normalize the query embedding for cosine similarity search
faiss.normalize_L2(query_emb)

# Search the FAISS index for the top k=1 most similar passage
D, I = index.search(query_emb, k=3) # Distance & Indices
# D: is an array of numbers representing how similar each neighbor is to the query [-1, 1] --> Where 1 mor similar.
# I: is an array of integers representing which vectors in your index are most similar to the query.

# Print the retrieved passage(s)
print("Top retrieved passage:")
for idx in I[0]:
    print("-", texts[idx])


Top retrieved passage:
- PATIENT MESSAGE: I woke up this morning feeling the whole room is spinning when i was sitting down. I went to the bathroom walking unsteadily, as i tried to focus i feel nauseous. I try to vomit but it wont come out.. After taking panadol and sleep for few hours, i still feel the same.. By the way, if i lay down or sit down, my head do not spin, only when i want to move around then i feel the whole world is spinning.. And it is normal stomach discomfort at the same time? Earlier after i relieved myself, the spinning lessen so i am not sure whether its connected or coincidences.. Thank you doc!
ANSWER: Hi, Thank you for posting your query. The most likely cause for your symptoms is benign paroxysmal positional vertigo (BPPV), a type of peripheral vertigo. In this condition, the most common symptom is dizziness or giddiness, which is made worse with movements. Accompanying nausea and vomiting are common. The condition is due to problem in the ear, and improves in

In [13]:
# =====================================================
# 6.2. Test RAG
# =====================================================
# Read Base Model and Base Tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,    # Reduce GPU memory
    device_map="auto"              # Automatically put layers on GPU
)
base_tokenizer = AutoTokenizer.from_pretrained(model_path)


`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:12<00:00,  6.07s/it]


In [14]:
bold_text = "\033[1m"
reset_text = "\033[0m"
user_query = "I woke up this morning feeling the whole room is spinning when i was sitting down. I went to the bathroom walking unsteadily, as i tried to focus i feel nauseous. I try to vomit but it wont come out.. After taking panadol and sleep for few hours, i still feel the same.. By the way, if i lay down or sit down, my head do not spin, only when i want to move around then i feel the whole world is spinning.. And it is normal stomach discomfort at the same time? Earlier after i relieved myself, the spinning lessen so i am not sure whether its connected or coincidences.. Thank you doc!"

response = generate_RAG_response(
    query=user_query,
    index=index,
    qa_texts=texts,
    embed_model=embed_model,
    base_model=base_model,
    base_tokenizer=base_tokenizer,
    device="cuda",
    top_k=1,
)

print(f"{bold_text}Generated RAG Response:{reset_text}\n", response)


[1mGenerated RAG Response:[0m
 It sounds like you're experiencing a very uncomfortable situation.  I understand you've been feeling dizzy and nauseous, and it's concerning that it'll worsen when you move.  It's important to remember that I'm an AI and can't provide medical advice.  

Based on your description, it'd be wise to consult a medical professional.  They can properly assess your symptoms, determine the cause, and recommend the best course of action.  Here's why seeing a doctor is crucial:

* **Accurate Diagnosis:**  There are many potential causes for your dizziness, and a doctor can rule out serious conditions.
* **Personalized Treatment:**  Treatment depends on the underlying cause. A doctor can provide the most effective treatment plan for you.
 
Please don't hesitate to reach out to a doctor or seek medical attention.  Your health is important. 


**Explanation of the Answer:**

* The answer acknowledges the patient's concerns and validates their experience.
  
* It emph

---
#### üíæ Step 6.3: Save FAISS Index


In [17]:
# Save the index to disk
save_path_faiss_index = os.getenv("SAVE_PATH_FAISS_INDEX", None)
faiss.write_index(index, save_path_faiss_index)
print(f"Index saved to {save_path_faiss_index} with {index.ntotal} vectors.")


Index saved to /leonardo/home/userexternal/gcortiad/workshop-AddingKnowledgeToLLMs/notebooks/FT-models/RAG/faiss_index.idx with 112165 vectors.


---

### ‚ú® 7: Inference with Base Model and RAG

In [20]:
# =====================================================
# 7. Inference with Base Model and RAG
# =====================================================
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer,
    DataCollatorForLanguageModeling, BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
from sklearn.model_selection import train_test_split
from utils.utils import get_gpu_memory, generate_chat_response, generate_RAG_response
import bitsandbytes as bnb
import torch.nn as nn
from sentence_transformers import SentenceTransformer, util
import faiss
import random
from bert_score import score as bert_score_function
from rouge_score import rouge_scorer as rouge_scorer_function
import numpy as np
from transformers import logging

os.environ["DATA_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/datasets/ChatDoctor-dataset/data/"
os.environ["MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/models/gemma-2-2b-it"
os.environ["SAVE_PATH_FAISS_INDEX"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/FT-models/RAG/RAG-faiss-index.idx"
os.environ["SENTENCE_TRANSFORMER_MODEL_PATH"]= "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/models/multi-qa-MiniLM-L6-cos-v1"
os.environ["QLORA_FT_MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/FT-models/QLoRA_model_chatdoctor_gemma-2-2b-it"


# Define the model we want to fine tune.
model_path = os.getenv("MODEL_PATH", None)
model_name = str(model_path.split("/")[-1])
device = "cuda"

# Read Base Model and Base Tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,    # Reduce GPU memory
    device_map="auto"              # Automatically put layers on GPU
)
base_tokenizer = AutoTokenizer.from_pretrained(model_path)

# Define path of the Sentence Transformer Model (for Q&A detection).
ST_model_path = os.getenv("SENTENCE_TRANSFORMER_MODEL_PATH", None)

# Load embedding model for detecting Q&A
embed_model = SentenceTransformer(ST_model_path, device=device)

# Load FAISS index from disk
save_path_faiss_index = os.getenv("SAVE_PATH_FAISS_INDEX", None)
index = faiss.read_index(save_path_faiss_index)
print("Index loaded, total vectors:", index.ntotal)

# Load the dataset from the local directory
chatdoctor = load_dataset(os.getenv("DATA_PATH", None))
device = "cuda" if torch.cuda.is_available() else "cpu"

def format_chat_template(row):
    row["text"] = f"PATIENT MESSAGE: {row['input']}\nANSWER: {row['output']}"
    return row

# Apply chat template to all data
chatdoctor = chatdoctor.map(format_chat_template, num_proc=4)

# Get train dataset
train_dataset = chatdoctor['train']
texts = [ex["text"] for ex in train_dataset]

bold_text = "\033[1m"
reset_text = "\033[0m"


Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:01<00:00,  1.17it/s]


Index loaded, total vectors: 112165


In [10]:
# How to do RAG inference?
#help(generate_RAG_response)

In [11]:
# =====================================================
#    7.1. Inference with Base Model
# =====================================================
instruction = "If you are a doctor, please answer the medical questions based on the patient's description."

user_message = "I woke up this morning feeling the whole room is spinning when i was sitting down. I went to the bathroom walking unsteadily, as i tried to focus i feel nauseous. I try to vomit but it wont come out.. After taking panadol and sleep for few hours, i still feel the same.. By the way, if i lay down or sit down, my head do not spin, only when i want to move around then i feel the whole world is spinning.. And it is normal stomach discomfort at the same time? Earlier after i relieved myself, the spinning lessen so i am not sure whether its connected or coincidences.. Thank you doc!"
user_message2 = "Hello, My husband is taking Oxycodone due to a broken leg/surgery. He has been taking this pain medication for one month. We are trying to conceive our second baby. Will this medication afect the fetus? Or the health of the baby? Or can it bring birth defects? Thank you."

messages = [
    {"role": "user", "content": f"INSTRUCTION:\n{instruction}\n\nPATIENT MESSAGE:\n{user_message}"}
]

response = generate_chat_response(
    messages=messages,
    model=base_model,
    tokenizer=base_tokenizer,
    device="cuda",
    max_new_tokens=512,
    temperature=0.2,
    top_p=0.85,
    top_k=50,
    no_repeat_ngram_size=3,
)

print(response)


I understand you're experiencing dizziness and nausea, and I'm sorry to hear you've been feeling unwell.  

**It's important to note that I am not a medical professional and cannot provide medical advice. The information below is for general knowledge only and should not be considered a substitute for professional medical advice.**

Based on your description, it sounds like you might be experiencing **vertigo**, which is a feeling of dizziness or spinning sensation.  There are several possible causes of vertigo, and it's crucial to see a doctor to determine the underlying cause and receive appropriate treatment. 

Here are some potential causes of your symptoms:

* **Benign paroxysmal positional vertigo (BPPV):** This is a common cause of vertigo where tiny calcium crystals in the inner ear become dislodged, causing dizziness when you move your head.
* **Meniere's disease:** This inner ear disorder can cause vertigo, hearing loss, and tinnitus.
    
**What you should do:**

1. **See a 

In [12]:
# =====================================================
#    7.2. Inference with RAG
# =====================================================
response = generate_RAG_response(
    query=user_message,
    index=index,
    qa_texts=texts,
    embed_model=embed_model,
    base_model=base_model,
    base_tokenizer=base_tokenizer,
    device="cuda",
    top_k=1,
    #retrieved_print=True,
)

print(f"{bold_text}Generated RAG Response:{reset_text}\n", response)


[1mGenerated RAG Response:[0m
 It sounds like you're experiencing dizziness and nausea, which can be quite concerning.  It's important to remember that I am an AI and cannot provide medical advice.  

Based on your description, it's possible you've got something called Benign Paroxysmal Positional Vertigo (BVPV). This is a common condition where tiny crystals in your inner ear get dislodged, causing dizziness when you move your head in certain positions. 

Here's what I can suggest:

* **See a doctor:** It's crucial to get a proper diagnosis from a medical professional. They can perform tests to rule out other conditions and recommend the best treatment for you.
* **Rest:**  Give your body time to recover. 
*  **Stay hydrated:**  Drinking plenty of fluids can help with nausea.
 
Please remember, I am not a doctor.  Your health is important, so please seek professional medical advice for a proper evaluation and treatment plan. 


**Explanation of the Answer:**

* The answer acknowledg

### ‚öñÔ∏è 8: Compare Full FT, LoRA, QLoRA, and RAG

Evaluate and compare the performance of the different models and approaches:
- **Full Fine-Tuning (Full FT)**
- **LoRA Fine-Tuning**
- **QLoRA Fine-Tuning**
- **RAG-augmented Model**


In [13]:
# ==========================================================================================================
#    8. Comparison across Full FT Model, LoRA FT Model, QLoRA FT Model and RAG
# ==========================================================================================================
#########################
# Load Full FT Model
#########################
# Import models alone
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
from utils.utils import generate_chat_response
import os

os.environ["DATA_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/datasets/ChatDoctor-dataset/data/"
os.environ["MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/models/gemma-2-2b-it"
os.environ["SAVE_PATH_FAISS_INDEX"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/FT-models/RAG/RAG-faiss-index.idx"
os.environ["SENTENCE_TRANSFORMER_MODEL_PATH"]= "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/models/multi-qa-MiniLM-L6-cos-v1"
os.environ["QLORA_FT_MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/FT-models/QLoRA_model_chatdoctor_gemma-2-2b-it"
os.environ["LORA_FT_MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/FT-models/LoRA_model_chatdoctor_gemma-2-2b-it/"
os.environ["FULL_FT_MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/FT-models/full_model_chatdoctor_gemma-2-2b-it/checkpoint-564/"
os.environ["ROBERTA_LARGE_MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/models/roberta-large"
os.environ["SENTENCE_TRANSFORMER_SCORE_MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/models/all-mpnet-base-v2"


# Define path of the Base Model
base_model_path = os.getenv("MODEL_PATH", None)
base_model_name = str(base_model_path.split("/")[-1])

# Define the path where Full FT Model is saved.
save_path_full_ft_model = os.getenv("FULL_FT_MODEL_PATH", None)

# Read Full FT Model and Full FT Tokenizer
full_model = AutoModelForCausalLM.from_pretrained(
    save_path_full_ft_model,
    torch_dtype=torch.bfloat16,    # Reduce GPU memory
    device_map="auto"             # Automatically put layers on GPU
)
full_tokenizer = AutoTokenizer.from_pretrained(save_path_full_ft_model)

#########################
# Load LoRA FT Model
#########################
# Import models alone
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
from utils.utils import generate_chat_response

# Define path of the Base Model
base_model_path = os.getenv("MODEL_PATH", None)
base_model_name = str(base_model_path.split("/")[-1])

# Define the path where LoRA FT Model is saved.
save_path_lora_ft_model = os.getenv("LORA_FT_MODEL_PATH", None)

# Read Base Model and Base Tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_path,
    torch_dtype=torch.bfloat16,    # Reduce GPU memory
    device_map="auto"              # Automatically put layers on GPU
)
base_tokenizer = AutoTokenizer.from_pretrained(base_model_path)

# Read LoRA FT Model and LoRA FT Tokenizer
lora_model = PeftModel.from_pretrained(base_model, save_path_lora_ft_model)

lora_tokenizer = AutoTokenizer.from_pretrained(save_path_lora_ft_model)

#os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
#os.environ["TORCH_USE_CUDA_DSA"] = "1"
#os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

#########################
# Load QLoRA FT Model
#########################
# Import models alone
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
from utils.utils import generate_chat_response

# Define path of the Base Model
base_model_path = os.getenv("MODEL_PATH", None)
base_model_name = str(base_model_path.split("/")[-1])
model_name = base_model_name

# Define the path where Full FT Model is saved.
save_path_qlora_ft_model = os.getenv("QLORA_FT_MODEL_PATH", None)

# Read QLoRA FT Model and QLoRA FT Tokenizer
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
)

qmodel = AutoModelForCausalLM.from_pretrained(
    base_model_path,
    quantization_config=bnb_config,
    device_map="cuda:0", # cuda:0
)

# Read LoRA FT Model and LoRA FT Tokenizer
qlora_model = PeftModel.from_pretrained(qmodel, save_path_qlora_ft_model)

qlora_tokenizer = AutoTokenizer.from_pretrained(save_path_qlora_ft_model)


Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:01<00:00,  1.16it/s]
The module name  (originally ) is not a valid Python identifier. Please rename the original module to avoid import issues.
Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:01<00:00,  1.18it/s]
Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:03<00:00,  1.84s/it]


In [14]:
instruction = "If you are a doctor, please answer the medical questions based on the patient's description."
user_message = "I woke up this morning feeling the whole room is spinning when i was sitting down. I went to the bathroom walking unsteadily, as i tried to focus i feel nauseous. I try to vomit but it wont come out.. After taking panadol and sleep for few hours, i still feel the same.. By the way, if i lay down or sit down, my head do not spin, only when i want to move around then i feel the whole world is spinning.. And it is normal stomach discomfort at the same time? Earlier after i relieved myself, the spinning lessen so i am not sure whether its connected or coincidences.. Thank you doc!"

messages = [
    {"role": "user", "content": f"INSTRUCTION:\n{instruction}\n\nPATIENT MESSAGE:\n{user_message}"}
]


In [15]:
# =====================================================
#    8.1. Inference with Full FT Model
# =====================================================
response = generate_chat_response(
    messages=messages,
    model=full_model,
    tokenizer=full_tokenizer,
    device="cuda",
    max_new_tokens=512,
    temperature=0.2,
    top_p=0.85,

    top_k=50,
    no_repeat_ngram_size=3,
)

print(response)


Hi, I think you may be having a problem called vertigo. In vertigo, the sensation of movement when you are not actually moving is caused by a number of factors including anxiety, stress, excessive coffee or alcohol, certain Chat Doctor.  In your case, it seems that you are having anxiety induced vertigo. There are a number medications that can help with this including betahistine, meclizine and linearizing preparations. I would suggest that you consult a physician or a psychiatrist for further evaluation and treatment. I hope this information is useful for you. Thanks.


In [16]:
# =====================================================
#    8.2. Inference with LoRA FT Model
# =====================================================
response = generate_chat_response(
    messages=messages,
    model=lora_model,
    tokenizer=lora_tokenizer,
    device="cuda",
    max_new_tokens=512,
    temperature=0.2,
    top_p=0.85,
    top_k=50,
    no_repeat_ngram_size=3,
)

print(response)


Hi, I have gone through your query. I can understand your concern. You are having dizziness and nausea. It is not clear whether it is due to stomach discomfort or due to some other cause. You should consult your physician and get your blood sugar, blood pressure, and ECG done. If your blood pressure is high, you should take antihypertensive medication. If you have high blood sugar you should start taking antidiabetic medication. You can also take some antiemetic like domperidone. If the dizziness is due some other reason, you will need to consult a neurologist. Hope I have answered your query, if you have any further doubts or questions, do not hesitate to ask.


In [17]:
# =====================================================
#    8.3. Inference with QLoRA FT Model
# =====================================================
response = generate_chat_response(
    messages=messages,
    model=qlora_model,
    tokenizer=qlora_tokenizer,
    device="cuda",
    max_new_tokens=512,
    temperature=0.2,
    top_p=0.85,
    top_k=50,
    no_repeat_ngram_size=3,
)

print(response)


Hi, I understand your concern. It is not uncommon to experience dizziness and nausea after a bowel movement. It could be due to a few reasons like dehydration, low blood sugar, or even a mild infection. I would suggest you to drink plenty of fluids and eat light food. If the symptoms persist for more than 24 hours, please consult a doctor. Hope I have answered your query. Let me know if I can assist you further.


In [21]:
# =====================================================
#    8.4. Inference with RAG
# =====================================================
response = generate_RAG_response(
    query=user_message,
    index=index,
    qa_texts=texts,
    embed_model=embed_model,
    base_model=base_model,
    base_tokenizer=base_tokenizer,
    device="cuda",
    top_k=3,
)

print(f"{bold_text}Generated RAG Response:{reset_text}\n", response)


[1mGenerated RAG Response:[0m
 Based on your description, it sounds like you might be experiencing **benign positional vertigo** (BVPV). This condition is characterized by sudden, brief episodes of dizziness and spinning sensations, often triggered by changes in head position. 

Here's why your symptoms align with BPP:

* **Spinning sensation:** You describe feeling like your room is rotating when you move.
* **Nausea and vomiting:** These are common accompanying symptoms of BPP.
 
**What to do:**

*  **Rest:**  Give your body time to recover.
 * **Hydrate:**  Drink plenty of water to prevent dehydration.
  * **Avoid Caffeine and Alcohol:** These can worsen dizziness.
   * **Over-the-counter medications:**  Panadol is a good option for pain and fever. 
   
**Important:** While BPP is usually benign and resolves on it's own, it'll be best to consult with a doctor to rule out any other potential causes and get personalized advice. 


**Remember:** This information is for general knowle

In [None]:
#chatdoctor['train']['input'][0:5]

In [24]:
# =====================================================
#    8.5. Compare/Eval Questions
# =====================================================
instruction = "If you are a doctor, please answer the medical questions based on the patient's description."
user_messages = ["I woke up this morning feeling the whole room is spinning when i was sitting down. I went to the bathroom walking unsteadily, as i tried to focus i feel nauseous. I try to vomit but it wont come out.. After taking panadol and sleep for few hours, i still feel the same.. By the way, if i lay down or sit down, my head do not spin, only when i want to move around then i feel the whole world is spinning.. And it is normal stomach discomfort at the same time? Earlier after i relieved myself, the spinning lessen so i am not sure whether its connected or coincidences.. Thank you doc!", "Hello, My husband is taking Oxycodone due to a broken leg/surgery. He has been taking this pain medication for one month. We are trying to conceive our second baby. Will this medication afect the fetus? Or the health of the baby? Or can it bring birth defects? Thank you.", "my husband was working on a project in the house and all of a sudden a bump about the size of a half dollar appeard on his left leg inside below the knee. He is 69 years old and had triple by pass surgery 7 years ago. It stung when it first happened. Doesn t hurt now. He is seated with his leg ellevated. Is this an emergency?", "hi, i have been recently diagnosed with H pylori.. i have been give the triple treatment of calithramycin, amoxxicilin and omeprazole.. is dis something very serious and does this have some long term implications etc.can u plz give some detailed information about Hpylori. thnx", "I sprained my foot on Friday. I stepped on what I thought was lawn. Instead I was a bit on the lawn, but the outside of my foot snapped down into a trough. My foot has been swollen since Friday. Today is Thursday. Kept ice on it for 24 hours. Bruising started after that, Now bruising is spreading. It is really dark on my toes. Pain is on outside of my foot, not on my ankle."]

#indices = random.sample(range(len(chatdoctor['train']['input'])), 5)
indices = random.sample(range(len(chatdoctor['train']['input'][0:5])), 5)
#inputs = [chatdoctor['train']['input'][i] for i in indices]
#outputs = [chatdoctor['train']['output'][i] for i in indices]

questions = list()
reference = list()
responsesFull = list()
responsesLoRA = list()
responsesQLoRA = list()
responsesRAG = list()

for idx in indices:
    user_message = chatdoctor['train']['input'][idx]
    output = chatdoctor['train']['output'][idx]
    
    messages = [
        {"role": "user", "content": f"INSTRUCTION:\n{instruction}\n\nPATIENT MESSAGE:\n{user_message}"}
    ]
    print(f"{bold_text}Question:{reset_text}\n {user_message}\n")
    questions.append(user_message)
    reference.append(output)
        
    responseFullFT = generate_chat_response(
        messages=messages,
        model=full_model,
        tokenizer=full_tokenizer,
        device="cuda",
        max_new_tokens=512,
        temperature=0.2,
        top_p=0.85,
        top_k=50,
        no_repeat_ngram_size=3,
    )
    responsesFull.append(responseFullFT)
    print(f"{bold_text}Full FT Response:{reset_text}\n {responseFullFT}\n")

    responseLoRAFT = generate_chat_response(
        messages=messages,
        model=lora_model,
        tokenizer=lora_tokenizer,
        device="cuda",
        max_new_tokens=512,
        temperature=0.2,
        top_p=0.85,
        top_k=50,
        no_repeat_ngram_size=3,
    )
    responsesLoRA.append(responseLoRAFT)
    print(f"{bold_text}LoRA FT Response:{reset_text}\n {responseLoRAFT}\n")
    
    responseQLoRAFT = generate_chat_response(
        messages=messages,
        model=qlora_model,
        tokenizer=qlora_tokenizer,
        device="cuda",
        max_new_tokens=512,
        temperature=0.2,
        top_p=0.85,
        top_k=50,
        no_repeat_ngram_size=3,
    )
    responsesQLoRA.append(responseQLoRAFT)
    print(f"{bold_text}QLoRA FT Response:{reset_text}\n {responseQLoRAFT}\n")

    responseRAG = generate_RAG_response(
        query=user_message,
        index=index,
        qa_texts=texts,
        embed_model=embed_model,
        base_model=base_model,
        base_tokenizer=base_tokenizer,
        device="cuda",
        top_k=3,
    )
    responsesRAG.append(responseRAG)
    print(f"{bold_text}RAG ReLORA_FT_MODEL_PATH:{reset_text}\n {responseRAG}\n")
    print()
    print()
    print()


[1mQuestion:[0m
 lump under left nipple and stomach pain (male) Hi,I have recently noticed a few weeks ago a lump under my nipple, it hurts to touch and is about the size of a quarter. Also I have bern experiencing stomach pains that prevent me from eating. I immediatly feel full and have extreme pain. Please help

[1mFull FT Response:[0m
 Hi, dairy have gone through your question. I can understand your concern. You may have some benign cyst like sebaceous cyst or Desmond cyst or some other soft tissue tumor.  You should go for fine needle aspiration cytology or biopsy of that lump. It will give you exact diagnosis. Then you should take treatment accordingly. Hope I have answered your question, if you have doubt then I will be happy to answer. Thanks for using Chat Doctor. Wish you a very good health.

[1mLoRA FT Response:[0m
 Hi, Thanks for your query. I have gone through your query and I can understand your concern. The lump under the nipple is a benign cyst or a lipoma. It is 

#### BERTScore

**BERTScore** evaluates the similarity between generated text and reference text using contextual embeddings from a pre-trained BERT model.

**How it works:**
1. Each token in the candidate and reference text is encoded into a contextual embedding using BERT.
2. A pairwise cosine similarity is computed between all candidate and reference token embeddings.
3. Precision, recall, and F1 scores are calculated based on matching embeddings, capturing semantic similarity rather than just exact word overlap.
Captures meaning and context, unlike traditional metrics like BLEU or ROUGE.

**Key Advantatge:** Captures meaning and context, unlike traditional metrics like BLEU or ROUGE.


In [25]:
# =====================================================
#    8.6. BERTScore Evaluation Questions
# =====================================================
# Only show errors, not warnings
logging.set_verbosity_error()

# BERTScore evaluates the similarity between generated text and reference text using contextual embeddings from a pre-trained BERT model.
roberta_large_model_path = os.getenv("ROBERTA_LARGE_MODEL_PATH", None)

# Full FT Model
P_full, R_full, F1_full = bert_score_function(responsesFull, reference, lang="en", model_type=roberta_large_model_path, num_layers=17, verbose=True)

# LoRA FT Model
P_lora, R_lora, F1_lora = bert_score_function(responsesLoRA, reference, lang="en", model_type=roberta_large_model_path, num_layers=17, verbose=True)

# QLoRA FT Model
P_qlora, R_qlora, F1_qlora = bert_score_function(responsesQLoRA, reference, lang="en", model_type=roberta_large_model_path, num_layers=17, verbose=True)

# RAG
P_rag, R_rag, F1_rag = bert_score_function(responsesRAG, reference, lang="en", model_type=roberta_large_model_path, num_layers=17, verbose=True)

#Precision (P), Recall (R), F1 (balanced measure of Precision & Recall)
# Always use F1 for BERTScore as the main comparison.
# You can also look at P vs R:
#    High P, low R ‚Üí model is precise but misses content
#    Low P, high R ‚Üí model is verbose but partially correct
    

calculating scores...
computing bert embedding.


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 13.10it/s]


computing greedy matching.


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 109.52it/s]

done in 0.09 seconds, 56.87 sentences/sec





calculating scores...
computing bert embedding.


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 14.44it/s]


computing greedy matching.


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 431.69it/s]

done in 0.07 seconds, 67.85 sentences/sec





calculating scores...
computing bert embedding.


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 14.91it/s]


computing greedy matching.


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 467.96it/s]

done in 0.07 seconds, 70.09 sentences/sec





calculating scores...
computing bert embedding.


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00,  6.30it/s]


computing greedy matching.


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1/1 [00:00<00:00, 366.38it/s]

done in 0.16 seconds, 30.56 sentences/sec





In [26]:
#help(bert_score_function)

In [27]:
# BERT Score
# Full FT Model
bert_f1_full = F1_full.tolist()
print("BERTScore F1 Full:", sum(bert_f1_full)/len(bert_f1_full))

# LoRA FT Model
bert_f1_lora = F1_lora.tolist()
print("BERTScore F1 LoRA:", sum(bert_f1_lora)/len(bert_f1_lora))

# QLoRA FT Model
bert_f1_qlora = F1_qlora.tolist()
print("BERTScore F1 QLoRA:", sum(bert_f1_qlora)/len(bert_f1_qlora))

# RAG
bert_f1_rag = F1_rag.tolist()
print("BERTScore F1 RAG:", sum(bert_f1_rag)/len(bert_f1_rag))


BERTScore F1 Full: 0.879701840877533
BERTScore F1 LoRA: 0.8564440488815308
BERTScore F1 QLoRA: 0.8555079936981201
BERTScore F1 RAG: 0.8291263937950134


#### 8.7. ROUGE Score 

**ROUGE (Recall-Oriented Understudy for Gisting Evaluation)** measures the quality of generated text by comparing it to reference text using overlapping units such as n-grams, word sequences, or word pairs.

**How it works:**
1. Count overlapping n-grams (ROUGE-N), longest common subsequence (ROUGE-L), or skip-grams between candidate and reference text.
2. Compute Recall, Precision, and F1 based on these overlaps.
3. Higher scores indicate closer match to the reference text.

**Key Advantatge:** Simple and widely used for summarization and text generation evaluation.

In [28]:
# =====================================================
#    8.7. ROUGE Score Evaluation Questions
# =====================================================
# Initialize scorer with metrics you want
scorer = rouge_scorer_function.RougeScorer(
    ["rouge1", "rouge2", "rougeL"],  # Unigram, bigram, and longest common subsequence
    use_stemmer=True,
)

def compute_rouge(reference, responses, scorer):
    scores = [scorer.score(ref, pred) for ref, pred in zip(reference, responses)]

    rouge1_f1 = np.mean([s['rouge1'].fmeasure for s in scores])
    rouge2_f1 = np.mean([s['rouge2'].fmeasure for s in scores])
    rougeL_f1 = np.mean([s['rougeL'].fmeasure for s in scores])

    return rouge1_f1, rouge2_f1, rougeL_f1


In [29]:
# Full FT Model
r1, r2, rL = compute_rouge(reference, responsesFull, scorer)
print(f"{bold_text}Full FT Model{reset_text}")
print(f"\tAvg ROUGE-1 F1: {r1:.4f}")
print(f"\tAvg ROUGE-2 F1: {r2:.4f}")
print(f"\tAvg ROUGE-L F1: {rL:.4f}")

# LoRA FT Model
r1, r2, rL = compute_rouge(reference, responsesLoRA, scorer)
print(f"{bold_text}LoRA FT Model{reset_text}")
print(f"\tAvg ROUGE-1 F1: {r1:.4f}")
print(f"\tAvg ROUGE-2 F1: {r2:.4f}")
print(f"\tAvg ROUGE-L F1: {rL:.4f}")

# QLoRA FT Model
r1, r2, rL = compute_rouge(reference, responsesQLoRA, scorer)
print(f"{bold_text}QLoRA FT Model{reset_text}")
print(f"\tAvg ROUGE-1 F1: {r1:.4f}")
print(f"\tAvg ROUGE-2 F1: {r2:.4f}")
print(f"\tAvg ROUGE-L F1: {rL:.4f}")

# RAG Model
r1, r2, rL = compute_rouge(reference, responsesRAG, scorer)
print(f"{bold_text}RAG Model{reset_text}")
print(f"\tAvg ROUGE-1 F1: {r1:.4f}")
print(f"\tAvg ROUGE-2 F1: {r2:.4f}")
print(f"\tAvg ROUGE-L F1: {rL:.4f}")


[1mFull FT Model[0m
	Avg ROUGE-1 F1: 0.4262
	Avg ROUGE-2 F1: 0.2210
	Avg ROUGE-L F1: 0.3201
[1mLoRA FT Model[0m
	Avg ROUGE-1 F1: 0.2892
	Avg ROUGE-2 F1: 0.0573
	Avg ROUGE-L F1: 0.1671
[1mQLoRA FT Model[0m
	Avg ROUGE-1 F1: 0.3143
	Avg ROUGE-2 F1: 0.0518
	Avg ROUGE-L F1: 0.1801
[1mRAG Model[0m
	Avg ROUGE-1 F1: 0.2817
	Avg ROUGE-2 F1: 0.0364
	Avg ROUGE-L F1: 0.1458


#### 8.8: SentenceTransformers Embeddings Score

Use **SentenceTransformers embeddings** to evaluate the semantic similarity between generated answers and reference answers. This approach goes beyond exact word overlap by comparing **contextual embeddings**, providing a **meaning-aware evaluation** of the model‚Äôs responses.

**How it works:**
1. Convert both the generated answer and reference answer into vector embeddings using a SentenceTransformers model.
2. Compute cosine similarity between the embeddings to measure semantic closeness.
3. Higher similarity scores indicate that the generated answer is closer in meaning to the reference.

**Key Advantage:** Captures semantic meaning and context, not just exact word matches, making it ideal for evaluating paraphrased or flexible responses.


In [30]:
# =====================================================
#    8.8. SentenceTransformers embeddings Score Evaluation Questions
# =====================================================
sentence_transformer_model_path = os.getenv("SENTENCE_TRANSFORMER_SCORE_MODEL_PATH", None)

# Initialize model
modelSentenceTransformer = SentenceTransformer(sentence_transformer_model_path)

def compute_semantic_similarity(reference, responses, model):
    emb_ref = model.encode(reference, convert_to_tensor=True)
    emb_pred = model.encode(responses, convert_to_tensor=True)

    cos_scores = util.cos_sim(emb_pred, emb_ref).diagonal()
    cos_scores = cos_scores.tolist()

    avg_score = sum(cos_scores) / len(cos_scores)
    return avg_score, cos_scores


In [32]:
# Full FT Model
avg_score_full, _ = compute_semantic_similarity(reference, responsesFull, modelSentenceTransformer)
print(f"{bold_text}Average Full semantic similarity:{reset_text}", avg_score_full)

# LoRA FT Model
avg_score_lora, _ = compute_semantic_similarity(reference, responsesLoRA, modelSentenceTransformer)
print(f"{bold_text}Average LoRA semantic similarity:{reset_text}", avg_score_lora)

# QLoRA FT Model
avg_score_qlora, _ = compute_semantic_similarity(reference, responsesQLoRA, modelSentenceTransformer)
print(f"{bold_text}Average QLoRA semantic similarity:{reset_text}", avg_score_qlora)

# RAG
avg_score_rag, _ = compute_semantic_similarity(reference, responsesRAG, modelSentenceTransformer)
print(f"{bold_text}Average RAG semantic similarity:{reset_text}", avg_score_rag)

[1mAverage Full semantic similarity:[0m 0.7416023731231689
[1mAverage LoRA semantic similarity:[0m 0.7142608761787415
[1mAverage QLoRA semantic similarity:[0m 0.6832935690879822
[1mAverage RAG semantic similarity:[0m 0.8036062002182007


#### Human Scoring Rubric (1‚Äì5 scale)

| Dimension    | Description                 |
|-------------|-----------------------------|
| <strong>Correctness</strong> | Factually accurate           |
| <strong>Completeness</strong>| Fully answers question       |
| <strong>Clarity</strong>     | Easy to understand           |
| <strong>Faithfulness</strong>| Matches retrieved context    |
| <strong>Helpfulness</strong> | Actionable/useful            |
| <strong>Safety</strong>      | No harmful content           |