# üß† Workshop: Adding Knowledge to LLMs  
### Dataset: lavita/ChatDoctor-HealthCareMagic-100k  
HuggingFace: https://huggingface.co/datasets/lavita/ChatDoctor-HealthCareMagic-100k  

### Base Model: google/gemma-2-2b-it  
HuggingFace: https://huggingface.co/google/gemma-2-2b-it  

---

## 1Ô∏è‚É£ Full Fine-Tuning (Full FT)

In **Full Fine-Tuning**, all model parameters are updated during training.  
This provides maximum domain adaptation but requires high GPU memory and compute.

---


In [1]:
# ============================================================
# Workshop: Adding Knowledge to LLMs
# ============================================================
# Dataset: lavita/ChatDoctor-HealthCareMagic-100k
#         HuggingFace Dataset Link: https://huggingface.co/datasets/lavita/ChatDoctor-HealthCareMagic-100k

# Model: google/gemma-2-2b-it
#         HuggingFace Model Link: https://huggingface.co/google/gemma-2-2b-it

# ============================================================
# Goal:
# - Fine-tune a model on Medical ChatDoctor Data using:
# 1) Full Fine-Tuning
# 2) LoRA
# 3) QLoRA (4-bit + LoRA)
# 4) Build a RAG baseline using the SAME data and Evaluate all approaches using the SAME questions
# 5) Create a Medical Agent
# ============================================================


In [2]:
# =====================================================
# Full Fine-Tuning
# =====================================================
# Check Current Path
!pwd

/leonardo/home/userexternal/gcortiad/workshop-AddingKnowledgeToLLMs/notebooks


---

### üì¶ Step 0: Environment Setup


In [3]:
# =====================================================
# 0. Setup
# =====================================================
import os
import torch
from datasets import load_dataset
from transformers import (
    AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer,
    DataCollatorForLanguageModeling, BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
from sklearn.model_selection import train_test_split
from utils.utils import get_gpu_memory, generate_chat_response
import bitsandbytes as bnb
import torch.nn as nn


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
# Define Environment Variables
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
os.environ["DATA_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/datasets/ChatDoctor-dataset/data/"
os.environ["MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/models/gemma-2-2b-it"
os.environ["FULL_FT_MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/FT-models/full_model_chatdoctor_gemma-2-2b-it/checkpoint-564/"


In [5]:
# Check path where we want to store out Full Fine-tuned Model
print(os.getenv("FULL_FT_MODEL_PATH", None))


/leonardo/home/userexternal/gcortiad/workshop-AddingKnowledgeToLLMs/FT-models/test/full_model_chatdoctor_gemma-2-2b-it


In [6]:
# Check GPUs available 
!nvidia-smi

Mon Feb 23 12:41:28 2026       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.274.02             Driver Version: 535.274.02   CUDA Version: 12.6     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA A100-SXM-64GB           On  | 00000000:1D:00.0 Off |                    0 |
| N/A   42C    P0              63W / 468W |      4MiB / 65536MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM-64GB           On  | 00000000:56:00.0 Off |  

In [7]:
gpu_mem = get_gpu_memory()
print(gpu_mem)

{'total_gb': 63.42, 'used_gb': 0.47, 'free_gb': 62.95, 'source': 'torch'}


---

### üì• Step 1: Load Dataset



In [8]:
# =====================================================
# 1. Load ChatDoctor Dataset
# =====================================================
# Load the dataset from the local directory
chatdoctor = load_dataset(os.getenv("DATA_PATH", None))


In [9]:
chatdoctor

DatasetDict({
    train: Dataset({
        features: ['instruction', 'input', 'output'],
        num_rows: 112165
    })
})

The dataset contains medical conversations between patients and doctors.

---

### üìÇ Step 2: Define Model Path and Load Tokenizer



In [10]:
# =====================================================
# 2. Model Path & Tokenizer
# =====================================================
# Define the model we want to fine tune.
model_path = os.getenv("MODEL_PATH", None)
model_name = str(model_path.split("/")[-1])

# Get Model Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
tokenizer.pad_token = tokenizer.eos_token


In [11]:
print(f"Model used for Full Fine-Tuning: {model_name}")

Model used for Full Fine-Tuning: gemma-2-2b-it


---

### üßπ Step 3: Apply Chat Template to the Data + Tokenization


In [12]:
# =====================================================
# 3. Apply Chat Template & Data Collator with Dynamic Padding
# =====================================================
def format_chat_template(row):
    row_json = [
        {"role": "user", "content": f"INSTRUCTION:\n{row['instruction']}\n\nPATIENT MESSAGE:\n{row['input']}"},
        {"role": "assistant", "content": row["output"]}
    ]
    row["text"] = tokenizer.apply_chat_template(row_json, tokenize=False)
    return row

# Apply chat template to all data
chatdoctor = chatdoctor.map(format_chat_template, num_proc=4)

# Split Train and Test datasets
split_dataset = chatdoctor['train'].train_test_split(
    test_size=0.20,
    seed=42,
    shuffle=True,
)
train_dataset = split_dataset['train']
val_dataset = split_dataset['test']

# Define the Data Collator for creating batches of the data
def data_collator(batch):
    tokenized = tokenizer(
        [example["text"] for example in batch],
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=2048,
    )
    # For causal LM, labels are just input_ids
    tokenized["labels"] = tokenized["input_ids"].clone()
    return tokenized

# Subsample for workshop (select only X rows)
train_data = train_dataset.select(range(3000)) #.shuffle(seed=42).select(range(2000)) # Shuffle before choosing X rows
val_data = val_dataset.select(range(300))


---
### ü§ñ Step 4: Load Gemma Model and Run the Full Fine Tuning


In [15]:
# =====================================================
# 4. Read Model + Full Fine-Tuning
# =====================================================
# Read Base Model
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map='auto')

# Define Training Arguments
train_args = TrainingArguments(
    per_device_train_batch_size=2,    # Try distinct values of batch_size
    gradient_accumulation_steps=8,    # Try distinct values of gradients accumulation

    num_train_epochs=3, 
    learning_rate=5e-5, #2e-4,        # Try distinct learning_rate values
    
    fp16=False,
    bf16=True,
    
    logging_strategy="steps",
    logging_steps=100,
    warmup_steps=30,
    
    eval_steps=100,
    eval_strategy="steps",

    output_dir=os.environ["FULL_FT_MODEL_PATH"],
    save_total_limit=2,
    save_strategy="epoch",
    
    report_to="none",
    remove_unused_columns=False,
)

# Trainer class
full_trainer = SFTTrainer(
    model=model,
    args=train_args,
    train_dataset=train_data,
    eval_dataset=val_data,
    data_collator=data_collator,
)

# Before training
torch.cuda.reset_peak_memory_stats()
print("Allocated before training:", torch.cuda.memory_allocated()/1e9, "GB")
print("Reserved before training:", torch.cuda.memory_reserved()/1e9, "GB")

# Train Full Model with the Medical Q&A data.
# After training, get peak memory usage
full_trainer.train()

print("Peak Allocated during training:", torch.cuda.max_memory_allocated()/1e9, "GB")
print("Peak Reserved during training:", torch.cuda.max_memory_reserved()/1e9, "GB")


Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:01<00:00,  1.17it/s]
Adding EOS to train dataset: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3000/3000 [00:00<00:00, 12856.89 examples/s]
Tokenizing train dataset: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3000/3000 [00:01<00:00, 1886.13 examples/s]
Truncating train dataset: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3000/3000 [00:01<00:00, 2928.35 examples/s]
Adding EOS to eval dataset: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 300/300 [00:00<00:00, 10456.48 examples/s]
Tokenizing eval dataset: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 300/300 [00:00<00:00, 1822.99 examples/s]
Truncating eval dataset: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 300/300 [00:00<00:00, 54696.42 examples/s]
The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 1}.


Allocated before training: 4.006139904 GB
Reserved before training: 4.028628992 GB


Step,Training Loss,Validation Loss,Entropy,Num Tokens,Mean Token Accuracy
100,2.3471,1.738018,2.457841,421433.0,0.639764
200,1.9898,1.793467,1.690795,840380.0,0.647913
300,1.2034,1.816977,1.586412,1265617.0,0.651943
400,1.024,2.336069,1.012148,1688143.0,0.644566
500,0.3971,2.519331,0.908757,2109549.0,0.642725


Peak Allocated during training: 20.903665664 GB
Peak Reserved during training: 60.599304192 GB


---
#### üíæ Step 4.1: Save Full Fine-Tuned Model


In [16]:
# =====================================================
#    4.1. Save Full Fine-Tuning Model
# =====================================================
# FT Full Model - ChatDoctor
full_ft_model_chatdoctor = full_trainer.model

# Save Full Model - ChatDoctor
save_path_full_ft_model = os.getenv("FULL_FT_MODEL_PATH", None)
full_ft_model_chatdoctor.save_pretrained(save_path_full_ft_model)
tokenizer.save_pretrained(save_path_full_ft_model)


('/leonardo/home/userexternal/gcortiad/workshop-AddingKnowledgeToLLMs/FT-models/test/full_model_chatdoctor_gemma-2-2b-it/tokenizer_config.json',
 '/leonardo/home/userexternal/gcortiad/workshop-AddingKnowledgeToLLMs/FT-models/test/full_model_chatdoctor_gemma-2-2b-it/special_tokens_map.json',
 '/leonardo/home/userexternal/gcortiad/workshop-AddingKnowledgeToLLMs/FT-models/test/full_model_chatdoctor_gemma-2-2b-it/chat_template.jinja',
 '/leonardo/home/userexternal/gcortiad/workshop-AddingKnowledgeToLLMs/FT-models/test/full_model_chatdoctor_gemma-2-2b-it/tokenizer.model',
 '/leonardo/home/userexternal/gcortiad/workshop-AddingKnowledgeToLLMs/FT-models/test/full_model_chatdoctor_gemma-2-2b-it/added_tokens.json',
 '/leonardo/home/userexternal/gcortiad/workshop-AddingKnowledgeToLLMs/FT-models/test/full_model_chatdoctor_gemma-2-2b-it/tokenizer.json')

---
#### üîÑ Step 5: Restart Kernel


In [None]:
# ========================================
# 5. Restart Kernel 
# ========================================
# Restart Kernel to clear cached objects and training artifacts
# and to free GPU Memory (VRAM). This ensures a clean state for inference
# and prevent Out-Of-Memory (OOM) errors.


---

### üîÆ Step 6: Inference with Base Model and Full FT Model


In [10]:
# =====================================================
# 6. Inference with Base Model and Full FT Model
# =====================================================
# Import models alone
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
from utils.utils import generate_chat_response
import os

# Define Environment Variables
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
os.environ["DATA_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/datasets/ChatDoctor-dataset/data/"
os.environ["MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/models/gemma-2-2b-it"
os.environ["FULL_FT_MODEL_PATH"] = "/leonardo_work/tra26_minwinsc/workshop-AddingKnowledgeToLLMs/FT-models/full_model_chatdoctor_gemma-2-2b-it/checkpoint-564/"

# Define path of the Base Model
base_model_path = os.getenv("MODEL_PATH", None)
base_model_name = str(base_model_path.split("/")[-1])

# Define the path where Full FT Model is saved.
save_path_full_ft_model = os.getenv("FULL_FT_MODEL_PATH", None)

# Read Base Model and Base Tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_path,
    torch_dtype=torch.bfloat16,    # Reduce GPU memory
    device_map="auto"              # Automatically put layers on GPU
)
base_tokenizer = AutoTokenizer.from_pretrained(base_model_path)


# Read Full FT Model and Full FT Tokenizer
full_model = AutoModelForCausalLM.from_pretrained(
    save_path_full_ft_model,
    torch_dtype=torch.bfloat16,    # Reduce GPU memory
    device_map="auto"              # Automatically put layers on GPU
)
full_tokenizer = AutoTokenizer.from_pretrained(save_path_full_ft_model)


Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:01<00:00,  1.18it/s]
Loading checkpoint shards: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2/2 [00:01<00:00,  1.07it/s]
The module name  (originally ) is not a valid Python identifier. Please rename the original module to avoid import issues.


In [11]:
# How to do inference?
#help(generate_chat_response)

---

### ‚úÖ Step 6.1: Inference with Base Model


In [12]:
# =====================================================
#    6.1. Inference with Base Model
# =====================================================
instruction = "If you are a doctor, please answer the medical questions based on the patient's description."

user_message = "I woke up this morning feeling the whole room is spinning when i was sitting down. I went to the bathroom walking unsteadily, as i tried to focus i feel nauseous. I try to vomit but it wont come out.. After taking panadol and sleep for few hours, i still feel the same.. By the way, if i lay down or sit down, my head do not spin, only when i want to move around then i feel the whole world is spinning.. And it is normal stomach discomfort at the same time? Earlier after i relieved myself, the spinning lessen so i am not sure whether its connected or coincidences.. Thank you doc!"
user_message2 = "Hello, My husband is taking Oxycodone due to a broken leg/surgery. He has been taking this pain medication for one month. We are trying to conceive our second baby. Will this medication afect the fetus? Or the health of the baby? Or can it bring birth defects? Thank you."

messages = [
    {"role": "user", "content": f"INSTRUCTION:\n{instruction}\n\nPATIENT MESSAGE:\n{user_message}"}
]

response = generate_chat_response(
    messages=messages,
    model=base_model,
    tokenizer=base_tokenizer,
    device="cuda",
    max_new_tokens=512,
    temperature=0.2,
    top_p=0.85,
    top_k=50,
    no_repeat_ngram_size=3,
)

print(response)


I understand you're experiencing dizziness and nausea, and I'm sorry to hear you've been feeling unwell.  

**I am an AI and cannot provide medical advice. The information below is for informational purposes only and does not constitute medical advice.**

Based on your description, it sounds like you might be experiencing **vertigo**, which is a feeling of dizziness or spinning.  It's important to see a doctor to get a proper diagnosis and treatment plan. 

Here's why you should seek medical attention:

* **Possible causes:** Your symptoms could be caused by various factors, including inner ear problems, migraines, low blood sugar, dehydration, or even medication side effects. 
* **Underlying conditions:**  Dizziness can sometimes be a symptom of a more serious underlying condition that requires medical attention.
*  **Proper diagnosis:** A doctor can perform a physical exam, review your medical history, and potentially order tests to determine the cause of your symptoms.

**What you c

---

### üß™ Step 6.2: Inference with Full FT Model


In [13]:
# =====================================================
#    6.2. Inference with Full FT Model
# =====================================================
response = generate_chat_response(
    messages=messages,
    model=full_model,
    tokenizer=full_tokenizer,
    device="cuda",
    max_new_tokens=512,
    temperature=0.2,
    top_p=0.85,
    top_k=50,
    no_repeat_ngram_size=3,
)

print(response)


Hi, I think you may be having a problem called vertigo. In vertigo, the sensation of movement is experienced when a person is not actually moving. There are certain medical conditions that can cause vertigo, and the most common is benign positional paroxysmal vertigo (BPPV). BPPV is a condition that is completely reversible, and you may need treatment in the form of betahistine tablets. I would suggest that you consult your family physician for treatment. Hope I have answered your query. Let me know if I can assist you further.
