### LLaMA Supervised Fine-Tuning on Ground Truth Answers

This document will take the ground truth answers of the Kababutare Medical Dataset and then fine-tune the LLaMA Model on those answers.

The purpose of this exercise is to test whether the LLaMA fine-tuning on ground truth answers will improve the performance on the open-ended question/answering related to healthcare dataset

In [1]:
import os

In [2]:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [3]:
import pandas as pd
import json
import torch
import pickle
from unsloth import FastLanguageModel, is_bfloat16_supported, train_on_responses_only
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from datasets import Dataset
from trl import SFTTrainer

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


  from .autonotebook import tqdm as notebook_tqdm


🦥 Unsloth Zoo will now patch everything to make training faster!


#### Reading the Question and Answer Pairs from Training Dataset Phase 2

In [4]:
ques_list = []
ans_list = []

with open('phase2_data_kabatubare/train_kabatubare.jsonl', 'rb') as file: #only reading the training dataset
    for line in file:
        json_object = json.loads(line)
        ques_list.append(json_object['question'])
        ans_list.append(json_object['answer']) #getting GPT Responses from the dataset

In [6]:
groundtruth_data = pd.DataFrame({'question': ques_list, 'answer': ans_list})
groundtruth_data

Unnamed: 0,question,answer
0,i have a small dull ache in my left testicle a...,just give it a rest my friend. it is not likel...
1,i've heard conflicting opinions. 7 weeks of pr...,hi the best thing is to live your life as norm...
2,my friend slept over that had fever blisters. ...,hi found you this little piece of info. oral h...
3,what are some common food triggers for migraines?,you can use information from your diary and tr...
4,why does grey hair itch so much? . why does my...,hair doesn't itch. scalps will itch. there are...
...,...,...
18744,how to make money online? . a brand new approa...,please do not post advertising on webmd answer...
18745,what can i eat with the stomach flu? . i can't...,the best foods to eat while feeling nausea are...
18746,what exams and tests help doctors to evaluate ...,doctors can often easily recognize ringworm wh...
18747,could i be pregnant?,the only way to know for sure you're pregnant ...


Create the HuggingFace Dataset from Pandas Dataframe

In [7]:
dataset = Dataset.from_pandas(groundtruth_data)
dataset = dataset.train_test_split(test_size=0.1) #dividing the training dataset into further train:validation dataset
dataset

DatasetDict({
    train: Dataset({
        features: ['question', 'answer'],
        num_rows: 16874
    })
    test: Dataset({
        features: ['question', 'answer'],
        num_rows: 1875
    })
})

### Fine-Tuning Code

#### Loading the model and tokenizer

In [8]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-3B-Instruct",
    max_seq_length = 2048,
    load_in_4bit = False, # 4 bit quantization to reduce memory
    load_in_8bit = True, # [NEW!] A bit more accurate, uses 2x memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    dtype=None, #None for auto-detection. Can be torch.bfloat16 or torch.float16 (will be automatically detected)
    device_map="auto"
)

==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.50.2.
   \\   /|    NVIDIA RTX A6000. Num GPUs = 1. Max memory: 47.413 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.4.1+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.0.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post1. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


#### Setting up the PEFT settings for the model

https://huggingface.co/blog/damjan-k/rslora\
https://medium.com/@fartypantsham/what-rank-r-and-alpha-to-use-in-lora-in-llm-1b4f025fd133

In [9]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 64, #max_full_rank=64 by default in FastLanguageModel
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 64, #scaling_factor = lora_alpha/r. If we select lora_alpha = 2 * r then it will multiply the adapter weights by 2 which can be un-ncessary
    lora_dropout = 0.1,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    use_rslora = True,
    loftq_config = None,
)

Unsloth: Making `model.base_model.model.model` require gradients


#### Forming the chat template

In [10]:
# Define a function to apply the chat template
def format_chat_template(example):
        
    messages = [
        {"role": "system", "content": "You are a medical knowledge assistant trained to provide information and guidance on various health-related topics."},
        {"role": "user", "content": example['question']},
        {"role": "assistant", "content": example['answer']}
    ]
    
    prompt = tokenizer.apply_chat_template(messages, tokenize=False)

    return {"text": prompt}

In [11]:
dataset_formatted = dataset.map(format_chat_template)

Map: 100%|██████████| 16874/16874 [00:02<00:00, 6444.93 examples/s]
Map: 100%|██████████| 1875/1875 [00:00<00:00, 6524.32 examples/s]


In [12]:
print(dataset_formatted['train']['text'][0])

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 07 Apr 2025

You are a medical knowledge assistant trained to provide information and guidance on various health-related topics.<|eot_id|><|start_header_id|>user<|end_header_id|>

i am currently taking 1mg of klonopin twice a day & i would like to increase my 20mg prozac to twice a day. is this safe<|eot_id|><|start_header_id|>assistant<|end_header_id|>

the short answer is: no it's not safe. no one should adjust their doses of prescription drugs without consulting a doctor. in your case you're taking one anti-anxiety medication and one antidepressant. you should not adjust the dosage of either of these medications without checking with your doctor first. if you feel the prozac (fluoxetine) is not helping your depression i'd suggest you discuss your symptoms with your doctor. he or she may want to change your antidepressant medication to something else. there are plenty of ant

#### Initializing the TRL SFTTrainer and related Arguments

In [13]:
peft_model_path = "./llama32-sft-peft-kabatubare-phase3-groundtruth" #use for LoRA based fine-tuning

training_args = TrainingArguments(
        output_dir=peft_model_path,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        # gradient_accumulation_steps=4,
        eval_strategy="steps",
        eval_steps=50,
        logging_strategy="steps",
        logging_steps=50,
        save_strategy="steps",
        save_steps=1000,
        warmup_steps = 5,
        num_train_epochs = 1,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,
        seed = 42,
        report_to = "none",
    )

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset=dataset_formatted["train"],
    eval_dataset=dataset_formatted["test"],
    dataset_text_field = "text",
    max_seq_length = 2048,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer), #only use when using train_on_responses_only()
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = training_args)

Unsloth: Tokenizing ["text"] (num_proc=2): 100%|██████████| 16874/16874 [00:06<00:00, 2566.30 examples/s]
Unsloth: Tokenizing ["text"] (num_proc=2): 100%|██████████| 1875/1875 [00:01<00:00, 1141.32 examples/s]
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [14]:
trainer.train_dataset

Dataset({
    features: ['question', 'answer', 'text', 'input_ids', 'attention_mask'],
    num_rows: 16874
})

In [15]:
print(tokenizer.decode(trainer.train_dataset['input_ids'][0]))

<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 07 Apr 2025

You are a medical knowledge assistant trained to provide information and guidance on various health-related topics.<|eot_id|><|start_header_id|>user<|end_header_id|>

i am currently taking 1mg of klonopin twice a day & i would like to increase my 20mg prozac to twice a day. is this safe<|eot_id|><|start_header_id|>assistant<|end_header_id|>

the short answer is: no it's not safe. no one should adjust their doses of prescription drugs without consulting a doctor. in your case you're taking one anti-anxiety medication and one antidepressant. you should not adjust the dosage of either of these medications without checking with your doctor first. if you feel the prozac (fluoxetine) is not helping your depression i'd suggest you discuss your symptoms with your doctor. he or she may want to change your antidepressant medication to something else. there 

#### Only Focus on the `Response Part` for the generation

In [16]:
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|start_header_id|>user<|end_header_id|>\n\n",
    response_part = "<|start_header_id|>assistant<|end_header_id|>\n\n",
)

Map (num_proc=64): 100%|██████████| 16874/16874 [00:02<00:00, 7832.24 examples/s] 
Map (num_proc=64): 100%|██████████| 1875/1875 [00:01<00:00, 1235.05 examples/s]


In [17]:
trainer.train_dataset

Dataset({
    features: ['question', 'answer', 'text', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 16874
})

In [18]:
# The labels are created which only contain response. Left Padding is implemented and all the padding tokens are given a score of -100 to avoid loss calculation for pad_tokens
trainer.train_dataset['labels'][0]

[-100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 -100,
 1820,
 2875,
 4320,
 374,
 25,
 912,
 433,
 596,
 539,
 6220,
 13,
 912,
 832,
 1288,
 7652,
 872,
 35130,
 315,
 22866,
 11217,
 2085,
 31831,
 264,
 10896,
 13,
 304,
 701,
 1162,
 499,
 2351,
 4737,
 832,
 7294,
 19415,
 16708,
 24099,
 323,
 832,
 65211,
 519,
 13,
 499,
 1288,
 539,
 7652,
 279,
 47040,
 315,
 3060,
 315,
 1521,
 31010,
 2085,
 13598,
 449,
 701,
 1089

#### Train the model

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 16,874 | Num Epochs = 1 | Total steps = 2,110
O^O/ \_/ \    Batch size per device = 8 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (8 x 1 x 1) = 8
 "-____-"     Trainable parameters = 97,255,424/3,310,005,248 (2.94% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


#### Saving the model and tokenizer

Just save the LoRA Adapters without merging with base model

In [None]:
peft_model_path = "./llama32-sft-peft-kabatubare-phase3-groundtruth" #use for LoRA based fine-tuning

# Or run the two below statements
model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

### Inference

In [None]:
peft_model_path = "./llama32-sft-peft-kabatubare-phase3-groundtruth" #use for LoRA based fine-tuning

In [8]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = peft_model_path,
    max_seq_length = 2048,
    load_in_4bit = False, # 4 bit quantization to reduce memory
    load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    dtype=None, #None for auto-detection. Can be torch.bfloat16 or torch.float16 (will be automatically detected)
    device_map="auto"
)

==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.50.2.
   \\   /|    NVIDIA RTX A6000. Num GPUs = 1. Max memory: 47.413 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.4.1+cu124. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.0.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post1. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2025.3.19 patched 28 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


In [9]:
dataset['test']

Dataset({
    features: ['question', 'gpt_response_base'],
    num_rows: 1875
})

In [23]:
FastLanguageModel.for_inference(model)

# for idx in range(1,50):

idx = 1

print(dataset['test']['question'][idx])

messages = [{"role": "system", "content": "You are a medical knowledge assistant trained to provide information and guidance on various health-related topics."},
            {"role": "user", "content": dataset['test']['question'][idx]}]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(prompt, return_tensors='pt', padding=True, truncation=True).to(model.device)
temp_resp = tokenizer.decode(inputs['input_ids'][0], skip_special_tokens=True)

outputs = model.generate(**inputs, max_new_tokens=2048, num_return_sequences=1)

resp = tokenizer.decode(outputs[0], skip_special_tokens=True)
resp = resp[len(temp_resp):] #getting only the response part (i.e., assistant)

print(resp)
print('---------------------------------------------------')

can you identify this strange blister? . what is a small blister-like flat bubble on my shoulder that causes no pain and no itchiness? it may have gone down a bit during the three weeks it has been there. no spreading and no additional blisters found. first i have ever noticed in my 57 years.
I'm not a doctor, but I can provide some general information. A small blister-like flat bubble on the skin that is not painful or itchy could be due to several reasons. Here are a few possibilities:

1. **Cyst**: A sebaceous cyst is a common benign growth that can feel like a small bump or bubble under the skin. They are usually painless and may change in size.

2. **Lipoma**: This is a benign tumor made of fat tissue. It's typically soft, movable, and painless.

3. **Insect bite or sting**: Sometimes, insect bites can create small, raised bumps that may resemble blisters, but they are usually itchy.

4. **Folliculitis**: This is an inflammation of hair follicles that can create small bumps, often