### LLaMA Supervised Fine-Tuning

This document will take the answers of GPT-4o on the Kababutare Medical Dataset and then fine-tune the LLaMA Model on those answers.

The purpose of this exercise is to test whether the LLaMA fine-tuning is able to distill the knowledge of GPT-4o and improve the performance on the open-ended question/answering related to healthcare dataset

In [1]:
import os

In [2]:
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [3]:
import pandas as pd
import json
import torch
import pickle
from unsloth import FastLanguageModel
from datasets import Dataset
from tqdm  import tqdm
import evaluate

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


  from .autonotebook import tqdm as notebook_tqdm


🦥 Unsloth Zoo will now patch everything to make training faster!


#### Reading the Question and Answer Pairs from Test Dataset Phase 2

In [4]:
ques_list = []
ans_list = []
llama_resp_list = []
gpt_resp_list = []

with open('phase2_data_kabatubare/test_kabatubare.jsonl', 'rb') as file:
    for line in file:
        json_object = json.loads(line)
        ques_list.append(json_object['question'])
        ans_list.append(json_object['answer'])
        llama_resp_list.append(json_object['llama_response_base'])
        gpt_resp_list.append(json_object['gpt_response_base'])

In [5]:
test_dataset = pd.DataFrame({'question': ques_list,
                          'answer': ans_list,
                          'llama_response_base': llama_resp_list,
                          'gpt_response_base': gpt_resp_list})
test_dataset

Unnamed: 0,question,answer,llama_response_base,gpt_response_base
0,i don't have periods due to taking nuvaring. h...,you really shouldn't worry about getting pregn...,I'm happy to help you with your concern. Since...,If you're using NuvaRing and have missed it by...
1,when you can't digest food. my daughter is 38 ...,i'm sorry your daughter is having a hard time....,I'm so sorry to hear that your daughter is exp...,I'm sorry to hear about your daughter's strugg...
2,8 wks post-hemorrhoidectomy. knife-like pain o...,i'm sorry you're going through this. hemorrhoi...,"I'm not a doctor, but I can try to provide som...",I'm sorry to hear that you're experiencing the...
3,i sometimes feel a mild discomfort under my le...,after reading your full statement i would also...,"I'm not a medical professional, but I can try ...","I'm not a doctor, but I can offer some general..."
4,i had sex about 12 hours ago and i'm noticing ...,hi ok if your shaved and this is on your pubic...,"I'm not a medical professional, but I can prov...",It's good that you're paying attention to chan...
...,...,...,...,...
4683,bright blood in stool for a while now burning ...,hi but as your not really well now i still thi...,"I'm not a doctor, but I can provide some gener...","I'm not a doctor, but the symptoms you're desc..."
4684,i feel fine i'm in a great mood but food taste...,well i'm happy you're in a good mood anyway! i...,"I'm glad you're feeling fine overall, but the ...",It's good to hear that you're feeling fine and...
4685,i have recently had unprotected sex and both m...,"you had unprotected sex? it is your ""other"" he...",I'm here to provide you with information and g...,Experiencing irritation on the head of the pen...
4686,is there a coated naproxen medication to preve...,all over-the-counter and prescription-strength...,"Yes, there are coated naproxen medications ava...","Yes, there are formulations of naproxen that a..."


### Inference

In [None]:
# full_model_path = "./llama32-sft-full-kabatubare-phase2-distill"
peft_model_path = "./llama32-sft-peft-kabatubare-phase2-distill" #use for LoRA based fine-tuning

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = peft_model_path,
    max_seq_length = 2048,
    load_in_4bit = False, # 4 bit quantization to reduce memory
    load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    dtype=None, #None for auto-detection. Can be torch.bfloat16 or torch.float16 (will be automatically detected)
    device_map="auto"
)

Implementing sample-by-sample inference. (Batch Inference doesn't work well for fine-tuned model adapters as responses like `P P P P` are being produced)

In [None]:
def get_llama_response_ft(question_input: str):
    
    llama_input = [{"role": "system", "content": "You are a medical knowledge assistant trained to provide information and guidance on various health-related topics."},
                    {"role": "user", "content": question_input}]

    prompt = tokenizer.apply_chat_template(llama_input, tokenize=False, add_generation_prompt=True)
    
    inputs = tokenizer(prompt, padding=True, truncation=True, return_tensors="pt").to(model.device)
    temp_resp = tokenizer.decode(inputs['input_ids'][0], skip_special_tokens=True)
    
    outputs = model.generate(
        **inputs, 
        max_new_tokens=2048,
        num_return_sequences=1
    )

    resp = tokenizer.decode(outputs[0], skip_special_tokens=True)
    resp = resp[len(temp_resp):] #getting only the response part (i.e., assistant)
    
    return resp

In [None]:
# # Implementing the Unsloth Fast Inference
# FastLanguageModel.for_inference(model)

# llama_responses_distill_ft = []
# for index, row in tqdm(test_dataset.iterrows(), total=len(test_dataset)):
#     question_input = row['question']
#     llama_resp = get_llama_response_ft(question_input)
#     llama_responses_distill_ft.append(llama_resp)

# with open('phase2_kabatubare_medical/llama_responses_distill_ft.pkl', 'wb') as file:
#     pickle.dump(llama_responses_distill_ft, file)

In [None]:
with open('phase2_kabatubare_medical/llama_responses_distill_ft.pkl', 'rb') as file:
    llama_responses_distill_ft = pickle.load(file)

### Saving the LLaMA Fine-Tuned Responses into the complete dataframe

In [None]:
test_dataset['llama_responses_distill_ft'] = llama_responses_distill_ft
test_dataset

Unnamed: 0,question,answer,llama_response_base,gpt_response_base,llama_response_ft
0,i don't have periods due to taking nuvaring. h...,you really shouldn't worry about getting pregn...,I'm happy to help you with your concern. Since...,If you're using NuvaRing and have missed it by...,If you are using the NuvaRing and have missed ...
1,when you can't digest food. my daughter is 38 ...,i'm sorry your daughter is having a hard time....,I'm so sorry to hear that your daughter is exp...,I'm sorry to hear about your daughter's strugg...,I'm sorry to hear about your daughter's strugg...
2,8 wks post-hemorrhoidectomy. knife-like pain o...,i'm sorry you're going through this. hemorrhoi...,"I'm not a doctor, but I can try to provide som...",I'm sorry to hear that you're experiencing the...,"I'm not a doctor, but I can provide some gener..."
3,i sometimes feel a mild discomfort under my le...,after reading your full statement i would also...,"I'm not a medical professional, but I can try ...","I'm not a doctor, but I can offer some general...","I'm not a doctor, but I can provide some infor..."
4,i had sex about 12 hours ago and i'm noticing ...,hi ok if your shaved and this is on your pubic...,"I'm not a medical professional, but I can prov...",It's good that you're paying attention to chan...,It's important to monitor any new or unusual s...
...,...,...,...,...,...
4683,bright blood in stool for a while now burning ...,hi but as your not really well now i still thi...,"I'm not a doctor, but I can provide some gener...","I'm not a doctor, but the symptoms you're desc...","I'm not a doctor, but the symptoms you're desc..."
4684,i feel fine i'm in a great mood but food taste...,well i'm happy you're in a good mood anyway! i...,"I'm glad you're feeling fine overall, but the ...",It's good to hear that you're feeling fine and...,It sounds like you're experiencing a change in...
4685,i have recently had unprotected sex and both m...,"you had unprotected sex? it is your ""other"" he...",I'm here to provide you with information and g...,Experiencing irritation on the head of the pen...,It's not uncommon to experience some irritatio...
4686,is there a coated naproxen medication to preve...,all over-the-counter and prescription-strength...,"Yes, there are coated naproxen medications ava...","Yes, there are formulations of naproxen that a...","Yes, there are formulations of naproxen that a..."


### Calculating the BLEU Results for Phase 2

LLaMA Response Base

In [8]:
bleu_eval = evaluate.load("bleu")
bleu_results = bleu_eval.compute(predictions=test_dataset['llama_response_base'].to_list(), references=test_dataset['answer'].to_list())
bleu_results

{'bleu': 0.008451293070833801,
 'precisions': [0.09561372031751067,
  0.015226588445850224,
  0.0034674400620675993,
  0.0010105571055748644],
 'brevity_penalty': 1.0,
 'length_ratio': 4.6308406208176915,
 'translation_length': 2377117,
 'reference_length': 513323}

GPT Response Base

In [9]:
bleu_eval = evaluate.load("bleu")
bleu_results = bleu_eval.compute(predictions=test_dataset['gpt_response_base'].to_list(), references=test_dataset['answer'].to_list())
bleu_results

{'bleu': 0.012913859036620538,
 'precisions': [0.1465048240902543,
  0.022814594699299896,
  0.00533860905705456,
  0.00155858812772994],
 'brevity_penalty': 1.0,
 'length_ratio': 2.7546963607708985,
 'translation_length': 1414049,
 'reference_length': 513323}

LLaMA Response Fine-Tune

In [None]:
bleu_eval = evaluate.load("bleu")
bleu_results = bleu_eval.compute(predictions=test_dataset['llama_responses_distill_ft'].to_list(), references=test_dataset['answer'].to_list())
bleu_results

{'bleu': 0.012102035371270584,
 'precisions': [0.1481477912408694,
  0.022289582326031166,
  0.004935573301325259,
  0.0013161305014180052],
 'brevity_penalty': 1.0,
 'length_ratio': 2.628049006181293,
 'translation_length': 1349038,
 'reference_length': 513323}

### Calculating the ROUGE Results for Phase 2

LLaMA Response Base

In [14]:
rouge_eval = evaluate.load("rouge")
rouge_results = rouge_eval.compute(predictions=test_dataset['llama_response_base'].to_list(), references=test_dataset['answer'].to_list())
rouge_results

{'rouge1': np.float64(0.17119481097208555),
 'rouge2': np.float64(0.03278281106824186),
 'rougeL': np.float64(0.0919086145945601),
 'rougeLsum': np.float64(0.1249113699917701)}

GPT Response Base

In [15]:
rouge_eval = evaluate.load("rouge")
rouge_results = rouge_eval.compute(predictions=test_dataset['gpt_response_base'].to_list(), references=test_dataset['answer'].to_list())
rouge_results

{'rouge1': np.float64(0.22386597577345274),
 'rouge2': np.float64(0.04213889578666414),
 'rougeL': np.float64(0.11631453151944982),
 'rougeLsum': np.float64(0.15270435062331955)}

LLaMA Response Fine-Tuned

In [None]:
rouge_eval = evaluate.load("rouge")
rouge_results = rouge_eval.compute(predictions=test_dataset['llama_responses_distill_ft'].to_list(), references=test_dataset['answer'].to_list())
rouge_results

{'rouge1': np.float64(0.2221762223815988),
 'rouge2': np.float64(0.04036304115754091),
 'rougeL': np.float64(0.11650390932105509),
 'rougeLsum': np.float64(0.15147386648116043)}