This notebook will deal with the batch inference of medical healthcare question/answer dataset using LLaMA model.

https://huggingface.co/datasets/Kabatubare/medical

After the inference, the LLaMA answers will be evaluated using BLEU/ROUGE against the groundtruth answers 

In [1]:
import os

In [2]:
os.environ["CUDA_VISIBLE_DEVICES"] = "7"

In [3]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset
from tqdm import tqdm
import pandas as pd
import numpy as np
import pickle

  from .autonotebook import tqdm as notebook_tqdm


### Get the Medical Healthcare dataset

In [7]:
dataset_name = "Kabatubare/medical"
dataset = load_dataset(dataset_name, split="all")
dataset

Dataset({
    features: ['Context', 'Question', 'Answer'],
    num_rows: 23437
})

Reading one instance of dataset

In [8]:
dataset[0]

{'Context': 'You are a medical knowledge assistant trained to provide information and guidance on various health-related topics.',
 'Question': 'my 5 1/2-year-old son displays adhd symptoms for 20 days then for 10 days he has none. is it adhd or another condition?',
 'Answer': "adhd and bipolar mood disorder (bmd) can coexist or be mistaken for one another. bmd usually is not diagnosed until young adulthood. however studies have shown that the earlier a person is diagnosed with bmd the more likely he is to have been diagnosed with adhd previously. in this case i would just like to reiterate that there is not enough information to discuss either possibility for your son. you mentioned that he becomes hyperactive for 3 weeks but not what his behaviors are like during those 10 days. you also do not mention irritability or mood swings just adhd symptoms. keep documenting the symptoms you are concerned about including what goes on in the home and at school when you see changes in behavior (

### Loading the LLaMA Model

In [9]:
with open('hf_token.key', 'r') as f:
    hf_token = f.read()

model_id = "meta-llama/Llama-3.2-3B-Instruct"

Initializing the tokenizer

In [10]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"
tokenizer.model_max_length = 4096

Loading the model shards

In [11]:
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto") # Must be float32 for MacBooks!
model.config.pad_token_id = tokenizer.pad_token_id # Updating the model config to use the special pad token

Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.00it/s]


Defining the `eos` token as the terminator to finish the sentences

In [12]:
terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

Defining the function to get the LLaMA Responses

In [13]:
def get_llama_response(question_inputs: str):
    
    llama_inputs = [[{"role": "system", "content": "You are a medical knowledge assistant trained to provide information and guidance on various health-related topics."},
                     {"role": "user", "content": question}] for question in question_inputs]

    texts = tokenizer.apply_chat_template(llama_inputs, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(texts, padding="longest", truncation=True, return_tensors="pt")
    inputs = {key: val.to(model.device) for key, val in inputs.items()}
    temp_texts = tokenizer.batch_decode(inputs['input_ids'], skip_special_tokens=True)
    
    gen_tokens = model.generate(
        **inputs, 
        max_new_tokens=4096, 
        pad_token_id=tokenizer.pad_token_id, 
        eos_token_id=terminators,
        do_sample=True,
        temperature=0.7,
        # top_p=0.9
    )

    gen_text = tokenizer.batch_decode(gen_tokens, skip_special_tokens=True)
    gen_text = [i[len(temp_texts[idx]):] for idx, i in enumerate(gen_text)]
    
    return gen_text

Preparing the batches of data

In [14]:
batch_size = 100
dataset_questions = dataset['Question']
dataset_answers = dataset['Answer']
batch_indices = np.arange(0, len(dataset_questions), batch_size)
if batch_indices[-1] != len(dataset_questions):
    batch_indices = np.append(batch_indices, len(dataset_questions))

Running the batch inference to get responses from LLaMA

Only run the following cell when new inferences are needed. Otherwise, keep it commented and move onto next section for evaluation

In [None]:
# question_list = []
# answer_list = []
# llama_responses = []
# for i in tqdm(range(0, len(batch_indices) - 1)):
#     questions_input = dataset_questions[batch_indices[i]:batch_indices[i+1]]
#     orig_answers = dataset_answers[batch_indices[i]:batch_indices[i+1]]
#     llama_resp = get_llama_response(questions_input)
    
#     question_list = question_list + questions_input
#     answer_list = answer_list + orig_answers
#     llama_responses = llama_responses + llama_resp

# with open('kabatubare_medical/questions.pkl', 'wb') as file:
#     pickle.dump(question_list, file)
    
# with open('kabatubare_medical/answers.pkl', 'wb') as file:
#     pickle.dump(answer_list, file)

# with open('kabatubare_medical/llama_resp.pkl', 'wb') as file:
#     pickle.dump(llama_responses, file)

### Evaluaing the LLaMA Responses

Reading the questions, answers and LLaMA Responses for

In [16]:
with open('kabatubare_medical/questions.pkl', 'rb') as file:
    question_list = pickle.load(file)
    
with open('kabatubare_medical/answers.pkl', 'rb') as file:
    answer_list = pickle.load(file)

with open('kabatubare_medical/llama_resp.pkl', 'rb') as file:
    llama_responses = pickle.load(file)

In [20]:
question_list

['my 5 1/2-year-old son displays adhd symptoms for 20 days then for 10 days he has none. is it adhd or another condition?',
 'my son has add and mild autism. he has been successfully on concerta for 6+ years. can you help with his weight loss?',
 'my son is 13 and is depressed. he has been taking vyvanse for the last 3 years and never had a problem. can you help?',
 'my 17-year-old has stopped taking concerta after 10 years. what affect will this have on his adhd?',
 "i've been taking respa-ar for allergies. i can't seem to get it refilled. will it be back on the market soon?",
 'what is the role of a pharmacist in a community and how can he/she help a patient?',
 'my daughter is 15 months old has had severe feeding issuesshe lives on applesauce and fruitmelts and formulatests normal. she has had feeding issues since birth formula refusal throwing up entire bottle 2 to 3 times a day she also has hip dysplsia so far medical test are normal what should i do what is the matter with my bab

In [21]:
answer_list

["adhd and bipolar mood disorder (bmd) can coexist or be mistaken for one another. bmd usually is not diagnosed until young adulthood. however studies have shown that the earlier a person is diagnosed with bmd the more likely he is to have been diagnosed with adhd previously. in this case i would just like to reiterate that there is not enough information to discuss either possibility for your son. you mentioned that he becomes hyperactive for 3 weeks but not what his behaviors are like during those 10 days. you also do not mention irritability or mood swings just adhd symptoms. keep documenting the symptoms you are concerned about including what goes on in the home and at school when you see changes in behavior (do you work those weeks does he visit a relative or have a different aide in the classroom). you also mentioned that this began 7 months ago. i would also urge you to think about what also might have changed in your son's life about that time. consulting your pediatrician or a

In [22]:
llama_responses

['It\'s not uncommon for children with Attention Deficit Hyperactivity Disorder (ADHD) to experience fluctuations in symptoms, and it\'s great that you\'re paying attention to your son\'s behavior.\n\nThe pattern you described, where your son has symptoms for 20 days and then goes back to normal for 10 days, is often referred to as "episodes" or "episodic" ADHD. This can be a common manifestation of ADHD, especially in young children.\n\nHowever, it\'s essential to consider other potential causes for the fluctuations in symptoms. Here are a few possibilities:\n\n1. **Comorbid conditions**: ADHD often co-occurs with other conditions, such as anxiety, sleep disorders, or sensory processing issues. These conditions can contribute to the fluctuations in symptoms.\n2. **Sleep patterns**: Sleep deprivation or irregular sleep patterns can exacerbate ADHD symptoms. If your son is having trouble sleeping or experiencing daytime fatigue, this could be contributing to the fluctuations.\n3. **Stre