## Data Processing Kabatubare Medical

This notebook will deal with getting Kabatubare Medical Data (Questions, Answers, Context) and GPT-4o answers (from Phase 1). After combining the data, this dataset will be shuffled and divided into the train/test pairs to be further used for fine-tuning and evaluation purposes

In [1]:
import pickle
import pandas as pd
import json

### Reading the dataset containing Questions, GroundTruth Answers and LLaMA Responses from Phase 1

In [2]:
with open('phase1_kabatubare_medical/questions.pkl', 'rb') as file:
    question_list = pickle.load(file)
    
with open('phase1_kabatubare_medical/answers.pkl', 'rb') as file:
    answer_list = pickle.load(file)

with open('phase1_kabatubare_medical/llama_resp.pkl', 'rb') as file:
    llama_responses = pickle.load(file)

In [3]:
llama_response_df = pd.DataFrame({'question': question_list, 'answer': answer_list, 'llama_response_base': llama_responses})
llama_response_df

Unnamed: 0,question,answer,llama_response_base
0,my 5 1/2-year-old son displays adhd symptoms f...,adhd and bipolar mood disorder (bmd) can coexi...,It's not uncommon for children with Attention ...
1,my son has add and mild autism. he has been su...,stimulants in general tend to decrease appetit...,I can provide general guidance on weight loss ...
2,my son is 13 and is depressed. he has been tak...,while any of the stimulant medications can inc...,I'm so sorry to hear that your son is struggli...
3,my 17-year-old has stopped taking concerta aft...,seventy percent of teens diagnosed when they a...,"I'm not a doctor, but I can provide general in..."
4,i've been taking respa-ar for allergies. i can...,try claritin-d which is located behind the pha...,I'm not aware of any information about a medic...
...,...,...,...
23432,how can accidental of acetaminophen overdose b...,to avoid unintentional overdoses among adults ...,Accidental overdose of acetaminophen (also kno...
23433,what should i do if i take an overdose of maxalt?,if you take more medication than you have been...,"I'm not a doctor, but I can provide general gu..."
23434,what do i do in case of an overdose of relpax?,call your doctor or poison control center or g...,"I'm not a doctor, but I can provide general in..."
23435,is overdose with acetaminophen usually acciden...,in the u. s. suicide attempts account for over...,"According to various studies and data, the maj..."


### Reading the dataset containing Questions and GPT Responses from Phase 1

In [4]:
ques_list = []
gpt_resp_list = []
sample_id_list = []

with open('phase1_kabatubare_medical/kabatubare_medical_gpt4omini_qa_pairs.jsonl', 'rb') as file:
    for line in file:
        json_object = json.loads(line)
        sample_id_list.append(json_object['custom_id'])
        ques_list.append(json_object['Question'])
        gpt_resp_list.append(json_object['Answer'])

In [5]:
gpt_response_df = pd.DataFrame({'question':ques_list, 'gpt_response_base': gpt_resp_list}, index=sample_id_list)
gpt_response_df

Unnamed: 0,question,gpt_response_base
0,my 5 1/2-year-old son displays adhd symptoms f...,It’s important to remember that only a qualifi...
1,my son has add and mild autism. he has been su...,Weight management can be a concern for childre...
2,my son is 13 and is depressed. he has been tak...,I'm really sorry to hear that your son is feel...
3,my 17-year-old has stopped taking concerta aft...,"When a person, especially a teenager, stops ta..."
4,i've been taking respa-ar for allergies. i can...,Resp-A-R is a combination medication commonly ...
...,...,...
23432,how can accidental of acetaminophen overdose b...,Accidental acetaminophen overdose is a signifi...
23433,what should i do if i take an overdose of maxalt?,If you suspect that you have taken an overdose...
23434,what do i do in case of an overdose of relpax?,If you suspect an overdose of Relpax (eletript...
23435,is overdose with acetaminophen usually acciden...,Overdoses of acetaminophen (also known as para...


### Combining two dataframes into a single dataframe

Since both the dataframes have same number of questions, we can simply read the gpt_responses into the llama_responses_df to have a final df

In [6]:
llama_response_df['gpt_response_base'] = gpt_response_df['gpt_response_base'].tolist()
llama_response_df

Unnamed: 0,question,answer,llama_response_base,gpt_response_base
0,my 5 1/2-year-old son displays adhd symptoms f...,adhd and bipolar mood disorder (bmd) can coexi...,It's not uncommon for children with Attention ...,It’s important to remember that only a qualifi...
1,my son has add and mild autism. he has been su...,stimulants in general tend to decrease appetit...,I can provide general guidance on weight loss ...,Weight management can be a concern for childre...
2,my son is 13 and is depressed. he has been tak...,while any of the stimulant medications can inc...,I'm so sorry to hear that your son is struggli...,I'm really sorry to hear that your son is feel...
3,my 17-year-old has stopped taking concerta aft...,seventy percent of teens diagnosed when they a...,"I'm not a doctor, but I can provide general in...","When a person, especially a teenager, stops ta..."
4,i've been taking respa-ar for allergies. i can...,try claritin-d which is located behind the pha...,I'm not aware of any information about a medic...,Resp-A-R is a combination medication commonly ...
...,...,...,...,...
23432,how can accidental of acetaminophen overdose b...,to avoid unintentional overdoses among adults ...,Accidental overdose of acetaminophen (also kno...,Accidental acetaminophen overdose is a signifi...
23433,what should i do if i take an overdose of maxalt?,if you take more medication than you have been...,"I'm not a doctor, but I can provide general gu...",If you suspect that you have taken an overdose...
23434,what do i do in case of an overdose of relpax?,call your doctor or poison control center or g...,"I'm not a doctor, but I can provide general in...",If you suspect an overdose of Relpax (eletript...
23435,is overdose with acetaminophen usually acciden...,in the u. s. suicide attempts account for over...,"According to various studies and data, the maj...",Overdoses of acetaminophen (also known as para...


Shuffling the dataframe and dividing into the Train/Test dataset

In [7]:
llama_response_df_shuffled = llama_response_df.sample(frac=1, random_state=42)
llama_response_df_shuffled

Unnamed: 0,question,answer,llama_response_base,gpt_response_base
4421,i have a small dull ache in my left testicle a...,just give it a rest my friend. it is not likel...,"I'm here to provide you with information, but ...",It's not uncommon for individuals to experienc...
20934,i've heard conflicting opinions. 7 weeks of pr...,hi the best thing is to live your life as norm...,Congratulations on the new addition to your fa...,"During pregnancy, there are several dietary re..."
11833,my friend slept over that had fever blisters. ...,hi found you this little piece of info. oral h...,I can sense your concern and worry about the s...,I understand that you're feeling very anxious ...
16511,what are some common food triggers for migraines?,you can use information from your diary and tr...,"As a medical knowledge assistant, I can provid...",Migraines can be triggered by a variety of fac...
3455,why does grey hair itch so much? . why does my...,hair doesn't itch. scalps will itch. there are...,The itchy sensation associated with grey hair ...,Itching in gray hair can be attributed to seve...
...,...,...,...,...
11964,bright blood in stool for a while now burning ...,hi but as your not really well now i still thi...,"I'm not a doctor, but I can provide some gener...","I'm not a doctor, but the symptoms you're desc..."
21575,i feel fine i'm in a great mood but food taste...,well i'm happy you're in a good mood anyway! i...,"I'm glad you're feeling fine overall, but the ...",It's good to hear that you're feeling fine and...
5390,i have recently had unprotected sex and both m...,"you had unprotected sex? it is your ""other"" he...",I'm here to provide you with information and g...,Experiencing irritation on the head of the pen...
860,is there a coated naproxen medication to preve...,all over-the-counter and prescription-strength...,"Yes, there are coated naproxen medications ava...","Yes, there are formulations of naproxen that a..."


In [8]:
train_size = int(0.8 * len(llama_response_df_shuffled))
dataset_train = llama_response_df_shuffled.iloc[:train_size].reset_index(drop=True)
dataset_test = llama_response_df_shuffled.iloc[train_size:].reset_index(drop=True)

In [9]:
dataset_train

Unnamed: 0,question,answer,llama_response_base,gpt_response_base
0,i have a small dull ache in my left testicle a...,just give it a rest my friend. it is not likel...,"I'm here to provide you with information, but ...",It's not uncommon for individuals to experienc...
1,i've heard conflicting opinions. 7 weeks of pr...,hi the best thing is to live your life as norm...,Congratulations on the new addition to your fa...,"During pregnancy, there are several dietary re..."
2,my friend slept over that had fever blisters. ...,hi found you this little piece of info. oral h...,I can sense your concern and worry about the s...,I understand that you're feeling very anxious ...
3,what are some common food triggers for migraines?,you can use information from your diary and tr...,"As a medical knowledge assistant, I can provid...",Migraines can be triggered by a variety of fac...
4,why does grey hair itch so much? . why does my...,hair doesn't itch. scalps will itch. there are...,The itchy sensation associated with grey hair ...,Itching in gray hair can be attributed to seve...
...,...,...,...,...
18744,how to make money online? . a brand new approa...,please do not post advertising on webmd answer...,There are many legitimate ways to make money o...,I'm here to assist with health-related inquiri...
18745,what can i eat with the stomach flu? . i can't...,the best foods to eat while feeling nausea are...,"I'm not a doctor, but I can provide some gener...",When you're dealing with stomach flu (viral ga...
18746,what exams and tests help doctors to evaluate ...,doctors can often easily recognize ringworm wh...,"Ringworm of the scalp, also known as tinea cap...",To evaluate or test individuals for ringworm o...
18747,could i be pregnant?,the only way to know for sure you're pregnant ...,"I'm not a medical professional, but I can prov...",Whether you could be pregnant depends on sever...


In [10]:
dataset_test

Unnamed: 0,question,answer,llama_response_base,gpt_response_base
0,i don't have periods due to taking nuvaring. h...,you really shouldn't worry about getting pregn...,I'm happy to help you with your concern. Since...,If you're using NuvaRing and have missed it by...
1,when you can't digest food. my daughter is 38 ...,i'm sorry your daughter is having a hard time....,I'm so sorry to hear that your daughter is exp...,I'm sorry to hear about your daughter's strugg...
2,8 wks post-hemorrhoidectomy. knife-like pain o...,i'm sorry you're going through this. hemorrhoi...,"I'm not a doctor, but I can try to provide som...",I'm sorry to hear that you're experiencing the...
3,i sometimes feel a mild discomfort under my le...,after reading your full statement i would also...,"I'm not a medical professional, but I can try ...","I'm not a doctor, but I can offer some general..."
4,i had sex about 12 hours ago and i'm noticing ...,hi ok if your shaved and this is on your pubic...,"I'm not a medical professional, but I can prov...",It's good that you're paying attention to chan...
...,...,...,...,...
4683,bright blood in stool for a while now burning ...,hi but as your not really well now i still thi...,"I'm not a doctor, but I can provide some gener...","I'm not a doctor, but the symptoms you're desc..."
4684,i feel fine i'm in a great mood but food taste...,well i'm happy you're in a good mood anyway! i...,"I'm glad you're feeling fine overall, but the ...",It's good to hear that you're feeling fine and...
4685,i have recently had unprotected sex and both m...,"you had unprotected sex? it is your ""other"" he...",I'm here to provide you with information and g...,Experiencing irritation on the head of the pen...
4686,is there a coated naproxen medication to preve...,all over-the-counter and prescription-strength...,"Yes, there are coated naproxen medications ava...","Yes, there are formulations of naproxen that a..."


### Saving the data 

Saving the json elements in a list

In [11]:
train_dataset_json_list = []

for index, row in dataset_train.iterrows():
    json_element = {'question': row['question'],
                    'answer': row['answer'],
                    'llama_response_base': row['llama_response_base'],
                    'gpt_response_base': row['gpt_response_base']
                    }
    
    train_dataset_json_list.append(json_element)

In [12]:
test_dataset_json_list = []

for index, row in dataset_test.iterrows():
    json_element = {'question': row['question'],
                    'answer': row['answer'],
                    'llama_response_base': row['llama_response_base'],
                    'gpt_response_base': row['gpt_response_base']
                    }
    
    test_dataset_json_list.append(json_element)

Saving the samples line by line in a jsonl file

In [13]:
file_path = "phase2_data_kabatubare/train_kabatubare.jsonl"

with open(file_path, 'w') as file:
    for element in train_dataset_json_list:
        json_line = json.dumps(element)
        file.write(json_line + '\n')

In [14]:
file_path = "phase2_data_kabatubare/test_kabatubare.jsonl"

with open(file_path, 'w') as file:
    for element in test_dataset_json_list:
        json_line = json.dumps(element)
        file.write(json_line + '\n')