# Generate Medical Multiple-Choice-Questions (MCQs)

In this notebook, we will use medical summaries and the llama3.1 model to generate educational learning objectives and associated multiple-choice questions. This will then be used to train a small Phi2 LLM model.

## Imports

In [1]:
import sys
from pathlib import Path

# Add the root directory to sys.path
root_path = Path().resolve().parent
sys.path.append(str(root_path))

%load_ext autoreload
%autoreload 2

In [2]:
import os
import json
import pandas as pd
from tqdm import tqdm
from collections import defaultdict
from src.mcq_generator import MCQGenerator

## Generate MCQs from summaries

In [3]:
SUMMARY_CSV = "../data/synthetic_contexts/summary_per_focus_1000.csv"
QA_CSV = "../data/parsed_csv/qa_clean.csv"
OUT_PATH = "../data/synthetic_contexts/generated_mcqs_llama_1000.jsonl"

In [4]:
df_summary = pd.read_csv(SUMMARY_CSV)
df_qa = pd.read_csv(QA_CSV)

In [5]:
# Map each focus subject to a qtype list. This list will help the Llama model to generate a coherent learning objective
focus_to_qtypes = defaultdict(set)
for _, row in df_qa.iterrows():
    focus_to_qtypes[row['focus']].add(row['question_type'].lower())
focus_to_qtypes['Anemia']

{'causes',
 'exams and tests',
 'information',
 'prevention',
 'susceptibility',
 'symptoms',
 'treatment'}

In [6]:
# Initialize generator
mcq_generator = MCQGenerator()
results = []

In [7]:
# Generate MCQ per summary using llama model
for _, row in tqdm(df_summary.iterrows(), total=len(df_summary)):
    focus = row['focus']
    qtypes = list(focus_to_qtypes.get(focus, []))
    mcq = mcq_generator.generate_mcq(summary=row['summary'], qtypes=qtypes)
    if mcq:
        mcq_record = {
            "focus": focus,
            "qtypes": qtypes,
            "summary": row['summary'],
            "objective": mcq['objective'],
            "question": mcq['question'],
            "options": mcq['options'],
            "answer": mcq['answer'],
            "explanation": mcq['explanation'],
        }
        results.append(mcq_record)

 16%|█████▍                             | 157/1000 [2:39:54<13:58:26, 59.68s/it]



 21%|███████▍                           | 212/1000 [3:30:15<11:25:35, 52.20s/it]



 32%|███████████▍                       | 325/1000 [5:22:27<11:01:07, 58.77s/it]



 33%|███████████▍                       | 328/1000 [5:25:56<12:12:15, 65.38s/it]



 34%|████████████                       | 344/1000 [5:41:21<10:38:13, 58.37s/it]



 36%|████████████▊                      | 365/1000 [6:03:28<10:13:06, 57.93s/it]



 75%|██████████████████████████▎        | 752/1000 [12:05:21<3:02:34, 44.17s/it]



 78%|███████████████████████████▎       | 780/1000 [12:33:56<3:48:02, 62.19s/it]



 82%|████████████████████████████▊      | 824/1000 [13:15:11<2:44:26, 56.06s/it]



 92%|████████████████████████████████   | 915/1000 [14:42:15<1:17:55, 55.00s/it]



100%|████████████████████████████████████▉| 997/1000 [16:09:15<03:27, 69.25s/it]



100%|████████████████████████████████████| 1000/1000 [16:12:32<00:00, 58.35s/it]


In [8]:
os.makedirs(os.path.dirname(OUT_PATH), exist_ok=True)
with open(OUT_PATH, 'w') as f:
    for r in results:
        f.write(json.dumps(r) + '\n')

print(f"Generated {len(results)} MCQs in {OUT_PATH}")

Generated 1000 MCQs in ../data/synthetic_contexts/generated_mcqs_llama_1000.jsonl


In [9]:
results[166]

{'focus': 'Piriformis Syndrome',
 'qtypes': ['outlook', 'treatment', 'information', 'research'],
 'summary': 'Piriformis Syndrome is a neuromuscular disorder that occurs when the piriformis muscle compresses or irritates the sciatic nerve, causing pain and numbness in the buttocks and legs. This condition can be exacerbated by activities such as sitting, climbing stairs, walking, or running.\n\nTreatment typically begins with stretching exercises, massage, and anti-inflammatory medication to reduce inflammation and alleviate symptoms. In some cases, a corticosteroid injection may provide temporary relief, while cessation of certain activities may be recommended. Surgery is also an option in severe cases.\n\nThe prognosis for most individuals with piriformis syndrome is good, with symptoms typically resolving once addressed through treatment. However, exercise regimens may need to be modified to reduce the likelihood of recurrence or worsening.\n\nResearch into piriformis syndrome is on