# Implementation of the MMLU objective assessment method
## 2-Shot (2 examples - 1 question)

- Based on the code: https://github.com/hendrycks/test
- Database: https://people.eecs.berkeley.edu/~hendrycks/data.tar
- Paper: Measuring Massive Multitask Language Understanding (https://arxiv.org/abs/2009.03300)

In [1]:
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage
import os
import pandas as pd
from dotenv import load_dotenv
import time
from tqdm import tqdm

from IPython.display import display, Markdown
from reuse import gen_prompt, format_example

## Setup

In [2]:
load_dotenv('../../.env')
pd.set_option('display.max_colwidth', None)

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

## Reading the Q&A database for assessment

In [3]:
df_val = pd.read_csv('../../data/mmlu/anatomy_val.csv', header=None)
print(df_val.shape)
df_val.head(5)

(14, 6)


Unnamed: 0,0,1,2,3,4,5
0,Which of the following terms describes the body's ability to maintain its normal state?,Anabolism,Catabolism,Tolerance,Homeostasis,D
1,Which of the following structures travel through the substance of the parotid gland?,The maxillary artery,The maxillary artery and retromandibular vein,"The maxillary artery, retromandibular vein and facial artery","The maxillary artery, retromandibular vein, facial artery and buccal branch of the mandibular nerve",B
2,A medical practitioner places a stethoscope over the patient's seventh right intercostal space in the mid-axillary line. The stethoscope overlies the,upper lobe of the lung.,middle lobe of the lung.,lower lobe of the lung.,costo-diaphragmatic recess.,C
3,The maxillary sinus,is lined by stratified squamous epithelium.,drains into the superior meatus of the nasal cavities.,is innervated by branches of the maxillary division of the trigeminal nerve.,Receives its blood supply from the first part of the maxillary artery.,C
4,Which of the following is flexible connective tissue that is attached to bones at the joints?,Adipose,Cartilage,Epithelial,Muscle,B


In [4]:
df_test = pd.read_csv('../../data/mmlu/anatomy_test.csv', header=None)
print(df_test.shape)
df_test.head(5)

(135, 6)


Unnamed: 0,0,1,2,3,4,5
0,A lesion causing compression of the facial nerve at the stylomastoid foramen will cause ipsilateral,paralysis of the facial muscles.,paralysis of the facial muscles and loss of taste.,"paralysis of the facial muscles, loss of taste and lacrimation.","paralysis of the facial muscles, loss of taste, lacrimation and decreased salivation.",A
1,"A ""dished face"" profile is often associated with",a protruding mandible due to reactivation of the condylar cartilage by acromegaly.,a recessive maxilla due to failure of elongation of the cranial base.,an enlarged frontal bone due to hydrocephaly.,defective development of the maxillary air sinus.,B
2,Which of the following best describes the structure that collects urine in the body?,Bladder,Kidney,Ureter,Urethra,A
3,Which of the following structures is derived from ectomesenchyme?,Motor neurons,Skeletal muscles,Melanocytes,Sweat glands,C
4,Which of the following describes the cluster of blood capillaries found in each nephron in the kidney?,Afferent arteriole,Glomerulus,Loop of Henle,Renal pelvis,B


## Testing...

In [5]:
prompt_end = format_example(df_test, 4, include_answer=False)
train_prompt = gen_prompt(df_val, 'anatomy', 2)
prompt = train_prompt + prompt_end

display(Markdown(prompt))

The following are multiple choice questions (with answers) about  anatomy.

Follow the answer instructions strictly, and answer only with the letter corresponding to the correct answer: Which of the following terms describes the body's ability to maintain its normal state?
A. Anabolism
B. Catabolism
C. Tolerance
D. Homeostasis
 - Answer: D

Follow the answer instructions strictly, and answer only with the letter corresponding to the correct answer: Which of the following structures travel through the substance of the parotid gland?
A. The maxillary artery
B. The maxillary artery and retromandibular vein
C. The maxillary artery, retromandibular vein and facial artery
D. The maxillary artery, retromandibular vein, facial artery and buccal branch of the mandibular nerve
 - Answer: B

Follow the answer instructions strictly, and answer only with the letter corresponding to the correct answer: Which of the following describes the cluster of blood capillaries found in each nephron in the kidney?
A. Afferent arteriole
B. Glomerulus
C. Loop of Henle
D. Renal pelvis
 - Answer:

In [6]:
llm = ChatOpenAI(model="gpt-4o-mini",
    temperature=0,
    max_tokens=1,
    max_retries=2,
)

messages = [
    HumanMessage(content=prompt)
]
response = llm.invoke(messages)
print(response.content)

B


## Evaluating...

In [7]:
answers = []
for q in tqdm(range(35, 55)):
    prompt_end = format_example(df_test, q, include_answer=False)
    train_prompt = gen_prompt(df_val, 'anatomy', 2)
    prompt = train_prompt + prompt_end

    messages = [
        HumanMessage(content=prompt)
    ]
    response = llm.invoke(messages)
    answers.append(response.content)
    time.sleep(1)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:28<00:00,  1.44s/it]


In [8]:
answers[0:10]

['C', 'D', 'C', 'D', 'C', 'C', 'C', 'B', 'C', 'A']

In [9]:
last_col = df_test.columns[-1]
real_answers = df_test.loc[35:55, last_col].tolist()
real_answers[0:10]

['C', 'D', 'C', 'D', 'C', 'C', 'C', 'B', 'C', 'A']

In [10]:
cont = sum(1 for x, y in zip(answers, real_answers) if x == y)
print(f"Grade: {cont} in 20")

Grade: 20 in 20
