# Implementação de benchmark para avaliação de LLMs.

* Baseado no código: https://github.com/hendrycks/test
* Base de dados: https://people.eecs.berkeley.edu/~hendrycks/data.tar

In [1]:
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage
import os
import pandas as pd
from dotenv import load_dotenv
import time

from IPython.display import display, Markdown

In [2]:
load_dotenv()
pd.set_option('display.max_colwidth', None)

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

## Aplicação do benchmark MMMLU (Measuring Massive Multitask Language Understanding) apenas na prova de anatomia

### Lendo o banco de perguntas e respostas

In [3]:
df = pd.read_csv('./anatomy_val.csv', header=None)
print(df.shape)
df.head(5)

(14, 6)


Unnamed: 0,0,1,2,3,4,5
0,Which of the following terms describes the body's ability to maintain its normal state?,Anabolism,Catabolism,Tolerance,Homeostasis,D
1,Which of the following structures travel through the substance of the parotid gland?,The maxillary artery,The maxillary artery and retromandibular vein,"The maxillary artery, retromandibular vein and facial artery","The maxillary artery, retromandibular vein, facial artery and buccal branch of the mandibular nerve",B
2,A medical practitioner places a stethoscope over the patient's seventh right intercostal space in the mid-axillary line. The stethoscope overlies the,upper lobe of the lung.,middle lobe of the lung.,lower lobe of the lung.,costo-diaphragmatic recess.,C
3,The maxillary sinus,is lined by stratified squamous epithelium.,drains into the superior meatus of the nasal cavities.,is innervated by branches of the maxillary division of the trigeminal nerve.,Receives its blood supply from the first part of the maxillary artery.,C
4,Which of the following is flexible connective tissue that is attached to bones at the joints?,Adipose,Cartilage,Epithelial,Muscle,B


In [4]:
df.columns

Index([0, 1, 2, 3, 4, 5], dtype='int64')

### Funções para a criação do prompt

In [5]:
choices = ["A", "B", "C", "D"]

def format_subject(subject):
    l = subject.split("_")
    s = ""
    for entry in l:
        s += " " + entry
    return s

def format_example(df, idx, include_answer=True):
    prompt = "Follow the answer instructions strictly, and answer only with the letter corresponding to the correct answer:"
    prompt += df.iloc[idx, 0]
    k = df.shape[1] - 2
    for j in range(k):
        prompt += "\n{}. {}".format(choices[j], df.iloc[idx, j+1])
    prompt += "\n - Answer:"
    if include_answer:
        prompt += " {}\n\n".format(df.iloc[idx, k + 1])
    return prompt

def gen_prompt(train_df, subject, k=-1):
    prompt = "The following are multiple choice questions (with answers) about {}.\n\n".format(format_subject(subject))
    if k == -1:
        k = train_df.shape[0]
    for i in range(k):
        prompt += format_example(train_df, i)
    return prompt

## Zero shot

In [6]:
prompt_end = format_example(df, 4, include_answer=False)
train_prompt = gen_prompt(df, 'anatomy', 0)
prompt = train_prompt + prompt_end

display(Markdown(prompt))

The following are multiple choice questions (with answers) about  anatomy.

Follow the answer instructions strictly, and answer only with the letter corresponding to the correct answer:Which of the following is flexible connective tissue that is attached to bones at the joints?
A. Adipose
B. Cartilage
C. Epithelial
D. Muscle
 - Answer:

In [7]:
llm = ChatOpenAI(model="gpt-4o-mini",
    temperature=0,
    max_tokens=1,
    max_retries=2,
)
messages = [
    HumanMessage(content=prompt)
]
response = llm.invoke(messages)
print(response.content)

B


In [8]:
answers = []
for q in range(4, 14):
    prompt_end = format_example(df, q, include_answer=False)
    train_prompt = gen_prompt(df, 'anatomy', 0)
    prompt = train_prompt + prompt_end

    messages = [
        HumanMessage(content=prompt)
    ]
    response = llm.invoke(messages)
    answers.append(response.content)
    time.sleep(1)
    
answers

['B', 'D', 'A', 'A', 'C', 'B', 'B', 'D', 'C', 'D']

In [9]:
last_col = df.columns[-1]
real_answers = df.loc[4:14, last_col].tolist()
real_answers

['B', 'C', 'A', 'D', 'C', 'B', 'B', 'B', 'C', 'D']

In [10]:
cont = sum(1 for x, y in zip(answers, real_answers) if x == y)
print(f"Grade: {cont}")

Grade: 7


## Few Shot

In [11]:
prompt_end = format_example(df, 4, include_answer=False)
train_prompt = gen_prompt(df, 'anatomy', 3)
prompt = train_prompt + prompt_end

display(Markdown(prompt))

The following are multiple choice questions (with answers) about  anatomy.

Follow the answer instructions strictly, and answer only with the letter corresponding to the correct answer:Which of the following terms describes the body's ability to maintain its normal state?
A. Anabolism
B. Catabolism
C. Tolerance
D. Homeostasis
 - Answer: D

Follow the answer instructions strictly, and answer only with the letter corresponding to the correct answer:Which of the following structures travel through the substance of the parotid gland?
A. The maxillary artery
B. The maxillary artery and retromandibular vein
C. The maxillary artery, retromandibular vein and facial artery
D. The maxillary artery, retromandibular vein, facial artery and buccal branch of the mandibular nerve
 - Answer: B

Follow the answer instructions strictly, and answer only with the letter corresponding to the correct answer:A medical practitioner places a stethoscope over the patient's seventh right intercostal space in the mid-axillary line. The stethoscope overlies the
A. upper lobe of the lung.
B. middle lobe of the lung.
C. lower lobe of the lung.
D. costo-diaphragmatic recess.
 - Answer: C

Follow the answer instructions strictly, and answer only with the letter corresponding to the correct answer:Which of the following is flexible connective tissue that is attached to bones at the joints?
A. Adipose
B. Cartilage
C. Epithelial
D. Muscle
 - Answer:

In [12]:
answers = []
for q in range(4, 14):
    prompt_end = format_example(df, q, include_answer=False)
    train_prompt = gen_prompt(df, 'anatomy', 3)
    prompt = train_prompt + prompt_end

    messages = [
        HumanMessage(content=prompt)
    ]
    response = llm.invoke(messages)
    answers.append(response.content)
    time.sleep(1)
    
answers

['B', 'D', 'A', 'D', 'C', 'B', 'B', 'B', 'C', 'D']

In [13]:
real_answers

['B', 'C', 'A', 'D', 'C', 'B', 'B', 'B', 'C', 'D']

In [14]:
cont = sum(1 for x, y in zip(answers, real_answers) if x == y)
print(f"Grade: {cont}")

Grade: 9
