# NBME Dataset Exploration

This notebook explores the NBME "Score Clinical Patient Notes" dataset.
The goal is to understand the structure, clinical language style, and annotation characteristics
to support cross-dataset robustness evaluation of open-source LLMs.

In [1]:
import pandas as pd
import numpy as np

pd.set_option("display.max_colwidth", 300)

In [3]:
#Loading data
notes = pd.read_csv("../data/nbme/patient_notes.csv")
features = pd.read_csv("../data/nbme/features.csv")
train = pd.read_csv("../data/nbme/train.csv")

print(notes.shape, features.shape, train.shape)

(42146, 3) (143, 3) (14300, 6)


## Dataset Overview

- `patient_notes.csv`: Exam-style clinical notes written by test takers
- `features.csv`: Rubric-based clinical features per case
- `train.csv`: Annotated feature spans for a subset of notes

We treat NBME as **structured, exam-style clinical text**, distinct from:
- synthetic EHR narratives (Synthea)
- real-world transcription notes (MTSamples)

In [4]:
#Inspecting Patient Notes
notes.head()

Unnamed: 0,pn_num,case_num,pn_history
0,0,0,"17-year-old male, has come to the student health clinic complaining of heart pounding. Mr. Cleveland's mother has given verbal consent for a history, physical examination, and treatment\r\n-began 2-3 months ago,sudden,intermittent for 2 days(lasting 3-4 min),worsening,non-allev/aggrav\r\n-associ..."
1,1,0,"17 yo male with recurrent palpitations for the past 3 mo lasting about 3 - 4 min, it happened about 5 - 6 times since the beginning. One time durign a baskeball game two days ago light headedness, pressure in the chest, catching breath, but no fainting. During teh episodes no sweating. No diarrh..."
2,2,0,"Dillon Cleveland is a 17 y.o. male patient with no significant PMH who presents with complaints of heart pounding. This has been going on for a few months and happens once or twice a month. He cannot think of any triggers, and it has occurred both with activity and at rest. Occasionally, it is a..."
3,3,0,a 17 yo m c/o palpitation started 3 mos ago; \r\nNOTHING IMPROVES OR EXACERBATES THE SYMPTOMS ACCORDING TO HIM; IT CAN HAPPEN ANY TIME; MAY TAKE A FEW MINUTES; LAST TIME HAPPENED 2 DAYS AGO DURING PLAYING A GAME AND IT WAS ASSOCIATED WITH RETROSTERNAL PRESSURE LIKE DISCOMFORT; AND HE FELT LIGHTH...
4,4,0,"17yo male with no pmh here for evaluation of palpitations. States for the last 3-4mo he has felt that his heart with intermittently ""beat out of his chest,"" with some associated difficulty catching his breath. States that the most recent event was 2 days ago, and during activity at a soccer game..."


In [5]:
notes.sample(5)[["pn_num", "case_num", "pn_history"]]

Unnamed: 0,pn_num,case_num,pn_history
37748,90781,9,"HPI 20 year old female presented for heacache that started yesterday morning for the first time. The pain is dull and constant, getting worse, bilateral, rated as 8/10, worsens with bending forward and walking. She reported nausea and 3 episodes of vomiting. She denies weakness, vertigo, numbnes..."
12277,37384,3,35 yo m staomach pain mid epigastrium for 2 month\r\n5/10 ganawing pain burning tried tums in the beginning not help right now\r\nno aggrav factor \r\naw fatigue nausea back pain malena eat fast goods \r\nno weight changes fever emesis chills cough no frequency dysuria urgency conjunctiivtsi...
16391,41658,4,45 yo F presents with persistent nervousness which began a few weeks ago and is getting worse. She feels nervous about everything in every situation. Especially exacerbated when preparing for lectures. No alleviating factors. Reports drinking 5-6 drinks of coffee/day. Reports difficulty falling ...
16450,41719,4,"45 yo F with nervousness\r\n- started 2 weeks ago, patient is taking care of in laws and mother and children and feels increasing anxiety \r\n- anxiety makes it hard for her to give lectures as an English professor \r\n- she has decreased appetite and difficulty falling asleep\r\n- denies feelin..."
2463,10238,1,"Ms Suzanne Powelton is a 20 year old female presenting to the clinic for abdominal pain. \r\n\r\nHPI: Pt was woken up from sleep last night by severe pain in right lower quadrant of abdomen, described as being dull and achey. Has persisted for 8-10 hours without change. Pt took ibuprofen without..."


In [6]:
#Note Length Analysis
notes["text_length"] = notes["pn_history"].str.len()
notes["word_count"] = notes["pn_history"].str.split().str.len()

notes[["text_length", "word_count"]].describe()

Unnamed: 0,text_length,word_count
count,42146.0,42146.0
mean,818.176814,135.465975
std,136.712013,24.28826
min,30.0,8.0
25%,736.0,121.0
50%,859.0,141.0
75%,939.0,154.0
max,950.0,194.0


## Clinical Feature Vocabulary

NBME features represent clinically relevant concepts used for rubric-based scoring.

In [7]:
features.sample(10)

Unnamed: 0,feature_num,case_num,feature_text
139,913,9,Female
94,607,6,Duration-x-1-day
61,402,4,Stress-due-to-caring-for-elderly-parents
18,105,1,No-bloody-bowel-movements
59,400,4,Lack-of-other-thyroid-symptoms
27,201,2,Last-Pap-smear-I-year-ago
36,210,2,LMP-2-months-ago-or-Last-menstrual-period-2-months-ago
101,702,7,heavy-periods-OR-irregular-periods
65,406,4,Insomnia
3,3,0,Intermittent-symptoms


In [8]:
features["feature_text"].value_counts().head(10)

feature_text
Female                                                             7
Male                                                               3
20-year                                                            2
Nausea                                                             2
35-year                                                            2
17-year                                                            2
Family-history-of-MI-OR-Family-history-of-myocardial-infarction    1
Worse-with-deep-breath-OR-pleuritic                                1
Chest-pain                                                         1
Duration-x-1-day                                                   1
Name: count, dtype: int64

# Annotation Structure

Annotations correspond to exact text spans indicating the presence of a clinical feature.
These annotations are used **only for evaluation and error analysis**, not supervised training.

In [9]:
train.head()

Unnamed: 0,id,case_num,pn_num,feature_num,annotation,location
0,00016_000,0,16,0,['dad with recent heart attcak'],['696 724']
1,00016_001,0,16,1,"['mom with ""thyroid disease']",['668 693']
2,00016_002,0,16,2,['chest pressure'],['203 217']
3,00016_003,0,16,3,"['intermittent episodes', 'episode']","['70 91', '176 183']"
4,00016_004,0,16,4,['felt as if he were going to pass out'],['222 258']


In [10]:
train.sample(5)[["pn_num", "feature_num", "annotation"]]

Unnamed: 0,pn_num,feature_num,annotation
245,489,11,['17 yo']
13765,93712,909,[]
11591,81877,817,"['trouble sleeping', 'trouble sleeping', 'trouble staying asleep', 'getting about 4-5 hours of sleep a night']"
6620,43724,400,"['denies cold intolerance', 'denies heat intolerance', 'denies palpitations', 'denies diarrhea', 'denies constipation']"
12324,83843,812,['lack of interest']


## Alignment with Our Task

In this study, NBME notes are used as:
- Exam-style clinical narratives
- Clean, structured patient histories

We adapt the NBME dataset for **clinical symptom identification** by:
- Treating `pn_history` as model input
- Using feature descriptions as symptom references
- Evaluating extraction performance and robustness without task-specific tuning

In [11]:
#Saving key stats
nbme_stats = {
    "num_notes": len(notes),
    "avg_words": notes["word_count"].mean(),
    "median_words": notes["word_count"].median()
}

nbme_stats

{'num_notes': 42146, 'avg_words': 135.46597541878234, 'median_words': 141.0}

### Key Observations

- NBME notes are short, structured, and exam-oriented
- Feature annotations include symptoms, demographics, and rubric-specific concepts
- NBME serves as a clean benchmark for evaluating model behavior under controlled conditions

### Summary

NBME provides structured, exam-style clinical notes with explicit feature annotations.
This dataset complements synthetic and transcription-style data, enabling controlled
evaluation of cross-dataset robustness in clinical NLP.