### Competition/Problem Overview and Description
When you visit a doctor, how they interpret your symptoms can determine whether your diagnosis is accurate. By the time they’re licensed, physicians have had a lot of practice writing patient notes that document the history of the patient’s complaint, physical exam findings, possible diagnoses, and follow-up care. Learning and assessing the skill of writing patient notes requires feedback from other doctors, a time-intensive process that could be improved with the addition of machine learning.

Until recently, the Step 2 Clinical Skills examination was one component of the United States Medical Licensing Examination® (USMLE®). The exam required test-takers to interact with Standardized Patients (people trained to portray specific clinical cases) and write a patient note. Trained physician raters later scored patient notes with rubrics that outlined each case’s important concepts (referred to as features). The more such features found in a patient note, the higher the score (among other factors that contribute to the final score for the exam).

However, having physicians score patient note exams requires significant time, along with human and financial resources. Approaches using natural language processing have been created to address this problem, but patient notes can still be challenging to score computationally because features may be expressed in many ways. For example, the feature "loss of interest in activities" can be expressed as "no longer plays tennis." Other challenges include the need to map concepts by combining multiple text segments, or cases of ambiguous negation such as “no cold intolerance, hair loss, palpitations, or tremor” corresponding to the key essential “lack of other thyroid symptoms.”

In this competition, you’ll identify specific clinical concepts in patient notes. Specifically, you'll develop an automated method to map clinical concepts from an exam rubric (e.g., “diminished appetite”) to various ways in which these concepts are expressed in clinical patient notes written by medical students (e.g., “eating less,” “clothes fit looser”). Great solutions will be both accurate and reliable.

If successful, you'll help tackle the biggest practical barriers in patient note scoring, making the approach more transparent, interpretable, and easing the development and administration of such assessments. As a result, medical practitioners will be able to explore the full potential of patient notes to reveal information relevant to clinical skills assessment.

This competition is sponsored by the National Board of Medical Examiners® (NBME®). Through research and innovation, NBME supports medical school and residency program educators in addressing issues around the evolution of teaching, learning, technology, and the need for meaningful feedback. NBME offers high-quality assessments and educational services for students, professionals, educators, regulators, and institutions dedicated to the evolving needs of medical education and health care. To serve these communities, NBME collaborates with a diverse and comprehensive array of practicing health professionals, medical educators, state medical board members, test developers, academic researchers, scoring experts and public representatives.

NBME gratefully acknowledges the valuable input of Dr Le An Ha from the University of Wolverhampton’s Research Group in Computational Linguistics.

### Understanding the problem
_In this competition, you’ll identify specific clinical concepts in patient notes. Specifically, you'll develop an automated method to map clinical concepts from an exam rubric (e.g., “diminished appetite”) to various ways in which these concepts are expressed in clinical patient notes written by medical students (e.g., “eating less,” “clothes fit looser”). Great solutions will be both accurate and reliable._  
Based off of the information in that paragraph (taken from the overview) the general ask from the organizers is to identify the the rubric entities within the patient notes, or more generally known as NER, or "Named Entity Recognition".  

The _incredibly oversimplified_ problem:  
**Given a patient note, tag the natural language note with standardized clinical terms**  

Example: “clothes fit looser” is tagged as "diminished appetite".


### My Approach
This problem is in essence an NER (Named-Entity Recognition) problem. I'll start by understanding the state of the data, how its stored/seperated across different file, what the data actually contains, etc. Im hoping that based off of this information I can formulate a _very basic_ approach that may or may not be effective and consider that our "baseline". I'll then move onto a more traditional NER approach that leverages a library like Huggingface's `transformers` library.  

Finally, given the time constraint, Ill take support the best performing model (baseline or Neural Net) and put it behind a fastAPI api.

In [30]:
import os
import json

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]:
os.listdir('data/')

['.DS_Store',
 'test.csv',
 'patient_notes.csv',
 'train.csv',
 'features.csv',
 'sample_submission.csv']

In [4]:
base_dir = "data/"
train = pd.read_csv(f"{base_dir}/train.csv")
test = pd.read_csv(f"{base_dir}/test.csv")
patient_notes = pd.read_csv(f"{base_dir}/patient_notes.csv")
features = pd.read_csv(f"{base_dir}/features.csv")
sample_submission = pd.read_csv(f"{base_dir}/sample_submission.csv")

### Data Description
[From the website](https://www.kaggle.com/competitions/nbme-score-clinical-patient-notes/data)
The text data presented here is from the USMLE® Step 2 Clinical Skills examination, a medical licensure exam. This exam measures a trainee's ability to recognize pertinent clinical facts during encounters with standardized patients.

During this exam, each test taker sees a Standardized Patient, a person trained to portray a clinical case. After interacting with the patient, the test taker documents the relevant facts of the encounter in a patient note. Each patient note is scored by a trained physician who looks for the presence of certain key concepts or features relevant to the case as described in a rubric. The goal of this competition is to develop an automated way of identifying the relevant features within each patient note, with a special focus on the patient history portions of the notes where the information from the interview with the standardized patient is documented.

Important Terms
- Clinical Case: The scenario (e.g., symptoms, complaints, concerns) the Standardized Patient presents to the test taker (medical student, resident or physician). Ten clinical cases are represented in this dataset.
- Patient Note: Text detailing important information related by the patient during the encounter (physical exam and interview).
- Feature: A clinically relevant concept. A rubric describes the key concepts relevant to each case.

Training Data  
- patient_notes.csv - A collection of about 40,000 Patient Note history portions. Only a subset of these have features annotated. You may wish to apply unsupervised learning techniques on the notes without annotations. The patient notes in the test set are not included in the public version of this file.
    - pn_num - A unique identifier for each patient note.
    - case_num - A unique identifier for the clinical case a patient note represents.
    - pn_history - The text of the encounter as recorded by the test taker.
- features.csv - The rubric of features (or key concepts) for each clinical case.
    - feature_num - A unique identifier for each feature.
    - case_num - A unique identifier for each case.
    - feature_text - A description of the feature.
- train.csv - Feature annotations for 1000 of the patient notes, 100 for each of ten cases.
    - id - Unique identifier for each patient note / feature pair.
    - pn_num - The patient note annotated in this row.
    - feature_num - The feature annotated in this row.
    - case_num - The case to which this patient note belongs.
    - annotation - The text(s) within a patient note indicating a feature. A feature may be indicated multiple times within a single note.
    - location - Character spans indicating the location of each annotation within the note. Multiple spans may be needed to represent an annotation, in which case the spans are delimited by a semicolon ;.

Example Test Data
To help you author submission code, we include a few example instances selected from the training set. When your submitted notebook is scored, this example data will be replaced by the actual test data. The patient notes in the test set will be added to the patient_notes.csv file. These patient notes are from the same clinical cases as the patient notes in the training set. There are approximately 2000 patient notes in the test set.

    - test.csv - Example instances selected from the training set.
    - sample_submission.csv - A sample submission file in the correct format.


Lets do some quick EDA on the `train`, `features` and `patient_notes` datasets. Just to see what hte data looks like. Where the interesting data is, how we can combine it to get what we need and any cleanup we may need to do.  
I'll start with `train` as it seems like thats the center to it all

In [5]:
print("Shape:")
print(train.shape)
print("Info:")
print(train.info())
print("Sample:")
train.sample(1)

Shape:
(14300, 6)
Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14300 entries, 0 to 14299
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   id           14300 non-null  object
 1   case_num     14300 non-null  int64 
 2   pn_num       14300 non-null  int64 
 3   feature_num  14300 non-null  int64 
 4   annotation   14300 non-null  object
 5   location     14300 non-null  object
dtypes: int64(3), object(3)
memory usage: 670.4+ KB
None
Sample:


Unnamed: 0,id,case_num,pn_num,feature_num,annotation,location
6089,41130_409,4,41130,409,['45 y/o'],['0 6']


No `null` values are found in this dataset, that makes null check cleanup _much_ easier!

In [6]:
print("Shape:")
print(features.shape)
print("Info:")
print(features.info())
print("Sample:")
features.sample(1)

Shape:
(143, 3)
Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 143 entries, 0 to 142
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   feature_num   143 non-null    int64 
 1   case_num      143 non-null    int64 
 2   feature_text  143 non-null    object
dtypes: int64(2), object(1)
memory usage: 3.5+ KB
None
Sample:


Unnamed: 0,feature_num,case_num,feature_text
104,705,7,Fatigue


Again, no `null` values is great to see. But there being 143 features and 14300 examples in the training set is a little suspicious. Based off of the `Data Description` the training set includes _feature annotations for 1000 patient notes_ or _100 patient notes per case (for 10 cases)_.  
So if there are 143 unique features across 10 cases - thats 14.3 features per case.

In [7]:
print("Shape:")
print(patient_notes.shape)
print("Info:")
print(patient_notes.info())
print("Sample:")
patient_notes.sample(1)

Shape:
(42146, 3)
Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42146 entries, 0 to 42145
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   pn_num      42146 non-null  int64 
 1   case_num    42146 non-null  int64 
 2   pn_history  42146 non-null  object
dtypes: int64(2), object(1)
memory usage: 987.9+ KB
None
Sample:


Unnamed: 0,pn_num,case_num,pn_history
31029,72429,7,"CC ""problems with period""\r\nHPI Ms Tompkins i..."


In [8]:
# Lets see if we can join these datasets together and get a more complete picture of the data in the training set
full_data = train.join(features[["feature_num", "feature_text"]].set_index("feature_num"), on="feature_num")
full_data = full_data.join(patient_notes[["pn_num", "pn_history"]].set_index("pn_num"), on="pn_num")

In [9]:
ex = full_data.sample().index[0]
print(full_data.iloc[ex].pn_history)
print(full_data.iloc[ex].annotation)
print(full_data.iloc[ex].feature_text)

CC: Irregular periods.
HPI: 35-yo female complaining of heavy and irregular periods for the past 6 months. She has had 2 periods in the past 6 months, lasting about 7 days. She would soak a pad every couple hours during that time. Has also noticed decreased energy. The patient reports some darkening of the skin around her knuckles and the back of her neck. 7-8 lbs. weight gain in past 6 months, appetite normal. Denies other skin changes. No changes in hair. Denies recent increases in stress, headaches. 
PMH: No significant PMH. No medications.
SH: Scheduler for a social work company. Has one beer on the weekends. Never smoker. Divorced. She has two children, ages 3 and 5, who are adopted. The patient states that she and her ex-husband tried to get pregnant for many years but were unsuccessful. No formal workup for infertility in either her or her ex-husband.
FH: Grandmother with cervical cancer. Aunt with breast cancer.
['weight gain', '7-8 lbs gain']
Weight-Gain


There is only 143 features across the entire dataset, I wonder if each feature corresponds to a particular case :thinking_emoji:

In [10]:
features[["feature_text", "case_num"]].groupby("feature_text").size().reset_index(name="counts").sort_values("counts")

Unnamed: 0,feature_text,counts
0,1-day-duration-OR-2-days-duration,1
95,Recent-nausea-vomiting-OR-Recent-flulike-symptoms,1
94,Recent-heavy-lifting-at-work-OR-recent-rock-cl...,1
93,Prior-normal-periods,1
92,Prior-episodes-of-diarrhea,1
...,...,...
5,35-year,2
3,20-year,2
1,17-year,2
63,Male,3


It does seem that the feature text has a pretty high degree of uniqueness across the differrent cases. With only `Male` and `Female` identifiers occuring more than twice.  

Possible improvement: **Consider finding similarities between the feature texts to help improve identification of those features in unseen cases.**

I think I have found a quick and logic-based solution that can give us a relatively decent baseline!  
For each case (we know the case number of future patient notes), we can build a map of our tags (ex: `24-year` or `17-year`) to their corresponding annotations.  
Scheme:  
```
{
    case_num: {
        "tag": ["annotations", ...],
        ...
    },
    ...
}
```  

Ex:  
``` 
{
    0: {
        "1-day-duration-OR-2-days-duration": ['1 day duration', 'yesterday', '2 day', ....]
    }
}
```  

Given a map like this, when a patient note comes in we can find the tag-annotation map for that case and look for examples of existing annotations in the incoming patient document and return the given tag if found!  

First I want to see if there is a lot of overlap of annotations and features within cases, but within doing so, we'll build the map. Also, since the features/tags seem pretty specific to the cases, I'll hold out about 20% of each case from the training set as a validation set.

In [11]:
validation_dataset = pd.DataFrame()
for case_num in full_data.case_num.unique():
    validation_dataset = pd.concat([validation_dataset, full_data[full_data.case_num == case_num].sample(frac=0.2, random_state=24)])
    
train_dataset = full_data.drop(validation_dataset.index)
print(train_dataset.shape)
print(validation_dataset.shape)
print(full_data.shape)

(11440, 8)
(2860, 8)
(14300, 8)


In [12]:
# example feature text
full_data[full_data.feature_text == "Male"].sample(10)

Unnamed: 0,id,case_num,pn_num,feature_num,annotation,location,feature_text,pn_history
9301,60911_601,6,60911,601,['male'],['15 19'],Male,Kane is a 17yo male with a history of exercise...
701,01165_012,0,1165,12,['M'],['6 7'],Male,17 yo M c/o heart racing\r\n- Episodes have be...
428,00769_012,0,769,12,['m'],['17 18'],Male,Mr. Cole is a 17 m with c/o heart pounding. He...
506,00817_012,0,817,12,['M'],['23 24'],Male,Mr. Cleveland is a 17yoM presenting with palpi...
5044,34742_308,3,34742,308,['M'],['11 12'],Male,HPI: 35 yo M c/o stomach pain\r\n-started 2 mo...
9829,61718_601,6,61718,601,['male'],['12 16'],Male,17 year old male presenting with sharp stabbin...
8857,60229_601,6,60229,601,['M'],['21 22'],Male,Mr. Smith is a 17 yo M here with chest pain. H...
415,00768_012,0,768,12,['M'],['8 9'],Male,A 17 YO M COLLEGE STUDENT C/O HEART POUNDING X...
4340,30168_308,3,30168,308,['M'],['25 26'],Male,Patient is a 35 year old M who presents c/o ep...
9865,61761_601,6,61761,601,['male'],['6 10'],Male,17 yo male presents with chest pain since yest...


In [38]:
def preprocess_text(text: str) -> str:
    """Basic text preprocessing to normalize text.
    Only `lower()` for now.
    """
    normalized_text = text.lower()
    
    return normalized_text

case_feature_map = {}
for case_num in features.case_num.unique():
    case_feature_map.update({case_num.item(): {}})
    feature_texts = features[features.case_num==case_num].feature_text.unique()
    for feature in feature_texts:
        case_feature_map[case_num].update({feature: []})
        # build the annotation map off of the training set
        annotations = train_dataset[(train_dataset.case_num == case_num) & (train_dataset.feature_text == feature)].annotation
        for annotation_list_string in annotations:
            # the lists are stored as strings so `eval` can be used to make them explicit lists
            annotation_list = eval(annotation_list_string)
            if len(annotation_list) > 0:
                for annotation in annotation_list:
                    # Normalize the annotations
                    annotation = preprocess_text(annotation)
                    
                    # There are a large number of annotations of single letters (ie "M" for the featuer "male"), 
                    # to avoid matching all of those charcters, Ill naively pad those annotations with spaces
                    if len(annotation) == 1:
                        annotation = " " + annotation + " "
                    if annotation not in case_feature_map[case_num][feature]:
                        case_feature_map[case_num][feature].append(annotation)

# save map for use later.
with open("baseline_feat_map.json", 'w') as savefile:
    json.dump(case_feature_map, savefile, indent=4)

Lets see if thee is much ovelap between features (within the same case)

In [35]:
case_feature_map

{0: {'Family-history-of-MI-OR-Family-history-of-myocardial-infarction': ['dad with recent heart attcak',
   'father: heart attack',
   'father mi',
   'dad-mi',
   'father had acute mi',
   'father heart attach',
   'dad had recent mi',
   'father heart problem',
   'mi in the father',
   'dad mi',
   'father had an mi',
   'dad had mi',
   'father had a heart attack',
   'father had mi',
   'father with mi',
   'father had heart attack',
   'dad had heart attack',
   'heart attack 1 year ago - father',
   'father heart attack',
   'heart attack in father',
   'mi at 52 for father',
   'dad - heart attack',
   'father-recently had an mi',
   'father-mi',
   'mi in his father',
   'father had problems with heart',
   'fh positive for a recent heart attack',
   'father- mi',
   'dad has cardiac issues',
   'dad with recent heart attack',
   'father - mi',
   'father had possible mi',
   'dad, heart attack',
   'dad had a heart attack',
   'father suffer an mi',
   'father with herat atta

In [15]:
case_feature_map[0].keys()

dict_keys(['Family-history-of-MI-OR-Family-history-of-myocardial-infarction', 'Family-history-of-thyroid-disorder', 'Chest-pressure', 'Intermittent-symptoms', 'Lightheaded', 'No-hair-changes-OR-no-nail-changes-OR-no-temperature-intolerance', 'Adderall-use', 'Shortness-of-breath', 'Caffeine-use', 'heart-pounding-OR-heart-racing', 'Few-months-duration', '17-year', 'Male'])

In [16]:
for c in list(case_feature_map.keys()):
    feature_list = list(case_feature_map[c].keys())
    for i, f in enumerate(feature_list):
        for next_f in feature_list[i+1:]:
            overlap = set(case_feature_map[c][f]).intersection(set(case_feature_map[c][next_f]))
            if len(overlap) > 0:
                print(c, f, next_f, overlap)
# set(case_feature_map[0]['Family-history-of-MI-OR-Family-history-of-myocardial-infarction']).intersection(set(case_feature_map[0]['Family-history-of-thyroid-disorder']))

1 Prior-episodes-of-diarrhea Recurrent-bouts-over-past-6-months {'similar episodes 3-4 times in past 6 months'}
2 Female 44-year {'44f'}
5 Onset-5-years-ago Increased-frequency-recently {'got worse 3 weeks ago'}
5 Female 26-year {'26 y/o'}
5 No-illicit-drug-use Increased-stress {'does not use recreational drugs'}
6 Male Sharp-OR-stabbing-OR-7-to-8-out-of-10-on-pain-scale {'sharp'}
6 Recent-upper-respiratory-symptoms Worse-with-deep-breath-OR-pleuritic {'worse with deep breathing'}


Honestly, less than I thought was going to be the case. A future problem to solve to continue to improve, respect the results of the baseline would be to find ways of alleviating these overlaps to avoid confusion. For now, we'll ignore these and have a small amount of noise for our baseline.  

Next I want to consider how we evaluate the results.  
The problem coming from Kaggle requests to use the micro-f1 score as a validation metric, so I'll use that here. However, there are other ways we could calculate meaningful metrics.  


Anotehr way of validation:  
Our validation set contains the character locations of the features within the text via annotations, I think a good way to evaluate how our system is doing is by calculating something akin to mAP for computer vision, but when simplified for our use case basically just becomes a "mean Jaccard similarity".  

Meaning, for an individual feature, we will find annotations in our map that exist in the patient note and return the character indexes of the substring AND the substring itself. Comparing the _true annotation(s)_ to the _predicted annotation(s)_ we can use the (average if there are multiple) Jaccard similarity (Intersection / Union) to determine how close we are.  

The problem itself, again, is to return the best annotation text/location for a given case+pn+feature.

In [17]:
import re

In [69]:
preds = {}
j = 0
for i,r in validation_dataset.iterrows():
    preds.update({
        r.id: {
            'annotation': [],
            'location': []
        }
    })
    for annotation in case_feature_map[r.case_num][r.feature_text]:
        pn_history = preprocess_text(r.pn_history)
        if annotation in pn_history:
            start_idx = pn_history.find(annotation)
            end_idx = start_idx+len(annotation)
            preds[r.id]['annotation'].append(annotation)
            preds[r.id]['location'].append(f'{start_idx} {end_idx}')

            # for match in re.finditer(annotation, pn_history):
            #     start_idx = match.start()
            #     end_idx = match.end()
            #     preds[r.id]['annotation'].append(annotation)
            #     preds[r.id]['location'].append(f'{start_idx} {end_idx}')
    j += 1
    

In [70]:
def jaccard_sim(pred_list, actual_list) -> float:
     return len(set(pred_list).intersection(set(actual_list))) / len(set(pred_list).union(set(actual_list)))

def jaccard_score(pred_list, actual_list) -> float:
    # if its a negative example and we found no annotations, call it good
    if len(pred_list) == 0 and len(actual_list) == 0:
        return 1.0

    return jaccard_sim(pred_list, actual_list)

def tp_score(pred_span: set, true_span: set) -> int:
    return len(pred_span.intersection(true_span))

def fn_score(pred_span: set, true_span: set) -> int:
    return len(true_span) - len(pred_span.intersection(true_span))

def fp_score(pred_span: set, true_span: set) -> int:
    return len(pred_span) - len(pred_span.intersection(true_span))

def precision(tp, fp):
    return tp / (tp + fp)

def recall(tp, fn):
    return tp / (tp+fn)

def f1(precision, recall):
    return 2 * ((precision*recall) / (precision+recall))

def location_list_to_set(loc_list: list) -> set:
    output_loc_list = []
    for l in loc_list:
        for i in l.split(";"):
            start = eval(i.split()[0])
            end = eval(i.split()[1])
            output_loc_list += list(range(start, end))

    return set(output_loc_list)

sim_scores = {}
for i, pred in preds.items():
    sim_scores.update({
        i: {}
    })
    actual = validation_dataset[validation_dataset.id == i].squeeze()
    
    # Jaccard
    true_annot_list = eval(actual.annotation)
    sim_scores[i]['jaccard_sim'] = jaccard_score(pred['annotation'], true_annot_list)
    
    # Span overlap
    true_location_list = eval(actual.location)
    true_location_set: set = location_list_to_set(true_location_list)
    pred_location_set: set = location_list_to_set(pred["location"])
    sim_scores[i]['tp'] = tp_score(pred_location_set, true_location_set)
    sim_scores[i]['fn'] = fn_score(pred_location_set, true_location_set)
    sim_scores[i]['fp'] = fp_score(pred_location_set, true_location_set)
    

In [71]:
total_tp = sum(v['tp'] for v in sim_scores.values())
total_fp = sum(v['fp'] for v in sim_scores.values())
total_fn = sum(v['fn'] for v in sim_scores.values())
micro_precision = precision(total_tp, total_fp)
micro_recall = recall(total_tp, total_fn)
micro_f1 = f1(micro_precision, micro_recall)
print("TP sum:", total_tp)
print("FP sum:", total_fp)
print("FN sum:", total_fn)
print("Micro precision:", micro_precision)
print("Micro recall:", micro_recall)
print("Micro f1:", micro_f1)

TP sum: 16895
FP sum: 2472
FN sum: 20931
Micro precision: 0.8723602003407859
Micro recall: 0.44665045207000476
Micro f1: 0.5908065672372492


The results are mostly what I would expect to see. Since we're looking for very specific strings, when we see a matching annotation in our map then its a high chance that its a good annotation. However, many of the unseen patient history and annotations do not match what we've built our map with. Language is a fickle beast. I've only done basic text normalization to this point as well, which would be another point of improvement given some time.  

At this point we have a baseline to compare any future approach too, which is the main goal. Below is so psuedo-random analysis of the results, trying to find any low hanging fruit for improving this implementation. 

In [22]:
results_df = pd.DataFrame(sim_scores).transpose()
results_df.sort_values("fn")

Unnamed: 0,jaccard_sim,tp,fn,fp
02223_003,1.0,8.0,0.0,0.0
60315_607,0.5,17.0,0.0,0.0
61407_605,0.0,23.0,0.0,6.0
60620_606,1.0,10.0,0.0,0.0
60116_609,1.0,0.0,0.0,0.0
...,...,...,...,...
55578_505,0.0,0.0,101.0,0.0
31543_313,0.0,0.0,103.0,0.0
52661_505,0.0,0.0,104.0,0.0
20360_200,0.0,0.0,115.0,0.0


In [23]:
ex_id = "55578_505"

Ms. Whelan is a 26 yo F complaining of palpitations. She reports episodes of palpitations for past five years, however, recently they have increased in frequency without clear trigger. During an episode she reports feeling shortness of breath, nausea, feeling overheated and then clammy and cold, and sometimes a sense that she might die. These episodes are unpleasant and have caused her quite a bit of distress. She visited ED two weeks ago for such an episode where she also experienced hand numbness. At that time cardiac enzymes, ECG, CBC and metabolic panel were normal, but Ms. Whalen reports she was no longer feeling her symptoms when they performed the ECG.\r\nNo pyschiatric or thyroid history. No chest pain. No headche, weakness, numbness, heat intolerance, skin/hair changes.\r\nPMH/PSH: none\r\nMeds: none\r\nAllergies: NKDA\r\nFMHx: no history of palpitations\r\nSocial: minimal alcohol usage, no smoking/drugs, under stress due to unemployment

In [24]:
preds[ex_id]

{'annotation': [], 'location': []}

In [25]:
validation_dataset[validation_dataset.id == ex_id].squeeze().pn_history

'Ms. Whelan is a 26 yo F complaining of palpitations. She reports episodes of palpitations for past five years, however, recently they have increased in frequency without clear trigger. During an episode she reports feeling shortness of breath, nausea, feeling overheated and then clammy and cold, and sometimes a sense that she might die. These episodes are unpleasant and have caused her quite a bit of distress. She visited ED two weeks ago for such an episode where she also experienced hand numbness. At that time cardiac enzymes, ECG, CBC and metabolic panel were normal, but Ms. Whalen reports she was no longer feeling her symptoms when they performed the ECG.\r\nNo pyschiatric or thyroid history. No chest pain. No headche, weakness, numbness, heat intolerance, skin/hair changes.\r\nPMH/PSH: none\r\nMeds: none\r\nAllergies: NKDA\r\nFMHx: no history of palpitations\r\nSocial: minimal alcohol usage, no smoking/drugs, under stress due to unemployment'

In [26]:
ex_val = validation_dataset[validation_dataset.id == ex_id].squeeze()
validation_dataset[validation_dataset.id == ex_id]

Unnamed: 0,id,case_num,pn_num,feature_num,annotation,location,feature_text,pn_history
8381,55578_505,5,55578,505,['visited ED two weeks ago for such an episode...,['418 462;518 575'],Recent-visit-to-emergency-department-with-nega...,Ms. Whelan is a 26 yo F complaining of palpita...


In [27]:
case_feature_map[ex_val.case_num][ex_val.feature_text]

['was seen 2 weeks ago for similar symptoms with a w/u wnl',
 'in ed 2 weeks ago bloodwork and ecg) were all unremarkable',
 'in ed 2 weeks ago cbc, cmp, cardiac enzymes and ecg) were all unremarkable',
 '2 weeks ago evaluation in emergency room was unrevelaing',
 '2 weeks ago in emergency room cbc, metabolic panel, cardiac enzymes and ekg in normal limits',
 '2 weeks ago presented to the ed evaluation was unrevelaing',
 '2 weeks ago she went to ed with normal workup',
 '2 weeks ago ed work up included ekg, cbc, cmp were normal',
 '1 week ago went to the ed cbc, bpm, ekg came back normal',
 '1 week ago at the ed cbc, bpm, ekg came back normal',
 'visit to the emergency department 2 weeks ago: complete blood count - within normal limits; metabolic panel - within normal limits; cardiac enzymes - within normal limits; ecg - within normal limits',
 '3 weeks ago visited the er, the ecg, heart enzymes, metabolic panel and cbc weree all normal',
 'in ed 2 weeks ago normal cbc, chem panel, car

In [29]:
results_df[(results_df.fn > 0.0) & (results_df.tp == 0.0) & (results_df.fp == 0.0)].shape[0]/results_df[(results_df.fn > 0.0)].shape[0]

0.6378504672897196

63% of the cases we had false negatives (missed annotations) we didnt find any annotations, this is a hard problem to solve without a more complex approach.  

Now that we have a baseline and a better idea of our data, I'll switch to the `ner_approach.ipynb` notebook to continue the NER analysis over there.  

I'll also add this approach as an endpoint in our API service to the sake of completeness.