### Purpose
Test different models to see if any one gives reasonable extraction
1. flan-t5-large (as text generation)
2. d4data/biomedical-ner-all (as NER)
3. deepset/roberta-base-squad2 (as question-answer)
4. valhalla/bart-large-finetuned-squadv1 (as question-answer, may need to tweak the score to remove null answer)
5. deepset/xlm-roberta-large-squad2 (looks promising doesn't run on my laptop)
6. facebook/bart-large (doesn't seem to have an easy way for text generation, will investigate more)
7. meta-llama/Meta-Llama-3.1-8B-Instruct (runs very slow, prefer a GPU resource)

### 1. Load and use the first 10 notes as test examples

In [1]:
import pandas as pd

df = pd.read_csv('raw_data/Assignment_Data.csv')
notes = df['discharge_note'].values[:10]
notes

array(['Good recovery trajectory. Follow-up scan scheduled next month.',
       'Stable post-surgery. Advised to avoid physical exertion.',
       'Symptoms controlled. Monitoring for relapse advised.',
       'Stable post-surgery. Advised to avoid physical exertion.',
       'Stable post-surgery. Advised to avoid physical exertion.',
       'Good recovery trajectory. Follow-up scan scheduled next month.',
       'Discharge after recovery from pneumonia. No complications observed.',
       'Patient discharged in stable condition. Recommend follow-up in 2 weeks.',
       'Patient showed improvement. Prescribed antibiotics for 5 days.',
       'Blood pressure under control. Continue current medication.'],
      dtype=object)

### 2. Test performance of each model

In [2]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
pipe = pipeline('text2text-generation', model='google/flan-t5-large')

prompt = '''
Extract diagnoses mentioned in the following clinical note. 
If no diagnoses are present, return an empty string.

Note: {note}
'''

for note in notes:
    print(pipe(prompt.format(note=note)))

Device set to use cpu


[{'generated_text': 'Good'}]
[{'generated_text': 'Stable post-surgery. Advised to avoid physical exertion'}]
[{'generated_text': 'Symptoms'}]
[{'generated_text': 'Stable post-surgery. Advised to avoid physical exertion'}]
[{'generated_text': 'Stable post-surgery. Advised to avoid physical exertion'}]
[{'generated_text': 'Good'}]
[{'generated_text': 'pneumonia'}]
[{'generated_text': 'stable'}]
[{'generated_text': 'antibiotics'}]
[{'generated_text': 'Blood pressure under control'}]


In [4]:
pipe = pipeline("ner", model='d4data/biomedical-ner-all')

for note in notes:
    print(pipe(prompt.format(note=note), aggregation_strategy="max"))

Device set to use cpu


[]
[{'entity_group': 'Lab_value', 'score': 0.97761816, 'word': 'stable', 'start': 122, 'end': 128}]
[{'entity_group': 'Therapeutic_procedure', 'score': 0.9738928, 'word': 'relapse', 'start': 158, 'end': 165}]
[{'entity_group': 'Lab_value', 'score': 0.97761816, 'word': 'stable', 'start': 122, 'end': 128}]
[{'entity_group': 'Lab_value', 'score': 0.97761816, 'word': 'stable', 'start': 122, 'end': 128}]
[]
[{'entity_group': 'Clinical_event', 'score': 0.9506174, 'word': 'discharge', 'start': 122, 'end': 131}, {'entity_group': 'Sign_symptom', 'score': 0.93387884, 'word': 'complications', 'start': 166, 'end': 179}]
[{'entity_group': 'Clinical_event', 'score': 0.99574226, 'word': 'discharged', 'start': 130, 'end': 140}, {'entity_group': 'Lab_value', 'score': 0.9987619, 'word': 'stable', 'start': 144, 'end': 150}, {'entity_group': 'Clinical_event', 'score': 0.96198046, 'word': 'follow', 'start': 172, 'end': 178}]
[{'entity_group': 'Medication', 'score': 0.99982125, 'word': 'antibiotics', 'start

In [5]:
pipe = pipeline("question-answering", model="deepset/roberta-base-squad2")
question = "What is the diagnosis of this patient"

for note in notes:
    print(pipe(question=question, context=note))

Device set to use cpu


{'score': 0.45707806944847107, 'start': 0, 'end': 24, 'answer': 'Good recovery trajectory'}
{'score': 0.4245307445526123, 'start': 0, 'end': 19, 'answer': 'Stable post-surgery'}
{'score': 0.4784463346004486, 'start': 9, 'end': 19, 'answer': 'controlled'}
{'score': 0.4245307445526123, 'start': 0, 'end': 19, 'answer': 'Stable post-surgery'}
{'score': 0.4245307445526123, 'start': 0, 'end': 19, 'answer': 'Stable post-surgery'}
{'score': 0.45707806944847107, 'start': 0, 'end': 24, 'answer': 'Good recovery trajectory'}
{'score': 0.02830599807202816, 'start': 41, 'end': 66, 'answer': 'No complications observed'}
{'score': 0.5185332894325256, 'start': 22, 'end': 38, 'answer': 'stable condition'}
{'score': 0.0926193818449974, 'start': 0, 'end': 26, 'answer': 'Patient showed improvement'}
{'score': 0.10162846744060516, 'start': 0, 'end': 28, 'answer': 'Blood pressure under control'}


In [6]:
pipe = pipeline("question-answering", model="valhalla/bart-large-finetuned-squadv1")
question = "What is the diagnosis of this patient"

for note in notes:
    print(pipe(question=question, context=note))

You passed along `num_labels=3` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
Device set to use cpu


{'score': 0.03655242547392845, 'start': 0, 'end': 13, 'answer': 'Good recovery'}
{'score': 0.06140803173184395, 'start': 0, 'end': 6, 'answer': 'Stable'}
{'score': 0.43134814500808716, 'start': 36, 'end': 43, 'answer': 'relapse'}
{'score': 0.06140803173184395, 'start': 0, 'end': 6, 'answer': 'Stable'}
{'score': 0.06140803173184395, 'start': 0, 'end': 6, 'answer': 'Stable'}
{'score': 0.03655242547392845, 'start': 0, 'end': 13, 'answer': 'Good recovery'}
{'score': 0.9923176169395447, 'start': 30, 'end': 39, 'answer': 'pneumonia'}
{'score': 0.6014525890350342, 'start': 22, 'end': 38, 'answer': 'stable condition'}
{'score': 0.11202436685562134, 'start': 15, 'end': 26, 'answer': 'improvement'}
{'score': 0.1728876680135727, 'start': 0, 'end': 28, 'answer': 'Blood pressure under control'}


In [7]:
notes

array(['Good recovery trajectory. Follow-up scan scheduled next month.',
       'Stable post-surgery. Advised to avoid physical exertion.',
       'Symptoms controlled. Monitoring for relapse advised.',
       'Stable post-surgery. Advised to avoid physical exertion.',
       'Stable post-surgery. Advised to avoid physical exertion.',
       'Good recovery trajectory. Follow-up scan scheduled next month.',
       'Discharge after recovery from pneumonia. No complications observed.',
       'Patient discharged in stable condition. Recommend follow-up in 2 weeks.',
       'Patient showed improvement. Prescribed antibiotics for 5 days.',
       'Blood pressure under control. Continue current medication.'],
      dtype=object)

### Process all notes with bart-large-finetuned-squadv1

In [8]:
df = pd.read_csv('raw_data/Assignment_Data.csv')
notes = df['discharge_note'].tolist()

In [9]:
todo_list = {
    'diagnosis': 'What diagnosis is mentioned in the note?',
    'treatment': 'What treatment or procedure is mentioned in the note?',
    'symptom': 'What symptom is described in the note?',
    'medication': 'What medication or drug is mentioned in the note?',
    'actions': 'What follow-up action or recommendation is mentioned in the note?'
}

In [10]:
pipe = pipeline("question-answering", model="valhalla/bart-large-finetuned-squadv1")

# form batch for each question
all_dfs = []
for key, question in todo_list.items():
    print(f"doing {key}")
    batch = [
        {'question': question, 'context': note} for note in notes
    ]
    result = pipe(batch)
    tmp = pd.DataFrame([{f'{key}_score': x['score'], f'{key}_answer': x['answer']} for x in result])
    all_dfs.append(tmp)

You passed along `num_labels=3` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
You passed along `num_labels=3` with an incompatible id to label map: {'0': 'LABEL_0', '1': 'LABEL_1'}. The number of labels wil be overwritten to 2.
Device set to use cpu


doing diagnosis
doing treatment
doing symptom
doing medication
doing actions


In [15]:
tmp = pd.concat(all_dfs, axis=1)
df_output = pd.concat([df[['patient_id', 'discharge_note']], tmp], axis=1)
df_output.to_csv('note_extraction_output.csv', index=False)

### Final Thoughts

1. Among the models tested, valhalla/bart-large-finetuned-squadv1 shows the most promise overall.
2. The extraction of follow-up actions works relatively well, but other categories (diagnosis, treatment, symptom, medication) remain inconsistent, particularly when the modelâ€™s confidence scores are low.