## Import the medical note

Run this code to import the dataset

In [1]:
import pandas as pd

# As an example for this test, we use the discharge medical note
discharge_gz = 'C:/3163dataset/discharge.csv.gz'
discharge = pd.read_csv(discharge_gz, compression='gzip')

Create a variable to store a single patient's medical notes

In [2]:
patient_admission_notes = discharge.loc[discharge['subject_id'] == 10000032]['text']
patient_admission_notes

0     \nName:  ___                     Unit No:   _...
1     \nName:  ___                     Unit No:   _...
2     \nName:  ___                     Unit No:   _...
3     \nName:  ___                     Unit No:   _...
Name: text, dtype: object

## NLP stage 

### BART Model
- This uses Hugging Face Transformers' BART model (standard summarizer model)
- Install the packages: transformes AND pytorch

Credits to: https://blog.gopenai.com/simplifying-healthcare-text-summarization-of-medical-notes-with-python-391c3a1e738d

In [3]:
# This variable stores only one medical note
note = patient_admission_notes[0]

# INSTALL transformers AND pytorch beforehand
from transformers import BartForConditionalGeneration, BartTokenizer
import torch

# BART model
model_name = "facebook/bart-large-cnn"
model = BartForConditionalGeneration.from_pretrained(model_name)
tokenizer = BartTokenizer.from_pretrained(model_name)

# Tokenize the medical note using the BART model and summarise the note
inputs = tokenizer.encode("summarize: " + note, return_tensors="pt", max_length=1024, truncation=True)
summary_ids = model.generate(inputs, max_length=500, min_length=200, length_penalty=2.0, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary)

Pt has HCV cirrhosis c/b ascites, hiv on ART, h/o IVDU, COPD, PTSD, PTSD. She reported self-discontinuing lasix and spirnolactone because she feels like "they don't do anything" and that she "doesn't want to put more chemicals in her" In the past week, she notes that she                 has been having worsening abd distension and discomfort. She denies easy bruising, melena, BRBPR,                 hemetesis, hemoptysis, or hematuria. She also had a skin lesion, which was biopsied and showed  skin cancer per patient report. She is not aware of any liver disease or liver disease in her family. Her last alcohol consumption was one drink two months ago. She had food poisoning a week ago from eating stale                 cake (n/v 20 min after food ingestion) She denies other recent illness or sick contacts.


### The result using BART model

Pt has HCV cirrhosis c/b ascites, hiv on ART, h/o IVDU, COPD, PTSD, PTSD. She reported self-discontinuing lasix and spirnolactone because she feels like "they don't do anything" and that she "doesn't want to put more chemicals in her" In the past week, she notes that she                 has been having worsening abd distension and discomfort. She denies easy bruising, melena, BRBPR,                 hemetesis, hemoptysis, or hematuria. She also had a skin lesion, which was biopsied and showed  skin cancer per patient report. She is not aware of any liver disease or liver disease in her family. Her last alcohol consumption was one drink two months ago. She had food poisoning a week ago from eating stale                 cake (n/v 20 min after food ingestion) She denies other recent illness or sick contacts.

- (+) It was able to summarise the very long medical note to a single paragraph in 41.3 seconds
- (-) It is not a scientifically-trained model

### BioGPT

In [3]:
# This variable stores only one medical note
note = patient_admission_notes[0]

from transformers import AutoTokenizer, BioGptModel

# BioGPT
tokenizer = AutoTokenizer.from_pretrained("microsoft/biogpt")
model = BioGptModel.from_pretrained("microsoft/biogpt")

# Tokenization and summarisation
inputs = tokenizer.encode("summarize: " + note, return_tensors="pt", max_length=1024, truncation=True)
summary_ids = model.generate(inputs, max_length=500, min_length=200, length_penalty=2.0, num_beams=4, early_stopping=True)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print(summary)

ImportError: You need to install sacremoses to use BioGptTokenizer. See https://pypi.org/project/sacremoses/ for installation.

Multiple patient notes

In [None]:
# TO-DO