# Viewing Dataset File

---

## Intentions

- to be able to extract a predetermined subset of the overall dataset
allowing a brief overview and analysis.
- After extracting we now want to analyse using Spacy or Hugging Face Transformations (HFT).

In [1]:
# imports
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt

In [2]:
# read all the csv files and store them respectively

# create function to limit rows read
def read_first_n_rows(data, n):
    df = pd.read_csv(data, nrows=n)
    return df

rows = 1000
discharge_detail_data = read_first_n_rows("discharge_detail.csv", rows)

In [3]:
discharge_detail_data.head()

Unnamed: 0,note_id,subject_id,field_name,field_value,field_ordinal
0,10000032-DS-21,10000032,author,___,1
1,10000032-DS-22,10000032,author,___,1
2,10000032-DS-23,10000032,author,___,1
3,10000032-DS-24,10000032,author,___,1
4,10000084-DS-17,10000084,author,___,1


In [4]:
# read in discharge.csv 
discharge_data = read_first_n_rows("discharge.csv", rows)


In [5]:
discharge_data.head()

Unnamed: 0,note_id,subject_id,hadm_id,note_type,note_seq,charttime,storetime,text
0,10000032-DS-21,10000032,22595853,DS,21,2180-05-07 00:00:00,2180-05-09 15:26:00,\nName: ___ Unit No: _...
1,10000032-DS-22,10000032,22841357,DS,22,2180-06-27 00:00:00,2180-07-01 10:15:00,\nName: ___ Unit No: _...
2,10000032-DS-23,10000032,29079034,DS,23,2180-07-25 00:00:00,2180-07-25 21:42:00,\nName: ___ Unit No: _...
3,10000032-DS-24,10000032,25742920,DS,24,2180-08-07 00:00:00,2180-08-10 05:43:00,\nName: ___ Unit No: _...
4,10000084-DS-17,10000084,23052089,DS,17,2160-11-25 00:00:00,2160-11-25 15:09:00,\nName: ___ Unit No: __...


In [6]:
# retrieve the column names
discharge_data_column_names = []
for column in discharge_data.columns:
    discharge_data_column_names.append(column)
discharge_data_column_names

['note_id',
 'subject_id',
 'hadm_id',
 'note_type',
 'note_seq',
 'charttime',
 'storetime',
 'text']

In [7]:
radiology_detail_data = read_first_n_rows("radiology_detail.csv", rows)
radiology_detail_data.head()

Unnamed: 0,note_id,subject_id,field_name,field_value,field_ordinal
0,10000032-RR-14,10000032,exam_code,C11,1
1,10000032-RR-14,10000032,exam_name,CHEST (PA & LAT),1
2,10000032-RR-15,10000032,exam_code,U314,1
3,10000032-RR-15,10000032,exam_code,U644,3
4,10000032-RR-15,10000032,exam_code,W82,2


In [8]:
radiology_data = read_first_n_rows("radiology.csv", rows)
radiology_data.head()

Unnamed: 0,note_id,subject_id,hadm_id,note_type,note_seq,charttime,storetime,text
0,10000032-RR-14,10000032,22595853.0,RR,14,2180-05-06 21:19:00,2180-05-06 23:32:00,EXAMINATION: CHEST (PA AND LAT)\n\nINDICATION...
1,10000032-RR-15,10000032,22595853.0,RR,15,2180-05-06 23:00:00,2180-05-06 23:26:00,EXAMINATION: LIVER OR GALLBLADDER US (SINGLE ...
2,10000032-RR-16,10000032,22595853.0,RR,16,2180-05-07 09:55:00,2180-05-07 11:15:00,"INDICATION: ___ HCV cirrhosis c/b ascites, hi..."
3,10000032-RR-18,10000032,,RR,18,2180-06-03 12:46:00,2180-06-03 14:01:00,EXAMINATION: Ultrasound-guided paracentesis.\...
4,10000032-RR-20,10000032,,RR,20,2180-07-08 13:18:00,2180-07-08 14:15:00,EXAMINATION: Paracentesis\n\nINDICATION: ___...


# Next Step - HFT Text Summarisation

---

The next step in the process is to be able to evaluate if the HFT library will be able to summarise the text
to a desired level. 

## Install the HFT Library

### Installing Transformers and Importing Dependencies


In [9]:
%pip install transformers
%pip install tensorflow

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Load Summarisation Pipeline

In [10]:
from transformers import pipeline

summarizer = pipeline("summarization")

  torch.utils._pytree._register_pytree_node(
No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


### Summarise Text


In [11]:
text = """
‘Have some shame’: Boxing world reacts to Mike Tyson news
Boxing has been stunned by the announcement Mike Tyson will fight one of the biggest names in the sport and not everyone is happy.

2 min read
March 8, 2024 - 6:51AM
4 comments
Boxing: Jake Paul took down Ryan Bourland with a first round TKO in their bout in... more
Up Next
Spying probe finds comms equipment on Chinese cranes at…
Cancel
More From Boxing
Former heavyweight champ’s major call on Aussie young gun
Former heavyweight champ’s major call on Aussie young gun
Meet the undefeated Aussie who could be our next UFC champion
Meet the undefeated Aussie who could be our next UFC champion
‘I will take it straight away’: Hall chases Gallen, SBW, rematches
‘I will take it straight away’: Hall chases Gallen, SBW, rematches
Former heavyweight champion Mike Tyson will return to the ring to face Youtuber-turned-boxer Jake Paul in an exhibition bout screened on Netflix, organisers announced on Friday (AEDT).

Tyson, 57, and Paul will face off at the AT&T Stadium in Arlington, Texas, the home of the NFL’s Dallas Cowboys, on July 21, Netflix and Most Valuable Promotions (MVP) announced.

Early reports claimed the fight would be free to Netflix subscribers.

“Given the names involved, that it’s not on PPV and is on Netflix, which is everywhere globally it may be the most watched boxing event ever,” tweeted boxing reporter Dan Rafael.

Mike Tyson and Jake Paul square off. Picture: Netflix
Mike Tyson and Jake Paul square off. Picture: Netflix
Tyson said in a statement he was looking forward to fighting an opponent who is 30 years his junior, insisting that he had been impressed by Paul’s performances in his fledgling boxing career.

“He’s grown significantly as a boxer over the years, so it will be a lot of fun to see what the will and ambition of a ‘kid’ can do with the experience and aptitude of a GOAT,” Tyson said.

Paul fought on the undercard of Tyson’s last outing, an eight-round exhibition against former middleweight king Roy Jones Jr. in Los Angeles in 2020.

Mike Tyson was in top shape when he last fought against Roy Jones Jr in 2020. (Photo by Joe Scarnici / GETTY)
Mike Tyson was in top shape when he last fought against Roy Jones Jr in 2020. (Photo by Joe Scarnici / GETTY)
“I started him on his boxing journey on the undercard of my fight with Roy Jones and now I plan to finish him,” Tyson quipped.

The fight is the latest in what has become a popular trend in recent years, pitting internet celebrities against each other or against recognised boxers, who have already retired or are well past their prime.

Jake Paul’s brother Logan Paul played a key role in pioneering the trend, even fighting against boxing icon Floyd Mayweather in 2021.

Jake Paul was far too good for Ryan Bourland during their cruiserweight fight on March 2. (Photo by Al Bello/Getty Images)
Jake Paul was far too good for Ryan Bourland during their cruiserweight fight on March 2. (Photo by Al Bello/Getty Images)
Jake Paul however has become a bigger draw, building a 9-1 record with six knockouts.

He scored back-to-back knockouts against professional boxers in his last two fights with victories over Ryan Bourland and Andre August.

Tyson, meanwhile, was regarded as one of the most ferocious heavyweight boxers in history.

He reigned as undisputed champion between 1987 and 1990 and won his first belt at the age of 20 years, four months and 22 days to become the youngest heavyweight champion in history.

NEWS.COM.AU00:20
Mike Tyson's insane power at 53-years-old
UP NEXT







Mike Tyson lets the world know that he's still got incredible power at the age of 53.
“He’s the greatest heavyweight of all time…the most vicious KO artist ever,” Paul tweeted after the bout was announced.

“But I’m younger, I’m faster and I’m going to be working my ass off to get stronger. A member of my team sent me this video that Mike’s coach put up two weeks ago and asked me if I’m sure that I want to do this ... yes, yes I do. Heavyweight.”

“My sights are set on becoming a world champion, and now I have a chance to prove myself against the greatest heavyweight champion of the world, the baddest man on the planet and the most dangerous boxer of all time. Time to put Iron Mike to sleep,” Paul added.

But it wasn’t long before other fighters started criticising the match-up.

More Coverage

Horner accuser suspended by Red Bull

Boxing upset throws Tszyu plan into disarray
“You’re fighting someone who was born in 1966,” tweeted Dillon Danis, who fought and lost to Logan Paul. “Have some shame.”

“You should be ashamed of yourself,” UFC great Michael Bisping tweeted to Paul. “And the biggest joke is you don’t even slightly realise why.”
"""

In [14]:
summarizer(text, min_length=120, do_sample=False)

IndexError: index out of range in self

# Summarising Patient Medical History ID - 10000032

---

Using the BART model we will summarise the medical history of patient 10000032

Steps:
- collect all the medical notes for this specific patient
- parse this information into the HFT model
- Analyse the output.

In [9]:
subject_info = []
patient_id = 10000032

# collect the text from patient 10000032
df = radiology_data
df = radiology_data.iloc[:,[1, 7]]
df

Unnamed: 0,subject_id,text
0,10000032,EXAMINATION: CHEST (PA AND LAT)\n\nINDICATION...
1,10000032,EXAMINATION: LIVER OR GALLBLADDER US (SINGLE ...
2,10000032,"INDICATION: ___ HCV cirrhosis c/b ascites, hi..."
3,10000032,EXAMINATION: Ultrasound-guided paracentesis.\...
4,10000032,EXAMINATION: Paracentesis\n\nINDICATION: ___...
...,...,...
995,10003299,EXAM: MRI brain. MRA head. MRA neck.\n\nCLI...
996,10003299,INDICATION: Evaluation of patient who stepped...
997,10003299,INDICATION: Bilateral hard tissue per requisi...
998,10003299,HISTORY: Screening.\n\nDIGITAL SCREENING MAMM...


In [10]:
df

Unnamed: 0,subject_id,text
0,10000032,EXAMINATION: CHEST (PA AND LAT)\n\nINDICATION...
1,10000032,EXAMINATION: LIVER OR GALLBLADDER US (SINGLE ...
2,10000032,"INDICATION: ___ HCV cirrhosis c/b ascites, hi..."
3,10000032,EXAMINATION: Ultrasound-guided paracentesis.\...
4,10000032,EXAMINATION: Paracentesis\n\nINDICATION: ___...
...,...,...
995,10003299,EXAM: MRI brain. MRA head. MRA neck.\n\nCLI...
996,10003299,INDICATION: Evaluation of patient who stepped...
997,10003299,INDICATION: Bilateral hard tissue per requisi...
998,10003299,HISTORY: Screening.\n\nDIGITAL SCREENING MAMM...


In [11]:
n = len(df)
patient_list = dict()

for _, row in df.iterrows():
    patient_list[row["subject_id"]] = "" 

for index, row in df.iterrows():
    patient_list[row["subject_id"]] += row["text"] + "\n"

patient_list

{10000032: "EXAMINATION:  CHEST (PA AND LAT)\n\nINDICATION:  ___ with new onset ascites  // eval for infection\n\nTECHNIQUE:  Chest PA and lateral\n\nCOMPARISON:  None.\n\nFINDINGS: \n\nThere is no focal consolidation, pleural effusion or pneumothorax.  Bilateral\nnodular opacities that most likely represent nipple shadows. The\ncardiomediastinal silhouette is normal.  Clips project over the left lung,\npotentially within the breast. The imaged upper abdomen is unremarkable.\nChronic deformity of the posterior left sixth and seventh ribs are noted.\n\nIMPRESSION: \n\nNo acute cardiopulmonary process.\n\nEXAMINATION:  LIVER OR GALLBLADDER US (SINGLE ORGAN)\n\nINDICATION:  ___ year-old female with cirrhosis, jaundice.\n\nTECHNIQUE:  Grey scale and color Doppler ultrasound images of the abdomen were\nobtained.\n\nCOMPARISON:  None.\n\nFINDINGS: \n\nLIVER: The liver is coarsened and nodular in echotexture. There is no focal\nliver mass. Main portal vein and its major branches are patent wi

## Summarisation using the BioGPT Model

--- 

The BioGPT model was proposed in BioGPT: generative pre-trained transformer for biomedical text generation and mining by Renqian Luo, Liai Sun, Yingce 
Xia, Tao Qin, Sheng Zhang, Hoifung Poon and Tie-Yan Liu. BioGPT is a domain-specific generative pre-trained Transformer language model for biomedical 
text generation and mining. BioGPT follows the Transformer language model backbone, and is pre-trained on 15M PubMed abstracts from scratch.

---

## Confiuring BioGPT

--

This is the configuration class to store the configuration of a BioGptModel. It is used to instantiate an BioGPT model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the BioGPT microsoft/biogpt architecture.

Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.

Example:

Copied
```py
from transformers import BioGptModel, BioGptConfig

# Initializing a BioGPT microsoft/biogpt style configuration
configuration = BioGptConfig()

# Initializing a model from the microsoft/biogpt style configuration
model = BioGptModel(configuration)

# Accessing the model configuration
configuration = model.config
```

---

## NOTE

when using a new HFT module restart kernal and do not install previous modules

In [12]:
%pip install torch torchvision
from transformers import BioGptModel, BioGptConfig

# intitalise a biogpt microsoft/biogpt style config
configuration = BioGptConfig()

# initialising a model from teh microsoft/biogpt style configuration
model = BioGptModel(configuration)

# Accessing the model configuration
configuration = model.config

Note: you may need to restart the kernel to use updated packages.


  torch.utils._pytree._register_pytree_node(
