<a href="https://colab.research.google.com/github/sajeshkayyath/tron_hackathon_2023/blob/main/tron_hackathon_2023_re_trained.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---

# **Training the t5 small model to understand the medical domain**

---

In [1]:
# !pip install transformers datasets evaluate rouge_score
!pip install datasets transformers==4.28.0
!pip install evaluate rouge_score



In [2]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [3]:
from datasets import load_dataset

In [4]:
# split='train'
split = 'train[0:50000]'
billsum = load_dataset("medalpaca/medical_meadow_cord19", split=split)
# billsum = billsum.train_test_split(test_size=0.2)
billsum = billsum.train_test_split(test_size=0.2)
billsum["train"][0]

{'output': 'Proposal of a recursive compartment model of epidemics and applications to the Covid-19 pandemic',
 'instruction': 'Please summerize the given abstract to a title',
 'input': 'This is work in progress. We make it accessible hoping that people might find the idea useful. We propose a discrete, recursive 5-compartment model for the spread of epidemics, which we call {\\em SEPIR-model}. Under mild assumptions which typically are fulfilled for the Covid-19 pandemic it can be used to reproduce the development of an epidemic from a small number of parameters closely related to the data. We demonstrate this at the development in Germany and Switzerland. It also allows model predictions assuming nearly constant reproduction numbers. Thus it might be a useful tool for shedding light on which interventions might be most effective in the future. In future work we will discuss other aspects of the model and more countries.'}

In [5]:
from transformers import AutoTokenizer

In [6]:
checkpoint = "t5-small"
# checkpoint = "t5-base"

In [7]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [8]:
prefix = "summarize: "

In [9]:
def preprocess_function(examples):
    inputs = [prefix + doc for doc in examples["input"]]
    model_inputs = tokenizer(inputs, max_length=1024, truncation=True)

    labels = tokenizer(text_target=examples["output"], max_length=128, truncation=True)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

In [10]:
tokenized_billsum = billsum.map(preprocess_function, batched=True)

Map:   0%|          | 0/40000 [00:00<?, ? examples/s]

Map:   0%|          | 0/10000 [00:00<?, ? examples/s]

In [11]:
from transformers import DataCollatorForSeq2Seq

In [12]:
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint)

In [13]:
import evaluate

In [14]:
rouge = evaluate.load("rouge")

In [15]:
import numpy as np

In [16]:
def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    result = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)

    prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions]
    result["gen_len"] = np.mean(prediction_lens)

    return {k: round(v, 4) for k, v in result.items()}

In [17]:
from transformers import AutoModelForSeq2SeqLM, Seq2SeqTrainingArguments, Seq2SeqTrainer

In [18]:
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)

In [37]:
# !pip install accelerate -U
# !pip install transformers[torch]

In [19]:
training_args = Seq2SeqTrainingArguments(
    output_dir="medical_diagnostic_summarizer",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=4,
    predict_with_generate=True,
    fp16=True,
    push_to_hub=True,
)

In [None]:
# !pip list

In [20]:
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_billsum["train"],
    eval_dataset=tokenized_billsum["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

/content/medical_diagnostic_summarizer is already a clone of https://huggingface.co/Sajesh/medical_diagnostic_summarizer. Make sure you pull the latest changes with `repo.git_pull()`.


In [21]:
trainer.train()

You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss,Rouge1,Rouge2,Rougel,Rougelsum,Gen Len
1,2.4288,2.194402,0.3895,0.1972,0.3304,0.3303,17.8459
2,2.3376,2.138129,0.3948,0.2012,0.3347,0.3347,17.8277
3,2.2978,2.115509,0.3972,0.2027,0.3365,0.3366,17.8694
4,2.3072,2.109933,0.398,0.2035,0.3373,0.3373,17.8606


TrainOutput(global_step=10000, training_loss=2.3860719848632814, metrics={'train_runtime': 7222.7258, 'train_samples_per_second': 22.152, 'train_steps_per_second': 1.385, 'total_flos': 2.851118002785485e+16, 'train_loss': 2.3860719848632814, 'epoch': 4.0})

In [22]:
trainer.push_to_hub()

To https://huggingface.co/Sajesh/medical_diagnostic_summarizer
   b1308dd..9142787  main -> main

   b1308dd..9142787  main -> main

To https://huggingface.co/Sajesh/medical_diagnostic_summarizer
   9142787..c19633e  main -> main

   9142787..c19633e  main -> main



'https://huggingface.co/Sajesh/medical_diagnostic_summarizer/commit/91427875e7552202b64ed3cee5a0538abab31099'

In [None]:
# text = "summarize: The Inflation Reduction Act lowers prescription drug costs, health care costs, and energy costs. It's the most aggressive action on tackling the climate crisis in American history, which will lift up American workers and create good-paying, union jobs across the country. It'll lower the deficit and ask the ultra-wealthy and corporations to pay their fair share. And no one making under $400,000 per year will pay a penny more in taxes."

# text = """summarize:
# About acne
# Acne is a common skin condition that affects most people at some point. It causes spots, oily skin and sometimes skin that's hot or painful to touch.

# Acne most commonly develops on the:

# face – this affects almost everyone with acne
# back – this affects more than half of people with acne
# chest – this affects about 15% of people with acne
# Types of spots
# There are 6 main types of spot caused by acne:

# blackheads – small black or yellowish bumps that develop on the skin; they're not filled with dirt, but are black because the inner lining of the hair follicle produces pigmentation (colouring)
# whiteheads – have a similar appearance to blackheads, but may be firmer and won't empty when squeezed
# papules – small red bumps that may feel tender or sore
# pustules – similar to papules, but have a white tip in the centre, caused by a build-up of pus
# nodules – large hard lumps that build up beneath the surface of the skin and can be painful
# cysts – the most severe type of spot caused by acne; they're large pus-filled lumps that look similar to boils and carry the greatest risk of causing permanent scarring
# """

# text = "summarize: Brucellosis is a zoonotic infection that is usually transmitted from cattle to humans through ingestion of animal milk, direct contact with animal parts, or inhalation of aerosolized particles. In Korea, brucellosis seem to be transmitted through close contact with blood, fetus, urine, and placenta of domestic cow that has been infected by Brucella abortus, or inhalation of B. arbortus while examining or slaughtering cow. Brucella melitensis infection is rare in Korea and there have been no reported cases of B. melitensis originating from other countries until now. This report details a case of complicated brucellosis with infective spondylitis in a 48-year-old male construction worker recently returned from Iraq. Infection with B. melitensis was confirmed using 16s rRNA sequencing and omp31 gene analysis. The patient was successfully treated using a combination of rifampin, doxycycline, and streptomycin, in accordance with WHO guidelines. This is the first reported case of complicated brucellosis with infective spondylitis in Korea caused by B. melitensis originating from Iraq."

text = """summarize:
COURSE WHILE IN HOSPITAL
Relevant Complaint(s) and Concerns:
1. Upon arrival: Patient presented with five days of increased urinary frequency, urgency and dysuria as well as
48 hours of fever and rigors. He was hypotensive and tachycardic upon arrival to the emergency department.
The internal medicine service was consulted. The following issues were addressed during the hospitalization:
Summary Course in Hospital (Issues Addressed):
2. Fever and urinary symptoms: A preliminary diagnosis of pyelonephritis was established. Other causes of fever
were possible but less likely. The patient was hypotensive on initial assessment with a blood pressure of
80/40. Serum lactate was elevated at 6.1. A bolus of IV fluid was administered (1.5L) but the patient remained
hypotensive. Our colleagues from ICU were consulted. An arterial line was inserted for hemodynamic
monitoring. Hemodynamics were supported with levophed and crystalloids. Piptazo was started after blood
and urine cultures were drawn. After 12 hours serum lactate had normalized and hemodynamics had
stabilized. Blood cultures were positive for E.Coli that was sensitive to all antibiotics. The patient was stepped
down to oral ciprofloxacin to complete a total 14 day course of antibiotics.
On further review it was learned that the patient has been experiencing symptoms of prostatism for the last
year. An abdominal ultrasound performed for elevated liver enzymes and acute kidney injury confirmed a
Ensure Primary Care /
Referring Provider
information is correct
Be concise and itemize
descriptions of patient’s
initial presentation
Indicate pre-existing
conditions patient
arrived with vs.
conditions that
developed during stay
SAMPLE Page 2 of 3
Printed by: Snow, Mike on 15-OCT-2015
severely enlarged prostate. Urinary retention secondary to BPH was the likely underlying mechanism that
contributed to the development of pyelonephritis in this patient. He was started on Tamsulosin 0.4mg PO qhs
and tolerated it well with no orthostatic intolerance. Post void residuals show 150-200cc of retained urine in
the bladder. An outpatient referral to Urology has been requested by our team.
3. Elevated liver enzymes and creatinine. Both of these were thought to be related to end organ hypoperfusion
in the setting of sepsis. Values improved with the administration of IV fluid and stabilization of the patients
hemodynamics. Abdominal ultrasound with doppler flow and urine analysis ruled out other possible
etiologies. Liver enzymes remain slightly above normal values at the time of discharge. We ask that the
patients’ family physician repeat these tests in 2 weeks’ time to ensure complete resolution.
"""

In [None]:
# from transformers import pipeline

In [None]:
# import locale
# locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
# !pip install keras

In [None]:
# summarizer = pipeline("summarization", model="Sajesh/my_awesome_billsum_model")
# summarizer(text)

In [None]:
from transformers import AutoTokenizer

In [None]:
tokenizer = AutoTokenizer.from_pretrained("Sajesh/medical_diagnostic_summarizer")
inputs = tokenizer(text, return_tensors="pt").input_ids

Token indices sequence length is longer than the specified maximum sequence length for this model (634 > 512). Running this sequence through the model will result in indexing errors


In [None]:
from transformers import AutoModelForSeq2SeqLM

In [None]:
model = AutoModelForSeq2SeqLM.from_pretrained("Sajesh/medical_diagnostic_summarizer")
outputs = model.generate(inputs, max_new_tokens=100, do_sample=False)

In [None]:
tokenizer.decode(outputs[0], skip_special_tokens=True)

'a preliminary diagnosis of pyelonephritis was established. a bolus of IV fluid was administered (1.5L) but the patient remained hypotensive.'

# **Init - summarization**

In [2]:
!pip install datasets transformers==4.28.0

Collecting datasets
  Downloading datasets-2.14.5-py3-none-any.whl (519 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting transformers==4.28.0
  Downloading transformers-4.28.0-py3-none-any.whl (7.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.0/7.0 MB[0m [31m88.0 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.11.0 (from transformers==4.28.0)
  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m32.1 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.28.0)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m94.5 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.8,>=0.3.0 (from datase

In [3]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# **Sample test data**

In [46]:
# text = "summarize: The Inflation Reduction Act lowers prescription drug costs, health care costs, and energy costs. It's the most aggressive action on tackling the climate crisis in American history, which will lift up American workers and create good-paying, union jobs across the country. It'll lower the deficit and ask the ultra-wealthy and corporations to pay their fair share. And no one making under $400,000 per year will pay a penny more in taxes."

In [21]:
# text = "summarize: Brucellosis is a zoonotic infection that is usually transmitted from cattle to humans through ingestion of animal milk, direct contact with animal parts, or inhalation of aerosolized particles. In Korea, brucellosis seem to be transmitted through close contact with blood, fetus, urine, and placenta of domestic cow that has been infected by Brucella abortus, or inhalation of B. arbortus while examining or slaughtering cow. Brucella melitensis infection is rare in Korea and there have been no reported cases of B. melitensis originating from other countries until now. This report details a case of complicated brucellosis with infective spondylitis in a 48-year-old male construction worker recently returned from Iraq. Infection with B. melitensis was confirmed using 16s rRNA sequencing and omp31 gene analysis. The patient was successfully treated using a combination of rifampin, doxycycline, and streptomycin, in accordance with WHO guidelines. This is the first reported case of complicated brucellosis with infective spondylitis in Korea caused by B. melitensis originating from Iraq."

In [24]:
# text = """summarize:
# COURSE WHILE IN HOSPITAL
# Relevant Complaint(s) and Concerns:
# 1. Upon arrival: Patient presented with five days of increased urinary frequency, urgency and dysuria as well as
# 48 hours of fever and rigors. He was hypotensive and tachycardic upon arrival to the emergency department.
# The internal medicine service was consulted. The following issues were addressed during the hospitalization:
# Summary Course in Hospital (Issues Addressed):
# 2. Fever and urinary symptoms: A preliminary diagnosis of pyelonephritis was established. Other causes of fever
# were possible but less likely. The patient was hypotensive on initial assessment with a blood pressure of
# 80/40. Serum lactate was elevated at 6.1. A bolus of IV fluid was administered (1.5L) but the patient remained
# hypotensive. Our colleagues from ICU were consulted. An arterial line was inserted for hemodynamic
# monitoring. Hemodynamics were supported with levophed and crystalloids. Piptazo was started after blood
# and urine cultures were drawn. After 12 hours serum lactate had normalized and hemodynamics had
# stabilized. Blood cultures were positive for E.Coli that was sensitive to all antibiotics. The patient was stepped
# down to oral ciprofloxacin to complete a total 14 day course of antibiotics.
# On further review it was learned that the patient has been experiencing symptoms of prostatism for the last
# year. An abdominal ultrasound performed for elevated liver enzymes and acute kidney injury confirmed a
# Ensure Primary Care /
# Referring Provider
# information is correct
# Be concise and itemize
# descriptions of patient’s
# initial presentation
# Indicate pre-existing
# conditions patient
# arrived with vs.
# conditions that
# developed during stay
# SAMPLE Page 2 of 3
# Printed by: Snow, Mike on 15-OCT-2015
# severely enlarged prostate. Urinary retention secondary to BPH was the likely underlying mechanism that
# contributed to the development of pyelonephritis in this patient. He was started on Tamsulosin 0.4mg PO qhs
# and tolerated it well with no orthostatic intolerance. Post void residuals show 150-200cc of retained urine in
# the bladder. An outpatient referral to Urology has been requested by our team.
# 3. Elevated liver enzymes and creatinine. Both of these were thought to be related to end organ hypoperfusion
# in the setting of sepsis. Values improved with the administration of IV fluid and stabilization of the patients
# hemodynamics. Abdominal ultrasound with doppler flow and urine analysis ruled out other possible
# etiologies. Liver enzymes remain slightly above normal values at the time of discharge. We ask that the
# patients’ family physician repeat these tests in 2 weeks’ time to ensure complete resolution.
# """

In [27]:
# text = """summarize:
# INTRODUCTION: Infection with the new coronavirus responsible for Severe Acute Respiratory Syndrome (SARS-CoV-2) continues to spread worldwide. In Brazil, there are already more than 230 thousand dead, many of these older adults. OBJECTIVE: To present the clinical characteristics of older Brazilian adults infected by COVID-19, in the epidemiological weeks (EW) 34-52, and to verify factors responsible for the increased risk of death. METHODS: Retrospective and observational study conducted with secondary publicly available data, provided by the Brazilian Ministry of Health. 1,544 confirmed cases of registered COVID-19 infection were included between August 16 and December 26, 2020, aged 60 or older. OUTCOMES: Demographic data, comorbidity, symptoms for disease, clinical information: days of hospitalization, chest X-ray, type of RT-PCR. RESULTS: 48% of patients admitted to the ICU with evidence for SARS-CoV-2 died. Symptoms and comorbidities related to increased chance of death (OR) were immunodeficiency (188%), kidney disease (166%), neurological disease (103%), dyspnea (86%), pneumopathy (55%), O2 saturation <95% (53%), respiratory discomfort (49%), age (36%), sore throat (31%), and sex (0.5%). There was a 5% increase in the chance of death for each year of life. CONCLUSION: Heart disease and Diabetes mellitus were the most frequent comorbidities, but did not indicate an increased risk of death from SARS-CoV-2 infection. Age, sex, sore throat, dyspnea, respiratory discomfort, O2 saturation <95%, neurological disease, pneumopathy, immunodeficiency, and kidney disease were significantly associated with risk of death from COVID-19.
# """

In [30]:
# text = """summarize:
# DIAGNOSIS:
# A. SKIN, RIGHT ARM, SHAVE BIOPSY:
# COMPATIBLE WITH PERFORATING DISORDER WITH FEATURES OF
# ELASTOSIS PERFORANS SERPIGINOSUM.
# B. SKIN, LEFT NECK, SHAVE BIOPSY:
# 1. COMPATIBLE WITH PERFORATING DISORDER WITH FEATURES
# OF ELASTOSIS PERFORANS SERPIGINOSUM.
# 2. ASSOCIATED SPONGIOTIC DERMATITIS WITH OCCASIONAL
# EOSINOPHILS (SEE NOTE).
# """

In [33]:
# text = """summarize:
# Laboratory diagnosis of Lyme neuroborreliosis (LNB) is challenging, and validated diagnostic algorithms are lacking. Therefore, this retrospective cross-sectional study aimed to compare the diagnostic performance of seven commercial antibody assays for LNB diagnosis. Random forest (RF) modeling was conducted to investigate whether the diagnostic performance using the antibody assays could be improved by including several routine cerebrospinal fluid (CSF) parameters (i.e., leukocyte count, total protein, blood-CSF barrier functionality, and intrathecal total antibody synthesis), two-tier serology on serum, the CSF level of the B-cell chemokine (C-X-C motif) ligand 13 (CXCL13), and a Borrelia species PCR on CSF. In total, 156 patients were included who were classified as definite LNB (n = 10), possible LNB (n = 7), or non-LNB patient (n = 139) according to the criteria of the European Federation of Neurological Societies using a consensus strategy for intrathecal Borrelia-specific antibody synthesis. The seven antibody assays showed sensitivities ranging from 47.1% to 100% and specificities ranging from 95.7% to 100%. RF modeling demonstrated that the sensitivities of most antibody assays could be improved by including other parameters to the diagnostic repertoire for diagnosing LNB (range: 94.1% to 100%), although with slightly lower specificities (range: 92.8% to 96.4%). The most important parameters for LNB diagnosis are the detection of intrathecally produced Borrelia-specific antibodies, two-tier serology on serum, CSF-CXCL13, Reibergram classification, and pleocytosis. In conclusion, this study shows that LNB diagnosis is best supported using multiparameter analysis. Furthermore, a collaborative prospective study is proposed to investigate if a standardized diagnostic algorithm can be developed for improved LNB diagnosis. IMPORTANCE The diagnosis of LNB is established by clinical symptoms, pleocytosis, and proof of intrathecal synthesis of Borrelia-specific antibodies. Laboratory diagnosis of LNB is challenging, and validated diagnostic algorithms are lacking. Therefore, this retrospective cross-sectional study aimed to compare the diagnostic performance of seven commercial antibody assays for LNB diagnosis. Multiparameter analysis was conducted to investigate whether the diagnostic performance using the antibody assays could be improved by including several routine (CSF) parameters. The results of this study show that LNB diagnosis is best supported using the detection of intrathecally produced Borrelia-specific antibodies, two-tier serology on serum, CSF-CXCL13, Reibergram classification, and pleocytosis. Furthermore, we propose a collaborative prospective study to investigate the potential role of constructing a diagnostic algorithm using multiparameter analysis for improved LNB diagnosis.
# """

In [36]:
# text = """summarize:
# Both objective and perceived social isolations were associated with future cognitive decline and increase risk of Alzheimer’s disease (AD). However, the impacts of perceived social isolation depending on different clinical stages of AD have not been elucidated. The aim of this study was to investigate the influence of perceived social isolation or loneliness on brain structure and future cognitive trajectories in patients who are living with or are at risk for AD. A total of 176 elderly patients (mean age of 78 years) who had complaint of memory problems (39 subjective cognitive decline [SCD], 53 mild cognitive impairment [MCI], 84 AD) underwent structural MRI and neuropsychological testing. Loneliness was measured by one binary item question “Do you often feel lonely?.” Voxel-based morphometry was conducted to evaluate regional gray matter volume (rGMV) difference associated with loneliness in each group. To evaluate individual differences in cognitive trajectories based on loneliness, subgroup analysis was performed in 51 patients with AD (n = 23) and pre-dementia status (SCD-MCI, n = 28) using the longitudinal scores of Alzheimer’s Disease Assessment Scale-cognitive component-Japanese version (ADAS-Jcog). Whole brain VBM analysis comparing lonely to non-lonely patients revealed loneliness was associated with decreased rGMV in bilateral thalamus in SCD patients and in the left middle occipital gyrus and the cerebellar vermal lobules I − V in MCI patients. Annual change of ADAS-Jcog in patients who reported loneliness was significantly greater comparing to these non-lonely in SCD-MCI group, but not in AD group. Our results indicate that perceived social isolation, or loneliness, might be a comorbid symptom of patients with SCD or MCI, which makes them more vulnerable to the neuropathology of future AD progression. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s11357-022-00584-6.
# """

In [39]:
# text = """summarize:
# SPECIMEN: A. RIGHT ARM SHAVE BIOPSY
# B. LEFT NECK SHAVE BIOPSY
# DIAGNOSIS:
# A. SKIN, RIGHT ARM, SHAVE BIOPSY:
# COMPATIBLE WITH PERFORATING DISORDER WITH FEATURES OF
# ELASTOSIS PERFORANS SERPIGINOSUM.
# B. SKIN, LEFT NECK, SHAVE BIOPSY:
# 1. COMPATIBLE WITH PERFORATING DISORDER WITH FEATURES
# OF ELASTOSIS PERFORANS SERPIGINOSUM.
# 2. ASSOCIATED SPONGIOTIC DERMATITIS WITH OCCASIONAL
# EOSINOPHILS (SEE NOTE).A B
# NOTE: Elastosis perforans serpiginosa presents as small papules, either grouped or in a circular or
# serpiginous arrangement, often on the face, neck, upper extremities, and trunk. The disorder has a predilection
# for males and typically occurs in the second decade, and up to a third of cases have been reported to be
# associated with an associated systemic disorder or connective tissue disorder. Some cases are associated with
# prolonged use of penicillamine. Clinical correlation is recommended.
# DERMATOPATHOLOGY REPORT
# A. Received in formalin in a container labeled with the patient's name and "R arm" is a single 0.5 x 0.4 x 0.1 cm irregular light
# grey-tan rough portion of tissue. The specimen is inked, bisected, wrapped in biopsy paper and submitted entirely in cassette A.
# B. Received in formalin in a container labeled with the patient's name and "L neck" is a single 0.5 x 0.4 x 0.1 cm irregular tan
# smooth portion of tissue. The specimen is inked, bisected, wrapped in biopsy paper and submitted entirely in cassette B.
# """

In [18]:
# text = """summarize:
# About acne
# Acne is a common skin condition that affects most people at some point. It causes spots, oily skin and sometimes skin that's hot or painful to touch.

# Acne most commonly develops on the:

# face – this affects almost everyone with acne
# back – this affects more than half of people with acne
# chest – this affects about 15% of people with acne
# Types of spots
# There are 6 main types of spot caused by acne:

# blackheads – small black or yellowish bumps that develop on the skin; they're not filled with dirt, but are black because the inner lining of the hair follicle produces pigmentation (colouring)
# whiteheads – have a similar appearance to blackheads, but may be firmer and won't empty when squeezed
# papules – small red bumps that may feel tender or sore
# pustules – similar to papules, but have a white tip in the centre, caused by a build-up of pus
# nodules – large hard lumps that build up beneath the surface of the skin and can be painful
# cysts – the most severe type of spot caused by acne; they're large pus-filled lumps that look similar to boils and carry the greatest risk of causing permanent scarring
# """

In [42]:
text = """summarize:
COURSE WHILE IN HOSPITAL
Relevant Complaint(s) and Concerns:
1. Upon arrival: Patient presented with five days of increased urinary frequency, urgency and dysuria as well as
48 hours of fever and rigors. He was hypotensive and tachycardic upon arrival to the emergency department.
The internal medicine service was consulted. The following issues were addressed during the hospitalization:
Summary Course in Hospital (Issues Addressed):
2. Fever and urinary symptoms: A preliminary diagnosis of pyelonephritis was established. Other causes of fever
were possible but less likely. The patient was hypotensive on initial assessment with a blood pressure of
80/40. Serum lactate was elevated at 6.1. A bolus of IV fluid was administered (1.5L) but the patient remained
hypotensive. Our colleagues from ICU were consulted. An arterial line was inserted for hemodynamic
monitoring. Hemodynamics were supported with levophed and crystalloids. Piptazo was started after blood
and urine cultures were drawn. After 12 hours serum lactate had normalized and hemodynamics had
stabilized. Blood cultures were positive for E.Coli that was sensitive to all antibiotics. The patient was stepped
down to oral ciprofloxacin to complete a total 14 day course of antibiotics.
On further review it was learned that the patient has been experiencing symptoms of prostatism for the last
year. An abdominal ultrasound performed for elevated liver enzymes and acute kidney injury confirmed a
"""

# **Run with T5**

In [43]:
model_name="t5-small"

# T5 default
tokenizert5 = AutoTokenizer.from_pretrained(model_name)

inputst5 = tokenizert5(text, return_tensors="pt").input_ids

modelt5 = AutoModelForSeq2SeqLM.from_pretrained(model_name)

outputst5 = modelt5.generate(inputst5, max_new_tokens=100, do_sample=False)

tokenizert5.decode(outputst5[0], skip_special_tokens=True)

'patient presented with five days of increased urinary frequency, urgency and dysuria. he was hypotensive and tachycardic upon arrival to the emergency department.'

# **Run with fine tuned**

In [44]:
model_name_custom="Sajesh/medical_diagnostic_summarizer"

# My custom summarizer
tokenizer = AutoTokenizer.from_pretrained(model_name_custom)

inputs = tokenizer(text, return_tensors="pt").input_ids

model = AutoModelForSeq2SeqLM.from_pretrained(model_name_custom)

outputs = model.generate(inputs, max_new_tokens=100, do_sample=False)

tokenizer.decode(outputs[0], skip_special_tokens=True)

'Patient with a pyelonephritis diagnosis: A summary course in hospital'