
# QUB-Cirdan at "Discharge Me!": Zero-Shot Discharge Letter Generation by Open-Source LLM

More information about the challenge can be found on the [Discharge Me! Challenge FAQ page](https://stanford-aimi.github.io/discharge-me/#faq).

---


## Overview

This repository hosts the solution to the "Discharge Me!" challenge, which focuses on generating sections of discharge summaries from the MIMIC-IV dataset. The corresponding paper is available on [ACL](https://aclanthology.org/2024.bionlp-1.58/).

The main objective is to automatically generate the **"Brief Hospital Course"** and **"Discharge Instructions"** sections.

---



## Usage

Ensure Python is installed along with the necessary dependencies, including:
- `langchain`
- `pandas`
- `ollama`

The file path used in those files should be adapted to your file path.

### Steps:
1. Run the `aggregate_discharge.py` script to augment the discharge summary table.
2. Run the `discharge_dataset.py` script to process the discharge summaries.
3. Run the `generation.py` script to generate the missing sections in the discharge summaries.

The Jupyter notebook `discharge_me_analysis.ipynb` can be used to predict the word count of the missing sections, which is not the preferred way in the final paper.


## Scripts

The paper section **"Dataset Exploration"** is implemented in the following two scripts:

### `aggregate_discharge.py`

This script augments the discharge summary table by integrating other relevant tables from the MIMIC-IV dataset, providing a fuller context and background for each patient’s stay.

#### Features:
- Reads additional tables to augment the data.
- Integrates multiple data sources to enhance the discharge summary.

---

In [2]:
# the default input folder for the mimic file is 
# /mnt/datadisk/mimic/mimic-iv-2.2/hosp
# That hosp module can be downloaded from mimic-ir dataset website
%run aggregate_discharge.py

Loading MIMIC-IV data...
Generating admission summaries...


100%|████████████████████████████████| 431231/431231 [00:08<00:00, 49275.08it/s]


Generating transfer summaries and durations...


100%|██████████████████████████████████| 431231/431231 [09:14<00:00, 777.80it/s]
100%|██████████████████████████████████| 431231/431231 [07:19<00:00, 981.43it/s]


Generating diagnosis summaries...


100%|██████████████████████████████████| 430852/430852 [07:51<00:00, 913.64it/s]


Saving results to JSON files...
Processing complete.


The default output folder is /mnt/datadisk/mimic/discharge/dataset

In [3]:
!ls -l /mnt/datadisk/mimic/discharge/dataset

total 294436
-rw-rw-r-- 1 ktpuser ktpuser 168658120 Mar 26 16:34 diagnoses_dict.json
-rw-rw-r-- 1 ktpuser ktpuser  64861400 Mar 26 16:34 patients_admissions_dict.json
-rw-rw-r-- 1 ktpuser ktpuser  54916991 Mar 26 16:34 transfer_summary_dict.json
-rw-rw-r-- 1 ktpuser ktpuser  13056095 Mar 26 16:34 transfer_total_duration_dict.json


### `discharge_dataset.py`

Processes the discharge summaries by segmenting them into specific sections, truncating excessive content, and aggregating the results into a structured table.

#### Features:
- Segments discharge summaries into manageable parts.
- Truncates sections that are overly verbose.
- Aggregates sections to form a structured output table.

---

In [5]:
# The default input folder for this is /mnt/datadisk/mimic/discharge
# The output folder is /mnt/datadisk/mimic/discharge/dataset
%run discharge_dataset.py

Loading discharge CSV files...
Loading discharge CSV files...
Loading JSON dictionaries...
Loading JSON dictionaries...
100%|████████████████████████████████████| 68785/68785 [05:55<00:00, 193.70it/s]
100%|████████████████████████████████████| 14719/14719 [01:15<00:00, 195.72it/s]
100%|████████████████████████████████████| 14702/14702 [01:15<00:00, 195.91it/s]
100%|████████████████████████████████████| 10962/10962 [00:55<00:00, 196.41it/s]
Processing completed and files saved.
Processing completed and files saved.


In [1]:
!ls -l /mnt/datadisk/mimic/discharge/dataset

total 1687664
-rw-rw-r-- 1 ktpuser ktpuser  191602309 Mar 26 17:05 df_discharge_test_phase_1_segmented.csv
-rw-rw-r-- 1 ktpuser ktpuser  147736281 Mar 26 17:05 df_discharge_test_phase_2_segmented.csv
-rw-rw-r-- 1 ktpuser ktpuser 1087314911 Mar 26 17:04 df_discharge_train_segmented.csv
-rw-rw-r-- 1 ktpuser ktpuser  168658120 Mar 26 16:34 diagnoses_dict.json
-rw-rw-r-- 1 ktpuser ktpuser       7104 Mar 26 17:03 lengths_df_statistics.xlsx
-rw-rw-r-- 1 ktpuser ktpuser   64861400 Mar 26 16:34 patients_admissions_dict.json
-rw-rw-r-- 1 ktpuser ktpuser   54916991 Mar 26 16:34 transfer_summary_dict.json
-rw-rw-r-- 1 ktpuser ktpuser   13056095 Mar 26 16:34 transfer_total_duration_dict.json


### `generation.py`

Generates missing sections in the discharge summaries based on RAG and LLama3 (using Ollama to serve the llama3 model). Two prompt templates are curated for this purpose, and RAG retrieves a target section's word count as the target output's word count.

#### Parameters:
- `start_index`: The index of the first input to consider for a generation.
- `end_index`: The total number of inputs to process for generation.
- `section_type`: Type of section to generate (`1` for a brief hospital course, `0` for discharge instructions).
- `dataset_dir`: Directory containing the segmented CSV files.
- `result_dir`: Directory to save the generated results.

---

In [6]:
# run for three patient only
%run generation.py --start_index 0 --end_index 3 --bhc 1

Start index: 0
End index: 3
Generating Brief Hospital Course


  df_discharge_train_segmented.fillna('', inplace=True)
  df_discharge_test_phase_2_segmented.fillna('', inplace=True)
Processing:   0%|                                                   | 0/3 [00:00<?, ?it/s]

Processing index 0


Processing:  33%|██████████████▎                            | 1/3 [00:17<00:35, 17.86s/it]

Data saved to file: /mnt/datadisk/mimic/discharge/result/bhc_0_3.json
Time taken to process index 0: 17.85502529144287 seconds
Processing index 1


Processing:  67%|████████████████████████████▋              | 2/3 [00:36<00:18, 18.15s/it]

Data saved to file: /mnt/datadisk/mimic/discharge/result/bhc_0_3.json
Time taken to process index 1: 18.350970029830933 seconds
Processing index 2


Processing: 100%|███████████████████████████████████████████| 3/3 [00:51<00:00, 17.18s/it]

Data saved to file: /mnt/datadisk/mimic/discharge/result/bhc_0_3.json
Time taken to process index 2: 15.334206342697144 seconds





In [13]:
# read /mnt/datadisk/mimic/discharge/result/bhc_0_3.json
import json
with open('/mnt/datadisk/mimic/discharge/result/bhc_0_3.json') as f:
    data = json.load(f)

In [14]:
data

{'0': [24962904,
  'Ms. ___ was a 76-year-old BLACK/AFRICAN AMERICAN female admitted to the emergency room at ___ with a chief complaint of shortness of breath, cough, and wheezing. She had a history of COPD on home O2, atrial fibrillation on apixaban, hypertension, CAD, and hyperlipidemia. \n# Active Issues \nMs. ___ presented with symptoms of acute exacerbation of COPD, including increased cough productive of red-tinged sputum, wheezing, and shortness of breath. Initial vital signs showed limited air movement with wheezing bilaterally. Laboratory results revealed normal white blood cell count, hemoglobin, and platelet count. Imaging studies, including a CXR, showed mild basilar atelectasis without definite focal consolidation. \n- She was initially treated with Duonebs and solumedrol 125mg IV in the emergency department. - Her oxygen flow was increased to 4L without significant improvement in symptoms. - She received tiotropium IH, theophylline, and Advair IH as prescribed at home. \

In [15]:
# run for three patient only
%run generation.py --start_index 0 --end_index 3 --bhc 0

Start index: 0
End index: 3
Generating Discharge Instructions


  df_discharge_train_segmented.fillna('', inplace=True)
  df_discharge_test_phase_2_segmented.fillna('', inplace=True)
Processing:   0%|                                                   | 0/3 [00:00<?, ?it/s]

Processing index 0


Processing:  33%|██████████████▎                            | 1/3 [00:32<01:04, 32.06s/it]

Data saved to file: /mnt/datadisk/mimic/discharge/result/di_0_3.json
Time taken to process index 0: 32.05663800239563 seconds
Processing index 1


Processing:  67%|████████████████████████████▋              | 2/3 [00:50<00:24, 24.27s/it]

Data saved to file: /mnt/datadisk/mimic/discharge/result/di_0_3.json
Time taken to process index 1: 18.814332962036133 seconds
Processing index 2


Processing: 100%|███████████████████████████████████████████| 3/3 [01:03<00:00, 21.29s/it]

Data saved to file: /mnt/datadisk/mimic/discharge/result/di_0_3.json
Time taken to process index 2: 12.984682083129883 seconds





In [17]:
# read /mnt/datadisk/mimic/discharge/result/dis_0_3.json
import json
with open('/mnt/datadisk/mimic/discharge/result/di_0_3.json') as f:
    data = json.load(f)

In [18]:
data

{'0': [24962904,
  "Dear Ms. ___, \n\nIt was a pleasure taking care of you at ___. \nYou were admitted to our hospital due to shortness of breath, cough, and wheezing for one day, which worsened over time. Your symptoms are consistent with severe COPD exacerbation, likely triggered by a decrease in your steroids. You also had an anxiety component. Our team evaluated you and recommended discharge to inpatient pulmonary rehabilitation program. \nDuring your hospital stay, we provided treatment with nebulizers, steroids, azithromycin, and oxygen therapy. Your wheezing, cough, and shortness of breath improved significantly, and you were able to return to baseline oxygen requirements. \nWhen you go home, please continue taking the medications as prescribed: ___. Take the Azithromycin for another 4 days until ___ and Prednisone for 5 days at a dose of 40mg daily, tapered by 5mg every 5 days. We also recommend finishing your current course of Duonebs q6h and Albuterol q2h prn. \nTo manage you