# Reproducibility Project: Readmission Prediction via Deep Contextual Embedding of Clinical Concepts

## Reproducibility summary

This project reproduces the CONTENT model, a deep learning model predicting hospital readmissions using interpretable patient representations from Electronic Health Records (EHR). This project independently implements the model using PyTorch due to missing source code and outdated Python libraries in the original paper. It compares the performance of the implemented CONTENT model with the RNN with Gated Recurrent Unit (GRU) model and the results from the original paper.

The CONTENT model, which combines topic modeling and Recurrent Neural Network (RNN), outperformed the GRU model as claimed in the original paper. Our implementation on CONTENT has also shown better performance than the RNN with GRU unit model consistently. The LSTM CONTENT model had lower accuracy and longer training time compared to the GRU model, indicating that GRU is better suited for the dataset. Our CONTENT implementation also has an accuracy of 11.82% higher, ROC-AUC score 16.1481% higher, and PR-AUC score 8.1496% higher than the original paper.

Implementation discrepancies could be due to the differences in hyperparameter and the differences in the implementation of missing functions and files. The incomplete source code complicates identifying the source of the performance discrepancy between the our own PyTorch implementation and the original Lasagne-based model.

In [1]:
import pandas as pd
import os
import time
import DataProcessing as dp
from Hyperpara_config import Config
from PatientDataLoader import Data_Loader
import PyTorch_CONTENT

RAW_DATA_PATH = "./S1_File.txt"
RAW_DATA_SORTED_PATH = "./raw_data_sorted.txt"

In [2]:
def get_data(isSorted = False):
    # Read in the data from the input file
    data = pd.read_csv(RAW_DATA_PATH, sep="\t", header=0)
    data.to_csv(RAW_DATA_SORTED_PATH, sep="\t", index=False)
    return data

data = get_data()

## Visualization of data statistics

In [3]:
print("First 5 rows of row data:")
print(data.head())
print()

columns = list(data.columns)
print("Column names:", ", ".join(columns))
print(f"Number of rows(events): {data.shape[0]}")
print(f"Number of columnss: {data.shape[1]}")
print(f"Number of patients: {data['PID'].nunique()}")
print(f"Number of different procedure description: {data['DX_GROUP_DESCRIPTION'].nunique()}")
inpatient_events = data[data['SERVICE_LOCATION'] == "INPATIENT HOSPITAL"]
num_admission = len(inpatient_events.groupby(['PID', 'DAY_ID']))
print(f"Number of admissions: {num_admission}")
print(f"Average number of events per patient: {data.shape[0]/data['PID'].nunique()}")
# num_combinations = len(data.groupby(['PID', 'DAY_ID', 'SERVICE_LOCATION']))
# print("Number of unique combinations:", num_combinations)

First 5 rows of row data:
   PID  DAY_ID                               DX_GROUP_DESCRIPTION  \
0    1   73888                                    ANGINA PECTORIS   
1    1   73888  MONONEURITIS OF UPPER LIMB AND MONONEURITIS MU...   
2    1   73888  SYMPTOMS INVOLVING RESPIRATORY SYSTEM AND OTHE...   
3    1   73880                                 ACUTE APPENDICITIS   
4    1   73880                                  DIABETES MELLITUS   

     SERVICE_LOCATION  OP_DATE  
0      DOCTORS OFFICE    74084  
1      DOCTORS OFFICE    74084  
2      DOCTORS OFFICE    74084  
3  INPATIENT HOSPITAL    74084  
4  INPATIENT HOSPITAL    74084  

Column names: PID, DAY_ID, DX_GROUP_DESCRIPTION, SERVICE_LOCATION, OP_DATE
Number of rows(events): 685482
Number of columnss: 5
Number of patients: 3000
Number of different procedure description: 1412
Number of admissions: 30742
Average number of events per patient: 228.494


## Methodology
In this reproducibility study, we employed the same approach as the original paper to assess the effectiveness of the CONTENT model. However, we independently implemented the CONTENT model using the model's description in the paper due to the missing source code and outdated Python libraries. Next, we compared the performance of our implementation with the reported results in the original paper. Finally, we compared the performance of the CONTENT model with that of the GRU model to determine whether the CONTENT model outperforms the RNN with GRU model, as claimed in the original paper.

The following code block will process the data, train and test the CONTENT model that we implemented as an example. The whole experiments ran each models 10 times to obtain the results. We are running the CONTENT model once just to demonstrate.

In [4]:
start_time = time.time()

def DataProcessing(isSorted = False):
    print("------------Loading and processing data------------")
    start_time = time.time()
    print("Loading and sorting raw data......")
    data = dp.get_data(isSorted) # load the raw data
    print("Creating Stop and Vocab files......")
    dp.stop_vocab_generation(data) # Writes vocab.txt and stop.txt, using data. vocab.txt contains all the description that has appearance more than x
    print("Mapping vocab to index......")
    word_index_dict = dp.load_vocab_index_dict() # Save a vocab.pkl file, it's a dict
    print("Extracting in patient events......")
    eventsDF = dp.extract_inpatient_events() # Extracts all the inpatient events, it's a dataframe
    print("Processing data to sequence and labels......")
    seq, labels = dp.seq_label_gen(word_index_dict, eventsDF) # create the sequence and labels for training
    print("Splitting data into training, testing and validation sets......")
    dp.splits(seq, labels) # splits the training, validation and testing sets
    print("Total data processing time: {:.3f}s".format(time.time() - start_time))
    print("------------Complete loading and processing data------------")

DataProcessing()
FLAGS = Config()
data_set = Data_Loader(FLAGS)
iterator = data_set.iterator()
isRNN = False
isCONTENT = True
isLSTM = False
PyTorch_CONTENT.run(data_set, FLAGS, isCONTENT, isRNN, isLSTM)

------------Loading and processing data------------
Loading and sorting raw data......
Creating Stop and Vocab files......
Mapping vocab to index......
Extracting in patient events......
Processing data to sequence and labels......
Splitting data into training, testing and validation sets......
Total data processing time: 169.391s
------------Complete loading and processing data------------
------CONTENT model------
Training...


  yield [torch.tensor(input[excerpt]) for input in inputs]


Epoch 1 		 Training Loss: 84.53390168952942
Epoch 1 		 Validation Loss: 84.16148558934529
Validation Loss Decreased(inf--->50496.891354) 	 Saving The Model
Epoch 1 of 6 took 452.805s
Epoch 2 		 Training Loss: 82.91657370185852
Epoch 2 		 Validation Loss: 82.98400868733724
Validation Loss Decreased(50496.891354--->49790.405212) 	 Saving The Model
Epoch 2 of 6 took 465.414s
Epoch 3 		 Training Loss: 82.47320268440247
Epoch 3 		 Validation Loss: 83.29619951883952
Epoch 3 of 6 took 464.233s
Epoch 4 		 Training Loss: 82.01510676765442
Epoch 4 		 Validation Loss: 82.8860942586263
Validation Loss Decreased(49790.405212--->49731.656555) 	 Saving The Model
Epoch 4 of 6 took 455.043s
Epoch 5 		 Training Loss: 81.7331255569458
Epoch 5 		 Validation Loss: 83.05654661814371
Epoch 5 of 6 took 451.772s
Epoch 6 		 Training Loss: 81.27637366485595
Epoch 6 		 Validation Loss: 83.02824993769327
Epoch 6 of 6 took 497.585s
Total time to train: 2787.6416029930115
Testing...
Test roc_auc:		0.801897
Test pr_a

Results of 10 runs of each models:

| Model | ROC-AUC | PR-AUC | ACC |
|-------|---------|--------|-----|
| CONTENT (PyTorch, own implementation) | 0.7998±0.0014 | 0.6501±0.0018 | 0.8352±0.0001 |
| CONTENT (reported in paper) | 0.6886±0.0074 | 0.6011±0.0191 | 0.7170±0.0069 |
| GRU (reported in paper) | 0.6881±0.0048 | 0.5929±0.0100 | 0.7141±0.0040 |
| GRU (own implementation) | 0.7937±0.0003 | 0.6445±0.0012 | 0.8318±0.0017 |
| CONTENT w/ LSTM (own implementation) | 0.7937±0.0024 | 0.6440±0.0003 | 0.8320±0.0010 |

## References
Cao Xiao, Tengfei Ma, Adji B. Dieng, David M. Blei, and Fei Wang. 2018. [Readmission prediction via deep contextual embedding of clinical concepts](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0195024). PLOS ONE, 13:e0195024.

Cao Xiao, Tengfei Ma, Adji B. Dieng, David M. Blei, and Fei Wang. (2018). CONTENT (Version 1.0.0) [Computer software].https://doi.org/10.1371/journal.pone.0195024