# Testing Baseline with Bootstrapping (Global Model)

Importing the functions needed from the `mtl_patients` module:

In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd

import sys
pathname = "../code/"
if pathname not in sys.path:
    sys.path.append("../code/")

from mtl_patients import run_mortality_prediction_task

2023-03-29 17:52:29.255112: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


## `run_mortality_prediction_task()`

Let's run the mortality prediction task for the *global* model without bootstrapping:

In [138]:
metrics_df_with_no_bootstrapping = run_mortality_prediction_task()

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Preparing the data
--------------------------------------------------------------------------------
    Loading data from MIMIC-Extract pipeline...
    Adding SAPS II score to static dataset...
    Adding mortality columns to static dataset...
    Discretizing X...
        X.shape: (2200954, 33), X.subject_id.nunique(): 34472
        X_discrete.shape: (2200954, 225), X_discrete.subject_id.nunique(): 34472
    Keep only X_discrete[X_discrete.hours_in < 24]...
        New X_discrete.shape: (808539, 223), new X_discrete.subject_id.nunique(): 34472
    Padding patients with less than 24 hours of data...
    Merging dataframes to create X_full...
    Mortality per careunit...
        MICU: 1138 out of 11403
        SICU: 409 out of 5187
        CCU: 344 out of 4907
        CSRU: 139 out of 6971
        TSICU: 291 out of 4245
    Final shape of X: (32713, 24, 232)
    Number of positive samples: 2321
    Done!
+

In [139]:
metrics_df_with_no_bootstrapping

Unnamed: 0,AUC,PPV,Specificity
CCU,0.8665,0.666667,0.993457
CSRU,0.889536,0.333333,0.997141
MICU,0.828538,0.661765,0.988731
SICU,0.853458,0.619048,0.991407
TSICU,0.866083,0.583333,0.993679
Macro,0.860823,0.572829,0.992883
Micro,0.86396,0.632,0.992433


In [140]:
metrics_df_with_no_bootstrapping.dtypes

AUC            float64
PPV            float64
Specificity    float64
dtype: object

Let's run the mortality prediction task for the *global* model with bootstrapping:

In [192]:
metrics_df_with_bootstrapping = run_mortality_prediction_task(bootstrap=True)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Preparing the data
--------------------------------------------------------------------------------
    Loading data from MIMIC-Extract pipeline...
    Adding SAPS II score to static dataset...
    Adding mortality columns to static dataset...
    Discretizing X...
        X.shape: (2200954, 33), X.subject_id.nunique(): 34472
        X_discrete.shape: (2200954, 225), X_discrete.subject_id.nunique(): 34472
    Keep only X_discrete[X_discrete.hours_in < 24]...
        New X_discrete.shape: (808539, 223), new X_discrete.subject_id.nunique(): 34472
    Padding patients with less than 24 hours of data...
    Merging dataframes to create X_full...
    Mortality per careunit...
        MICU: 1138 out of 11403
        SICU: 409 out of 5187
        CCU: 344 out of 4907
        CSRU: 139 out of 6971
        TSICU: 291 out of 4245
    Final shape of X: (32713, 24, 232)
    Number of positive samples: 2321
    Done!
+

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:12<00:00,  8.08it/s]


    Bootstrap prediction for task "CSRU"...


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:14<00:00,  6.86it/s]


    Bootstrap prediction for task "MICU"...


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:19<00:00,  5.02it/s]


    Bootstrap prediction for task "SICU"...


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:12<00:00,  8.24it/s]


    Bootstrap prediction for task "TSICU"...


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:11<00:00,  8.78it/s]


    Bootstrap prediction for task "all"...


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:44<00:00,  2.26it/s]

    Done!





In [193]:
metrics_df_with_bootstrapping

Unnamed: 0_level_0,Unnamed: 1_level_0,AUC,PPV,Specificity
Cohort,Sample,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CCU,1,0.857437,0.588235,0.992481
CCU,2,0.892022,0.375000,0.989669
CCU,3,0.869949,0.750000,0.994463
CCU,4,0.888846,0.578947,0.991605
CCU,5,0.835649,0.615385,0.994624
...,...,...,...,...
Micro,97,0.873432,0.682540,0.993420
Micro,98,0.871268,0.685484,0.993584
Micro,99,0.849979,0.675439,0.993913
Micro,100,0.867830,0.642857,0.992597


## `run_mortality_prediction_task()` step by step

### Imports needed

In [15]:
import os
import tensorflow as tf
import numpy as np
import pandas as pd
import random
from keras.callbacks import EarlyStopping
from keras.layers import Input, Dense, LSTM, RepeatVector
from keras.models import Model, Sequential
from keras.optimizers import Adam
from sklearn.metrics import roc_auc_score, precision_score, recall_score
from sklearn.mixture import GaussianMixture
from sklearn.model_selection import train_test_split

In [4]:
from mtl_patients import set_global_determinism, prepare_data, stratified_split, create_single_task_learning_model

### Arguments

In [8]:
model_type='global'
cutoff_hours=24
gap_hours=12
save_to_folder='../data/'
cohort_criteria_to_select='careunits'
seed=0
cohort_unsupervised_filename='../data/unsupervised_clusters.npy'
lstm_layer_size=16
epochs=30
learning_rate=0.0001
use_cohort_inv_freq_weights=False
bootstrap=False
num_bootstrapped_samples=100
SEED=0

### Code common to all models

In [100]:
# setting the seeds to get reproducible results
# taken from https://stackoverflow.com/questions/36288235/how-to-get-stable-results-with-tensorflow-setting-random-seed
set_global_determinism(seed=seed)

# create folders to store models and results
for folder in ['results', 'models']:
    if not os.path.exists(os.path.join(save_to_folder, folder)):
        os.makedirs(os.path.join(save_to_folder, folder))

X, Y, careunits, sapsii_quartile, subject_ids = prepare_data(cutoff_hours=cutoff_hours, gap_hours=gap_hours)
Y = Y.astype(int) # Y is originally a boolean

print('+' * 80, flush=True)
print('Running the Mortality Prediction Task', flush=True)
print('-' * 80, flush=True)

# fetch right cohort criteria
if cohort_criteria_to_select == 'careunits':
    cohort_criteria = careunits
elif cohort_criteria_to_select == 'sapsii_quartile':
    cohort_criteria = sapsii_quartile
elif cohort_criteria_to_select == 'unsupervised':
    cohort_criteria = np.load(f"{cohort_unsupervised_filename}")

# Do train/validation/test split using `cohort_criteria` as the cohort classifier
print('    Splitting data into train/validation/test sets...', flush=True)
X_train, X_val, X_test, y_train, y_val, y_test, cohorts_train, cohorts_val, cohorts_test = \
    stratified_split(X, Y, cohort_criteria, train_val_random_seed=seed)

# one task by distinct cohort
tasks = np.unique(cohorts_train)

# calculate number of samples per cohort and its reciprocal
# (to be used in sample weight calculation)
print('    Calculating number of training samples in cohort...', flush=True)
task_weights = {}
for cohort in tasks:
    num_samples_in_cohort = len(np.where(cohorts_train == cohort)[0])
    print(f"        # of patients in cohort {cohort} is {str(num_samples_in_cohort)}")
    task_weights[cohort] = len(X_train) / num_samples_in_cohort

sample_weight = None
if use_cohort_inv_freq_weights:
    # calculate sample weight as the cohort's inverse frequency corresponding to each sample
    sample_weight = np.array([task_weights[cohort] for cohort in cohorts_train])

model_filename = f"{save_to_folder}models/model_{cutoff_hours}+{gap_hours}_{cohort_criteria_to_select}"
filename_part_bootstrap = "bootstrap-ON" if bootstrap else "bootstrap-OFF"
results_filename = f'{save_to_folder}results/model_{cutoff_hours}+{gap_hours}_{cohort_criteria_to_select}_{filename_part_bootstrap}.h5'

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Preparing the data
--------------------------------------------------------------------------------
    Loading data from MIMIC-Extract pipeline...
    Adding SAPS II score to static dataset...
    Adding mortality columns to static dataset...
    Discretizing X...
        X.shape: (2200954, 33), X.subject_id.nunique(): 34472
        X_discrete.shape: (2200954, 225), X_discrete.subject_id.nunique(): 34472
    Keep only X_discrete[X_discrete.hours_in < 24]...
        New X_discrete.shape: (808539, 223), new X_discrete.subject_id.nunique(): 34472
    Padding patients with less than 24 hours of data...
    Merging dataframes to create X_full...
    Mortality per careunit...
        MICU: 1138 out of 11403
        SICU: 409 out of 5187
        CCU: 344 out of 4907
        CSRU: 139 out of 6971
        TSICU: 291 out of 4245
    Final shape of X: (32713, 24, 232)
    Number of positive samples: 2321
    Done!
+

### Global model common code

In [101]:
#-----------------------
# train the global model

print('    ' + '~' * 76)
print(f"    Training '{model_type}' model...")

model = create_single_task_learning_model(lstm_layer_size=lstm_layer_size, input_dims=X_train.shape[1:],
                                          output_dims=1, learning_rate=learning_rate)
print(model.summary())

early_stopping = EarlyStopping(monitor='val_loss', patience=4)

model.fit(X_train, y_train, epochs=epochs, batch_size=100, sample_weight=sample_weight,
          callbacks=[early_stopping], validation_data=(X_val, y_val))
model.save(model_filename)

print('    ' + '~' * 76)
print(f"    Predicting using '{model_type}' model...", flush=True)
y_scores = np.squeeze(model.predict(X_test))
y_pred = (y_scores > 0.5).astype("int32")

# calculate AUC, PPV, and Specificity for every cohort
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8156826/
# https://stackoverflow.com/questions/56253863/precision-recall-and-confusion-matrix-problems-in-sklearn
# https://stackoverflow.com/questions/33275461/specificity-in-scikit-learn
# PPV (Predictive Positive Value) is same as precision
# Specificity is same as recall of the negative class... using that trick to get it in sklearn

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Training 'global' model...
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_3 (LSTM)               (None, 16)                15936     
                                                                 
 dense_3 (Dense)             (None, 1)                 17        
                                                                 
Total params: 15,953
Trainable params: 15,953
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
INFO:tensorflow:Assets wri

In [115]:
# get `num_bootstrapped_samples` and calculate AUC, PPV, and specificity

lst_of_tasks = list(tasks)
lst_of_tasks.append('Micro')

idx = pd.MultiIndex.from_product([lst_of_tasks, list(np.arange(1, 101))], names=['Cohort', 'Sample'])
metrics_df = pd.DataFrame(index=idx, columns=['AUC', 'PPV', 'Specificity'])

for task in tasks:
    all_auc, all_ppv, all_specificity = bootstrap_predict(X_test, y_test, cohorts_test, task, model,
                                num_bootstrap_samples=num_bootstrapped_samples)
    metrics_df.loc[task, 'AUC'] = all_auc
    metrics_df.loc[task, 'PPV'] = all_ppv
    metrics_df.loc[task, 'Specificity'] = all_specificity

# calculate macro AUC
metrics_df.loc['Macro', :] = metrics_df.query("Cohort != 'Micro'").mean()

# calculate micro AUC
all_auc, all_ppv, all_specificity = bootstrap_predict(X_test, y_test, cohorts_test, 'all', model,
                            num_bootstrap_samples=num_bootstrapped_samples)
metrics_df.loc['Micro', 'AUC'] = all_auc
metrics_df.loc['Micro', 'PPV'] = all_ppv
metrics_df.loc['Micro', 'Specificity'] = all_specificity

    Bootstrap prediction for task "CCU"...


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:12<00:00,  8.27it/s]


    Bootstrap prediction for task "CSRU"...


  _warn_prf(average, modifier, msg_start, len(result))
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:14<00:00,  6.68it/s]


    Bootstrap prediction for task "MICU"...


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:20<00:00,  4.95it/s]


    Bootstrap prediction for task "SICU"...


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:12<00:00,  8.25it/s]


    Bootstrap prediction for task "TSICU"...


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:11<00:00,  8.81it/s]


ValueError: Must have equal len keys and value when setting with an iterable