# Testing Baseline (Multitask Learning Model)

Importing the functions needed from the `mtl_patients` module:

In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd

import sys
pathname = "../code/"
if pathname not in sys.path:
    sys.path.append("../code/")

from mtl_patients import get_summaries, run_mortality_prediction_task

2023-03-31 07:22:24.084628: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
  from tqdm.autonotebook import tqdm


Run summaries. Default (no parameters) assumes collection of data for first 24 hours and 12 hours of gap after that period to start predicting mortality.

In [2]:
pat_summ_by_cu_df, pat_summ_by_sapsiiq_df, vitals_labs_summ_df = get_summaries()

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Creating summaries
--------------------------------------------------------------------------------
    Loading data from MIMIC-Extract pipeline...
    Adding SAPS II score to static dataset...
    Adding mortality columns to static dataset...
    Merging dataframes to create X_full...
    Creating summary by careunit...
    Creating summary by SAPS II score quartile...
    Creating summary by vitals/labs...
    Done!


In [3]:
pat_summ_by_cu_df

Unnamed: 0_level_0,N,n,Class Imbalance,Age (Mean),Gender (Male)
Careunit,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CCU,4907,344,0.07,82.56,0.58
CSRU,6971,139,0.02,69.49,0.67
MICU,11403,1138,0.1,77.97,0.51
SICU,5187,409,0.079,72.65,0.52
TSICU,4245,291,0.069,67.2,0.61
Overall,32713,2321,0.071,74.61,0.57


In [4]:
pat_summ_by_sapsiiq_df

Unnamed: 0_level_0,N,n,Class Imbalance,Age (Mean),Gender (Male),SAPS II (Min),SAPS II (Mean),SAPS II (Max)
SAPS II Quartile,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,7099,62,0.009,45.69,0.61,0,16.61,22
1,10033,259,0.026,68.94,0.58,23,27.74,32
2,8127,552,0.068,86.52,0.55,33,36.72,41
3,7454,1448,0.194,96.8,0.54,42,51.42,118
Overall,32713,2321,0.071,74.61,0.57,0,32.95,118


In [5]:
vitals_labs_summ_df

Unnamed: 0_level_0,min,avg,max,std,N,pres.
variable,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
anion gap,5.0,13.62,50.0,3.84,178506,0.0832
bicarbonate,0.0,24.32,53.0,4.68,187223,0.0873
blood urea nitrogen,0.0,26.07,250.0,21.63,189120,0.0882
chloride,50.0,105.19,175.0,6.26,205674,0.0959
creatinine,0.1,1.39,46.6,1.48,189944,0.0886
diastolic blood pressure,0.0,60.95,307.0,14.08,1866709,0.8703
fraction inspired oxygen,0.21,0.53,1.0,0.19,95643,0.0446
glascow coma scale total,3.0,12.59,15.0,3.5,367332,0.1713
glucose,33.0,140.03,1591.0,56.29,502487,0.2343
heart rate,0.0,84.88,300.0,17.13,1927016,0.8985


Run the mortality prediction task using the global model. Default (no parameters) assumes collection of data for first 24 hours and 12 hours of gap after that period to start predicting mortality.

In [5]:
pd.options.display.max_rows = 9999
metrics_df = run_mortality_prediction_task()

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Preparing the data
--------------------------------------------------------------------------------
    Loading data from MIMIC-Extract pipeline...
    Adding SAPS II score to static dataset...
    Adding mortality columns to static dataset...
    Discretizing X...
        X.shape: (2200954, 33), X.subject_id.nunique(): 34472
        X_discrete.shape: (2200954, 225), X_discrete.subject_id.nunique(): 34472
    Keep only X_discrete[X_discrete.hours_in < 24]...
        New X_discrete.shape: (808539, 223), new X_discrete.subject_id.nunique(): 34472
    Padding patients with less than 24 hours of data...
    Merging dataframes to create X_full...
    Mortality per careunit...
        MICU: 1138 out of 11403
        SICU: 409 out of 5187
        CCU: 344 out of 4907
        CSRU: 139 out of 6971
        TSICU: 291 out of 4245
    Final shape of X: (32713, 24, 232)
    Number of positive samples: 2321
    Done!
+

First run.

In [4]:
metrics_df

Unnamed: 0,AUC
CCU,0.8665
CSRU,0.889536
MICU,0.828538
SICU,0.853458
TSICU,0.866083
Macro,0.860823
Micro,0.86396


Second run.

In [6]:
metrics_df

Unnamed: 0,AUC
CCU,0.8665
CSRU,0.889536
MICU,0.828538
SICU,0.853458
TSICU,0.866083
Macro,0.860823
Micro,0.86396


They are the same!

## `run_mortality_prediction()` step by step

Imports:

In [7]:
import os
import tensorflow as tf
import numpy as np
import pandas as pd
import random
from keras.callbacks import EarlyStopping
from keras.layers import Input, Dense, LSTM, RepeatVector
from keras.models import Model, Sequential
from keras.optimizers import Adam
from sklearn.metrics import roc_auc_score, precision_score, recall_score
from sklearn.mixture import GaussianMixture
from sklearn.model_selection import train_test_split
from tqdm.autonotebook import tqdm

Arguments for `run_mortality_prediction()`:

In [11]:
model_type='multitask'
cutoff_hours=24
gap_hours=12
save_to_folder='../data/'
cohort_criteria_to_select='careunits'
seed=0
cohort_unsupervised_filename='../data/unsupervised_clusters.npy'
lstm_layer_size=16
epochs=30
learning_rate=0.0001
use_cohort_inv_freq_weights=False
bootstrap=False
num_bootstrapped_samples=100

Imports for local functions needed by `run_mortality_prediction()`:

In [9]:
from mtl_patients import set_global_determinism, prepare_data, stratified_split
from mtl_patients import create_single_task_learning_model, create_multitask_learning_model
from mtl_patients import bootstrap_predict
from mtl_patients import get_mtl_sample_weights, get_correct_task_mtl_outputs

Code in `run_mortality_prediction()` common to all models:

In [13]:
# setting the seeds to get reproducible results
# taken from https://stackoverflow.com/questions/36288235/how-to-get-stable-results-with-tensorflow-setting-random-seed
set_global_determinism(seed=seed)

# create folders to store models and results
for folder in ['results', 'models']:
    if not os.path.exists(os.path.join(save_to_folder, folder)):
        os.makedirs(os.path.join(save_to_folder, folder))

X, Y, careunits, sapsii_quartile, subject_ids = prepare_data(cutoff_hours=cutoff_hours, gap_hours=gap_hours)
Y = Y.astype(int) # Y is originally a boolean

print('+' * 80, flush=True)
print('Running the Mortality Prediction Task', flush=True)
print('-' * 80, flush=True)

# fetch right cohort criteria
if cohort_criteria_to_select == 'careunits':
    cohort_criteria = careunits
elif cohort_criteria_to_select == 'sapsii_quartile':
    cohort_criteria = sapsii_quartile
elif cohort_criteria_to_select == 'unsupervised':
    cohort_criteria = np.load(f"{cohort_unsupervised_filename}")

# Do train/validation/test split using `cohort_criteria` as the cohort classifier
print('    Splitting data into train/validation/test sets...', flush=True)
X_train, X_val, X_test, y_train, y_val, y_test, cohorts_train, cohorts_val, cohorts_test = \
    stratified_split(X, Y, cohort_criteria, train_val_random_seed=seed)

# one task by distinct cohort
tasks = np.unique(cohorts_train)

# calculate number of samples per cohort and its reciprocal
# (to be used in sample weight calculation)
print('    Calculating number of training samples in cohort...', flush=True)
task_weights = {}
for cohort in tasks:
    num_samples_in_cohort = len(np.where(cohorts_train == cohort)[0])
    print(f"        # of patients in cohort {cohort} is {str(num_samples_in_cohort)}")
    task_weights[cohort] = len(X_train) / num_samples_in_cohort

sample_weight = None
if use_cohort_inv_freq_weights:
    # calculate sample weight as the cohort's inverse frequency corresponding to each sample
    sample_weight = np.array([task_weights[cohort] for cohort in cohorts_train])

model_filename = f"{save_to_folder}models/model_{model_type}_{cutoff_hours}+{gap_hours}_{cohort_criteria_to_select}"
filename_part_bootstrap = "bootstrap-ON" if bootstrap else "bootstrap-OFF"
results_filename = f'{save_to_folder}results/model_{model_type}_{cutoff_hours}+{gap_hours}'
results_filename = results_filename + f'_{cohort_criteria_to_select}_{filename_part_bootstrap}.h5'

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Preparing the data
--------------------------------------------------------------------------------
    Loading data from MIMIC-Extract pipeline...
    Adding SAPS II score to static dataset...
    Adding mortality columns to static dataset...
    Discretizing X...
        X.shape: (2200954, 33), X.subject_id.nunique(): 34472
        X_discrete.shape: (2200954, 225), X_discrete.subject_id.nunique(): 34472
    Keep only X_discrete[X_discrete.hours_in < 24]...
        New X_discrete.shape: (808539, 223), new X_discrete.subject_id.nunique(): 34472
    Padding patients with less than 24 hours of data...
    Merging dataframes to create X_full...
    Mortality per careunit...
        MICU: 1138 out of 11403
        SICU: 409 out of 5187
        CCU: 344 out of 4907
        CSRU: 139 out of 6971
        TSICU: 291 out of 4245
    Final shape of X: (32713, 24, 232)
    Number of positive samples: 2321
    Done!
+

Code specific for multitask learning model:

In [14]:
model_filename

'../data/models/model_multitask_24+12_careunits'

In [21]:
#--------------------------
# train the multitask model

print('    ' + '~' * 76)
print(f"    Training '{model_type}' model...")

num_tasks = len(tasks)
cohort_to_index = dict(zip(tasks, range(num_tasks)))
model = create_multitask_learning_model(lstm_layer_size=lstm_layer_size, input_dims=X_train.shape[1:],
                                        output_dims=1, tasks=tasks, learning_rate=learning_rate)
print(model.summary())

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Training 'multitask' model...
Model: "multitask_learning_model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input (InputLayer)             [(None, 24, 232)]    0           []                               
                                                                                                  
 lstm (LSTM)                    (None, 16)           15936       ['input[0][0]']                  
                                                                                                  
 CCU (Dense)                    (None, 1)            17          ['lstm[0][0]']                   
                                                                                                  
 CSRU (Dense)                   (None, 1)            17    

In [30]:
early_stopping = EarlyStopping(monitor='val_loss', patience=4)

model.fit(X_train, [y_train for i in range(num_tasks)], epochs=epochs, batch_size=100,
        sample_weight=get_mtl_sample_weights(y_train, cohorts_train, tasks, sample_weight=sample_weight),
        callbacks=[early_stopping],
        validation_data=(X_val, [y_val for i in range(num_tasks)]))
model.save(model_filename)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30


Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
INFO:tensorflow:Assets written to: ../data/models/model_multitask_24+12_careunits/assets


In [60]:
print('    ' + '~' * 76)
print(f"    Predicting using '{model_type}' model...", flush=True)
y_scores = np.squeeze(model.predict(X_test))
y_pred = (y_scores > 0.5).astype("int32")

    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Predicting using 'multitask' model...


In [61]:
y_scores

array([[9.7142458e-03, 5.7850674e-02, 4.4487682e-03, ..., 3.6409721e-04,
        2.5717876e-04, 9.8056757e-05],
       [7.3966999e-03, 4.1378792e-02, 2.0379403e-03, ..., 2.3206249e-04,
        1.2437073e-03, 4.2549564e-05],
       [2.0372530e-02, 5.2386966e-02, 1.5142230e-02, ..., 6.6932658e-04,
        1.0302681e-03, 1.1604048e-03],
       [2.0169087e-02, 5.8898907e-02, 1.9637668e-03, ..., 3.5769146e-04,
        4.2110807e-03, 5.6139819e-05],
       [3.1219753e-02, 4.3450404e-02, 2.7379468e-03, ..., 9.0671389e-04,
        1.1229646e-02, 1.0131980e-02]], dtype=float32)

In [62]:
y_pred

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int32)

In [41]:
len(y_pred)

5

How many patients died in each cohort (careunit) according to the prediction?

In [40]:
[np.sum(y_pred[i]) for i in np.arange(num_tasks)]

[2, 2, 94, 3, 53]

## With no bootstrapping

In [94]:
metrics_df = pd.DataFrame(index=np.append(tasks, ['Macro', 'Micro']), dtype=float)

for task in tasks:
    #y_scores_in_cohort = y_scores[cohorts_test == task, cohort_to_index[task]]
    y_scores_in_cohort = y_scores[cohort_to_index[task], cohorts_test == task]
    y_pred_in_cohort = y_pred[cohort_to_index[task], cohorts_test == task]
    y_true_in_cohort = y_test[cohorts_test == task]
    auc = roc_auc_score(y_true_in_cohort, y_scores_in_cohort)
    ppv = precision_score(y_true_in_cohort, y_pred_in_cohort, zero_division=0)
    specificity = recall_score(y_true_in_cohort, y_pred_in_cohort, pos_label=0)
    metrics_df.loc[task, 'AUC'] = auc
    metrics_df.loc[task, 'PPV'] = ppv
    metrics_df.loc[task, 'Specificity'] = specificity

# calculate macro AUC
metrics_df.loc['Macro', :] = metrics_df.loc[(metrics_df.index != 'Macro') & (metrics_df.index != 'Micro')].mean()

# calculate micro AUC
metrics_df.loc['Micro', 'AUC'] = roc_auc_score(y_test, y_scores[[cohort_to_index[c] for c in cohorts_test], np.arange(len(y_test))])
metrics_df.loc['Micro', 'PPV'] = precision_score(y_test, y_pred[[cohort_to_index[c] for c in cohorts_test], np.arange(len(y_test))])
metrics_df.loc['Micro', 'Specificity'] = recall_score(y_test, y_pred[[cohort_to_index[c] for c in cohorts_test], np.arange(len(y_test))], pos_label=0)

In [95]:
metrics_df

Unnamed: 0,AUC,PPV,Specificity
CCU,0.876631,0.0,1.0
CSRU,0.897097,0.0,1.0
MICU,0.828308,0.714286,0.992161
SICU,0.843071,1.0,1.0
TSICU,0.871614,0.714286,0.997472
Macro,0.863344,0.485714,0.997926
Micro,0.867372,0.71875,0.997039


## With bootstrapping

In [96]:
# get `num_bootstrapped_samples` and calculate AUC, PPV, and specificity

lst_of_tasks = list(tasks)
lst_of_tasks.append('Micro')

idx = pd.MultiIndex.from_product([lst_of_tasks, list(np.arange(1, 101).astype(str))], names=['Cohort', 'Sample'])
metrics_df = pd.DataFrame(index=idx, columns=['AUC', 'PPV', 'Specificity'], dtype=float)

In [97]:
metrics_df

Unnamed: 0_level_0,Unnamed: 1_level_0,AUC,PPV,Specificity
Cohort,Sample,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CCU,1,,,
CCU,2,,,
CCU,3,,,
CCU,4,,,
CCU,5,,,
...,...,...,...,...
Micro,96,,,
Micro,97,,,
Micro,98,,,
Micro,99,,,


In [98]:
for task in tasks:
    all_auc, all_ppv, all_specificity = bootstrap_predict(X_test, y_test, cohorts_test, task, model,
                                                          tasks=tasks, num_bootstrap_samples=num_bootstrapped_samples)
    metrics_df.loc[task, 'AUC'] = all_auc
    metrics_df.loc[task, 'PPV'] = all_ppv
    metrics_df.loc[task, 'Specificity'] = all_specificity