# Mortality Prediction


@References : Soenksen, L.R., Ma, Y., Zeng, C. et al. Integrated multimodal artificial intelligence framework for healthcare applications. npj Digit. Med. 5, 149 (2022). https://doi.org/10.1038/s41746-022-00689-4

In this notebook, the task is to predict 48 hour mortality using the CSV embeddings file



## Introduction


The goal of this part of the study is to build models to predict the probability that a patient will expire during the next 48 h as a binary classification problem: expired ≤48 h (1) or otherwise (0). In the case of a patient whose hospital exit status is not expiration, the class label is set to 0. A patient can acquire different target class labels at different time points during their stay due to changes in status and proximity to the discharge or time of death. Similar to the length-of-stay modeling, each sample in this predictive task corresponds to a single patient-admission EHR time point where an X-ray image was obtained (N = 45,050).


#### Imports

In [1]:
import os
os.chdir('../')

from pandas import read_csv

from src.data import constants
from src.data.dataset import HAIMDataset
from src.evaluation.pycaret_evaluator import PyCaretEvaluator
from src.utils.metric_scores import *

#### Read data from local source



In [2]:
df = read_csv(constants.FILE_DF, nrows=constants.N_DATA)

#### Create a custom dataset for the HAIM experiment


Build the target column for the task at hand, set the dataset specificities:  the ``haim_id`` as a ``global_id``, use all sources for prediction

In [3]:
dataset = HAIMDataset(df,  
                      constants.ALL_PREDICTORS, 
                      constants.ALL_MODALITIES, 
                      constants.MORTALITY, 
                      constants.IMG_ID, 
                      constants.GLOBAL_ID)

#### Set hyper-parameters

In [4]:
# Define the grid oh hyper-parameters for the tuning
grid_hps = {'max_depth': [5, 6, 7, 8],
            'n_estimators': [200, 300],
            'learning_rate': [0.3, 0.1, 0.05],
            }

### Model training and predictions using an XGBClassifier model with GridSearchCV and Hyperparameters optimization


The goal of this section of the notebook is to compute the following metrics:

``ACCURACY_SCORE, BALANCED_ACCURACY_SCORE, SENSITIVITY, SPECIFICITY, AUC, BRIER SCORE, BINARY CROSS-ENTROPY``


The
hyperparameter combinations of individual XGBoost models were
selected within each training loop using a ``fivefold cross-validated
grid search`` on the training set (80%). This XGBoost ``tuning process``
selected the ``maximum depth of the trees (5–8)``, the number of
``estimators (200 or 300)``, and the ``learning rate (0.05, 0.1, 0.3)``
according to the parameter value combination leading to the
highest observed AUROC within the training loop 


As mentioned previously, all XGBoost models were trained ``five times with five different data splits`` to repeat the
experiments and compute average metrics 


```Refer to page 8 of study``` : https://doi.org/10.1038/s41746-022-00689-4

In [5]:
# Initialize the PyCaret Evaluator
evaluator = PyCaretEvaluator(dataset=dataset, target="Mortality", experiment_name="CP_Mortality", filepath="./results/mortality")

# Model training
evaluator.run_experiment(
    train_size=0.8,
    fold=5,
    fold_strategy='kfold',
    outer_fold=5,
    outer_strategy='kfold',
    session_id=42,
    model='xgboost',
    optimize='AUC',
    custom_grid=grid_hps
)

2024-10-04 13:54:33,961	INFO worker.py:1777 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m


[36m(run_fold pid=631326)[0m Outer fold 1


[36m(raylet)[0m Spilled 3869 MiB, 2 objects, write throughput 589 MiB/s. Set RAY_verbose_spill_logs=0 to disable this message.
[36m(raylet)[0m Spilled 7738 MiB, 4 objects, write throughput 583 MiB/s.
[36m(raylet)[0m Spilled 11607 MiB, 5 objects, write throughput 579 MiB/s.
Processing:   0%|          | 0/4 [00:00<?, ?it/s]


[36m(run_fold pid=631326)[0m Configuring PyCaret for outer fold 1


Processing:  25%|██▌       | 1/4 [00:02<00:08,  2.84s/it]
Processing:  75%|███████▌  | 3/4 [13:23<04:57, 297.24s/it]
                                                          


[36m(run_fold pid=631326)[0m       Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
[36m(run_fold pid=631326)[0m Fold                                                          
[36m(run_fold pid=631326)[0m 0       0.9806  0.9568  0.3675  0.8971  0.5214  0.5132  0.5673
[36m(run_fold pid=631326)[0m 1       0.9846  0.9536  0.3897  0.8983  0.5436  0.5370  0.5861
[36m(run_fold pid=631326)[0m 2       0.9844  0.9631  0.4082  0.9524  0.5714  0.5648  0.6181
[36m(run_fold pid=631326)[0m 3       0.9813  0.9470  0.3772  0.9403  0.5385  0.5307  0.5892
[36m(run_fold pid=631326)[0m 4       0.9832  0.9540  0.4177  0.9296  0.5764  0.5691  0.6170
[36m(run_fold pid=631326)[0m Mean    0.9828  0.9549  0.3921  0.9235  0.5503  0.5430  0.5955
[36m(run_fold pid=631326)[0m Std     0.0016  0.0052  0.0187  0.0223  0.0207  0.0211  0.0195
[36m(run_fold pid=631326)[0m Tuning hyperparameters for model xgboost with custom grid using grid search
[36m(run_fold pid=631326)[0m Transformation P

[36m(raylet)[0m Spilled 19844 MiB, 11 objects, write throughput 498 MiB/s.
Processing:   0%|          | 0/4 [00:00<?, ?it/s]


[36m(run_fold pid=631326)[0m Configuring PyCaret for outer fold 2


Processing:  25%|██▌       | 1/4 [00:02<00:08,  2.78s/it]
Processing:  75%|███████▌  | 3/4 [13:04<04:50, 290.25s/it]
                                                          


[36m(run_fold pid=631326)[0m       Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
[36m(run_fold pid=631326)[0m Fold                                                          
[36m(run_fold pid=631326)[0m 0       0.9801  0.9662  0.3846  0.9589  0.5490  0.5407  0.6006
[36m(run_fold pid=631326)[0m 1       0.9854  0.9470  0.4046  0.8983  0.5579  0.5516  0.5975
[36m(run_fold pid=631326)[0m 2       0.9820  0.9571  0.3469  0.8644  0.4951  0.4877  0.5412
[36m(run_fold pid=631326)[0m 3       0.9825  0.9588  0.4438  0.9146  0.5976  0.5898  0.6304
[36m(run_fold pid=631326)[0m 4       0.9846  0.9611  0.4333  0.9420  0.5936  0.5868  0.6333
[36m(run_fold pid=631326)[0m Mean    0.9829  0.9580  0.4027  0.9157  0.5587  0.5513  0.6006
[36m(run_fold pid=631326)[0m Std     0.0019  0.0063  0.0348  0.0331  0.0371  0.0372  0.0332
[36m(run_fold pid=631326)[0m Tuning hyperparameters for model xgboost with custom grid using grid search
[36m(run_fold pid=631326)[0m Transformation P

Processing:   0%|          | 0/4 [00:00<?, ?it/s]


[36m(run_fold pid=631326)[0m Configuring PyCaret for outer fold 3


Processing:  25%|██▌       | 1/4 [00:02<00:08,  2.87s/it]
Processing:  75%|███████▌  | 3/4 [12:36<04:39, 279.82s/it]
                                                          


[36m(run_fold pid=631326)[0m       Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
[36m(run_fold pid=631326)[0m Fold                                                          
[36m(run_fold pid=631326)[0m 0       0.9792  0.9500  0.3062  0.8448  0.4495  0.4413  0.5014
[36m(run_fold pid=631326)[0m 1       0.9863  0.9555  0.4730  0.9859  0.6393  0.6332  0.6780
[36m(run_fold pid=631326)[0m 2       0.9844  0.9686  0.4615  0.9231  0.6154  0.6083  0.6467
[36m(run_fold pid=631326)[0m 3       0.9814  0.9443  0.4057  0.9595  0.5703  0.5624  0.6175
[36m(run_fold pid=631326)[0m 4       0.9814  0.9446  0.3113  0.9400  0.4677  0.4606  0.5352
[36m(run_fold pid=631326)[0m Mean    0.9826  0.9526  0.3915  0.9307  0.5484  0.5412  0.5958
[36m(run_fold pid=631326)[0m Std     0.0025  0.0090  0.0713  0.0477  0.0768  0.0773  0.0669
[36m(run_fold pid=631326)[0m Tuning hyperparameters for model xgboost with custom grid using grid search
[36m(run_fold pid=631326)[0m Transformation P

Processing:   0%|          | 0/4 [00:00<?, ?it/s]


[36m(run_fold pid=631326)[0m Configuring PyCaret for outer fold 4


Processing:  25%|██▌       | 1/4 [00:02<00:08,  2.90s/it]
Processing:  75%|███████▌  | 3/4 [12:37<04:40, 280.19s/it]
                                                          


[36m(run_fold pid=631326)[0m       Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
[36m(run_fold pid=631326)[0m Fold                                                          
[36m(run_fold pid=631326)[0m 0       0.9825  0.9562  0.3812  0.9683  0.5471  0.5399  0.6018
[36m(run_fold pid=631326)[0m 1       0.9849  0.9665  0.4286  0.9545  0.5915  0.5850  0.6343
[36m(run_fold pid=631326)[0m 2       0.9806  0.9432  0.3013  0.9400  0.4563  0.4491  0.5263
[36m(run_fold pid=631326)[0m 3       0.9811  0.9359  0.3943  0.9583  0.5587  0.5508  0.6083
[36m(run_fold pid=631326)[0m 4       0.9828  0.9433  0.3733  0.9180  0.5308  0.5236  0.5795
[36m(run_fold pid=631326)[0m Mean    0.9824  0.9490  0.3757  0.9478  0.5369  0.5297  0.5900
[36m(run_fold pid=631326)[0m Std     0.0015  0.0109  0.0417  0.0174  0.0449  0.0450  0.0363
[36m(run_fold pid=631326)[0m Tuning hyperparameters for model xgboost with custom grid using grid search
[36m(run_fold pid=631326)[0m Transformation P

Processing:   0%|          | 0/4 [00:00<?, ?it/s]


[36m(run_fold pid=631326)[0m Configuring PyCaret for outer fold 5


Processing:  25%|██▌       | 1/4 [00:02<00:08,  2.89s/it]
Processing:  75%|███████▌  | 3/4 [12:36<04:39, 279.90s/it]
                                                          


[36m(run_fold pid=631326)[0m       Accuracy     AUC  Recall   Prec.      F1   Kappa     MCC
[36m(run_fold pid=631326)[0m Fold                                                          
[36m(run_fold pid=631326)[0m 0       0.9811  0.9537  0.3636  0.9375  0.5240  0.5163  0.5775
[36m(run_fold pid=631326)[0m 1       0.9849  0.9607  0.4345  0.9265  0.5915  0.5849  0.6289
[36m(run_fold pid=631326)[0m 2       0.9807  0.9553  0.3765  0.9275  0.5356  0.5275  0.5843
[36m(run_fold pid=631326)[0m 3       0.9811  0.9491  0.3681  0.9091  0.5240  0.5161  0.5719
[36m(run_fold pid=631326)[0m 4       0.9837  0.9530  0.4214  0.9710  0.5877  0.5807  0.6341
[36m(run_fold pid=631326)[0m Mean    0.9823  0.9544  0.3928  0.9343  0.5526  0.5451  0.5993
[36m(run_fold pid=631326)[0m Std     0.0017  0.0038  0.0293  0.0205  0.0306  0.0311  0.0266
[36m(run_fold pid=631326)[0m Tuning hyperparameters for model xgboost with custom grid using grid search
[36m(run_fold pid=631326)[0m Transformation P

[33m(raylet)[0m [2024-10-07 08:24:39,646 E 631211 631211] (raylet) node_manager.cc:3065: 1 Workers (tasks / actors) killed due to memory pressure (OOM), 0 Workers crashed due to other reasons at node (ID: 4a76051cba771f3e77cb11a71acab1dc5b4773f9ce3f3946b8921878, IP: 10.44.86.85) over the last time period. To see more information about the Workers killed on this node, use `ray logs raylet.out -ip 10.44.86.85`
[33m(raylet)[0m 
[33m(raylet)[0m Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.


Final metrics table:
     Metric     Mean   Std Dev
0  Accuracy  0.98066  0.001385
1       AUC  0.96012  0.004346
2    Recall  0.35174  0.034586
3     Prec.  0.94114  0.021459
4        F1  0.51146  0.038680
5     Kappa  0.50372  0.038772
6       MCC  0.56860  0.032578
