Ordinal, full-context prognosis-based trajectories of traumatic brain injury patients in European ICUs

Mining the contribution of intensive care clinical course to outcome after traumatic brain injury

Overview

This repository contains the code underlying the article entitled Mining the contribution of intensive care clinical course to outcome after traumatic brain injury from the Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI) consortium. In this file, we present the abstract, to outline the motivation for the work and the findings, and then a brief description of the code with which we generate these finding and achieve this objective.

The code on this repository is commented throughout to provide a description of each step alongside the code which achieves it.

Abstract

Existing methods to characterise the evolving condition of traumatic brain injury (TBI) patients in the intensive care unit (ICU) do not capture the context necessary for individualising treatment. Here, we integrate all heterogenous data stored in medical records (1,166 pre-ICU and ICU variables) to model the individualised contribution of clinical course to six-month functional outcome on the Glasgow Outcome Scale - Extended (GOSE). On a prospective cohort (n=1,550, 65 centres) of TBI patients, we train recurrent neural network models to map a token-embedded time series representation of all variables (including missing values) to an ordinal GOSE prognosis every two hours. The full range of variables explains up to 52% (95% CI: 50%-54%) of the ordinal variance in functional outcome. Up to 91% (95% CI: 90%-91%) of this explanation is derived from pre-ICU and admission information (i.e., static variables). Information collected in the ICU (i.e., dynamic variables) increases explanation (by up to 5% [95% CI: 4%-6%]), though not enough to counter poorer overall performance in longer-stay (>5.75 days) patients. Highest-contributing variables include physician-based prognoses, CT features, and markers of neurological function. Whilst static information currently accounts for the majority of functional outcome explanation after TBI, data-driven analysis highlights investigative avenues to improve dynamic characterisation of longer-stay patients. Moreover, our modelling strategy proves useful for converting large patient records into interpretable time series with missing data integration and minimal processing.

Code

All of the code used in this work can be found in the ./scripts directory as Python (.py), R (.R), or bash (.sh) scripts. Moreover, custom classes have been saved in the ./scripts/classes sub-directory, custom functions have been saved in the ./scripts/functions sub-directory, and custom PyTorch models have been saved in the ./scripts/models sub-directory.

1. Extract study sample from CENTER-TBI dataset and define ICU stays

In this .py file, we extract the study sample from the CENTER-TBI dataset, filter patients by our study criteria, and determine ICU admission and discharge times for time window discretisation. We also perform proportional odds logistic regression analysis to determine significant effects among summary characteristics.

2. Partition CENTER-TBI for stratified, repeated k-fold cross-validation

In this .py file, we create 100 partitions, stratified by 6-month GOSE, for repeated k-fold cross-validation, and save the splits into a dataframe for subsequent scripts.

3. Tokenise all CENTER-TBI variables and place into discretised ICU stay time windows

Format CENTER-TBI variables for tokenisation
In this .py file, we extract all heterogeneous types of variables from CENTER-TBI and fix erroneous timestamps and formats.
Convert full patient records over ICU stays into tokenised time windows
In this .py file, we convert all CENTER-TBI variables into tokens depending on variable type and compile full dictionaries of tokens across the full dataset.

4. Train and evaluate full-context ordinal-trajectory-generating models

Train full-context trajectory-generating models
In this .py file, we train the trajectory-generating models across the repeated cross-validation splits and the hyperparameter configurations. This is run, with multi-array indexing, on the HPC using a bash script.
Compile generated trajectories across repeated cross-validation and different hyperparameter configurations
In this .py file, we compile the training, validation, and testing set trajectories generated by the models and creates bootstrapping resamples for validation set dropout.
Calculate validation set calibration and discrimination of generated trajectories for hyperparameter configuration dropout
In this .py file, we calculate validation set trajectory calibration and discrimination based on provided bootstrapping resample row index. This is run, with multi-array indexing, on the HPC using a bash script.
Compile validation set performance metrics and dropout under-performing hyperparameter configurations
In this .py file, we compiled the validation set performance metrics and perform bias-corrected bootstrapping dropout for cross-validation (BBCD-CV) to reduce the number of hyperparameter configurations. We also create testing set resamples for final performance calculation bootstrapping.
Calculate calibration and discrimination performance metrics of generated trajectories of the testing set with bootstrapping
In this .py file, calculate the model calibration and explanation metrics to assess model reliability and information, respectively. This is run, with multi-array indexing, on the HPC using a bash script.
Compile testing set trajectory performance metrics and calculate confidence intervals
In this .py file, we compile the performance metrics and summarise them across bootstrapping resamples to define the 95% confidence intervals for statistical inference.

5. Interpret variable effects on trajectory generation and evaluate baseline comparison model

Compile and summarise learned weights from model relevance layers
In this .py file, we extract and summarise the learned weights from the model relevance layers (trained as PyTorch Embedding layers).
Prepare environment to calculate TimeSHAP for trajectory-generating models
In this .py file, we define and identify significant transitions in individual patient trajectories, partition them for bootstrapping, and calculate summarised testing set trajectory information in preparation for TimeSHAP feature contribution calculation.
Calculate TimeSHAP values for each patient in parallel
In this .py file, we calculate variable and time-window TimeSHAP values for each individual's significant transitions. This is run, with multi-array indexing, on the HPC using a bash script.
Compile and summarise TimeSHAP values across the patient set
In this .py file, we load all the calculated TimeSHAP values and summarise them for population-level variable and time-window analysis. We also extract the model trajectories of our individual patient for exploration.
Calculate Kendall's tau values for the relationship between variable values and TimeSHAP values to assess feature robustness
In this .py file, we perform a variable robustness check by calculating the Kendall's Tau rank correlation coefficient between variable values and the TimeSHAP corresponding to each value. This is run, with multi-array indexing, on the HPC using a bash script.
Dropout variables without significant variable effect direction
In this .py file, we determine which variables have a significantly robust relationship with GOSE outcome and drop out those which do not.
Calculate testing set performance metrics of baseline ordinal prediction model for comparison
In this .py file, calculate the baseline model calibration and explanation metrics to assess model reliability and information, respectively. This model was developed in our previous work (see ordinal GOSE prediction repository) and serves as our baseline for comparison to determine the added value of information collected in the ICU over time. This is run, with multi-array indexing, on the HPC using a bash script.
Compile testing set baseline prediction model performance metrics and calculate confidence intervals for comparison
In this .py file, compile the baseline model performance metrics and summarise them across bootstrapping resamples to define the 95% confidence intervals for statistical inference.

6. Sensitivity analysis to parse effect of length of stay

Prepare for sensitivity analysis to account for differences in patient stay
In this .py file, we construct a list of model trajectories to generate from just the static variable set, compile static variable outputs, prepare bootstrapping resamples for ICU stay duration cut-off analysis, and characterise characteristics of study population remaining in the ICU over time.
Calculate testing set outputs with dynamic tokens removed in parallel
In this .py file, we generate patient trajectories solely based on static variables for baseline comparison based on provided bootstrapping resample row index. This is run, with multi-array indexing, on the HPC using a bash script.
Calculate metrics for test set performance for sensitivity analysis
In this .py file, we calculate testing set sensitivity analysis metrics based on provided bootstrapping resample row index. This is run, with multi-array indexing, on the HPC using a bash script.
Compile performance results from sensitivity analysis to calculate confidence intervals
In this .py file, we load all the calculated sensitivity testing set performance values and summarise them for statistical inference for our sensitivity analysis.

7. Visualise study results for manuscript

In this .R file, we produce the figures for the manuscript and the supplementary figures. The large majority of the quantitative figures in the manuscript are produced using the ggplot package.

Citation

@Article{10.1038/s41746-023-00895-8,
    author={Bhattacharyay, Shubhayu and Caruso, Pier Francesco and {\AA}kerlund, Cecilia and Wilson, Lindsay and Stevens, Robert D. and Menon, David K. and Steyerberg, Ewout W. and Nelson, David W. and Ercole, Ari and the CENTER-TBI investigators participants},
    title={Mining the contribution of intensive care clinical course to outcome after traumatic brain injury},
    journal={npj Digital Medicine},
    year={2023},
    month={Aug},
    day={21},
    volume={6},
    number={1},
    pages={154},
    abstract={Existing methods to characterise the evolving condition of traumatic brain injury (TBI) patients in the intensive care unit (ICU) do not capture the context necessary for individualising treatment. Here, we integrate all heterogenous data stored in medical records (1166 pre-ICU and ICU variables) to model the individualised contribution of clinical course to 6-month functional outcome on the Glasgow Outcome Scale -Extended (GOSE). On a prospective cohort (n{\thinspace}={\thinspace}1550, 65 centres) of TBI patients, we train recurrent neural network models to map a token-embedded time series representation of all variables (including missing values) to an ordinal GOSE prognosis every 2{\thinspace}h. The full range of variables explains up to 52{\%} (95{\%} CI: 50--54{\%}) of the ordinal variance in functional outcome. Up to 91{\%} (95{\%} CI: 90--91{\%}) of this explanation is derived from pre-ICU and admission information (i.e., static variables). Information collected in the ICU (i.e., dynamic variables) increases explanation (by up to 5{\%} [95{\%} CI: 4--6{\%}]), though not enough to counter poorer overall performance in longer-stay (>5.75 days) patients. Highest-contributing variables include physician-based prognoses, CT features, and markers of neurological function. Whilst static information currently accounts for the majority of functional outcome explanation after TBI, data-driven analysis highlights investigative avenues to improve the dynamic characterisation of longer-stay patients. Moreover, our modelling strategy proves useful for converting large patient records into interpretable time series with missing data integration and minimal processing.},
    issn={2398-6352},
    doi={10.1038/s41746-023-00895-8},
    url={https://doi.org/10.1038/s41746-023-00895-8}
}

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
CENTER-TBI		CENTER-TBI
model_interpretations		model_interpretations
model_outputs		model_outputs
model_performance		model_performance
plots		plots
scripts		scripts
timestamps		timestamps
tokens		tokens
LICENSE		LICENSE
README.md		README.md

License

sbhattacharyay/dynamic_GOSE_model

Folders and files

Latest commit

History

Repository files navigation

Ordinal, full-context prognosis-based trajectories of traumatic brain injury patients in European ICUs

Contents

Overview

Abstract

Code

1. Extract study sample from CENTER-TBI dataset and define ICU stays

2. Partition CENTER-TBI for stratified, repeated k-fold cross-validation

3. Tokenise all CENTER-TBI variables and place into discretised ICU stay time windows

4. Train and evaluate full-context ordinal-trajectory-generating models

5. Interpret variable effects on trajectory generation and evaluate baseline comparison model

6. Sensitivity analysis to parse effect of length of stay

7. Visualise study results for manuscript

Citation

About

Resources

License

Stars

Watchers

Forks

Languages