Skip to content

sbhattacharyay/dynamic_GOSE_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

Ordinal, full-context prognosis-based trajectories of traumatic brain injury patients in European ICUs

Mining the contribution of intensive care clinical course to outcome after traumatic brain injury

Contents

Overview

This repository contains the code underlying the article entitled Mining the contribution of intensive care clinical course to outcome after traumatic brain injury from the Collaborative European NeuroTrauma Effectiveness Research in TBI (CENTER-TBI) consortium. In this file, we present the abstract, to outline the motivation for the work and the findings, and then a brief description of the code with which we generate these finding and achieve this objective.

The code on this repository is commented throughout to provide a description of each step alongside the code which achieves it.

Abstract

Existing methods to characterise the evolving condition of traumatic brain injury (TBI) patients in the intensive care unit (ICU) do not capture the context necessary for individualising treatment. Here, we integrate all heterogenous data stored in medical records (1,166 pre-ICU and ICU variables) to model the individualised contribution of clinical course to six-month functional outcome on the Glasgow Outcome Scale - Extended (GOSE). On a prospective cohort (n=1,550, 65 centres) of TBI patients, we train recurrent neural network models to map a token-embedded time series representation of all variables (including missing values) to an ordinal GOSE prognosis every two hours. The full range of variables explains up to 52% (95% CI: 50%-54%) of the ordinal variance in functional outcome. Up to 91% (95% CI: 90%-91%) of this explanation is derived from pre-ICU and admission information (i.e., static variables). Information collected in the ICU (i.e., dynamic variables) increases explanation (by up to 5% [95% CI: 4%-6%]), though not enough to counter poorer overall performance in longer-stay (>5.75 days) patients. Highest-contributing variables include physician-based prognoses, CT features, and markers of neurological function. Whilst static information currently accounts for the majority of functional outcome explanation after TBI, data-driven analysis highlights investigative avenues to improve dynamic characterisation of longer-stay patients. Moreover, our modelling strategy proves useful for converting large patient records into interpretable time series with missing data integration and minimal processing.

Code

All of the code used in this work can be found in the ./scripts directory as Python (.py), R (.R), or bash (.sh) scripts. Moreover, custom classes have been saved in the ./scripts/classes sub-directory, custom functions have been saved in the ./scripts/functions sub-directory, and custom PyTorch models have been saved in the ./scripts/models sub-directory.

In this .py file, we extract the study sample from the CENTER-TBI dataset, filter patients by our study criteria, and determine ICU admission and discharge times for time window discretisation. We also perform proportional odds logistic regression analysis to determine significant effects among summary characteristics.

In this .py file, we create 100 partitions, stratified by 6-month GOSE, for repeated k-fold cross-validation, and save the splits into a dataframe for subsequent scripts.

3. Tokenise all CENTER-TBI variables and place into discretised ICU stay time windows

  1. In this .py file, we extract all heterogeneous types of variables from CENTER-TBI and fix erroneous timestamps and formats.
  2. In this .py file, we convert all CENTER-TBI variables into tokens depending on variable type and compile full dictionaries of tokens across the full dataset.

4. Train and evaluate full-context ordinal-trajectory-generating models

  1. In this .py file, we train the trajectory-generating models across the repeated cross-validation splits and the hyperparameter configurations. This is run, with multi-array indexing, on the HPC using a bash script.
  2. In this .py file, we compile the training, validation, and testing set trajectories generated by the models and creates bootstrapping resamples for validation set dropout.
  3. In this .py file, we calculate validation set trajectory calibration and discrimination based on provided bootstrapping resample row index. This is run, with multi-array indexing, on the HPC using a bash script.
  4. In this .py file, we compiled the validation set performance metrics and perform bias-corrected bootstrapping dropout for cross-validation (BBCD-CV) to reduce the number of hyperparameter configurations. We also create testing set resamples for final performance calculation bootstrapping.
  5. In this .py file, calculate the model calibration and explanation metrics to assess model reliability and information, respectively. This is run, with multi-array indexing, on the HPC using a bash script.
  6. In this .py file, we compile the performance metrics and summarise them across bootstrapping resamples to define the 95% confidence intervals for statistical inference.

5. Interpret variable effects on trajectory generation and evaluate baseline comparison model

  1. In this .py file, we extract and summarise the learned weights from the model relevance layers (trained as PyTorch Embedding layers).
  2. In this .py file, we define and identify significant transitions in individual patient trajectories, partition them for bootstrapping, and calculate summarised testing set trajectory information in preparation for TimeSHAP feature contribution calculation.
  3. In this .py file, we calculate variable and time-window TimeSHAP values for each individual's significant transitions. This is run, with multi-array indexing, on the HPC using a bash script.
  4. In this .py file, we load all the calculated TimeSHAP values and summarise them for population-level variable and time-window analysis. We also extract the model trajectories of our individual patient for exploration.
  5. In this .py file, we perform a variable robustness check by calculating the Kendall's Tau rank correlation coefficient between variable values and the TimeSHAP corresponding to each value. This is run, with multi-array indexing, on the HPC using a bash script.
  6. In this .py file, we determine which variables have a significantly robust relationship with GOSE outcome and drop out those which do not.
  7. In this .py file, calculate the baseline model calibration and explanation metrics to assess model reliability and information, respectively. This model was developed in our previous work (see ordinal GOSE prediction repository) and serves as our baseline for comparison to determine the added value of information collected in the ICU over time. This is run, with multi-array indexing, on the HPC using a bash script.
  8. In this .py file, compile the baseline model performance metrics and summarise them across bootstrapping resamples to define the 95% confidence intervals for statistical inference.

6. Sensitivity analysis to parse effect of length of stay

  1. In this .py file, we construct a list of model trajectories to generate from just the static variable set, compile static variable outputs, prepare bootstrapping resamples for ICU stay duration cut-off analysis, and characterise characteristics of study population remaining in the ICU over time.
  2. In this .py file, we generate patient trajectories solely based on static variables for baseline comparison based on provided bootstrapping resample row index. This is run, with multi-array indexing, on the HPC using a bash script.
  3. In this .py file, we calculate testing set sensitivity analysis metrics based on provided bootstrapping resample row index. This is run, with multi-array indexing, on the HPC using a bash script.
  4. In this .py file, we load all the calculated sensitivity testing set performance values and summarise them for statistical inference for our sensitivity analysis.

In this .R file, we produce the figures for the manuscript and the supplementary figures. The large majority of the quantitative figures in the manuscript are produced using the ggplot package.

Citation

@Article{10.1038/s41746-023-00895-8,
    author={Bhattacharyay, Shubhayu and Caruso, Pier Francesco and {\AA}kerlund, Cecilia and Wilson, Lindsay and Stevens, Robert D. and Menon, David K. and Steyerberg, Ewout W. and Nelson, David W. and Ercole, Ari and the CENTER-TBI investigators participants},
    title={Mining the contribution of intensive care clinical course to outcome after traumatic brain injury},
    journal={npj Digital Medicine},
    year={2023},
    month={Aug},
    day={21},
    volume={6},
    number={1},
    pages={154},
    abstract={Existing methods to characterise the evolving condition of traumatic brain injury (TBI) patients in the intensive care unit (ICU) do not capture the context necessary for individualising treatment. Here, we integrate all heterogenous data stored in medical records (1166 pre-ICU and ICU variables) to model the individualised contribution of clinical course to 6-month functional outcome on the Glasgow Outcome Scale -Extended (GOSE). On a prospective cohort (n{\thinspace}={\thinspace}1550, 65 centres) of TBI patients, we train recurrent neural network models to map a token-embedded time series representation of all variables (including missing values) to an ordinal GOSE prognosis every 2{\thinspace}h. The full range of variables explains up to 52{\%} (95{\%} CI: 50--54{\%}) of the ordinal variance in functional outcome. Up to 91{\%} (95{\%} CI: 90--91{\%}) of this explanation is derived from pre-ICU and admission information (i.e., static variables). Information collected in the ICU (i.e., dynamic variables) increases explanation (by up to 5{\%} [95{\%} CI: 4--6{\%}]), though not enough to counter poorer overall performance in longer-stay (>5.75 days) patients. Highest-contributing variables include physician-based prognoses, CT features, and markers of neurological function. Whilst static information currently accounts for the majority of functional outcome explanation after TBI, data-driven analysis highlights investigative avenues to improve the dynamic characterisation of longer-stay patients. Moreover, our modelling strategy proves useful for converting large patient records into interpretable time series with missing data integration and minimal processing.},
    issn={2398-6352},
    doi={10.1038/s41746-023-00895-8},
    url={https://doi.org/10.1038/s41746-023-00895-8}
}