Skip to content

sandro2462/sys-score

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SYS Score v2.0

Surgical Risk Score for Infective Endocarditis — Machine Learning Pipeline

European Heart Journal submission · Gelsomino S, Parise G, Parise O, Di Mauro M, Actis Dato G, Lorusso R


Overview

SYS Score v2.0 is a bagged XGBoost ensemble blended with RISK-E for predicting 30-day surgical mortality in patients operated for infective endocarditis (IE). It was developed on the GIROC multicenter registry (n=5,255 development patients, 36 centres) and externally validated on an independent held-out centre (Centro 12, n=597).

Key performance (external validation):

  • AUC = 0.882 (95% CI 0.858–0.906)
  • DeLong p < 0.05 vs all 6 established IE scores
  • Robust across 30 independent random seeds (100% 6/6 significance rate)

Repository Structure

sys-score/
├── 01_train.py              # Training pipeline (XGB bagged + calibration + blend)
├── 02_analyses.py           # NRI/IDI, DCA, bootstrap stability, multi-seed, TIMA-3
├── 03_figures.py            # Figure 1 (2×2) + Figure 2 supplementary
├── 04_distill_browser.py    # Ridge distillation + standalone browser calculator
├── 05_validate.py           # Certification script (all checks must pass)
├── requirements.txt         # Python dependencies
├── GIROC_TIMA1.csv          # Master cleaned dataset (n=5,403)
├── synthetic_1000ep.csv     # CTGAN synthetic data (n=100,000)
├── dbEndocarditiGIROC_DEFINITIVO_DICEMBRE_2023_2.xlsx  # Source for CENTRO variable
├── EHJ_TIMA_COMPLETED_WITH_10_BLINDED_CLINICIANS.xlsx  # TIMA-3 blind test
├── models/                  # Saved models (10 XGB bags, calibrators, config)
├── predictions/             # Model predictions on study + external sets
├── reports/                 # JSON/CSV metrics (metrics_final, NRI_IDI, DCA, multiseed)
├── figures/                 # Publication figures (PNG + PDF, 300 dpi)
├── calculator/              # Standalone HTML calculator + model_weights.json
├── manuscript/              # Paper draft, cover letter, tables
└── zenodo/                  # Zenodo metadata and deposit instructions

Quickstart

1. Install dependencies

pip install -r requirements.txt

Requires Python ≥ 3.10, CUDA-capable GPU (tested on NVIDIA RTX 5070).

2. Run pipeline

python 01_train.py       # ~3-5 min on GPU
python 02_analyses.py    # ~15-20 min (30-seed multi-validation)
python 03_figures.py     # ~1 min
python 04_distill_browser.py  # ~2 min
python 05_validate.py    # Certification (must print ALL CHECKS PASSED)

3. Create release ZIP

python -c "
import zipfile, os, pathlib
with zipfile.ZipFile('sys-score-v2.0.zip','w',zipfile.ZIP_DEFLATED) as z:
    for p in pathlib.Path('.').rglob('*'):
        if p.is_file() and '.git' not in str(p):
            z.write(p)
print('Created sys-score-v2.0.zip')
"

Hyperparameters

Parameter Value
SEED 42
N_BAG 10
N_EVENTS (synth) 88,000
N_NONEVENTS 12,000
W_XGB 0.60
W_RISKE 0.40
XGB max_depth 3
XGB n_estimators 500
XGB lr 0.03
Device cuda (GPU)

Model Description

Features (38 total): 30 clinical variables + 8 interaction terms:

Base: ETA, female, PROTESI, ASCESSO, FISTOLA, PSEUDOANEURISMA, IPERTENSIONE, DIABETE, OBESITA, BPCO, FE, TOSSICODIPENDENTE, NUMERODIREDO, IABP_PRE, PATNEUROLOGICAPRECEDENTE, SHOCK, INTUBPRE, PAPSgt50, IRC, DIALISI, VEGETAZIONI, PERFORAZIONELEMBO, aortic, mitral, CABG, INFECTION_SEPSIS, MOF, periannular, saureus, fungal

Interactions: shock×saureus, shock×MOF, fungal×PROTESI, lvef_low (FE<35), lvef_low×SHOCK, periannular×PROTESI, MOF×AKI, INTUBPRE×SHOCK

Architecture:

  1. 10 bagged XGBoost classifiers (GPU) trained on oversampled CTGAN synthetic data
  2. Isotonic + sigmoid calibration on real development patients
  3. Final blend: 0.60 × calibrated XGB + 0.40 × RISK-E

Comparison Scores

Score Variable
EuroSCORE II EuroSCORE_logistic_prob
EndoSCORE EndoSCORE_prob
RISK-E RISK_E_prob
AEPEI AEPEI_prob
APORTEI APORTEI_prob
STS-IE STS_IE_prob

TIMA-3 Synthetic Data Realism

The GIROC-TIMA blind realism test (TIMA-3) involved 10 blinded expert cardiac surgeons who classified 50 cases (real vs synthetic) from the CTGAN 1000-epoch synthetic dataset. Results are reported in reports/TIMA3_realism.json.


Clinical Disclaimer

FOR RESEARCH USE ONLY. This tool has not been approved by any regulatory agency. It must not be used for individual patient management decisions without additional clinical validation and institutional approval. The authors accept no liability for any clinical use of this software.


Citation

Gelsomino S, Parise G, Parise O, Di Mauro M, Actis Dato G, Lorusso R. SYS Score v2.0: A Synthetic-Data-Augmented Machine Learning Score for Surgical Mortality Prediction in Infective Endocarditis. European Heart Journal, 2025. doi:10.1093/eurheartj/[TBD]


License

MIT License — see LICENSE for full text.

About

Synthetic Data-Augmented Risk Prediction for 30-Day Mortality after Cardiac Surgery for Infective Endocarditis · External AUC 0.882 · 6/6 DeLong p<0.005 · Zenodo DOI: 10.5281/zenodo.20327975

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors