Surgical Risk Score for Infective Endocarditis — Machine Learning Pipeline
European Heart Journal submission · Gelsomino S, Parise G, Parise O, Di Mauro M, Actis Dato G, Lorusso R
SYS Score v2.0 is a bagged XGBoost ensemble blended with RISK-E for predicting 30-day surgical mortality in patients operated for infective endocarditis (IE). It was developed on the GIROC multicenter registry (n=5,255 development patients, 36 centres) and externally validated on an independent held-out centre (Centro 12, n=597).
Key performance (external validation):
- AUC = 0.882 (95% CI 0.858–0.906)
- DeLong p < 0.05 vs all 6 established IE scores
- Robust across 30 independent random seeds (100% 6/6 significance rate)
sys-score/
├── 01_train.py # Training pipeline (XGB bagged + calibration + blend)
├── 02_analyses.py # NRI/IDI, DCA, bootstrap stability, multi-seed, TIMA-3
├── 03_figures.py # Figure 1 (2×2) + Figure 2 supplementary
├── 04_distill_browser.py # Ridge distillation + standalone browser calculator
├── 05_validate.py # Certification script (all checks must pass)
├── requirements.txt # Python dependencies
├── GIROC_TIMA1.csv # Master cleaned dataset (n=5,403)
├── synthetic_1000ep.csv # CTGAN synthetic data (n=100,000)
├── dbEndocarditiGIROC_DEFINITIVO_DICEMBRE_2023_2.xlsx # Source for CENTRO variable
├── EHJ_TIMA_COMPLETED_WITH_10_BLINDED_CLINICIANS.xlsx # TIMA-3 blind test
├── models/ # Saved models (10 XGB bags, calibrators, config)
├── predictions/ # Model predictions on study + external sets
├── reports/ # JSON/CSV metrics (metrics_final, NRI_IDI, DCA, multiseed)
├── figures/ # Publication figures (PNG + PDF, 300 dpi)
├── calculator/ # Standalone HTML calculator + model_weights.json
├── manuscript/ # Paper draft, cover letter, tables
└── zenodo/ # Zenodo metadata and deposit instructions
pip install -r requirements.txtRequires Python ≥ 3.10, CUDA-capable GPU (tested on NVIDIA RTX 5070).
python 01_train.py # ~3-5 min on GPU
python 02_analyses.py # ~15-20 min (30-seed multi-validation)
python 03_figures.py # ~1 min
python 04_distill_browser.py # ~2 min
python 05_validate.py # Certification (must print ALL CHECKS PASSED)python -c "
import zipfile, os, pathlib
with zipfile.ZipFile('sys-score-v2.0.zip','w',zipfile.ZIP_DEFLATED) as z:
for p in pathlib.Path('.').rglob('*'):
if p.is_file() and '.git' not in str(p):
z.write(p)
print('Created sys-score-v2.0.zip')
"| Parameter | Value |
|---|---|
| SEED | 42 |
| N_BAG | 10 |
| N_EVENTS (synth) | 88,000 |
| N_NONEVENTS | 12,000 |
| W_XGB | 0.60 |
| W_RISKE | 0.40 |
| XGB max_depth | 3 |
| XGB n_estimators | 500 |
| XGB lr | 0.03 |
| Device | cuda (GPU) |
Features (38 total): 30 clinical variables + 8 interaction terms:
Base: ETA, female, PROTESI, ASCESSO, FISTOLA, PSEUDOANEURISMA, IPERTENSIONE, DIABETE, OBESITA, BPCO, FE, TOSSICODIPENDENTE, NUMERODIREDO, IABP_PRE, PATNEUROLOGICAPRECEDENTE, SHOCK, INTUBPRE, PAPSgt50, IRC, DIALISI, VEGETAZIONI, PERFORAZIONELEMBO, aortic, mitral, CABG, INFECTION_SEPSIS, MOF, periannular, saureus, fungal
Interactions: shock×saureus, shock×MOF, fungal×PROTESI, lvef_low (FE<35), lvef_low×SHOCK, periannular×PROTESI, MOF×AKI, INTUBPRE×SHOCK
Architecture:
- 10 bagged XGBoost classifiers (GPU) trained on oversampled CTGAN synthetic data
- Isotonic + sigmoid calibration on real development patients
- Final blend: 0.60 × calibrated XGB + 0.40 × RISK-E
| Score | Variable |
|---|---|
| EuroSCORE II | EuroSCORE_logistic_prob |
| EndoSCORE | EndoSCORE_prob |
| RISK-E | RISK_E_prob |
| AEPEI | AEPEI_prob |
| APORTEI | APORTEI_prob |
| STS-IE | STS_IE_prob |
The GIROC-TIMA blind realism test (TIMA-3) involved 10 blinded expert cardiac surgeons
who classified 50 cases (real vs synthetic) from the CTGAN 1000-epoch synthetic dataset.
Results are reported in reports/TIMA3_realism.json.
FOR RESEARCH USE ONLY. This tool has not been approved by any regulatory agency. It must not be used for individual patient management decisions without additional clinical validation and institutional approval. The authors accept no liability for any clinical use of this software.
Gelsomino S, Parise G, Parise O, Di Mauro M, Actis Dato G, Lorusso R. SYS Score v2.0: A Synthetic-Data-Augmented Machine Learning Score for Surgical Mortality Prediction in Infective Endocarditis. European Heart Journal, 2025. doi:10.1093/eurheartj/[TBD]
MIT License — see LICENSE for full text.