TransformerECG achieved the highest macro-AUC (0.885) in a systematic benchmark of 5 deep learning architectures for multi-label ECG diagnosis. Trained on 27,765 12-lead ECGs from PTB-XL, classifying across 5 cardiac superclasses under identical experimental conditions.
Published in the style of The New England Journal of Statistics in Data Science (2025).
ECG interpretation is critical for cardiac diagnosis but suffers from significant inter-reader variability among clinicians. This study conducts the first controlled benchmark of CNN, multi-resolution CNN, Transformer, graph-based, and wavelet-enhanced architectures on the PTB-XL dataset — all trained under identical preprocessing, splits, and evaluation conditions. Patient-level demographic features (age, sex, recording site) were integrated directly into each model.
| Model | Macro AUC | Macro F1 | Label Accuracy |
|---|---|---|---|
| TransformerECG | 0.885 | 0.703 | 0.876 |
| ResNet1D (Baseline) | 0.823 | 0.740 | 0.888 |
| MultiResCNN | 0.794 | 0.656 | — |
| MRMT-GNN | 0.743 | — | — |
| WaveletAttention | 0.712 | — | — |
TransformerECG led on macro-AUC; ResNet1D led on F1 and label accuracy — highlighting that strong ranking performance doesn't always translate to stronger threshold-based classification.
| Architecture | Approach |
|---|---|
| ResNet1D | Residual 1D CNN — strong baseline for ECG morphology |
| MultiResCNN | Parallel convolutions (kernels 3, 7, 15) for multi-scale features |
| TransformerECG | Multi-head self-attention over 12-lead temporal embeddings |
| MRMT-GNN | Dilated convolutions + Transformer + graph neural network over label co-occurrence |
| WaveletAttention | Wavelet-inspired multi-scale filters + attention encoder |
PTB-XL — 21,799 clinically acquired 12-lead ECGs (100 Hz), annotated with 71 SCP diagnostic statements mapped to 5 superclasses:
| Superclass | Count | % |
|---|---|---|
| Normal (NORM) | 9,514 | 43.6% |
| Myocardial Infarction (MI) | 5,469 | 25.1% |
| ST/T Abnormalities (STTC) | 5,235 | 24.0% |
| Conduction Disorders (CD) | 4,898 | 22.5% |
| Hypertrophy (HYP) | 2,649 | 12.2% |
Splits: folds 1–8 train, fold 9 validation, fold 10 test (official stratified PTB-XL protocol).
Data: PTB-XL on PhysioNet — publicly available, not included in this repo due to size.
| Notebook | Description |
|---|---|
01_eda.ipynb |
Exploratory data analysis — label distribution, demographics, ECG signal visualization |
Python PyTorch HuggingFace wfdb scikit-learn pandas numpy matplotlib seaborn Google Colab
Built as part of BA878 (Deep Learning for Healthcare) at Boston University with Bhuvan S. Gowda, Sumanth H. Kamath, and Rishabh R. Suravaram.