ECG Classification: Benchmarking Deep Learning Architectures

TransformerECG achieved the highest macro-AUC (0.885) in a systematic benchmark of 5 deep learning architectures for multi-label ECG diagnosis. Trained on 27,765 12-lead ECGs from PTB-XL, classifying across 5 cardiac superclasses under identical experimental conditions.

Published in the style of The New England Journal of Statistics in Data Science (2025).

Overview

ECG interpretation is critical for cardiac diagnosis but suffers from significant inter-reader variability among clinicians. This study conducts the first controlled benchmark of CNN, multi-resolution CNN, Transformer, graph-based, and wavelet-enhanced architectures on the PTB-XL dataset — all trained under identical preprocessing, splits, and evaluation conditions. Patient-level demographic features (age, sex, recording site) were integrated directly into each model.

Results

Model	Macro AUC	Macro F1	Label Accuracy
TransformerECG	0.885	0.703	0.876
ResNet1D (Baseline)	0.823	0.740	0.888
MultiResCNN	0.794	0.656	—
MRMT-GNN	0.743	—	—
WaveletAttention	0.712	—	—

TransformerECG led on macro-AUC; ResNet1D led on F1 and label accuracy — highlighting that strong ranking performance doesn't always translate to stronger threshold-based classification.

Architecture Overview

Architecture	Approach
ResNet1D	Residual 1D CNN — strong baseline for ECG morphology
MultiResCNN	Parallel convolutions (kernels 3, 7, 15) for multi-scale features
TransformerECG	Multi-head self-attention over 12-lead temporal embeddings
MRMT-GNN	Dilated convolutions + Transformer + graph neural network over label co-occurrence
WaveletAttention	Wavelet-inspired multi-scale filters + attention encoder

Dataset

PTB-XL — 21,799 clinically acquired 12-lead ECGs (100 Hz), annotated with 71 SCP diagnostic statements mapped to 5 superclasses:

Superclass	Count	%
Normal (NORM)	9,514	43.6%
Myocardial Infarction (MI)	5,469	25.1%
ST/T Abnormalities (STTC)	5,235	24.0%
Conduction Disorders (CD)	4,898	22.5%
Hypertrophy (HYP)	2,649	12.2%

Splits: folds 1–8 train, fold 9 validation, fold 10 test (official stratified PTB-XL protocol).

Data: PTB-XL on PhysioNet — publicly available, not included in this repo due to size.

Notebooks

Notebook	Description
`01_eda.ipynb`	Exploratory data analysis — label distribution, demographics, ECG signal visualization

Tech Stack

Python PyTorch HuggingFace wfdb scikit-learn pandas numpy matplotlib seaborn Google Colab

Team

Built as part of BA878 (Deep Learning for Healthcare) at Boston University with Bhuvan S. Gowda, Sumanth H. Kamath, and Rishabh R. Suravaram.

Links

Final Report · Presentation · PTB-XL Dataset

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECG Classification: Benchmarking Deep Learning Architectures

Overview

Results

Architecture Overview

Dataset

Notebooks

Tech Stack

Team

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ECG Classification: Benchmarking Deep Learning Architectures

Overview

Results

Architecture Overview

Dataset

Notebooks

Tech Stack

Team

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages