EEG-GCNN: Dataset, Task, Models, and Neurological Disease Detection Pipeline by jburhan · Pull Request #3 · jburhan/PyHealth

jburhan · 2026-04-18T17:32:20Z

Contributors

Jimmy Burhan — jburhan2 (Dataset + Task)
Robert Coffey — rc37 (Models + Training)

Type of Contribution

Option 4: Full Pipeline — Dataset + Task + Model (60 pts)

Paper

Wagh & Varatharajah (2020) — EEG-GCNN: Augmenting Electroencephalogram-based Neurological Disease Diagnosis using a Domain-guided Graph Convolutional Neural Network, ML4H @ NeurIPS 2020

Summary

End-to-end EEG-GCNN pipeline for neurological disease detection from EEG recordings, replicating and extending Wagh & Varatharajah (2020).

Dataset & Task (Jimmy Burhan — jburhan2)

EEGGCNNRawDataset — raw EEG loader for TUAB (1,385 clinically-normal patients, label 0) + LEMON (208 healthy volunteers, label 1): resamples to 250 Hz, 1 Hz high-pass + 60/50 Hz notch filter, 8 bipolar channels, 10 s non-overlapping windows, 6-band Welch PSD features (delta/theta/alpha/beta/low_gamma/high_gamma), writes 5 FigShare-format arrays via precompute_features()
EEGGCNNDataset — FigShare pre-computed dataset (1,593 subjects, 225,334 windows); builds graph adjacency via tunable blend of geodesic distance and spectral coherence: edge_weight = α·geodesic + (1−α)·coherence
EEGGCNNDiseaseDetection — per-sample streaming task with configurable adjacency type (spatial / functional / combined) and band subset; drives both ablation studies
EEGGCNNClassification — window-level classification task for FigShare data

Models & Training (Robert Coffey — rc37)

EEGGraphConvNet — two-layer GCN classifier (reproduces paper's Shallow EEG-GCNN, float32)
EEGGATConvNet — two-layer graph attention network classifier (float64)
Training pipelines, holdout evaluation scripts, band frequency ablation script
23 unit tests (synthetic fixtures, no real EEG required)

Ablation Studies (Original Extensions)

Ablation 1 — Adjacency Type (α sweep, GCN)

Varied α ∈ {0.0, 0.25, 0.5, 0.75, 1.0} on the full FigShare dataset (477 test patients, 70/30 split, 10-fold CV). Paper only uses α=0.5 — this is our novel contribution:

α	auroc_patient	bal_acc	f1
0.0 (coherence only)	0.8841 ± 0.0132	0.8087 ± 0.0130	0.8222 ± 0.0176
0.25	0.8943 ± 0.0072	0.8069 ± 0.0122	0.8385 ± 0.0162
0.50 (paper default)	0.8984 ± 0.0102	0.8276 ± 0.0122	0.8399 ± 0.0281
0.75	0.8991 ± 0.0066	0.8209 ± 0.0051	0.8491 ± 0.0323
1.0 (geodesic only)	0.7891 ± 0.0471	0.7406 ± 0.0349	0.8211 ± 0.0599

α=1.0 (spatial-only) collapses performance — functional connectivity is essential. α=0.75 best for auroc_patient and F1; α=0.50 best for balanced accuracy — both within each other's error margin.

Ablation 2 — Spectral Frequency Analysis (GCN, inference-only)

Leave-one-out (LOO) and keep-one-in (KOI) across all 6 bands using trained GCN checkpoints — no retraining required. Low gamma is the single most important band (removing it drops AUROC from 0.898 → 0.594; keeping only it recovers 0.723). All bands make a meaningful contribution — none is redundant. Numerical results in project slides.

Best Results (GAT, tuned learning rate)

auroc_patient	bal_acc	f1	precision	recall
0.9415 ± 0.0251	0.8854 ± 0.0322	0.9201 ± 0.0231	0.9855 ± 0.0089	0.8638 ± 0.0409

GAT with tuned LR outperforms GCN on every patient-level metric. End-to-end run on locally pre-computed features matched FigShare-trained results.

File Guide

File	Description
`pyhealth/datasets/eeg_gcnn.py`	`EEGGCNNDataset` — FigShare pre-computed loader
`pyhealth/datasets/eeg_gcnn_raw.py`	`EEGGCNNRawDataset` — raw TUAB + LEMON loader with `precompute_features()`
`pyhealth/datasets/configs/eeg_gcnn.yaml`	FigShare dataset config
`pyhealth/datasets/configs/eeg_gcnn_raw.yaml`	Raw dataset config
`pyhealth/tasks/eeg_gcnn_classification.py`	`EEGGCNNClassification` — window-level task for FigShare path
`pyhealth/tasks/eeg_gcnn_disease_detection.py`	`EEGGCNNDiseaseDetection` — streaming task for raw path
`pyhealth/models/eeg_gcnn.py`	`EEGGraphConvNet` (GCN)
`pyhealth/models/eeg_gatcnn.py`	`EEGGATConvNet` (GAT)
`examples/eeg_gcnn/pre_compute.py`	Batch precompute: raw TUAB + LEMON → 5 FigShare-format arrays
`examples/eeg_gcnn/training_pipeline_shallow_gcnn.py`	GCN training (10-fold CV)
`examples/eeg_gcnn/heldout_test_run_gcnn.py`	GCN holdout evaluation (Youden's J threshold)
`examples/eeg_gcnn/run_band_ablation.py`	Spectral frequency ablation (LOO + KOI, 13 conditions)
`examples/eeg_gatcnn/pre_compute_gatcnn.py`	Batch precompute for GAT (float64)
`examples/eeg_gatcnn/training_pipeline_shallow_gatcnn.py`	GAT training (tuned LR)
`examples/eeg_gatcnn/heldout_test_run_gatcnn.py`	GAT holdout evaluation
`pyhealth/tests/test_eeg_gcnn.py`	23 unit tests: dataset, task, models (all synthetic data)
`docs/api/eeg_gcnn_pipeline.rst`	Full pipeline documentation
`docs/api/datasets/pyhealth.datasets.EEGGCNNDataset.rst`	API doc — FigShare dataset
`docs/api/datasets/pyhealth.datasets.EEGGCNNRawDataset.rst`	API doc — raw dataset
`docs/api/models/pyhealth.models.EEGGraphConvNet.rst`	API doc — GCN model
`docs/api/models/pyhealth.models.EEGGATConvNet.rst`	API doc — GAT model
`docs/api/tasks/pyhealth.tasks.EEGGCNNClassification.rst`	API doc — classification task
`docs/api/tasks/pyhealth.tasks.eeg_gcnn_disease_detection.rst`	API doc — disease detection task

Test Plan

pytest pyhealth/tests/test_eeg_gcnn.py -v
# 23 tests, all passing, ~12 seconds, no real EEG data required

* Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update * test script fixes * dataset path update * fix contrawr - small change * divide by 0 error * Incorporate tfm logic * Fix label stuff * tuab fixes * fix metrics * aggregate alphas * Fix splitting and add tfm weights * fix tfm+tuab * updates scripts and haoyu splitter * fix conflict * Remove weightfiles from tracking and add to .gitignore Weight files are large binaries distributed separately; untrack all existing .pth files under weightfiles/ and add weightfiles/ to .gitignore so they are excluded from future commits and the PR. Made-with: Cursor

…uiuc#904) * feat: add optional dependency groups for graph and NLP extras (sunlabuiuc#890) Add [project.optional-dependencies] to pyproject.toml so users can install domain-specific dependencies via pip extras: pip install pyhealth[graph] # torch-geometric for GraphCare, KG pip install pyhealth[nlp] # editdistance, rouge_score, nltk The codebase already uses try/except ImportError with HAS_PYG flags for torch-geometric, and the NLP metrics define their required versions in each scorer class. This change exposes those dependencies through standard Python packaging so pip can resolve them. Version pins match the requirements declared in the code: - editdistance~=0.8.1 (pyhealth/nlp/metrics.py:356) - rouge_score~=0.1.2 (pyhealth/nlp/metrics.py:415) - nltk~=3.9.1 (pyhealth/nlp/metrics.py:397) - torch-geometric>=2.6.0 (compatible with PyTorch 2.7) Closes sunlabuiuc#890 * fix: move optional-dependencies after scalar fields to fix TOML structure Move [project.optional-dependencies] from between dependencies and license (line 49) to after keywords (line 62), before [project.urls]. In TOML, a sub-table header like [project.optional-dependencies] closes the parent [project] table, so placing it before license and keywords caused those fields to be excluded from [project]. This broke CI validation. Verified with tomllib that all project fields (name, license, keywords, optional-dependencies, urls) parse correctly under [project].

* init commit * RNN memory fix * add example scripts here * more bug fixes? * commit to see new changes * add test cases * fix basemodel leakage of args * fixes to tests and examples * more examples * reduce unnecessary checks, enable crashing on when a cache is invalid * fix nested sequence rnn problems * fixes for the concare and transformer model exploding in memory * fix concare merge conflict again * fix for 3D channel for CNN * update and delete defunct docs * better loc comparisons and also a bunch of model fixes hopefully * test case updates to match our bug fixes * fix instability in calibration tests for CP tldr; Fixes a variety of dataset loading, run bugs, splits for TUEV/TUAB, adds a good number of performance fixes for Transformer and Concare. We can always iterate on our fixes later.

Bypassing a PR review, because of speed/reviewer bottleneck reasons.

Implements the EEG-GCNN paper (Wagh & Varatharajah, ML4H @ NeurIPS 2020) as a PyHealth 2.0 contribution with dataset, task, tests, docs, and ablation study script. New files: - pyhealth/datasets/eeg_gcnn.py — EEGGCNNDataset (TUAB normal + MPI LEMON) - pyhealth/datasets/configs/eeg_gcnn.yaml — YAML config - pyhealth/tasks/eeg_gcnn_nd_detection.py — EEGGCNNDiseaseDetection task (PSD features, spatial/functional/combined adjacency, configurable bands) - tests/test_eeg_gcnn.py — 23 tests using synthetic data - examples/eeg_gcnn_nd_detection_gcn.py — 3 ablation experiments - RST docs for dataset and task

"coherence" must be passed as "coh" to spectral_connectivity_epochs.

The ablation script can now run without real TUAB/LEMON data using `--demo`. Generates 40 synthetic patients with reproducible random PSD features and adjacency matrices, runs all 3 experiments through the full GCN training pipeline. Also fixes a bug where roc_auc_score failed on single-column y_prob output from binary classification.

Rename task schema keys (psd_features→node_features, adjacency→adj_matrix) to match the EEGGraphConvNet model interface. Add self-contained combined pipeline script that runs both GCN and GAT models with demo mode, requiring only torch + torch_geometric (no PyHealth install needed).

…sunlabuiuc#935) The v2.0 MIMIC3Dataset/MIMIC4Dataset (based on BaseDataset) no longer accepts code_mapping, dev, or refresh_cache parameters. These were part of the legacy BaseEHRDataset API. Update README.rst, example scripts, and leaderboard utilities to use the current v2.0 API. Note: task file docstrings and pyhealth/datasets/mimicextract.py still reference code_mapping but are left for separate PRs since mimicextract.py has not yet been migrated to v2.0. Fixes sunlabuiuc#535

…ncy band ablations.

…d not also on training to reduce the time needed

* rename arg name for chefer * Initial attempts to fix the interpretability target_class_idx * Support negative prediction for interpretability metric. * Fix tests * Fix more tests * Revert "Support negative prediction for interpretability metric." This reverts commit fe8c8ad. * Reapply "Support all samples for interpretability metric" * Initial attempt for the filter * Fixup * Fix sample_class handling * fixup * fix test * Fix arg name * Add example * fix docs

…xamples/eeg_gatcnn

…lation added.

* feat: migrate GRASP model from PyHealth 1.0 to 2.0 API Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * feat: add GRASP mortality prediction notebook and fix cluster_num Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * Restore code_mapping support in SequenceProcessor for PyHealth 2.0 Adds optional code_mapping parameter to SequenceProcessor that maps granular medical codes to grouped vocabularies (e.g. ICD9CM→CCSCM) before building the embedding table. Resolves the functional gap from the 1.x→2.0 rewrite where code_mapping was removed. Ref sunlabuiuc#535 Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> * Add RNN baseline and code_mapping comparison notebooks for MIMIC-III Two identical notebooks for A/B testing code_mapping impact on mortality prediction. Only difference is the schema override in Step 2. Both use seed=42 for reproducible splits. Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> * fix(tasks): extract NDC codes instead of drug names for prescription mapping event.drug returns drug names (e.g. "Aspirin") which produce zero matches in CrossMap NDC→ATC; event.ndc returns actual NDC codes enabling 3/3 feature mapping for mortality and readmission tasks. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * test(tasks): add tests verifying NDC extraction in drug tasks Checks that mortality and readmission task processors build vocabulary from NDC codes (numeric strings) rather than drug names (e.g. "Aspirin"), confirming the event.drug -> event.ndc fix works correctly. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * fix(tasks): fix missed MortalityPredictionMIMIC4 event.drug and update docs - Fix event.drug -> event.ndc in MortalityPredictionMIMIC4 (line 282) - Update readmission task docstrings to reflect NDC extraction Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * fix(tasks): fix DrugRecommendationMIMIC3 to extract NDC codes DrugRecommendationMIMIC3 used prescriptions/drug (drug names) via Polars column select; changed to prescriptions/ndc to match MIMIC-4 variant and enable NDC->ATC code mapping. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * fix(models): guard RNNLayer and ConCare against zero-length sequences RNNLayer: clamp sequence lengths to min 1 so pack_padded_sequence does not crash on all-zero masks, matching TCNLayer (tcn.py:186). ConCare: guard covariance divisor with max(n-1, 1) to prevent ZeroDivisionError when attention produces single-element features. Both edge cases are triggered when code_mapping collapses vocabularies and some patients have all codes map to <unk>, producing all-zero embeddings and all-zero masks. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * docs: add docstrings to SequenceProcessor class and fit method Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * docs: add docstrings, type hints, and fix test dims for GRASP module Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * feat: add GRASP mortality prediction notebooks for baseline and code_mapping Baseline notebook runs GRASP with raw ICD-9/NDC codes. Code_mapping notebook collapses vocab via ICD9CM→CCSCM, ICD9PROC→CCSPROC, NDC→ATC for trainable embeddings on full MIMIC-III. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * fix(models): guard ConCare and GRASP against batch_size=1 crashes - ConCare FinalAttentionQKV: bare .squeeze() removed batch dim when batch_size=1, causing IndexError in softmax. Use .squeeze(-1) and .squeeze(1) to target only the intended dimensions. - ConCare cov(): division by zero when x.size(1)==1. Guard with max(). - GRASP grasp_encoder: remove stale torch.squeeze(hidden_t, 0) that collapsed [1, hidden] to [hidden] with batch_size=1. Both RNNLayer and ConCareLayer already return [batch, hidden]. - GRASP random_init: clamp num_centers to num_points to prevent ValueError when cluster_num > batch_size. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * feat: add GRASP mortality prediction notebooks for baseline and code_mapping Baseline notebook runs GRASP with raw ICD-9/NDC codes. Code_mapping notebook collapses vocab via ICD9CM→CCSCM, ICD9PROC→CCSPROC, NDC→ATC for trainable embeddings on full MIMIC-III. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * Add code_mapping as task __init__ argument Allow tasks to accept a code_mapping dict that upgrades input_schema entries so SequenceProcessor maps raw codes (e.g. ICD9CM) to grouped vocabularies (e.g. CCSCM) at fit/process time. This avoids manual schema manipulation after task construction. - Add code_mapping parameter to BaseTask.__init__() - Thread **kwargs + super().__init__() through all task subclasses with existing __init__ methods (4 readmission tasks, 1 multimodal mortality task) - Add 17 tests covering SequenceProcessor mapping and task-level code_mapping initialization Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * Update code_mapping notebook to use task init argument Replace manual task.input_schema override with the new code_mapping parameter on MortalityPredictionMIMIC3(). Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * feat(examples): add ConCare hyperparameter grid sweep script Mirrors the GRASP+ConCare mortality notebook pipeline exactly (same tables, split, seed, metrics) but sweeps 72 configurations of embedding_dim, hidden_dim, cluster_num, lr, and weight_decay. Results are logged to sweep_results.csv. Supports --root for pointing at local MIMIC-III, --code-mapping, --dev, and --monitor. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * chore(sweep): increase early stopping patience from 10 to 15 epochs Smaller ConCare configs (embedding_dim=8/16) may learn slower and need more epochs before plateauing. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * Initial plan * fix: filter falsy NDCs, guard None tokens in process(), fix NDC regex Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-authored-by: ddhangdd <43976109+ddhangdd@users.noreply.github.com> * refactor(sweep): rename and generalize sweep script for all backbones Rename sweep_concare_grasp.py → sweep_grasp.py. Now supports --block GRU|ConCare|LSTM with per-backbone default grids, --resume for crash recovery, --grid JSON override, auto-dated output dirs (sweep/{BLOCK}_{YYYYMMDD}_{HHMMSS}_{mapping}/), and config.json saved alongside results for reproducibility. Co-Authored-By: Colton Loew <colton.loew@gmail.com> Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com> Co-Authored-By: christiana-beard <christyanamarie116@gmail.com> Co-Authored-By: ddhangdd <dfung2@wisc.edu> * test(sweep): add unit and integration tests for sweep_grasp utilities Covers grid building, combo hashing, CSV resume parsing, output directory naming, and end-to-end single-config runs for GRU and ConCare on synthetic data (13 tests, all passing). Co-Authored-By: Colton Loew <loewcx@illinois.edu> Co-Authored-By: lookman-olowo <lookman-olowo@github.com> Co-Authored-By: christiana-beard <christiana-beard@github.com> Co-Authored-By: ddhangdd <ddhangdd@github.com> * docs(sweep): add tmux copy-paste instructions for each paper run Co-Authored-By: Colton Loew <loewcx@illinois.edu> Co-Authored-By: lookman-olowo <lookman-olowo@github.com> Co-Authored-By: christiana-beard <christiana-beard@github.com> Co-Authored-By: ddhangdd <ddhangdd@github.com> * chore(examples): adds cleans examples, removes util script * Delete tests/core/test_grasp.py we removed grasp script from examples, dropped test * Revert "Delete tests/core/test_grasp.py" This reverts commit 0d95758. * fix: remove orphaned sweep test, restore grasp tests * feat(grasp): add static_key support for demographic features with tests * fix(test): add valid NDC to test prescriptions so readmit test produces both labels --------- Co-authored-by: lookman-olowo <lookmanolowo@hotmail.com> Co-authored-by: christiana-beard <christyanamarie116@gmail.com> Co-authored-by: ddhangdd <dfung2@wisc.edu> Co-authored-by: Lookman Olowo <42081779+lookman-olowo@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: ddhangdd <43976109+ddhangdd@users.noreply.github.com> Co-authored-by: ddhangdd <desmondfung123@gmail.com> Co-authored-by: Colton Loew <loewcx@illinois.edu> Co-authored-by: lookman-olowo <lookman-olowo@github.com> Co-authored-by: christiana-beard <christiana-beard@github.com> Co-authored-by: ddhangdd <ddhangdd@github.com> Co-authored-by: lookman-olowo <lookman-olowo@users.noreply.github.com>

…peline docs, and tests

…LEMON CSV generation

…LEMON data, expand pipeline docs with dataset processing section

…ention

…to match EEGGCNNDiseaseDetection class name

Completely rewrote the pipeline doc to cover both data paths (raw EEG and FigShare pre-computed), both models (GCN float32, GAT float64), all signal-processing steps, output schema, training/heldout-eval instructions, adjacency ablation results (α ∈ {0.0, 0.5, 0.75, 1.0}), band ablation usage, and complete API reference for all six classes. Also adds the pre_compute_gcnn.py / pre_compute_gatcnn.py quick-start commands that were missing from the previous version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adjacency comparison: remove alpha=0.75 (not run), keep only the three actual configs — functional (0.0), combined (0.5), spatial (1.0). Combined (paper default) is now correctly marked as best. Band ablation: rename section to "Spectral Frequency Analysis" and describe the leave-one-out / keep-one-in methodology that run_band_ablation.py actually performs (13 conditions, inference-only). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… comparison Add separate GCN and GAT result tables for all four alpha configs (0.0, 0.5, 0.75, 1.0) with AUC, Youden's J, Bal. Acc, and Recall. Add Paper Table 2 comparison row (Shallow EEG-GCNN: AUC=0.90±0.02, Bal.Acc=0.83±0.02). Update narrative: GCN α=0.75 is best, GCN beats GAT by ~6% AUC across all configs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add full 13-condition LOO/KOI results table (AUC, Bal.Acc, F1, mean±std across 10 folds). Key finding: LOO low_gamma causes largest AUC drop (0.898→0.594); KOI low_gamma is best single-band (AUC=0.656), confirming low gamma (30–50 Hz) is the most discriminative band. Update code examples to highlight low_gamma as the band of interest. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… slides LOO delta: 0.772→0.802 AUC, 0.728→0.746 Bal.Acc LOO high gamma: completely wrong (0.623→0.785 AUC, 0.605→0.737 Bal.Acc, 0.644→0.800 F1) KOI delta/theta/alpha/beta: all rows were shifted/swapped, now corrected KOI low gamma: 0.656→0.723 AUC, 0.633→0.682 Bal.Acc, 0.840→0.781 F1 KOI beta F1: 0.915→0.856, Bal.Acc std: 0.083→0.138 Key finding text: KOI low gamma AUC updated to 0.723 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

GCN F1: functional=0.493, combined=0.503, spatial-heavy=0.508, spatial=0.481 GAT F1: functional=0.521, combined=0.484, spatial-heavy=0.473, spatial=0.475 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ing has been removed.

…ocumentation.

…ents to include in the precompute step.

…nloaded recently have a different external directory numbering

- Clarify pipeline diagram: two independent paths (raw EEG vs FigShare), both feeding EEGGCNNDataset + EEGGCNNClassification with the same 5-array schema. Adds a path comparison table. - Data Sources: split into "Path A — Raw inputs" (TUAB, LEMON) and "Path B — Pre-computed input" (FigShare); each entry now labelled raw vs precomputed. - Training (GCN): clarify it works with either Path A output or FigShare download. - Adjacency ablation: remove numerical results tables, keep methodology only; add α=0.25 to the configs (5 total: 0.0, 0.25, 0.5, 0.75, 1.0). - Spectral analysis: remove numerical results table, keep methodology only. - Heldout evaluation: remove paper-matching results block, point to slides. All numerical results now live in the project slides. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…Health into eeg-gcnn-contribution

…D model (sunlabuiuc#981)

* dl4h final project kobeguo2 - CaliForest * Update CaliForest to require explicit fit before inference * Remove unused logit_scale from CaliForest

- Update Robert Coffey's NetID from racoffey2 to rc37 - Rename heading from 'Signal Processing (EEGGCNNDiseaseDetection)' to 'Signal Processing'; add note clarifying it is implemented in both EEGGCNNRawDataset.precompute_features() (batch) and EEGGCNNDiseaseDetection.__call__() (streaming) - Correct PSD band names and ranges to match actual code and FigShare arrays: delta/theta/alpha/beta/low_gamma/high_gamma with boundaries matching _BAND_RANGES (7.5, 13.0, 30.0, 40.0 Hz) - Attribute graph adjacency alpha formula to EEGGCNNDataset where it lives - Fix adjacency description: 'geodesic distance' (not 'arc length on unit sphere') - Remove non-existent download_lemon.py reference from Quick Start Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Brings in post-PR-#3 changes from the EEG-GCNN pipeline branch: - docs: updated eeg_gcnn_pipeline.rst (band names, adjacency attribution, EEGGCNNDiseaseDetection vs EEGGCNNRawDataset clarification) - examples: renamed scripts to required convention (eeg_gcnn_classification_gcn_training.py, _evaluation.py, etc.) - examples: added sample raw + precomputed data for quick-start testing - tests: consolidated 41-test suite into tests/core/test_eeg_gcnn_dataset.py (merged raw-path tests, fixed stale import, removed tests/test_eeg_gcnn.py) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lehendo and others added 30 commits March 26, 2026 19:04

concare fix (sunlabuiuc#920)

f2aa75c

Bypassing a PR review, because of speed/reviewer bottleneck reasons.

concare fix (sunlabuiuc#920)

ed56212

Bypassing a PR review, because of speed/reviewer bottleneck reasons.

fix pixi warning and version format for backend (sunlabuiuc#917)

ff9b90e

fix pixi warning and version format for backend (sunlabuiuc#917)

e857c91

fix: map connectivity_measure to mne_connectivity method names

fcd23fb

"coherence" must be passed as "coh" to spectral_connectivity_epochs.

Merge branch 'master' into eeg-gcnn-contribution

00426f8

Initial dataset, config and task, prepared for edge weight and freque…

86c8620

…ncy band ablations.

Initial dataset, config and task, prepared for edge weight and freque…

8b0134a

…ncy band ablations.

Approach adjusted so frequency bands are sampled on inference only an…

3451eca

…d not also on training to reduce the time needed

Approach adjusted so frequency bands are sampled on inference only an…

7da622a

…d not also on training to reduce the time needed

Merge branch 'sunlabuiuc:master' into roberts

7ce58e4

Merge branch 'sunlabuiuc:master' into roberts

e8c86b0

Models added and documentation updates made to dataset and classifier.

af067e6

Models added and documentation updates made to dataset and classifier.

72cb61a

heldout_test_run_gatcnn and training_pipeline_short_gatcnn added to e…

243a148

…xamples/eeg_gatcnn

heldout_test_run_gatcnn and training_pipeline_short_gatcnn added to e…

db2af41

…xamples/eeg_gatcnn

Training_pipeline_shallow_gcnn, heldout_test_run_gcnn and run_band_ab…

4223064

…lation added.

Training_pipeline_shallow_gcnn, heldout_test_run_gcnn and run_band_ab…

4a8b13d

…lation added.

Jimmy Burhan and others added 5 commits April 14, 2026 13:17

Merge Robert's latest: model docs, model tests, dataset tests

55c6c57

Model, training, and holdout scripts updated. Docs and tests added.

a074d3f

Merge Robert's latest (a074d3f): updated models, training scripts, pi…

59c1f69

…peline docs, and tests

fix: correct prepare_metadata test to use EEGGCNNRawDataset for TUAB/…

39652bb

…LEMON CSV generation

jburhan self-assigned this Apr 18, 2026

Jimmy Burhan and others added 22 commits April 18, 2026 20:10

Merge master into branch: keep our newer versions of all EEG-GCNN files

b3333e8

feat: add pre-compute pipeline scripts for GCN and GAT with raw TUAB/…

fc006dd

…LEMON data, expand pipeline docs with dataset processing section

fix: align EEGGCNNDiseaseDetection task_name to match class name conv…

f2c9315

…ention

refactor: rename eeg_gcnn_nd_detection -> eeg_gcnn_disease_detection …

8f9bfec

…to match EEGGCNNDiseaseDetection class name

docs: add F1 column to adjacency ablation tables for GCN and GAT

4fc8214

GCN F1: functional=0.493, combined=0.503, spatial-heavy=0.508, spatial=0.481 GAT F1: functional=0.521, combined=0.484, spatial-heavy=0.473, spatial=0.475 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adjustments made to save output files. Also end to end pipeline check…

e215fd7

…ing has been removed.

Corrected labelling of TUAB normal from health to diseased. Updated d…

62de98b

…ocumentation.

Adding the possibility to configure the number of TUAB and LEMON pati…

4de0c95

…ents to include in the precompute step.

Correction made to the name convention for the LEMON files. Those dow…

7121111

…nloaded recently have a different external directory numbering

Name of pre_compute_gcnn.py changed to pre-compute.py

d8e38ff

Introduced a new LR approach to the GAT model and deleted two old files.

30c2f64

Merge branch 'eeg-gcnn-contribution' of https://github.com/jburhan/Py…

04f808a

…Health into eeg-gcnn-contribution

Add PhysioNet De-Identification dataset, NER task, and TransformerDeI…

5f63039

…D model (sunlabuiuc#981)

dl4h final project kobeguo2 - CaliForest (sunlabuiuc#999)

144c06a

* dl4h final project kobeguo2 - CaliForest * Update CaliForest to require explicit fit before inference * Remove unused logit_scale from CaliForest

Merge branch 'master' into eeg-gcnn-contribution

2aa7d69

jburhan merged commit 9e411cb into master Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EEG-GCNN: Dataset, Task, Models, and Neurological Disease Detection Pipeline#3

EEG-GCNN: Dataset, Task, Models, and Neurological Disease Detection Pipeline#3
jburhan merged 65 commits intomasterfrom
eeg-gcnn-contribution

jburhan commented Apr 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

jburhan commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contributors

Type of Contribution

Paper

Summary

Dataset & Task (Jimmy Burhan — jburhan2)

Models & Training (Robert Coffey — rc37)

Ablation Studies (Original Extensions)

Ablation 1 — Adjacency Type (α sweep, GCN)

Ablation 2 — Spectral Frequency Analysis (GCN, inference-only)

Best Results (GAT, tuned learning rate)

File Guide

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

jburhan commented Apr 18, 2026 •

edited

Loading