Skip to content

EEG-GCNN: Dataset, Task, Models, and Neurological Disease Detection Pipeline#3

Merged
jburhan merged 65 commits intomasterfrom
eeg-gcnn-contribution
Apr 20, 2026
Merged

EEG-GCNN: Dataset, Task, Models, and Neurological Disease Detection Pipeline#3
jburhan merged 65 commits intomasterfrom
eeg-gcnn-contribution

Conversation

@jburhan
Copy link
Copy Markdown
Owner

@jburhan jburhan commented Apr 18, 2026

Contributors

  • Jimmy Burhan — jburhan2 (Dataset + Task)
  • Robert Coffey — rc37 (Models + Training)

Type of Contribution

Option 4: Full Pipeline — Dataset + Task + Model (60 pts)

Paper

Wagh & Varatharajah (2020) — EEG-GCNN: Augmenting Electroencephalogram-based Neurological Disease Diagnosis using a Domain-guided Graph Convolutional Neural Network, ML4H @ NeurIPS 2020

Summary

End-to-end EEG-GCNN pipeline for neurological disease detection from EEG recordings, replicating and extending Wagh & Varatharajah (2020).

Dataset & Task (Jimmy Burhan — jburhan2)

  • EEGGCNNRawDataset — raw EEG loader for TUAB (1,385 clinically-normal patients, label 0) + LEMON (208 healthy volunteers, label 1): resamples to 250 Hz, 1 Hz high-pass + 60/50 Hz notch filter, 8 bipolar channels, 10 s non-overlapping windows, 6-band Welch PSD features (delta/theta/alpha/beta/low_gamma/high_gamma), writes 5 FigShare-format arrays via precompute_features()
  • EEGGCNNDataset — FigShare pre-computed dataset (1,593 subjects, 225,334 windows); builds graph adjacency via tunable blend of geodesic distance and spectral coherence: edge_weight = α·geodesic + (1−α)·coherence
  • EEGGCNNDiseaseDetection — per-sample streaming task with configurable adjacency type (spatial / functional / combined) and band subset; drives both ablation studies
  • EEGGCNNClassification — window-level classification task for FigShare data

Models & Training (Robert Coffey — rc37)

  • EEGGraphConvNet — two-layer GCN classifier (reproduces paper's Shallow EEG-GCNN, float32)
  • EEGGATConvNet — two-layer graph attention network classifier (float64)
  • Training pipelines, holdout evaluation scripts, band frequency ablation script
  • 23 unit tests (synthetic fixtures, no real EEG required)

Ablation Studies (Original Extensions)

Ablation 1 — Adjacency Type (α sweep, GCN)

Varied α ∈ {0.0, 0.25, 0.5, 0.75, 1.0} on the full FigShare dataset (477 test patients, 70/30 split, 10-fold CV). Paper only uses α=0.5 — this is our novel contribution:

α auroc_patient bal_acc f1
0.0 (coherence only) 0.8841 ± 0.0132 0.8087 ± 0.0130 0.8222 ± 0.0176
0.25 0.8943 ± 0.0072 0.8069 ± 0.0122 0.8385 ± 0.0162
0.50 (paper default) 0.8984 ± 0.0102 0.8276 ± 0.0122 0.8399 ± 0.0281
0.75 0.8991 ± 0.0066 0.8209 ± 0.0051 0.8491 ± 0.0323
1.0 (geodesic only) 0.7891 ± 0.0471 0.7406 ± 0.0349 0.8211 ± 0.0599

α=1.0 (spatial-only) collapses performance — functional connectivity is essential. α=0.75 best for auroc_patient and F1; α=0.50 best for balanced accuracy — both within each other's error margin.

Ablation 2 — Spectral Frequency Analysis (GCN, inference-only)

Leave-one-out (LOO) and keep-one-in (KOI) across all 6 bands using trained GCN checkpoints — no retraining required. Low gamma is the single most important band (removing it drops AUROC from 0.898 → 0.594; keeping only it recovers 0.723). All bands make a meaningful contribution — none is redundant. Numerical results in project slides.

Best Results (GAT, tuned learning rate)

auroc_patient bal_acc f1 precision recall
0.9415 ± 0.0251 0.8854 ± 0.0322 0.9201 ± 0.0231 0.9855 ± 0.0089 0.8638 ± 0.0409

GAT with tuned LR outperforms GCN on every patient-level metric. End-to-end run on locally pre-computed features matched FigShare-trained results.

File Guide

File Description
pyhealth/datasets/eeg_gcnn.py EEGGCNNDataset — FigShare pre-computed loader
pyhealth/datasets/eeg_gcnn_raw.py EEGGCNNRawDataset — raw TUAB + LEMON loader with precompute_features()
pyhealth/datasets/configs/eeg_gcnn.yaml FigShare dataset config
pyhealth/datasets/configs/eeg_gcnn_raw.yaml Raw dataset config
pyhealth/tasks/eeg_gcnn_classification.py EEGGCNNClassification — window-level task for FigShare path
pyhealth/tasks/eeg_gcnn_disease_detection.py EEGGCNNDiseaseDetection — streaming task for raw path
pyhealth/models/eeg_gcnn.py EEGGraphConvNet (GCN)
pyhealth/models/eeg_gatcnn.py EEGGATConvNet (GAT)
examples/eeg_gcnn/pre_compute.py Batch precompute: raw TUAB + LEMON → 5 FigShare-format arrays
examples/eeg_gcnn/training_pipeline_shallow_gcnn.py GCN training (10-fold CV)
examples/eeg_gcnn/heldout_test_run_gcnn.py GCN holdout evaluation (Youden's J threshold)
examples/eeg_gcnn/run_band_ablation.py Spectral frequency ablation (LOO + KOI, 13 conditions)
examples/eeg_gatcnn/pre_compute_gatcnn.py Batch precompute for GAT (float64)
examples/eeg_gatcnn/training_pipeline_shallow_gatcnn.py GAT training (tuned LR)
examples/eeg_gatcnn/heldout_test_run_gatcnn.py GAT holdout evaluation
pyhealth/tests/test_eeg_gcnn.py 23 unit tests: dataset, task, models (all synthetic data)
docs/api/eeg_gcnn_pipeline.rst Full pipeline documentation
docs/api/datasets/pyhealth.datasets.EEGGCNNDataset.rst API doc — FigShare dataset
docs/api/datasets/pyhealth.datasets.EEGGCNNRawDataset.rst API doc — raw dataset
docs/api/models/pyhealth.models.EEGGraphConvNet.rst API doc — GCN model
docs/api/models/pyhealth.models.EEGGATConvNet.rst API doc — GAT model
docs/api/tasks/pyhealth.tasks.EEGGCNNClassification.rst API doc — classification task
docs/api/tasks/pyhealth.tasks.eeg_gcnn_disease_detection.rst API doc — disease detection task

Test Plan

pytest pyhealth/tests/test_eeg_gcnn.py -v
# 23 tests, all passing, ~12 seconds, no real EEG data required

lehendo and others added 30 commits March 26, 2026 19:04
* Fixed repo to be able to run TUEV/TUAB + updated example scripts

* Args need to be passed correctly

* Minor fixes and precomputed STFT logic

* Fix the test files to reflect codebase changes

* Args update

* test script fixes

* dataset path update

* fix contrawr - small change

* divide by 0 error

* Incorporate tfm logic

* Fix label stuff

* tuab fixes

* fix metrics

* aggregate alphas

* Fix splitting and add tfm weights

* fix tfm+tuab

* updates scripts and haoyu splitter

* fix conflict

* Remove weightfiles from tracking and add to .gitignore

Weight files are large binaries distributed separately; untrack all
existing .pth files under weightfiles/ and add weightfiles/ to
.gitignore so they are excluded from future commits and the PR.

Made-with: Cursor
* Fixed repo to be able to run TUEV/TUAB + updated example scripts

* Args need to be passed correctly

* Minor fixes and precomputed STFT logic

* Fix the test files to reflect codebase changes

* Args update

* test script fixes

* dataset path update

* fix contrawr - small change

* divide by 0 error

* Incorporate tfm logic

* Fix label stuff

* tuab fixes

* fix metrics

* aggregate alphas

* Fix splitting and add tfm weights

* fix tfm+tuab

* updates scripts and haoyu splitter

* fix conflict

* Remove weightfiles from tracking and add to .gitignore

Weight files are large binaries distributed separately; untrack all
existing .pth files under weightfiles/ and add weightfiles/ to
.gitignore so they are excluded from future commits and the PR.

Made-with: Cursor
…uiuc#904)

* feat: add optional dependency groups for graph and NLP extras (sunlabuiuc#890)

Add [project.optional-dependencies] to pyproject.toml so users can
install domain-specific dependencies via pip extras:

  pip install pyhealth[graph]   # torch-geometric for GraphCare, KG
  pip install pyhealth[nlp]     # editdistance, rouge_score, nltk

The codebase already uses try/except ImportError with HAS_PYG flags
for torch-geometric, and the NLP metrics define their required
versions in each scorer class. This change exposes those dependencies
through standard Python packaging so pip can resolve them.

Version pins match the requirements declared in the code:
- editdistance~=0.8.1 (pyhealth/nlp/metrics.py:356)
- rouge_score~=0.1.2 (pyhealth/nlp/metrics.py:415)
- nltk~=3.9.1 (pyhealth/nlp/metrics.py:397)
- torch-geometric>=2.6.0 (compatible with PyTorch 2.7)

Closes sunlabuiuc#890

* fix: move optional-dependencies after scalar fields to fix TOML structure

Move [project.optional-dependencies] from between dependencies and
license (line 49) to after keywords (line 62), before [project.urls].

In TOML, a sub-table header like [project.optional-dependencies]
closes the parent [project] table, so placing it before license and
keywords caused those fields to be excluded from [project]. This
broke CI validation.

Verified with tomllib that all project fields (name, license,
keywords, optional-dependencies, urls) parse correctly under
[project].
…uiuc#904)

* feat: add optional dependency groups for graph and NLP extras (sunlabuiuc#890)

Add [project.optional-dependencies] to pyproject.toml so users can
install domain-specific dependencies via pip extras:

  pip install pyhealth[graph]   # torch-geometric for GraphCare, KG
  pip install pyhealth[nlp]     # editdistance, rouge_score, nltk

The codebase already uses try/except ImportError with HAS_PYG flags
for torch-geometric, and the NLP metrics define their required
versions in each scorer class. This change exposes those dependencies
through standard Python packaging so pip can resolve them.

Version pins match the requirements declared in the code:
- editdistance~=0.8.1 (pyhealth/nlp/metrics.py:356)
- rouge_score~=0.1.2 (pyhealth/nlp/metrics.py:415)
- nltk~=3.9.1 (pyhealth/nlp/metrics.py:397)
- torch-geometric>=2.6.0 (compatible with PyTorch 2.7)

Closes sunlabuiuc#890

* fix: move optional-dependencies after scalar fields to fix TOML structure

Move [project.optional-dependencies] from between dependencies and
license (line 49) to after keywords (line 62), before [project.urls].

In TOML, a sub-table header like [project.optional-dependencies]
closes the parent [project] table, so placing it before license and
keywords caused those fields to be excluded from [project]. This
broke CI validation.

Verified with tomllib that all project fields (name, license,
keywords, optional-dependencies, urls) parse correctly under
[project].
* init commit

* RNN memory fix

* add example scripts here

* more bug fixes?

* commit to see new changes

* add test cases

* fix basemodel leakage of args

* fixes to tests and examples

* more examples

* reduce unnecessary checks, enable crashing on when a cache is invalid

* fix nested sequence rnn problems

* fixes for the concare and transformer model exploding in memory

* fix concare merge conflict again

* fix for 3D channel for CNN

* update and delete defunct docs

* better loc comparisons and also a bunch of model fixes hopefully

* test case updates to match our bug fixes

* fix instability in calibration tests for CP


tldr; Fixes a variety of dataset loading, run bugs, splits for TUEV/TUAB, adds a good number of performance fixes for Transformer and Concare. We can always iterate on our fixes later.
* init commit

* RNN memory fix

* add example scripts here

* more bug fixes?

* commit to see new changes

* add test cases

* fix basemodel leakage of args

* fixes to tests and examples

* more examples

* reduce unnecessary checks, enable crashing on when a cache is invalid

* fix nested sequence rnn problems

* fixes for the concare and transformer model exploding in memory

* fix concare merge conflict again

* fix for 3D channel for CNN

* update and delete defunct docs

* better loc comparisons and also a bunch of model fixes hopefully

* test case updates to match our bug fixes

* fix instability in calibration tests for CP


tldr; Fixes a variety of dataset loading, run bugs, splits for TUEV/TUAB, adds a good number of performance fixes for Transformer and Concare. We can always iterate on our fixes later.
Bypassing a PR review, because of speed/reviewer bottleneck reasons.
Bypassing a PR review, because of speed/reviewer bottleneck reasons.
Implements the EEG-GCNN paper (Wagh & Varatharajah, ML4H @ NeurIPS 2020)
as a PyHealth 2.0 contribution with dataset, task, tests, docs, and
ablation study script.

New files:
- pyhealth/datasets/eeg_gcnn.py — EEGGCNNDataset (TUAB normal + MPI LEMON)
- pyhealth/datasets/configs/eeg_gcnn.yaml — YAML config
- pyhealth/tasks/eeg_gcnn_nd_detection.py — EEGGCNNDiseaseDetection task
  (PSD features, spatial/functional/combined adjacency, configurable bands)
- tests/test_eeg_gcnn.py — 23 tests using synthetic data
- examples/eeg_gcnn_nd_detection_gcn.py — 3 ablation experiments
- RST docs for dataset and task
"coherence" must be passed as "coh" to spectral_connectivity_epochs.
The ablation script can now run without real TUAB/LEMON data using
`--demo`. Generates 40 synthetic patients with reproducible random
PSD features and adjacency matrices, runs all 3 experiments through
the full GCN training pipeline.

Also fixes a bug where roc_auc_score failed on single-column y_prob
output from binary classification.
Rename task schema keys (psd_features→node_features, adjacency→adj_matrix)
to match the EEGGraphConvNet model interface. Add self-contained combined
pipeline script that runs both GCN and GAT models with demo mode, requiring
only torch + torch_geometric (no PyHealth install needed).
…sunlabuiuc#935)

The v2.0 MIMIC3Dataset/MIMIC4Dataset (based on BaseDataset) no longer
accepts code_mapping, dev, or refresh_cache parameters. These were
part of the legacy BaseEHRDataset API.

Update README.rst, example scripts, and leaderboard utilities to use
the current v2.0 API.

Note: task file docstrings and pyhealth/datasets/mimicextract.py
still reference code_mapping but are left for separate PRs since
mimicextract.py has not yet been migrated to v2.0.

Fixes sunlabuiuc#535
…sunlabuiuc#935)

The v2.0 MIMIC3Dataset/MIMIC4Dataset (based on BaseDataset) no longer
accepts code_mapping, dev, or refresh_cache parameters. These were
part of the legacy BaseEHRDataset API.

Update README.rst, example scripts, and leaderboard utilities to use
the current v2.0 API.

Note: task file docstrings and pyhealth/datasets/mimicextract.py
still reference code_mapping but are left for separate PRs since
mimicextract.py has not yet been migrated to v2.0.

Fixes sunlabuiuc#535
…d not also on training to reduce the time needed
…d not also on training to reduce the time needed
* rename arg name for chefer

* Initial attempts to fix the interpretability target_class_idx

* Support negative prediction for interpretability metric.

* Fix tests

* Fix more tests

* Revert "Support negative prediction for interpretability metric."

This reverts commit fe8c8ad.

* Reapply "Support all samples for interpretability metric"

* Initial attempt for the filter

* Fixup

* Fix sample_class handling

* fixup

* fix test

* Fix arg name

* Add example

* fix docs
Jimmy Burhan and others added 5 commits April 14, 2026 13:17
* feat: migrate GRASP model from PyHealth 1.0 to 2.0 API

Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* feat: add GRASP mortality prediction notebook and fix cluster_num

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* Restore code_mapping support in SequenceProcessor for PyHealth 2.0

Adds optional code_mapping parameter to SequenceProcessor that maps
granular medical codes to grouped vocabularies (e.g. ICD9CM→CCSCM)
before building the embedding table. Resolves the functional gap
from the 1.x→2.0 rewrite where code_mapping was removed. Ref sunlabuiuc#535

Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>

* Add RNN baseline and code_mapping comparison notebooks for MIMIC-III

Two identical notebooks for A/B testing code_mapping impact on mortality
prediction. Only difference is the schema override in Step 2. Both use
seed=42 for reproducible splits.

Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>

* fix(tasks): extract NDC codes instead of drug names for prescription mapping

event.drug returns drug names (e.g. "Aspirin") which produce zero matches
in CrossMap NDC→ATC; event.ndc returns actual NDC codes enabling 3/3
feature mapping for mortality and readmission tasks.

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* test(tasks): add tests verifying NDC extraction in drug tasks

Checks that mortality and readmission task processors build vocabulary
from NDC codes (numeric strings) rather than drug names (e.g. "Aspirin"),
confirming the event.drug -> event.ndc fix works correctly.

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* fix(tasks): fix missed MortalityPredictionMIMIC4 event.drug and update docs

- Fix event.drug -> event.ndc in MortalityPredictionMIMIC4 (line 282)
- Update readmission task docstrings to reflect NDC extraction

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* fix(tasks): fix DrugRecommendationMIMIC3 to extract NDC codes

DrugRecommendationMIMIC3 used prescriptions/drug (drug names) via Polars
column select; changed to prescriptions/ndc to match MIMIC-4 variant and
enable NDC->ATC code mapping.

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* fix(models): guard RNNLayer and ConCare against zero-length sequences

RNNLayer: clamp sequence lengths to min 1 so pack_padded_sequence
does not crash on all-zero masks, matching TCNLayer (tcn.py:186).

ConCare: guard covariance divisor with max(n-1, 1) to prevent
ZeroDivisionError when attention produces single-element features.

Both edge cases are triggered when code_mapping collapses vocabularies
and some patients have all codes map to <unk>, producing all-zero
embeddings and all-zero masks.

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* docs: add docstrings to SequenceProcessor class and fit method

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* docs: add docstrings, type hints, and fix test dims for GRASP module

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* feat: add GRASP mortality prediction notebooks for baseline and code_mapping

Baseline notebook runs GRASP with raw ICD-9/NDC codes. Code_mapping
notebook collapses vocab via ICD9CM→CCSCM, ICD9PROC→CCSPROC, NDC→ATC
for trainable embeddings on full MIMIC-III.

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* fix(models): guard ConCare and GRASP against batch_size=1 crashes

- ConCare FinalAttentionQKV: bare .squeeze() removed batch dim when
  batch_size=1, causing IndexError in softmax. Use .squeeze(-1) and
  .squeeze(1) to target only the intended dimensions.
- ConCare cov(): division by zero when x.size(1)==1. Guard with max().
- GRASP grasp_encoder: remove stale torch.squeeze(hidden_t, 0) that
  collapsed [1, hidden] to [hidden] with batch_size=1. Both RNNLayer
  and ConCareLayer already return [batch, hidden].
- GRASP random_init: clamp num_centers to num_points to prevent
  ValueError when cluster_num > batch_size.

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* feat: add GRASP mortality prediction notebooks for baseline and code_mapping

Baseline notebook runs GRASP with raw ICD-9/NDC codes. Code_mapping
notebook collapses vocab via ICD9CM→CCSCM, ICD9PROC→CCSPROC, NDC→ATC
for trainable embeddings on full MIMIC-III.

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* Add code_mapping as task __init__ argument

Allow tasks to accept a code_mapping dict that upgrades input_schema
entries so SequenceProcessor maps raw codes (e.g. ICD9CM) to grouped
vocabularies (e.g. CCSCM) at fit/process time. This avoids manual
schema manipulation after task construction.

- Add code_mapping parameter to BaseTask.__init__()
- Thread **kwargs + super().__init__() through all task subclasses
  with existing __init__ methods (4 readmission tasks, 1 multimodal
  mortality task)
- Add 17 tests covering SequenceProcessor mapping and task-level
  code_mapping initialization

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* Update code_mapping notebook to use task init argument

Replace manual task.input_schema override with the new
code_mapping parameter on MortalityPredictionMIMIC3().

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* feat(examples): add ConCare hyperparameter grid sweep script

Mirrors the GRASP+ConCare mortality notebook pipeline exactly
(same tables, split, seed, metrics) but sweeps 72 configurations
of embedding_dim, hidden_dim, cluster_num, lr, and weight_decay.

Results are logged to sweep_results.csv. Supports --root for
pointing at local MIMIC-III, --code-mapping, --dev, and --monitor.

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* chore(sweep): increase early stopping patience from 10 to 15 epochs

Smaller ConCare configs (embedding_dim=8/16) may learn slower and
need more epochs before plateauing.

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* Initial plan

* fix: filter falsy NDCs, guard None tokens in process(), fix NDC regex

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-authored-by: ddhangdd <43976109+ddhangdd@users.noreply.github.com>

* refactor(sweep): rename and generalize sweep script for all backbones

Rename sweep_concare_grasp.py → sweep_grasp.py. Now supports
--block GRU|ConCare|LSTM with per-backbone default grids, --resume
for crash recovery, --grid JSON override, auto-dated output dirs
(sweep/{BLOCK}_{YYYYMMDD}_{HHMMSS}_{mapping}/), and config.json
saved alongside results for reproducibility.

Co-Authored-By: Colton Loew <colton.loew@gmail.com>
Co-Authored-By: lookman-olowo <lookmanolowo@hotmail.com>
Co-Authored-By: christiana-beard <christyanamarie116@gmail.com>
Co-Authored-By: ddhangdd <dfung2@wisc.edu>

* test(sweep): add unit and integration tests for sweep_grasp utilities

Covers grid building, combo hashing, CSV resume parsing, output
directory naming, and end-to-end single-config runs for GRU and ConCare
on synthetic data (13 tests, all passing).

Co-Authored-By: Colton Loew <loewcx@illinois.edu>
Co-Authored-By: lookman-olowo <lookman-olowo@github.com>
Co-Authored-By: christiana-beard <christiana-beard@github.com>
Co-Authored-By: ddhangdd <ddhangdd@github.com>

* docs(sweep): add tmux copy-paste instructions for each paper run

Co-Authored-By: Colton Loew <loewcx@illinois.edu>
Co-Authored-By: lookman-olowo <lookman-olowo@github.com>
Co-Authored-By: christiana-beard <christiana-beard@github.com>
Co-Authored-By: ddhangdd <ddhangdd@github.com>

* chore(examples): adds cleans examples, removes util script

* Delete tests/core/test_grasp.py

we removed grasp script from examples, dropped test

* Revert "Delete tests/core/test_grasp.py"

This reverts commit 0d95758.

* fix: remove orphaned sweep test, restore grasp tests

* feat(grasp): add static_key support for demographic features with tests

* fix(test): add valid NDC to test prescriptions so readmit test produces both labels

---------

Co-authored-by: lookman-olowo <lookmanolowo@hotmail.com>
Co-authored-by: christiana-beard <christyanamarie116@gmail.com>
Co-authored-by: ddhangdd <dfung2@wisc.edu>
Co-authored-by: Lookman Olowo <42081779+lookman-olowo@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ddhangdd <43976109+ddhangdd@users.noreply.github.com>
Co-authored-by: ddhangdd <desmondfung123@gmail.com>
Co-authored-by: Colton Loew <loewcx@illinois.edu>
Co-authored-by: lookman-olowo <lookman-olowo@github.com>
Co-authored-by: christiana-beard <christiana-beard@github.com>
Co-authored-by: ddhangdd <ddhangdd@github.com>
Co-authored-by: lookman-olowo <lookman-olowo@users.noreply.github.com>
@jburhan jburhan self-assigned this Apr 18, 2026
Jimmy Burhan and others added 22 commits April 18, 2026 20:10
…LEMON data, expand pipeline docs with dataset processing section
…to match EEGGCNNDiseaseDetection class name
Completely rewrote the pipeline doc to cover both data paths (raw EEG
and FigShare pre-computed), both models (GCN float32, GAT float64),
all signal-processing steps, output schema, training/heldout-eval
instructions, adjacency ablation results (α ∈ {0.0, 0.5, 0.75, 1.0}),
band ablation usage, and complete API reference for all six classes.
Also adds the pre_compute_gcnn.py / pre_compute_gatcnn.py quick-start
commands that were missing from the previous version.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adjacency comparison: remove alpha=0.75 (not run), keep only the three
actual configs — functional (0.0), combined (0.5), spatial (1.0).
Combined (paper default) is now correctly marked as best.

Band ablation: rename section to "Spectral Frequency Analysis" and
describe the leave-one-out / keep-one-in methodology that run_band_ablation.py
actually performs (13 conditions, inference-only).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… comparison

Add separate GCN and GAT result tables for all four alpha configs
(0.0, 0.5, 0.75, 1.0) with AUC, Youden's J, Bal. Acc, and Recall.
Add Paper Table 2 comparison row (Shallow EEG-GCNN: AUC=0.90±0.02,
Bal.Acc=0.83±0.02). Update narrative: GCN α=0.75 is best, GCN beats
GAT by ~6% AUC across all configs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add full 13-condition LOO/KOI results table (AUC, Bal.Acc, F1, mean±std
across 10 folds). Key finding: LOO low_gamma causes largest AUC drop
(0.898→0.594); KOI low_gamma is best single-band (AUC=0.656), confirming
low gamma (30–50 Hz) is the most discriminative band. Update code examples
to highlight low_gamma as the band of interest.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… slides

LOO delta: 0.772→0.802 AUC, 0.728→0.746 Bal.Acc
LOO high gamma: completely wrong (0.623→0.785 AUC, 0.605→0.737 Bal.Acc, 0.644→0.800 F1)
KOI delta/theta/alpha/beta: all rows were shifted/swapped, now corrected
KOI low gamma: 0.656→0.723 AUC, 0.633→0.682 Bal.Acc, 0.840→0.781 F1
KOI beta F1: 0.915→0.856, Bal.Acc std: 0.083→0.138
Key finding text: KOI low gamma AUC updated to 0.723

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GCN F1: functional=0.493, combined=0.503, spatial-heavy=0.508, spatial=0.481
GAT F1: functional=0.521, combined=0.484, spatial-heavy=0.473, spatial=0.475

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nloaded recently have a different external directory numbering
- Clarify pipeline diagram: two independent paths (raw EEG vs FigShare),
  both feeding EEGGCNNDataset + EEGGCNNClassification with the same
  5-array schema. Adds a path comparison table.
- Data Sources: split into "Path A — Raw inputs" (TUAB, LEMON) and
  "Path B — Pre-computed input" (FigShare); each entry now labelled
  raw vs precomputed.
- Training (GCN): clarify it works with either Path A output or FigShare
  download.
- Adjacency ablation: remove numerical results tables, keep methodology
  only; add α=0.25 to the configs (5 total: 0.0, 0.25, 0.5, 0.75, 1.0).
- Spectral analysis: remove numerical results table, keep methodology only.
- Heldout evaluation: remove paper-matching results block, point to slides.

All numerical results now live in the project slides.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* dl4h final project kobeguo2 - CaliForest

* Update CaliForest to require explicit fit before inference

* Remove unused logit_scale from CaliForest
- Update Robert Coffey's NetID from racoffey2 to rc37
- Rename heading from 'Signal Processing (EEGGCNNDiseaseDetection)' to
  'Signal Processing'; add note clarifying it is implemented in both
  EEGGCNNRawDataset.precompute_features() (batch) and
  EEGGCNNDiseaseDetection.__call__() (streaming)
- Correct PSD band names and ranges to match actual code and FigShare
  arrays: delta/theta/alpha/beta/low_gamma/high_gamma with boundaries
  matching _BAND_RANGES (7.5, 13.0, 30.0, 40.0 Hz)
- Attribute graph adjacency alpha formula to EEGGCNNDataset where it lives
- Fix adjacency description: 'geodesic distance' (not 'arc length on unit sphere')
- Remove non-existent download_lemon.py reference from Quick Start

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jburhan jburhan merged commit 9e411cb into master Apr 20, 2026
jburhan pushed a commit that referenced this pull request Apr 21, 2026
Brings in post-PR-#3 changes from the EEG-GCNN pipeline branch:
- docs: updated eeg_gcnn_pipeline.rst (band names, adjacency attribution,
  EEGGCNNDiseaseDetection vs EEGGCNNRawDataset clarification)
- examples: renamed scripts to required convention
  (eeg_gcnn_classification_gcn_training.py, _evaluation.py, etc.)
- examples: added sample raw + precomputed data for quick-start testing
- tests: consolidated 41-test suite into tests/core/test_eeg_gcnn_dataset.py
  (merged raw-path tests, fixed stale import, removed tests/test_eeg_gcnn.py)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants