OmicsGraphNet

Graph-Attentive Multi-Omics Fusion for Blood-Based PTSD Classification Across Military and Civilian Cohorts

Overview

OmicsGraphNet integrates six molecular feature blocks from whole blood across eight publicly available GEO cohorts (n = 1,173 samples) into a unified multi-omics GNN classifier for PTSD. The model chains:

Per-modality β-VAE encoders (β = 4.0, z = 64) for robust latent compression of each feature block
Multi-head cross-modal attention (4 heads) for learned modality fusion with soft per-modality gates
GATv2 graph attention over the STRING v12 PPI knowledge graph (14,690 nodes, 191,324 edges)
Multi-task decoder for simultaneous PTSD classification and PCL-5 severity regression
Monte Carlo Dropout (T = 50 passes) for predictive uncertainty quantification

Evaluation uses six-fold Leave-One-Cohort-Out cross-validation (LOCO-CV), the most stringent generalisation benchmark for blood-based PTSD biomarker discovery.

Key Results

Model	AUC (mean ± SD)	F1	Sensitivity	Specificity
OmicsGraphNet	0.603 ± 0.176	0.132	0.105	0.907
ElasticNet	0.541 ± 0.040	0.324	0.351	0.706
Random Forest	0.548 ± 0.051	0.041	0.031	0.970
XGBoost	0.528 ± 0.041	0.295	0.360	0.678
ConcatMLP	0.542 ± 0.043	0.403	0.546	0.477

Best fold: GSE164877 (WTC First Responders 2) — AUC = 0.897

Datasets

All data are publicly available from NCBI GEO. No patient-level data are included in this repository.

GEO Accession	Cohort	n	Assay	LOCO Role
GSE97356	WTC First Responders	324	RNA-seq	Fold 1
GSE164877	WTC First Responders 2	226	RNA-seq	Fold 2 (best)
GSE63878	Marines MRS	96	Microarray	Fold 3
GSE81761	Military Service Members	66	RNA-seq	Fold 4
GSE109409	Canadian Infantry	85	RNA-seq	Fold 5
GSE64813	US Service Members	188	RNA-seq	Fold 6
GSE67663	PTSD+MDD Comorbid	184	Microarray	Fixed holdout
GSE860	ER Trauma Survivors	33	Microarray	Fixed civilian holdout

Architecture

Input (537 features)           β-VAE Encoders    Cross-Modal Attn    GATv2 PPI Graph    Decoder
──────────────────             ──────────────    ────────────────    ───────────────    ───────
Block A: Transcriptomics (500)  VAE (z=64) ──┐
Block B: EWAS CpG proxies  (12) VAE (z=64) ──┤
Block C: Cell-type fracs   (11) VAE (z=64) ──┤  MHA (4 heads) ──► GATv2 (2 layers) ──► PTSD prob
Block D: PRS                (1) VAE (z=64) ──┤  Soft gates         14,690 nodes          PCL-5 score
Block E: TWAS brain genes  (10) VAE (z=64) ──┤  Query vector       191,324 edges
Block F: Clinical           (3) VAE (z=64) ──┘

Total parameters: 896,413
Loss: L_CE  +  β · KL  +  λ_r · MSE_recon  +  λ_s · MSE_PCL5
MC Dropout  (p = 0.25,  T = 50)  for uncertainty quantification

Installation

# 1. Create environment
conda create -n deepomics python=3.11 -y
conda activate deepomics

# 2. Install PyTorch (CPU; for GPU see https://pytorch.org/get-started/locally/)
pip install torch==2.2.2 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# 3. Install dependencies
pip install -r requirements.txt

# 4. Install PyTorch Geometric
pip install torch-scatter torch-sparse torch-cluster torch-geometric --no-build-isolation

No GPU required. Training one fold takes ~2 hours on an M2 MacBook Pro (CPU).

Quick Start — Inference with Pre-trained Checkpoints

The checkpoints/ directory contains 11 trained models (~3.5 MB each). Run inference immediately after completing the data pipeline (Steps 1–6):

python predict.py \
  --checkpoint checkpoints/omicsgraphnet_fold2_GSE164877.pt \
  --features   data/processed/feature_matrix.csv \
  --labels     data/processed/sample_labels.csv \
  --ppi_edges  data/processed/ppi_graph_edges.csv \
  --ppi_nodes  data/processed/ppi_node_features.csv \
  --output     predictions.csv

Output: sample_id · ptsd_prob · ptsd_pred · uncertainty_std · uncertainty_entropy · [true_label]

Checkpoint	Test cohort	AUC	Use for
`omicsgraphnet_fold2_GSE164877.pt`	WTC FR 2	0.897	Best overall
`omicsgraphnet_fold1_GSE97356.pt`	WTC FR 1	~0.60	WTC-independent
`omicsgraphnet_fold5_GSE109409.pt`	Canadian infantry	~0.55	Military population

Full Pipeline

# 1. Download all 8 GEO cohorts (~2 GB)
python data/download_geo.py

# 2. Extract and standardise PTSD labels
python data/fix_labels.py

# 3. Normalise, ComBat batch-correct, run SVA
python src/preprocessing/normalize_expression.py

# 4. NNLS cell-type deconvolution (Hwang 2025 atlas)
python src/preprocessing/run_deconvolution.py

# 5. Build 537-feature matrix (6 blocks)
python src/preprocessing/build_feature_matrix.py

# 6. Build STRING v12 PPI graph (auto-downloads ~500 MB)
python src/preprocessing/build_ppi_graph.py --download_string

# 7. LOCO-CV training — 6 folds, 200 epochs each
python src/training/train_omicsgraphnet.py --epochs 200 --cv loco

# 8. Integrated Gradients biomarker attribution
python src/evaluation/interpret.py \
       --checkpoint checkpoints/omicsgraphnet_fold2_GSE164877.pt

# 9. Statistical tests (DeLong AUC CI, McNemar, bootstrap)
python src/evaluation/statistical_tests.py

# 10. Generate figures
python make_figures.py

Adapting to Your Own Data

OmicsGraphNet can be applied to any binary phenotype with whole-blood transcriptomics.

Feature matrix format — CSV with samples as rows, 537 columns in block order:

Block A (cols   0–499):  500 transcriptomics features (log2-TPM, ComBat-corrected)
Block B (cols 500–511):  12  EWAS CpG proxy genes
Block C (cols 512–522):  11  NNLS cell-type fractions
Block D (col  523    ):   1  PRS value (0 if unavailable)
Block E (cols 524–533):  10  TWAS PrediXcan z-scores
Block F (cols 534–536):   3  clinical (PCL-5, sex, platform)

See data/processed/feature_metadata.csv for the full column list. Unavailable blocks can be zero-filled — the cross-modal attention gate will down-weight them automatically.

Fine-tuning a checkpoint:

import torch
from src.models.omics_graph_net import OmicsGraphNet

ckpt  = torch.load("checkpoints/omicsgraphnet_fold2_GSE164877.pt", map_location="cpu", weights_only=False)
model = OmicsGraphNet(block_dims=ckpt["block_dims"])
model.load_state_dict(ckpt["model_state"])

optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4, weight_decay=1e-4)
# standard PyTorch training loop

Project Structure

OmicsGraphNet/
├── data/
│   ├── download_geo.py          # Download 8 GEO cohorts via GEOparse
│   ├── fix_labels.py            # Cohort-specific PTSD label extraction
│   ├── processed/               # Generated outputs (git-ignored; run pipeline to create)
│   └── external/
│       ├── ewas/                # PTSD EWAS CpG target genes
│       ├── prs/                 # PGC Freeze 3 GWAS loci + PRSice2/LDPred2 templates
│       ├── ppi/                 # STRING v12 protein aliases
│       └── singlecell/          # Hwang 2025 cell-type marker signatures
├── src/
│   ├── models/
│   │   ├── omics_graph_net.py   # OmicsGraphNet: β-VAE → Attention → GATv2
│   │   └── baselines.py         # ElasticNet, SVM-RFE, RF, XGBoost, ConcatMLP, MOLI, MOGONET
│   ├── preprocessing/
│   │   ├── normalize_expression.py  # VST + ComBat + SVA
│   │   ├── run_deconvolution.py     # NNLS cell-type deconvolution (11 types)
│   │   ├── build_feature_matrix.py  # Assembles 537-feature 6-block matrix
│   │   └── build_ppi_graph.py       # STRING v12 → PyG graph
│   ├── training/
│   │   └── train_omicsgraphnet.py   # LOCO-CV, AdamW, CosineAnnealingLR, early stopping
│   ├── evaluation/
│   │   ├── interpret.py             # Integrated Gradients + cross-modal attention
│   │   └── statistical_tests.py     # DeLong, McNemar, bootstrap, BH-FDR
│   └── visualization/
│       └── make_figures.py          # Figure utilities (Okabe-Ito palette, 600 DPI)
├── checkpoints/                 # 11 pre-trained models (~3.5 MB each)
├── predict.py                   # Inference with MC Dropout uncertainty
├── make_figures.py              # Publication figures (Fig 1–3)
├── run_loco_cv_folds.sh         # Run all 6 LOCO folds sequentially
├── requirements.txt
├── CITATION.cff
├── LICENSE
└── README.md

Key Biomarkers

Top features by Integrated Gradients (GSE164877 fold, AUC = 0.897):

Rank	Feature	Block	Mean \|IG\|	Biological context
1	Neutrophil fraction	C — Cell-type	6.3 × 10⁻⁵	Elevated NLR in PTSD (innate immunity)
2	PV interneuron score	C — Cell-type	4.8 × 10⁻⁵	GABAergic interneuron proxy
3	NK cell fraction	C — Cell-type	4.8 × 10⁻⁵	NK functional deficit (glucocorticoid)
4	Astrocyte score	C — Cell-type	3.8 × 10⁻⁵	Neuroinflammatory proxy (cross-tissue)
5	IFI44	A — Transcriptomics	1.4 × 10⁻⁶	ISG15 / type-I interferon pathway
6	RSAD2	A — Transcriptomics	1.2 × 10⁻⁶	Viperin / type-I interferon pathway

Full ranked list: results/interpretation/saliency_top_genes.csv

Pending Data Access

Item	Status	Source
Polygenic risk scores (Block D)	Zero-imputed placeholder	dbGaP: phs000864, phs000560, phs002046
EWAS methylation (Block B)	Expression-proxy (12 surrogates)	PGC PTSD EWAS working group
Hwang 2025 atlas (.h5ad)	Required for deconvolution	Zenodo: 10.5281/zenodo.15186498
STRING v12	Auto-downloaded	`build_ppi_graph.py --download_string`

Reproducibility

All analyses run on a standard macOS workstation (Apple M-series, 16 GB RAM). No HPC or GPU required. Seeds: torch.manual_seed(42), np.random.seed(42).

Versions: Python 3.11 · PyTorch 2.2.2 · PyG 2.7.0 · scikit-learn 1.3 · neuroCombat 0.9.1 · GEOparse 2.0.4

Citation

@article{roy2026omicsgraphnet,
  title   = {{OmicsGraphNet}: Graph-Attentive Multi-Omics Fusion for Blood-Based {PTSD}
             Classification Across Military and Civilian Cohorts},
  author  = {Roy, Kushal Raj},
  year    = {2026}
}

License

MIT — see LICENSE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmicsGraphNet

Overview

Key Results

Datasets

Architecture

Installation

Quick Start — Inference with Pre-trained Checkpoints

Full Pipeline

Adapting to Your Own Data

Project Structure

Key Biomarkers

Pending Data Access

Reproducibility

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
checkpoints		checkpoints
data		data
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
make_figures.py		make_figures.py
predict.py		predict.py
requirements.txt		requirements.txt
run_loco_cv_folds.sh		run_loco_cv_folds.sh

Folders and files

Latest commit

History

Repository files navigation

OmicsGraphNet

Overview

Key Results

Datasets

Architecture

Installation

Quick Start — Inference with Pre-trained Checkpoints

Full Pipeline

Adapting to Your Own Data

Project Structure

Key Biomarkers

Pending Data Access

Reproducibility

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages