Single-Cell Multimodal Integration Pipeline for scRNA-seq + scATAC-seq
A complete end-to-end pipeline integrating paired scRNA-seq and scATAC-seq data (10x Multiome, SHARE-seq, SNARE-seq) using scGLUE and MOFA+, with automatic cell type annotation and gene regulatory network (GRN) inference.
- Multimodal Integration: scGLUE (graph-linked unified embedding) + MOFA+ (multi-omics factor analysis)
- Cell Type Annotation: Marker-based annotation validated across both RNA and ATAC modalities
- GRN Inference: Peak → gene regulatory links via GLUE cosine similarity + TF motif scanning
- Standard Output: UMAP plots, cluster labels, peak-gene links, GRN edges, reproducible
.h5mubundle
pip install muon scanpy scglue anndata mofapy2 leidenalg python-igraph \
matplotlib seaborn pandas numpy scipy \
--break-system-packages -qfrom multiome import run_multiome_skill
mdata, metrics, grn = run_multiome_skill(
out_dir="multiome_results_demo",
run_scglue=True,
run_mofa=True,
run_grn=True,
max_epochs=100 # reduce for faster demo
)from multiome import run_multiome_skill
# 10x .h5 or .h5mu file
mdata, metrics, grn = run_multiome_skill(
input_path="your_multiome.h5mu",
out_dir="my_analysis",
max_epochs=500
)1. Load Data
└── 10x Multiome .h5 / .h5mu / separate .h5ad files
2. Quality Control
├── RNA: gene count, total counts, mitochondrial %
└── ATAC: peak count, total counts
└── Intersect: keep cells present in both modalities
3. Preprocessing
├── RNA: normalize → log1p → HVG → scale → PCA
└── ATAC: TF-IDF → LSI → HVG → scale → PCA
4. Multimodal Integration
├── scGLUE: genomic coordinate prior (peak → gene proximity)
└── MOFA+: multi-omics factor analysis
5. Cell Type Annotation
└── Marker gene scoring across modalities
6. GRN Inference
└── Peak → gene cosine similarity → TF motif → GRN
7. Outputs
├── multiome_integrated.h5mu (full MuData)
├── cell_metadata.csv (cluster labels)
├── peak_gene_links.csv (regulatory pairs)
└── UMAP figures
| File | Description |
|---|---|
multiome_integrated.h5mu |
Complete MuData object with all embeddings |
cell_metadata.csv |
Cell × cluster assignments (RNA, ATAC, joint) |
peak_gene_links.csv |
GLUE-scored peak → gene regulatory pairs |
joint_umap_clusters.png |
Main UMAP: RNA clusters, ATAC clusters, joint clusters |
marker_dotplot.png |
Canonical marker gene expression by cluster |
muon>=0.1.6
scanpy>=1.9.6
scglue>=0.3.3
anndata>=0.10.0
mofapy2>=0.7.1
leidenalg>=0.10.1
python-igraph>=0.11.0
matplotlib>=3.7
seaborn>=0.12
pandas>=1.5
numpy>=1.24
scipy>=1.10
scikit-learn>=1.3
requests>=2.28Python 3.9+ required. GPU recommended (scGLUE auto-detects CUDA, 5–10× faster).
scGLUE uses genomic coordinate proximity (peaks within 1 Mb of genes) as a knowledge graph prior to align RNA and ATAC modalities in a shared latent space. This biologically grounded approach reduces false positives from spurious correlations.
Multi-Omics Factor Analysis learns latent factors capturing both shared and modality-specific variation, providing interpretable biological processes as factors.
GLUE feature embeddings place genes and peaks in the same vector space. Peaks with high cosine similarity to a gene embedding are predicted cis-regulatory elements. TF motif scanning (via JASPAR) on these peaks yields a three-layer network: TF → enhancer peak → target gene.
If you use scMultiome in your research, please cite:
Bredikhin, D. et al. (2022). MUON: multimodal omics analysis framework. Genome Biology.
Cao, Z.-J. & Gao, G. (2022). Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nature Biotechnology.