# pyPAGE Comprehensive Tutorial

This notebook is the consolidated end-to-end walkthrough for the current API.

Sections:
1. Bulk PAGE from demo differential-expression table
2. Gene set loading and gene ID conversion with `GeneMapper`
3. Single-cell PAGE on AnnData
4. CLI quick-reference snippets


## 1. Setup


In [None]:
import numpy as np
import pandas as pd
import anndata
from pypage import PAGE, SingleCellPAGE, ExpressionProfile, GeneSets, GeneMapper


## 2. Bulk PAGE (continuous scores)


In [None]:
expr = pd.read_csv("../example_data/test_DESeq_logFC.txt", sep="	")
exp = ExpressionProfile(expr["GENE"], expr["log2FoldChange"], n_bins=9)
gs_h = GeneSets.from_gmt("../example_data/h.all.v2026.1.Hs.symbols.gmt")

np.random.seed(42)
p = PAGE(exp, gs_h, n_shuffle=200, function="cmi", k=20, n_jobs=1)
bulk_results, bulk_hm = p.run()
bulk_results.head()


In [None]:
if bulk_hm is not None:
    bulk_hm.save("comprehensive_bulk_heatmap.pdf")
    bulk_hm.to_html("comprehensive_bulk_heatmap.html")


## 3. GeneMapper example

`GeneMapper` needs network only on first build if cache is missing.


In [None]:
# Example IDs; may map depending on cache/species table
ids = np.array(["ENSG00000141510", "ENSG00000012048"])
try:
    mapper = GeneMapper(species="human")
    converted, unmapped = mapper.convert(ids, from_type="ensg", to_type="symbol")
    print(converted)
    print(f"unmapped: {len(unmapped)}")
except Exception as e:
    print("GeneMapper initialization skipped in this environment:", e)


## 4. Single-cell PAGE (CRC AnnData)


In [None]:
adata = anndata.read_h5ad("../example_data/CRC.h5ad")
gs_sc = GeneSets.from_gmt("../example_data/c2.all.v2026.1.Hs.symbols.gmt")

sc = SingleCellPAGE(
    adata=adata,
    genesets=gs_sc,
    function="cmi",
    n_bins=10,
    fast_mode=True,
    filter_redundant=True,
    n_jobs=1,
)
sc_results = sc.run(n_permutations=200)
sc_results.head()


## 5. CLI quick reference

```bash
# Bulk full run
pypage -e ../example_data/test_DESeq_logFC.txt \
    --gmt ../example_data/h.all.v2026.1.Hs.symbols.gmt \
    --type continuous --cols GENE,log2FoldChange --seed 42

# Bulk draw-only
pypage --draw-only --outdir ../example_data/test_DESeq_logFC_cont_PAGE

# Single-cell full run
pypage-sc --adata ../example_data/CRC.h5ad \
    --gene-column gene \
    --gmt ../example_data/c2.all.v2026.1.Hs.symbols.gmt \
    --groupby PhenoGraph_clusters --fast-mode

# Single-cell draw-only
pypage-sc --draw-only --outdir ../example_data/CRC_scPAGE --groupby PhenoGraph_clusters
```
