ICePop: Informative Cell Population

This repository contains source code for ICePop (link).

The data used in this study are available on Zenodo

The code used to reproduce the analyses in the paper is available at: https://github.com/krishnanlab/icepop_analysis

Dependencies

python>=3.11,<3.12

Installation

ICePop requires torch==2.1.1. To enable GPU acceleration, which substantially speeds up metacell reconstruction, we recommend installing PyTorch following the official instructions on the PyTorch website to ensure compatibility with your system. Specifically, we used torch-2.1.1+cu121 in the paper.

After installing torch, then install ICePop via pip: pip install git+https://github.com/krishnanlab/icepop

Run ICePop

Before running the analysis, we recommend downloading the processed data from Zenodo.

Expand and place the downloaded files under ../data, then run the following commands.

A more detailed tutorial is available at notebook/ICePop_tutorial.ipynb

Step 1: Extract metacells

icepop metacell \
    --h5ad ../data/TM_FACS/TM_FACS_cnt.h5ad \
    --outdir ../results/TM_FACS_mc \
    --save_name TM_FACS

Input options

--h5ad (str) Path to input AnnData (.h5ad) file containing single-cell expression count data
--outdir (str) Output directory where MetaQ results will be written
--save_name (str; default='metaq_res') prefix of metaq output under ./save/*, do not write a path
--ncell_per_mc (int; default=75) Target number of cells per metacell. The total number of metacells is \n determined as approximately n_cells / ncell_per_mc
--ct_key (str; default='cell_type') Column name in adata.obs specifying cell-type annotations. Used to evaluate metacell purity
--device (str; default='cuda') Compute device to use. Options include 'cuda' or 'cpu'
--batch_size (int; default=512) Batch size to run metaq

this step need gpu for faster speed

Outputs

metacell assignment: outdir/mc_assign.csv
metacell statistics: outdir/mc_stats.csv

Step 2: Get association, mixture and influence diagnoistics

icepop association \
    --h5ad ../data/TM_FACS/TM_FACS_cnt.h5ad \
    --mc_assign ../results/TM_FACS/mc_assign.csv \
    --magmaz ../data/magmaz/asd.genes.out \
    --sp mmusculus \
    --outdir ../results/TM_FACS_association

Input options

--h5ad (str) Input AnnData file containing single-cell expression count data
--mc_assign (str) CSV file mapping cells to metacell assignments (output from step 1: outdir/mc_assign.csv)
--magmaz (str) magmaz MAGMA gene-level association file (*.genes.out) of a trait of interest
--spec_score (str; default=None) Precomputed specificity scores; will be calculated if not provided
--outdir (str) Output directory for association results
--n_jobs (int; default=20) Number of parallel workers
--sp (str; default='mmusculus') Species identifier for gene ID conversion
--ct_key (str; default='cell_type') Column in adata.obs defining cell types
--trait_name (str; optional) Trait name used for output file naming
--n_perm (int; default=1000) Number of permutations for null distribution estimation
--q_thres (float; default=0.1) FDR threshold for significance
--min_purity (float; default=0.2) Minimum metacell purity required for inclusion in cell type aggregation
--min_mc_size (int; default=20) Minimum metacell size required for inclusion in cell type aggregation
--output_dfbs (boolean; default=True) If output influential testing results

Outputs

outdir/celltype__trait-*.csv: Disease-cell type association table
outdir/dfbs__trait-*.npz: Gene-level influence scores (DFBETAS) for each disease–cell type association
outdir/metacell__trait-*.csv: Disease-metacell type association table
outdir/mc_spec_score.npz: Metacell expression specificity (if nothing specified for --spec_score, this will be the path to generated expression specificity)
outdir/mcfdr__trait-*.csv: Cell type × metacell matrix indicating significant disease-associated metacells within each cell type

where * is trait name we assume magmaz file name is *.genes.out

Step3: Enrichment analysis and interactive output

# run all gene sets
icepop interactive \
  --outdir ../results/TM_FACS \
  --mcdir ../results/TM_FACS \
  --geneset_collections All \
  --adata_path ../data/TM_FACS/TM_FACS_cnt.h5ad

or

# run specific gene sets
icepop interactive \
  --outdir ../results/TM_FACS \
  --mcdir ../results/TM_FACS \
  --geneset_collections BIOCARTA \
  --adata_path ../data/TM_FACS/TM_FACS_cnt.h5ad

or 

# custom gene sets
icepop interactive \
  --outdir ../results/TM_FACS \
  --mcdir ../results/TM_FACS \
  --geneset_collections none \
  --geneset_path custom.gmt \
  --adata_path ../data/TM_FACS/TM_FACS_cnt.h5ad

Input options

--outdir (str) Output directory for association results. Enrichment results and reports will also be saved here.
--mcdir (str) Directory for metacell assignments (This dir can be the same as --outdir)
--geneset_collections (str) All, 'BIOCARTA', 'KEGG', 'REACTOME', 'WIKIPATHWAYS', 'MIR', 'TF', 'GOBP', 'GOCC', 'GOMF', 'HP'
--geneset_path (str) path to custom gmt file if --geneset_collections is set to none
--adata_path (str) path to AnnData file containing single-cell expression count data

Outputs

outdir/icepop-report.ipynb: Interactive Jupyter notebook containing all results
outdir/icepop-report.html: Rendered HTML version of the notebook for easy viewing
outdir/enrichment: Directory containing gene set enrichment analysis results

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
icepop		icepop
images		images
notebook		notebook
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICePop: Informative Cell Population

Dependencies

Installation

Run ICePop

Step 1: Extract metacells

Input options

Outputs

Step 2: Get association, mixture and influence diagnoistics

Input options

Outputs

Step3: Enrichment analysis and interactive output

Input options

Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ICePop: Informative Cell Population

Dependencies

Installation

Run ICePop

Step 1: Extract metacells

Input options

Outputs

Step 2: Get association, mixture and influence diagnoistics

Input options

Outputs

Step3: Enrichment analysis and interactive output

Input options

Outputs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages