Skip to content

krishnanlab/icepop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ICePop: Informative Cell Population

This repository contains source code for ICePop (link).

The data used in this study are available on Zenodo

The code used to reproduce the analyses in the paper is available at: https://github.com/krishnanlab/icepop_analysis

Dependencies

python>=3.11,<3.12

Installation

ICePop requires torch==2.1.1. To enable GPU acceleration, which substantially speeds up metacell reconstruction, we recommend installing PyTorch following the official instructions on the PyTorch website to ensure compatibility with your system. Specifically, we used torch-2.1.1+cu121 in the paper.

After installing torch, then install ICePop via pip: pip install git+https://github.com/krishnanlab/icepop

Run ICePop

Before running the analysis, we recommend downloading the processed data from Zenodo.

Expand and place the downloaded files under ../data, then run the following commands.

A more detailed tutorial is available at notebook/ICePop_tutorial.ipynb

Step 1: Extract metacells

icepop metacell \
    --h5ad ../data/TM_FACS/TM_FACS_cnt.h5ad \
    --outdir ../results/TM_FACS_mc \
    --save_name TM_FACS

Input options

  1. --h5ad (str) Path to input AnnData (.h5ad) file containing single-cell expression count data
  2. --outdir (str) Output directory where MetaQ results will be written
  3. --save_name (str; default='metaq_res') prefix of metaq output under ./save/*, do not write a path
  4. --ncell_per_mc (int; default=75) Target number of cells per metacell. The total number of metacells is \n determined as approximately n_cells / ncell_per_mc
  5. --ct_key (str; default='cell_type') Column name in adata.obs specifying cell-type annotations. Used to evaluate metacell purity
  6. --device (str; default='cuda') Compute device to use. Options include 'cuda' or 'cpu'
  7. --batch_size (int; default=512) Batch size to run metaq

this step need gpu for faster speed

Outputs

  1. metacell assignment: outdir/mc_assign.csv
  2. metacell statistics: outdir/mc_stats.csv

Step 2: Get association, mixture and influence diagnoistics

icepop association \
    --h5ad ../data/TM_FACS/TM_FACS_cnt.h5ad \
    --mc_assign ../results/TM_FACS/mc_assign.csv \
    --magmaz ../data/magmaz/asd.genes.out \
    --sp mmusculus \
    --outdir ../results/TM_FACS_association

Input options

  1. --h5ad (str) Input AnnData file containing single-cell expression count data
  2. --mc_assign (str) CSV file mapping cells to metacell assignments (output from step 1: outdir/mc_assign.csv)
  3. --magmaz (str) magmaz MAGMA gene-level association file (*.genes.out) of a trait of interest
  4. --spec_score (str; default=None) Precomputed specificity scores; will be calculated if not provided
  5. --outdir (str) Output directory for association results
  6. --n_jobs (int; default=20) Number of parallel workers
  7. --sp (str; default='mmusculus') Species identifier for gene ID conversion
  8. --ct_key (str; default='cell_type') Column in adata.obs defining cell types
  9. --trait_name (str; optional) Trait name used for output file naming
  10. --n_perm (int; default=1000) Number of permutations for null distribution estimation
  11. --q_thres (float; default=0.1) FDR threshold for significance
  12. --min_purity (float; default=0.2) Minimum metacell purity required for inclusion in cell type aggregation
  13. --min_mc_size (int; default=20) Minimum metacell size required for inclusion in cell type aggregation
  14. --output_dfbs (boolean; default=True) If output influential testing results

Outputs

  1. outdir/celltype__trait-*.csv: Disease-cell type association table
  2. outdir/dfbs__trait-*.npz: Gene-level influence scores (DFBETAS) for each disease–cell type association
  3. outdir/metacell__trait-*.csv: Disease-metacell type association table
  4. outdir/mc_spec_score.npz: Metacell expression specificity (if nothing specified for --spec_score, this will be the path to generated expression specificity)
  5. outdir/mcfdr__trait-*.csv: Cell type × metacell matrix indicating significant disease-associated metacells within each cell type

where * is trait name we assume magmaz file name is *.genes.out

Step3: Enrichment analysis and interactive output

# run all gene sets
icepop interactive \
  --outdir ../results/TM_FACS \
  --mcdir ../results/TM_FACS \
  --geneset_collections All \
  --adata_path ../data/TM_FACS/TM_FACS_cnt.h5ad

or

# run specific gene sets
icepop interactive \
  --outdir ../results/TM_FACS \
  --mcdir ../results/TM_FACS \
  --geneset_collections BIOCARTA \
  --adata_path ../data/TM_FACS/TM_FACS_cnt.h5ad

or 

# custom gene sets
icepop interactive \
  --outdir ../results/TM_FACS \
  --mcdir ../results/TM_FACS \
  --geneset_collections none \
  --geneset_path custom.gmt \
  --adata_path ../data/TM_FACS/TM_FACS_cnt.h5ad

Input options

  1. --outdir (str) Output directory for association results. Enrichment results and reports will also be saved here.
  2. --mcdir (str) Directory for metacell assignments (This dir can be the same as --outdir)
  3. --geneset_collections (str) All, 'BIOCARTA', 'KEGG', 'REACTOME', 'WIKIPATHWAYS', 'MIR', 'TF', 'GOBP', 'GOCC', 'GOMF', 'HP'
  4. --geneset_path (str) path to custom gmt file if --geneset_collections is set to none
  5. --adata_path (str) path to AnnData file containing single-cell expression count data

Outputs

  1. outdir/icepop-report.ipynb: Interactive Jupyter notebook containing all results
  2. outdir/icepop-report.html: Rendered HTML version of the notebook for easy viewing
  3. outdir/enrichment: Directory containing gene set enrichment analysis results

About

public repo for icepop

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors