Firmament is a Python package for performing high-throughput gene signature searches on large collections of single-cell RNA-seq datasets stored as H5Ad objects. It helps identify datasets containing sets of cells enriched with cells expressing a gene signature.
The package leverages the signature score calculation methods from the pegasus package to compute and search for gene signatures. Firmament's two-phase approach (offline computation + online search) enables efficient signature score searches across 100M+ single cells.
pip install firmamentEasily compute Z-scores for gene signatures across your single-cell data:
import anndata
import firmament
# Load your data
ad = anndata.read_h5ad("path/to/your/data.h5ad")
# Compute Z-scores
zscore_df = firmament.compute.calc_zscores(
ad,
layer=None, # Use .X matrix (or specify a layer name)
cell_type_labels="level1class" # Column in adata.obs containing cell type labels
)
print(zscore_df)### output
Tspan12 Tshz1 Fnbp1l Adamts15 Cldn12 Rxfp1 \
1772071015_C02 -0.996770 -0.162126 -0.171727 -0.454830 -0.370999 -0.424958
1772071017_G12 -0.773850 -0.321869 -0.658392 -0.516157 -0.128007 -0.273857
1772071017_A05 -0.830232 -0.829437 0.106758 -0.402295 -0.438613 -0.577686
1772071014_B06 0.046787 -0.065567 -0.399475 -0.278929 -0.757392 -0.670489
1772067065_H06 -1.004179 -0.033353 -0.709410 -0.278839 -0.722943 -0.405029
... ... ... ... ... ... ...
One can imagine a system stores these computed signature Z-score matrices to rapidly identify cell types or individual cells enriched for the expression of specific genes across large data collections.
Search for cells or cell types enriched for specific gene signatures:
from firmament import signature_search
# Search for cells enriched for a set of genes
results = signature_search(
"path/to/your/data.h5ad",
genes=["Fnbp1l", "Tspan12", "Vipas39"],
layer_name=None, # Use .X matrix (or specify a layer name)
cell_type_label_column_name="level1class",
verbose=True
)
print(results) # Can also be converted into a Pandas DataFrame### output
fdr_prop total_count fdr_count zscore_hist pval_hist mean_zscore set_zscore set_pval label
0 0.001997 3005 6 {'bins': ['-3.20', '-3.14', '-3.09', '-3.03', ... {'bins': ['0.01', '0.02', '0.03', '0.04', '0.0... -0.932011 -51.090881 1.0 overall
For larger collections of files, you can use the batch_signature_search function to perform searches on-the-fly across multiple datasets.
This project has been set up using BiocSetup and PyScaffold.