cshift

A tool to perform cluster enrichment/depletion analyses

Background

This is a tool to perform cluster enrichment/depletions with known groups. It was built with single-cell perturbation studies in mind to answer the question of how each perturbation potentially shifts leiden cluster occupation when compared to non-targeting controls.

Method

Cluster Enrichment

This performs a chi-square test for each group/cluster in the provided dataset. Specifically, a chi-square test is performed for each group and cluster between each group distribution and each non-targeting control distribution. The pvalues from this are then aggregated over the non-targeting controls using a geometric mean. Finally these pvalues are adjusted for multiple hypothesis testing using a Benjamini Hochberg correction.

Visualization

This is visualized using the negative log false discovery rate multiplied by the sign of the percent change (to reflect either enrichment[+] or depletion[-] respectively).

By default only groups with at least one significant cluster are shown, but this can be turned off using the filter_significant = False flag on cs.plot().

The visualization is powered by seaborn clustermap and a **kwargs is included so you can pass in all relevant arguments into clustermap from the cs.plot() interface.

Usage

Toy Dataset

Here is an example starting from scratch

import numpy as np
from cshift import CShift

# define your array of clusters
clusters = np.array([0, 0, 0, 1, 1, 1])

# define your array of groups
groups = np.array([0, 1, 2, 0, 1, 2])

# define your reference group (this is what we are comparing everything against)
reference = np.array([0])

# create the CShift object
cs = CShift(
    clusters,
    groups,
    reference,
)

# perform all the chi-square tests and calculate the percent change
# and false discovery matrix
pcc, qval = cs.fit()
assert pcc.shape == (3, 2)
assert qval.shape == (3, 2)

# plot the signed negative log adjusted p-values as a clustermap
# this is commented out because there will not be any enrichment/depletions
# cs.plot()

With Single Cell Sequencing

Here is an example of what it would be like in context of single-cell analysis

from cshift import CShift

# adata is an AnnData object used in scanpy
cs = CShift(
    clusters=adata.obs.leiden.values,
    groups=adata.obs.guide_name.values,
    reference=np.array(["NTC_5406", "NTC_5408", "NTC_5412", "NTC_5430"])
)
pcc, qval = cs.fit()

cs.plot()

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github/workflows		.github/workflows
cshift		cshift
resources		resources
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

cshift

cshift

resources

resources

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

cshift

Background

Method

Cluster Enrichment

Visualization

Usage

Toy Dataset

With Single Cell Sequencing

About

Releases

Packages

Languages

License

noamteyssier/cshift

Folders and files

Latest commit

History

Repository files navigation

cshift

Background

Method

Cluster Enrichment

Visualization

Usage

Toy Dataset

With Single Cell Sequencing

About

Resources

License

Stars

Watchers

Forks

Languages