CELLEX Workflow

See Timshel (eLife, 2020) for details on the CELLEX method

Overview

The default workflow for CELLEX. Note that several of these steps can be modified in the CELLEX object function calls.

Input

The preferred input to CELLEX is unnormalized expression data (UMI counts or TMP-like values). The CELLEX expression specific metrics assume the expression input is positive (or zero), so (batch corrected) expression data containing negative values cannot be used.

Normalization

By default, the data will be normalized using "log common transcript count"-normalization. If the data input has already been normalized, then run CELLEX with ESObject(data=data, annotation=metadata, **normalize=False**, verbose=True).

Gene filtering

ES metrics are prone to falsely estimating genes with ‘sporadic’ gene expression levels as highly expression specific. By default, these genes are therefore excluded to reduce false-positive ES genes. (Most notably, the EP metric will estimate genes with very low expression appearing in a few number of cells as highly expression specific.) To solve this problem, we use an ANOVA model to estimate the background noise level for each gene, enabling us to distinguish between genes with undetectable sporadic expression levels and genes with confident expression levels. By default, these sporadically expressed genes will be removed before ES metrics are computed. If the data input has already been filtered or you want to disable this step, then run CELLEX with ESObject(data=data, annotation=metadata, **avova=False**, verbose=True).