Skip to content

Latest commit

 

History

History
47 lines (32 loc) · 2.39 KB

v3_0.md

File metadata and controls

47 lines (32 loc) · 2.39 KB

Overview

We expect a cell_filtering HDF5 group at the root of the file, containing information about the filtering of low-quality cells. The itself contains the parameters and results subgroups.

Definitions:

  • rna_available: whether RNA data is available. This is typically determined from the inputs step.
  • adt_available: whether ADT data is available. This is typically determined from the inputs step.
  • crispr_available: whether CRISPR data is available. This is typically determined from the inputs step.
  • num_cells: number of cells in the unfiltered dataset. This is typically determined from the inputs step.

Parameters

parameters should contain:

  • use_rna: a scalar integer to be interpreted as a boolean, indicating whether the RNA data should be used for quality control.
  • use_adt: a scalar integer to be interpreted as a boolean, indicating whether the ADT data should be used for quality control.
  • use_crispr: a scalar integer to be interpreted as a boolean, indicating whether the CRISPR data should be used for quality control.

Results

In this section, the effective number of modalities is defined as the number of modalities (i.e., rna, adt or crispr) that are both available (i.e., *_available is true) and in use for quality control (i.e., use_* is true).

If the effective number of modalities is greater than 1, results should contain:

  • discards: an integer dataset of length equal to num_cells. Each value is interpreted as a boolean and specifies whether the corresponding cell would be discarded by the filter thresholds. This is typically some function of the per-modality discards from the modality-specific *_quality_control steps.

If the effective number of modalities is 1, discards may be absent. In this case, the discard dataset is implicitly defined as the discards from the single modality that was used for QC.

If the effective number of modalities is 0, discards may be absent. In this case, the discard dataset is implicitly defined as an all-zero vector of length num_cells, i.e., no removal of any cells.

History

Updated in version 3.0, with the following changes from the previous version:

  • This step is now responsible for deciding which modalities to use for filtering. Upstream *_quality_control steps will always compute metrics, allowing them to be reliably used for diagnostic purposes.