We expect a cell_filtering
HDF5 group at the root of the file, containing information about the filtering of low-quality cells.
The itself contains the parameters
and results
subgroups.
Definitions:
rna_available
: whether RNA data is available. This is typically determined from theinputs
step.adt_available
: whether ADT data is available. This is typically determined from theinputs
step.crispr_available
: whether CRISPR data is available. This is typically determined from theinputs
step.num_cells
: number of cells in the unfiltered dataset. This is typically determined from theinputs
step.
parameters
should contain:
use_rna
: a scalar integer to be interpreted as a boolean, indicating whether the RNA data should be used for quality control.use_adt
: a scalar integer to be interpreted as a boolean, indicating whether the ADT data should be used for quality control.use_crispr
: a scalar integer to be interpreted as a boolean, indicating whether the CRISPR data should be used for quality control.
In this section, the effective number of modalities is defined as the number of modalities (i.e., rna
, adt
or crispr
)
that are both available (i.e., *_available
is true) and in use for quality control (i.e., use_*
is true).
If the effective number of modalities is greater than 1, results
should contain:
discards
: an integer dataset of length equal tonum_cells
. Each value is interpreted as a boolean and specifies whether the corresponding cell would be discarded by the filter thresholds. This is typically some function of the per-modalitydiscards
from the modality-specific*_quality_control
steps.
If the effective number of modalities is 1, discards
may be absent.
In this case, the discard dataset is implicitly defined as the discards
from the single modality that was used for QC.
If the effective number of modalities is 0, discards
may be absent.
In this case, the discard dataset is implicitly defined as an all-zero vector of length num_cells
, i.e., no removal of any cells.
Updated in version 3.0, with the following changes from the previous version:
- This step is now responsible for deciding which modalities to use for filtering.
Upstream
*_quality_control
steps will always compute metrics, allowing them to be reliably used for diagnostic purposes.