Skip to content

omics-rust/rsomics-struc-zero

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rsomics-struc-zero

Structural-zero detection for compositional feature tables, as a single fast CLI. Equivalent to skbio.stats.composition.struc_zero.

Structural zeros are features systematically absent from a sample group — their observed counts are all (or nearly all) zero. ANCOM-BC and related differential-abundance methods detect them as a preprocessing step: a feature that is a structural zero in a group is automatically differentially abundant there and should be excluded from log-ratio analyses of that group. Run this on the raw table, before any pseudocount replacement.

rsomics-struc-zero --table table.tsv --metadata meta.tsv --grouping <col> [--neg-lb] [-o grid.tsv]
  • table.tsv — feature table: header row of feature IDs (corner cell ignored), then one sample_id value... line per sample. Empty / NA cells count as zero.
  • meta.tsv — sample metadata: sample ID in the first column, named covariate columns in the header.
  • --grouping — the metadata column whose labels partition samples into groups.
  • --neg-lb — flag a feature when the 95% Wald lower bound of its group prevalence is ≤ 0, not only when it is absent from every sample. Recommended for larger per-group sample sizes.
  • --csv — comma-separated I/O instead of tab.

Output is a boolean grid: a header of the (sorted) group labels, then one feature<TAB>True/False... line per feature, True where the feature is a structural zero in that group.

Features are independent, so -t parallelises the per-group scan across them.

Origin

This crate is an independent Rust reimplementation of skbio.stats.composition.struc_zero based on:

  • Kaul, A., Mandal, S., Davidov, O., Peddada, S. D., "Analysis of Microbiome Data in the Presence of Excess Zeros", Frontiers in Microbiology 8:2114, 2017. DOI: 10.3389/fmicb.2017.02114
  • Lin, H., Peddada, S. D., "Analysis of compositions of microbiomes with bias correction", Nature Communications 11:3514, 2020. DOI: 10.1038/s41467-020-17041-7
  • The scikit-bio implementation (Modified BSD License), read and cited: for each group it counts nonzero samples per feature to get a prevalence p, optionally replaces it with the Wald lower bound p - 1.96·sqrt(p(1-p)/n), and flags the feature when p ≤ 0. NaN/empty cells are treated as zero.

The boolean grid is reproduced exactly vs scikit-bio (counting plus a closed-form threshold, no RNG, no iteration); tests/compat.rs diffs a committed skbio-captured golden and a live skbio.stats.composition.struc_zero run.

Group labels are sorted lexicographically as strings; scikit-bio sorts via numpy.unique after a float-cast of the metadata, so a column of numeric-looking labels orders numerically there. Pass non-numeric labels for identical column order.

License: MIT OR Apache-2.0. Upstream credit: scikit-bio https://scikit-bio.org/ (Modified BSD License).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors