rsomics-struc-zero

Structural-zero detection for compositional feature tables, as a single fast CLI. Equivalent to skbio.stats.composition.struc_zero.

Structural zeros are features systematically absent from a sample group — their observed counts are all (or nearly all) zero. ANCOM-BC and related differential-abundance methods detect them as a preprocessing step: a feature that is a structural zero in a group is automatically differentially abundant there and should be excluded from log-ratio analyses of that group. Run this on the raw table, before any pseudocount replacement.

rsomics-struc-zero --table table.tsv --metadata meta.tsv --grouping <col> [--neg-lb] [-o grid.tsv]

table.tsv — feature table: header row of feature IDs (corner cell ignored), then one sample_id value... line per sample. Empty / NA cells count as zero.
meta.tsv — sample metadata: sample ID in the first column, named covariate columns in the header.
--grouping — the metadata column whose labels partition samples into groups.
--neg-lb — flag a feature when the 95% Wald lower bound of its group prevalence is ≤ 0, not only when it is absent from every sample. Recommended for larger per-group sample sizes.
--csv — comma-separated I/O instead of tab.

Output is a boolean grid: a header of the (sorted) group labels, then one feature<TAB>True/False... line per feature, True where the feature is a structural zero in that group.

Features are independent, so -t parallelises the per-group scan across them.

Origin

This crate is an independent Rust reimplementation of skbio.stats.composition.struc_zero based on:

Kaul, A., Mandal, S., Davidov, O., Peddada, S. D., "Analysis of Microbiome Data in the Presence of Excess Zeros", Frontiers in Microbiology 8:2114, 2017. DOI: 10.3389/fmicb.2017.02114
Lin, H., Peddada, S. D., "Analysis of compositions of microbiomes with bias correction", Nature Communications 11:3514, 2020. DOI: 10.1038/s41467-020-17041-7
The scikit-bio implementation (Modified BSD License), read and cited: for each group it counts nonzero samples per feature to get a prevalence p, optionally replaces it with the Wald lower bound p - 1.96·sqrt(p(1-p)/n), and flags the feature when p ≤ 0. NaN/empty cells are treated as zero.

The boolean grid is reproduced exactly vs scikit-bio (counting plus a closed-form threshold, no RNG, no iteration); tests/compat.rs diffs a committed skbio-captured golden and a live skbio.stats.composition.struc_zero run.

Group labels are sorted lexicographically as strings; scikit-bio sorts via numpy.unique after a float-cast of the metadata, so a column of numeric-looking labels orders numerically there. Pass non-numeric labels for identical column order.

License: MIT OR Apache-2.0. Upstream credit: scikit-bio https://scikit-bio.org/ (Modified BSD License).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
benches		benches
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rsomics-struc-zero

Origin

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

rsomics-struc-zero

Origin

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages