Skip to content

omics-rust/rsomics-bioenv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rsomics-bioenv

BIO-ENV / BEST — find the subset of environmental variables whose standardized Euclidean distances are maximally rank-correlated with a community distance matrix (Clarke & Ainsworth 1993). Drop-in compatible with skbio.stats.distance.bioenv.

rsomics-bioenv dm.tsv --env env.tsv [--columns a,b,c] [-o result.tsv]

dm.tsv is an lsmat-format community distance matrix (a blank top-left corner, a tab-separated id header, then one id<TAB>values… row per sample). env.tsv is a samples × variables table: a header <id-label><TAB>var1<TAB>var2…, then one sampleid<TAB>v1<TAB>v2… row per sample. Env rows are reindexed onto the distance-matrix ids, so the ids must match but need not be in the same order; extra env rows are ignored. All variable values must be numeric.

For each subset size from 1 to the number of variables, every variable subset of that size is evaluated and the one with the highest correlation is reported. The variables are standardized first (centered, divided by the sample standard deviation), their Euclidean distances are computed over the matrix's upper triangle, and Spearman's ρ is taken against the community distances. Output is a TSV of size, the best correlation, and the comma-joined vars.

This is an exhaustive 2^p search, so runtime grows quickly with the variable count — the same warning scikit-bio gives.

Origin

This crate is an independent Rust reimplementation of skbio.stats.distance.bioenv, informed by its BSD-3-licensed source (the center-and-scale standardization with sample standard deviation, the upper-triangle condensed Euclidean distances, the per-subset-size exhaustive search, and the "first subset on a tie" rule matching vegan::bioenv) and by the method's primary reference:

  • Clarke, K. R. & Ainsworth, M. (1993). "A method of linking multivariate community structure to environmental variables." Marine Ecology Progress Series 92: 205–219. doi:10.3354/meps092205.

Spearman's ρ is computed as Pearson correlation on average-ranked distances, matching scipy.stats.spearmanr; the community ranks are centered once and reused across all subsets.

License: MIT OR Apache-2.0. Upstream credit: scikit-bio https://scikit-bio.org (BSD-3-Clause).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors