Skip to content

omics-rust/rsomics-permdisp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rsomics-permdisp

PERMDISP — Marti Anderson's test of homogeneity of multivariate dispersions (Anderson 2006), the distance-based analog of Levene's test and the companion to PERMANOVA and ANOSIM. Given a symmetric distance matrix and a categorical grouping, it reduces the distances to principal coordinates, measures each sample's distance to its group's centre, and reports a one-way ANOVA F-statistic with a permutation p-value. This is the same test as vegan::betadisper.

rsomics-permdisp dm.tsv -g groups.tsv --test median --seed 42
rsomics-permdisp dm.tsv -g groups.tsv --test centroid -p 0   # statistic only

Input

  • Distance matrix (dm.tsv): square, symmetric, lsmat layout — a blank top-left cell, sample IDs across the header, each data row prefixed by its ID. The format rsomics-permanova / rsomics-anosim consume.
  • Grouping (-g groups.tsv): id<tab>group per line; an optional header is detected and skipped. Every matrix ID must have a group.

Options

  • --test centroid|median — the dispersion centre. median (the spatial / geometric median, default) is the more robust test; centroid uses the group mean. The two give slightly different F-statistics.
  • -p, --permutations <N> — permutations for the p-value (default 999; 0 skips it and reports NA).
  • --seed <S> — seed for the permutation RNG.
  • --csv — parse inputs as comma-separated.
  • --precision <N> — decimal places in the output (default 6).
  • -o, --output <path> — output path (- for stdout).

Output

A key<tab>value table: method name, test statistic name (F-value), sample size, number of groups, the F-statistic, the p-value, and the number of permutations.

Method

Distances are reduced to principal coordinates (PCoA, Gower 1966): double-center -0.5·D², eigendecompose, and scale each eigenvector by sqrt(eigenvalue). Axes with a non-positive eigenvalue are kept as zero coordinates, as scikit-bio does. Each sample's dispersion is its Euclidean distance, in that coordinate space, to its group's centroid (--test centroid) or spatial median (--test median, found by the modified-Weiszfeld iteration). The F-statistic is a one-way ANOVA over those per-sample dispersions; the p-value permutes the grouping labels.

The F-statistic is value-exact versus scikit-bio (to 1e-9, both tests). The p-value is a seeded permutation estimate: scikit-bio permutes with NumPy's Generator.permutation (PCG64), this crate with its own seeded PCG64 Fisher-Yates, so for a given seed the two p-values are independent Monte-Carlo estimates of the same true p (they agree within Monte-Carlo tolerance, not bit-for-bit). Increasing -p tightens both. The estimate is reproducible for a fixed --seed and independent of thread count.

Origin

This crate is an independent Rust reimplementation of the PERMDISP operation provided by scikit-bio (skbio.stats.distance.permdisp), based on:

  • The published method: Anderson, M. J. "Distance-based tests for homogeneity of multivariate dispersions." Biometrics 62.1 (2006): 245–253. doi:10.1111/j.1541-0420.2005.00440.x
  • The scikit-bio source (BSD-3-Clause), read and cited for the exact pipeline: PCoA via double-centering and eigendecomposition (skbio.stats.ordination.pcoa, method="eigh", Gower 1966), the per-sample distance-to-centre, and the scipy.stats.f_oneway one-way ANOVA (its grand-mean-centered sum-of-squares partition, replicated for numerical agreement).
  • The spatial-median iteration mirrors scikit-bio's geomedian_axis_one, which scikit-bio itself ports from hdmedians v0.14.2 (BSD-3-Clause) — the Vardi-Zhang modified-Weiszfeld algorithm, with the same eps/maxiters defaults.

License: MIT OR Apache-2.0. Upstream credit: scikit-bio https://scikit-bio.org (BSD-3-Clause); hdmedians https://github.com/daleroberts/hdmedians (BSD-3-Clause).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors