PERMDISP — Marti Anderson's test of homogeneity of multivariate dispersions
(Anderson 2006), the distance-based analog of Levene's test and the companion
to PERMANOVA and ANOSIM. Given a symmetric distance matrix and a categorical
grouping, it reduces the distances to principal coordinates, measures each
sample's distance to its group's centre, and reports a one-way ANOVA F-statistic
with a permutation p-value. This is the same test as vegan::betadisper.
rsomics-permdisp dm.tsv -g groups.tsv --test median --seed 42
rsomics-permdisp dm.tsv -g groups.tsv --test centroid -p 0 # statistic only
- Distance matrix (
dm.tsv): square, symmetric, lsmat layout — a blank top-left cell, sample IDs across the header, each data row prefixed by its ID. The formatrsomics-permanova/rsomics-anosimconsume. - Grouping (
-g groups.tsv):id<tab>groupper line; an optional header is detected and skipped. Every matrix ID must have a group.
--test centroid|median— the dispersion centre.median(the spatial / geometric median, default) is the more robust test;centroiduses the group mean. The two give slightly different F-statistics.-p, --permutations <N>— permutations for the p-value (default 999; 0 skips it and reportsNA).--seed <S>— seed for the permutation RNG.--csv— parse inputs as comma-separated.--precision <N>— decimal places in the output (default 6).-o, --output <path>— output path (-for stdout).
A key<tab>value table: method name, test statistic name (F-value), sample
size, number of groups, the F-statistic, the p-value, and the number of
permutations.
Distances are reduced to principal coordinates (PCoA, Gower 1966): double-center
-0.5·D², eigendecompose, and scale each eigenvector by sqrt(eigenvalue).
Axes with a non-positive eigenvalue are kept as zero coordinates, as scikit-bio
does. Each sample's dispersion is its Euclidean distance, in that coordinate
space, to its group's centroid (--test centroid) or spatial median
(--test median, found by the modified-Weiszfeld iteration). The F-statistic is
a one-way ANOVA over those per-sample dispersions; the p-value permutes the
grouping labels.
The F-statistic is value-exact versus scikit-bio (to 1e-9, both tests). The
p-value is a seeded permutation estimate: scikit-bio permutes with NumPy's
Generator.permutation (PCG64), this crate with its own seeded PCG64
Fisher-Yates, so for a given seed the two p-values are independent Monte-Carlo
estimates of the same true p (they agree within Monte-Carlo tolerance, not
bit-for-bit). Increasing -p tightens both. The estimate is reproducible for a
fixed --seed and independent of thread count.
This crate is an independent Rust reimplementation of the PERMDISP operation
provided by scikit-bio (skbio.stats.distance.permdisp), based on:
- The published method: Anderson, M. J. "Distance-based tests for homogeneity of multivariate dispersions." Biometrics 62.1 (2006): 245–253. doi:10.1111/j.1541-0420.2005.00440.x
- The scikit-bio source (BSD-3-Clause), read and cited for the exact
pipeline: PCoA via double-centering and eigendecomposition
(
skbio.stats.ordination.pcoa,method="eigh", Gower 1966), the per-sample distance-to-centre, and thescipy.stats.f_onewayone-way ANOVA (its grand-mean-centered sum-of-squares partition, replicated for numerical agreement). - The spatial-median iteration mirrors scikit-bio's
geomedian_axis_one, which scikit-bio itself ports from hdmedians v0.14.2 (BSD-3-Clause) — the Vardi-Zhang modified-Weiszfeld algorithm, with the sameeps/maxitersdefaults.
License: MIT OR Apache-2.0. Upstream credit: scikit-bio https://scikit-bio.org (BSD-3-Clause); hdmedians https://github.com/daleroberts/hdmedians (BSD-3-Clause).