v2.2.0
Highlights
- SIMD RNG / Ziggurat speedups — single-sample
dist.sample(rng)loops are ~2–4× faster vs the v2.1wide 1.3baseline.next_f64_array/next_f32_arraygo from 8 scalar shift+cast+mul loops to SIMD shift + magic-number bit-cast + subtract (52-bit / 23-bit precision). Newfill_uniform_f64/fill_uniform_f32direct-write APIs skip the[T; 8]return-by-value round-trip. Fused1/λscaling intoSimdExp::fill_exp_scaled. [819b1c0] - Opt-in
dual-stream-rngfeature —SimdRngDualcarries two independent Xoshiro state pairs (engine_a,engine_b) so the OoO core can interleave the Ziggurat scalar table-load chains across batches. Apple Silicon: −5% to −11% onSimdNormal::fill_sliceZiggurat sweet-spot (n = 4096). Uniform fills are not engine-bound, so no speedup there. Stream-bit-exactness changes (KS-validated). [207cb42] - Generic
R: SimdRngExtacross all 18 distributions — everySimdXxx<T, ..., R>struct is parameterised on the backing RNG (defaultSimdRng), lettingSimdNormalDual/SimdExpDual/SimdExpZigDualreuse the exact same code paths via type alias. [f886c7b, 207cb42] - Unified seed-handling API (breaking) —
with_seed(u64)andfrom_seed_source(&src)are removed; every distribution and process now exposes a single canonicalnew(args, &seed)constructor accepting anySeedExtstrategy (Unseeded,Deterministic::new(u64), or a custom source).SeedExt::reseed(u64)swaps aDeterministicsource in place for calibration sweeps. [f886c7b, d74f824] - Seed propagation fixes —
non_central_chi_squared::sampleis now generic overSeedExt(was hardcoding&Unseeded), restoring reproducibility in theSvcgmyv-path. All four FGN GPU/Accelerate backends (fgn/accelerate.rs,fgn/gpu.rs,fgn/metal.rs,fgn/cuda_native.rs) now thread&self.seedinstead of pulling from the global RNG. [88f7e0d, cf3c557] - Documentation — new
seedingconcept page covers the unified constructor,SeedExtstrategies,SimdRngExtgeneric backing RNG, and the dual-stream feature. [d74f824, f67c6ac]
Full Changelog: v2.1.0...v2.2.0