feat(swarm): ADR-149 evaluation harness — GDOP, IQM+bootstrap, noise sweep#875
Merged
Conversation
…se sweep Stage-1 kinematic evaluator per ADR-149 (peer-reviewed). Pure Rust, no new deps. evals/: - gdop.rs: 2D Geometric Dilution of Precision ((HᵀH)⁻¹ trace-sqrt); None for <2 observers or collinear/singular geometry - stats.rs: IQM (Agarwal 2021) + 95% stratified-bootstrap CI (deterministic LCG) + probability_of_improvement - metrics.rs: EpisodeMetrics + AggregateMetrics::from_strata (IQM±CI, seed-stratified) - runner.rs: seeded kinematic rollout (FlightPattern-driven), seed×episode matrix, 3σ×3κ default noise sweep (Gaussian amplitude × von Mises phase) - report.rs + eval_swarm bin: generates evals/RESULTS.md leaderboard RESULTS.md surfaces the real coverage-vs-localization-precision trade-off via GDOP: partitioned wins coverage (100%) but single-drone sightings (GDOP 0 → 7.0m); pheromone gets multistatic fusion (GDOP 1.6 → 4.1m). Wi2SAR 5m paper-baseline row included. Stage-2 (Gazebo/PX4 SITL false-alarm + collision on median seeds) is documented follow-on. Tests: 116 default / 133 full+train (+13 eval tests), 0 failed. Clippy clean (-D warnings). Co-Authored-By: claude-flow <ruv@ruv.net>
d6407ae to
aabf7a7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements ADR-149 (statistically-rigorous swarm evaluation methodology, peer-reviewed/Accepted). Adds the Stage-1 kinematic evaluation harness for
ruview-swarm: seeded multi-run rollouts → SAR + MARL metrics with IQM + 95% stratified-bootstrap CIs, a (σ, κ) CSI-noise sweep, GDOP tracking, and aRESULTS.mdleaderboard generator. Pure Rust, no new dependencies.What's new (
src/evals/)gdop.rs(HᵀH)⁻¹;Nonefor <2 observers / collinear / singular geometrystats.rsmetrics.rsEpisodeMetrics+AggregateMetrics::from_strata(IQM±CI, seed-stratified)runner.rsFlightPattern), seed×episode matrix, 3σ×3κ default noise sweepreport.rs+bin/eval_swarmevals/RESULTS.mdThe result it surfaces (the point of the methodology)
GDOP tracking exposes a real coverage-vs-localization-precision trade-off that point estimates would hide:
partitionedwins coverage (disjoint strips) but its single-drone sightings (GDOP→0) give the worst localization;pheromoneco-locates drones (GDOP 1.6) for better fusion. Coverage and localization-precision genuinely trade off — exactly what the harness is built to reveal.Methodology (ADR-149)
Tests
--no-default-features: 116/116 (+13 eval tests)--features full,train: 133/133-D warnings --no-deps)cargo run --bin eval_swarmproducesRESULTS.md✓Covered by the ruview-swarm CI guard (path-scoped feature matrix + clippy + ITAR guards).
Related
docs/adr/ADR-149-swarm-benchmarking-evaluation-methodology.md🤖 Generated with claude-flow