S6: validation report + figures + MkDocs Validation page + parity README; remove legacy zdt/ (milestone-12a)#18
Conversation
…E parity paragraph; remove legacy zdt/ suite (milestone-12a)
Adversarial review: APPROVE (final step, s6-reporting)Audited the committed diff and verified everything by execution in the worktree. Consistency gate (the citation shield)
Tables trace to JSON (no hardcoded numbers) — sphere/objective_error, zdt4/igd_plus, zdt4/hypervolume (degenerate), zdt6/hypervolume, schwefel/solution_distance, zdt2/zdt3 igd_plus all match committed verdicts. Prose ↔ verdicts — SO 6/6 equivalent on both error metrics; MO zdt1/2/3/dtlz2 equivalent on all three metrics; ZDT4/ZDT6 framed only as in-our-favour exceptions (grep "worse" → none). No superiority framing. ZDT3 'visible scatter' caption replaced with the real cause (legacy per-individual PM, ~30× under-mutation). Legacy removal — benchmarks/zdt/ tree gone; no real imports remain. Do-not-touch set (problems/metrics/harness/run.py/stats.py/config/tests/pyproject/uv.lock/.github) untouched. Acceptance — render gate PASS + deterministic (figures byte-identical on re-render); ruff/ty(src) PASS; doctest gate 11 passed (collects render+reproduce); full suite 525 passed @ 98.89%; CI: 4/4 green (py3.11/3.12/3.13 + numpy-floor). mergeStateStatus CLEAN. Merging — completes milestone-12a (8/8 steps). |
s6-reporting (milestone-12a) — validation report, docs, legacy removal (FINAL step)
Turns the committed 30-seed results JSON into the human-facing report + a citable docs page, and removes the superseded legacy suite. Completes the milestone.
Files
benchmarks/render.py(NEW): JSON → markdown tables (injected between<!-- BEGIN:id -->/<!-- END:id -->markers) + matplotlib figures. Carries a fail-fast consistency gate (check_report_consistency) called insiderender()before any file write — it RAISES if the report prose's claimed-indistinguishable/at-least-as-good set contradicts any committed verdict cell, so the citation page can never self-contradict.benchmarks/reproduce.py(NEW): the oneuv run python benchmarks/reproduce.pyregenerate-everything entry (sweep → render).benchmarks/README.md(REWRITE): canonical report, tables auto-rendered from the committed JSON + the 3 figures embedded.README.md(root): the## Benchmarkslink replaced with one honest parity paragraph + links.mkdocs.yml+docs/validation.md(NEW) +docs/assets/validation/*.png: a citable Validation page (passesmkdocs build --strict).benchmarks/zdt/tree (problems, run_benchmark.py, old results/png/log) — superseded; nothing imports it.Report framing (parity, not superiority; reconciled to the committed 30-seed verdicts)
Acceptance (verified twice — critic + orchestrator)
render.py(gate passes, 7 tables + 3 figures),mkdocs build --strict,ruff,ty check src/— all pass.uv run pytest: 525 passed @ 98.89% (the s5 doctest gate now also covers render/reproduce). Legacyzdt/gone; nothing imports it.Plan converged via planner↔critic over 2 rounds (round 2 added the prose↔verdict consistency gate; critic verified it passes on the committed JSON and raises on a flipped cell).