Skip to content

S6: validation report + figures + MkDocs Validation page + parity README; remove legacy zdt/ (milestone-12a)#18

Merged
CooperBigFoot merged 1 commit into
mainfrom
milestone-12a/s6-reporting
Jun 26, 2026
Merged

S6: validation report + figures + MkDocs Validation page + parity README; remove legacy zdt/ (milestone-12a)#18
CooperBigFoot merged 1 commit into
mainfrom
milestone-12a/s6-reporting

Conversation

@CooperBigFoot

Copy link
Copy Markdown
Contributor

s6-reporting (milestone-12a) — validation report, docs, legacy removal (FINAL step)

Turns the committed 30-seed results JSON into the human-facing report + a citable docs page, and removes the superseded legacy suite. Completes the milestone.

Files

  • benchmarks/render.py (NEW): JSON → markdown tables (injected between <!-- BEGIN:id -->/<!-- END:id --> markers) + matplotlib figures. Carries a fail-fast consistency gate (check_report_consistency) called inside render() before any file write — it RAISES if the report prose's claimed-indistinguishable/at-least-as-good set contradicts any committed verdict cell, so the citation page can never self-contradict.
  • benchmarks/reproduce.py (NEW): the one uv run python benchmarks/reproduce.py regenerate-everything entry (sweep → render).
  • benchmarks/README.md (REWRITE): canonical report, tables auto-rendered from the committed JSON + the 3 figures embedded.
  • README.md (root): the ## Benchmarks link replaced with one honest parity paragraph + links.
  • mkdocs.yml + docs/validation.md (NEW) + docs/assets/validation/*.png: a citable Validation page (passes mkdocs build --strict).
  • Removed the entire legacy benchmarks/zdt/ tree (problems, run_benchmark.py, old results/png/log) — superseded; nothing imports it.

Report framing (parity, not superiority; reconciled to the committed 30-seed verdicts)

  • SO: indistinguishable from pymoo and DEAP on all six functions (objective_error + solution_distance). success_rate documented as a strict-threshold non-convergence property at this budget (0 for all three — the error metrics carry the SO evidence).
  • MO: indistinguishable on ZDT1, ZDT2, ZDT3, DTLZ2 (IGD+, GD, hypervolume). ZDT4/ZDT6 documented as the two in-our-favour exceptions: none of the three converges at this budget and ctrl-freak's IGD+/GD/HV is at least as good as both baselines (ZDT4 HV degenerate 0/0/0).
  • The old "pymoo ZDT3 scatter" caption is removed and replaced with the real cause: the legacy baseline applied PM per-individual (~0.03 vs ~1.0 expected gene flips, ~30× under-mutation) — fixed in s4b.

Acceptance (verified twice — critic + orchestrator)

  • render.py (gate passes, 7 tables + 3 figures), mkdocs build --strict, ruff, ty check src/ — all pass.
  • full uv run pytest: 525 passed @ 98.89% (the s5 doctest gate now also covers render/reproduce). Legacy zdt/ gone; nothing imports it.

Plan converged via planner↔critic over 2 rounds (round 2 added the prose↔verdict consistency gate; critic verified it passes on the committed JSON and raises on a flipped cell).

…E parity paragraph; remove legacy zdt/ suite (milestone-12a)
@CooperBigFoot

Copy link
Copy Markdown
Contributor Author

Adversarial review: APPROVE (final step, s6-reporting)

Audited the committed diff and verified everything by execution in the worktree.

Consistency gate (the citation shield)

  • Passes on the committed 30-seed JSON; gate is fail-fast — render.py:622-631 runs build_tables → check_report_consistency → inject_tables → figures, before any file write.
  • Raise-on-tamper proven: a scratch JSON with zdt1/igd_plus flipped to "ctrl-freak worse" makes render() raise ValueError and leaves README.md + docs/validation.md byte-unchanged (sha256 before==after).

Tables trace to JSON (no hardcoded numbers) — sphere/objective_error, zdt4/igd_plus, zdt4/hypervolume (degenerate), zdt6/hypervolume, schwefel/solution_distance, zdt2/zdt3 igd_plus all match committed verdicts.

Prose ↔ verdicts — SO 6/6 equivalent on both error metrics; MO zdt1/2/3/dtlz2 equivalent on all three metrics; ZDT4/ZDT6 framed only as in-our-favour exceptions (grep "worse" → none). No superiority framing. ZDT3 'visible scatter' caption replaced with the real cause (legacy per-individual PM, ~30× under-mutation).

Legacy removal — benchmarks/zdt/ tree gone; no real imports remain. Do-not-touch set (problems/metrics/harness/run.py/stats.py/config/tests/pyproject/uv.lock/.github) untouched.

Acceptance — render gate PASS + deterministic (figures byte-identical on re-render); ruff/ty(src) PASS; doctest gate 11 passed (collects render+reproduce); full suite 525 passed @ 98.89%; mkdocs build --strict exit 0.

CI: 4/4 green (py3.11/3.12/3.13 + numpy-floor). mergeStateStatus CLEAN.

Merging — completes milestone-12a (8/8 steps).

@CooperBigFoot CooperBigFoot merged commit 5893c63 into main Jun 26, 2026
4 checks passed
@CooperBigFoot CooperBigFoot deleted the milestone-12a/s6-reporting branch June 26, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant