Skip to content

perf(d-3 followup): harden compare.exs against schema drift (standards#99)#30

Merged
hyperpolymath merged 1 commit into
mainfrom
perf/d-3-compare-schema-drift
Jun 2, 2026
Merged

perf(d-3 followup): harden compare.exs against schema drift (standards#99)#30
hyperpolymath merged 1 commit into
mainfrom
perf/d-3-compare-schema-drift

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

Phase D-3 follow-up under the single-lane HCG tier-2 channel (standards#91). PR #26 (D-4 bootstrap) deferred this as a "separate defensive D-3 follow-up, not coupled to D-4 collection": once bench/baseline.json _status is flipped to active (which the D-4 ritual eventually does), two directions of schema drift between bench/results.json and bench/baseline.json silently passed the gate. This PR closes both.

Refs hyperpolymath/standards#91
Refs hyperpolymath/standards#99

NOT Closes #99: joint-close is owner-only; the D-4 maintainer-dispatch rebaseline workflow plus the _status flip to active still pend under #99 after this lands. Same posture as PRs #14 (D-2), #22 (D-3), #26 (D-4 bootstrap).

The gap

The comparator's old emit_table/3 iterated over results scenarios only and let check_regression(nil, …) fall through to "no baseline", which never bubbled up to the :regressed exit:

  1. Results-only scenario (new harness scenario landed without a rebaseline) → Map.get(baseline, name) returns nilcheck_regression(nil, …) returns "no baseline"reduce keeps acc = :ok → exit 0. The gate has no anchor for the new scenario, so it cannot meaningfully report regression for it, but it should say so instead of silently passing.
  2. Baseline-only scenario (the harness dropped a scenario the baseline still claims) → never enters the Enum.reduce(stats, …) body at all → invisible in the report and exit 0. A scenario was removed (or the harness crashed before emitting it) and the gate did not notice.

In scaffold-placeholder mode the existing code was fine — every row was tagged "scaffold" and the build exited 0 unconditionally — but it gave operators no preview of how the eventual active-mode verdict would look.

What changed

  • bench/compare.exs

    • emit_table/4 now iterates the union of scenario names from results.statistics and baseline.scenarios (sorted lexicographically), so both drift directions are visible.
    • Both directions are surfaced inline:
      • results-only → MISSING IN BASELINE
      • baseline-only → MISSING IN RESULTS
    • enforce: bool opt replaces the previous nil baseline sentinel — compare/2 now always passes Map.get(baseline, "scenarios", %{}) and uses enforce: false in scaffold-placeholder mode, enforce: true in active mode.
    • In enforce: false rows are displayed as scaffold (would fail: MISSING IN BASELINE) etc. so a rebaseline PR previews the active-mode verdict; the build still exits 0.
    • In enforce: true rows are displayed as bare MISSING IN BASELINE / MISSING IN RESULTS / REGRESSED and the comparator exits 1 if any row is drift.
    • The now-unreachable check_regression(nil, _, _, _, _) -> "no baseline" clause is removed.
    • Latent crash fixed: when baseline values are TODO sentinels (or any non-number), num/1 returns nil, and the old (bp50 && p50 && p50 > bp50 * t50) or (…) or (…) chain raised BadBooleanError because nil or nil is not a valid or. Inner && already short-circuits to nil; the outer joins are now || so the whole expression short-circuits consistently. Previously masked because scaffold mode never reached check_regression; the new flow does, so this had to be fixed in the same PR.
  • docs/perf-contract.md

    • New ## Schema drift section between ## Regression-alert tolerance and ## Baseline lifecycle documents the two directions, the active vs scaffold display difference, and the fail-closed semantic.
    • Updated SCAFFOLD-MODE banner inside compare.exs to mention that drift is now surfaced inline.

Behaviour matrix (smoke-tested)

Mode (_status) results.json baseline.json Status column Exit
scaffold A only (A absent) scaffold (would fail: MISSING IN BASELINE) 0
scaffold (B absent) B only scaffold (would fail: MISSING IN RESULTS) 0
scaffold C C (TODO) scaffold 0
scaffold D D (real, over tol.) scaffold (would fail: REGRESSED) 0
active A only (A absent) MISSING IN BASELINE 1
active (B absent) B only MISSING IN RESULTS 1
active C C (TODO) ok (TODO parses as nil → no breach) 0
active D D (real, over tol.) REGRESSED 1
active E E (real, within tol.) ok 0

The behaviour pivots on _status in bench/baseline.json — no code change is needed to arm the schema checks once the D-4 rebaseline + active flip lands.

Local verification

Smoke-tested via a synthetic-fixture harness against the four named cases (active-with-drift → :regressed, scaffold-with-drift → :ok with (would fail: …) rows, active-clean → :ok, active-TODO-sentinels → :ok no crash). Output matched expected status strings and exit returns in every case.

Build was not verified end-to-end — the session environment has Elixir 1.14 only, no Elixir 1.19 / OTP 28 toolchain — but Code.format_string!/1 reports the file is already formatted and Code.string_to_quoted!/1 round-trips under 1.14. The existing perf-regression.yml workflow exercises the comparator end-to-end on CI.

Test plan

  • CI green: Perf Regression workflow runs mix run bench/compare.exs end-to-end and posts a scaffold-mode markdown report (still non-blocking — _status is scaffold-placeholder).
  • CI green: existing workflows (governance, hypatia-scan, dogfood-gate, codeql, scorecard) unaffected.
  • CI green: mix test still passes — bench/compare.exs is not exercised by mix test; no production-code change in this PR.
  • Manual (post-merge, owner): when D-4 rebaseline PR lands real numbers, the scaffold-mode Status column should still read scaffold for every scenario unless a true drift is present.
  • Manual (post-active-flip): when _status is flipped to active, the comparator exits 1 on any MISSING IN BASELINE, MISSING IN RESULTS, or REGRESSED row.

Downstream unblock

The boj-server rollout-prerequisite checklist in docs/integration/hcg-tier2-rollout-runbook.md §1.1 lists "Phase D-3 (gate armed)" and "Phase D-4 (numbers populated)" as the remaining open items gating Phase E rollout. This PR doesn't tick either box directly — neither requires schema-drift hardening — but it hardens the gate before it gets armed, so the first time _status flips to active the gate already covers the failure modes a future scenario rename / removal would otherwise hide.

Owner merges; not for admin-merge.

🤖 Generated with Claude Code


Generated by Claude Code

…s#99)

Phase D-3 follow-up under the single-lane HCG tier-2 channel
(standards#91). PR #26 (D-4 bootstrap) deferred this as a "separate
defensive D-3 follow-up, not coupled to D-4 collection": when
bench/baseline.json `_status` is flipped to `active`, a scenario
present in results.json but absent from baseline.json (a new harness
scenario landed without rebaseline) silently passed the gate, and a
scenario present in baseline.json but absent from results.json (the
harness dropped a scenario without rebaselining) was never even
checked. Both directions of schema drift now fail-closed in active
mode and surface as informational "scaffold (would fail: ...)" rows
in scaffold-placeholder mode so a rebaseline PR previews the
active-mode verdict before the gate is armed.

The comparator now iterates the union of scenario names across
results and baseline rather than the results map alone, and uses a
single `enforce: bool` opt to pivot between scaffold and active mode
(replaces the previous `nil` sentinel). check_regression/5 also has
a latent crash fixed in the process — when baseline values are TODO
sentinels (or any non-number), num/1 returns nil and the `or` chain
raises BadBooleanError; the inner `&&` short-circuit already returns
nil for unknowns, so the outer joins are switched from `or` to `||`
to match. Previously this was masked by scaffold mode never reaching
check_regression at all (the `nil` sentinel skipped it); the new
flow exposes that path in scaffold mode too.

docs/perf-contract.md gains a "Schema drift" section explaining the
two directions, the active vs scaffold display difference, and the
fail-closed semantic. The behaviour pivots on `_status` in
bench/baseline.json — no code change is needed to arm the schema
checks once Phase D-4 maintainer-only rebaseline + active flip lands.

Smoke-tested locally against synthetic results/baseline fixtures
(four cases: active+drift→regressed, scaffold+drift→ok-with-warnings,
active+clean→ok, active+TODO-sentinels→ok-no-crash). Build is not
verified end-to-end — the session environment has Elixir 1.14 only,
no Elixir 1.19 / OTP 28 toolchain — but Code.format_string!/1 reports
the file is already formatted and Code.string_to_quoted!/1 round-trips
under 1.14. Repo CI (`Perf Regression`, governance, hypatia-scan,
dogfood-gate, codeql, scorecard) is the verification gate.

Refs hyperpolymath/standards#91
Refs hyperpolymath/standards#99

NOT Closes #99: joint-close is owner-only; D-4 baseline collection
plus the `_status` flip to active still pend under #99 after this
lands. Same posture as PRs #14, #22, #26.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

🔍 Hypatia Security Scan

Findings: 65 issues detected

Severity Count
🔴 Critical 6
🟠 High 17
🟡 Medium 42

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Issue in boj-build.yml",
    "type": "missing_timeout_minutes",
    "file": "boj-build.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in casket-pages.yml",
    "type": "missing_timeout_minutes",
    "file": "casket-pages.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in codeql.yml",
    "type": "missing_timeout_minutes",
    "file": "codeql.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in dogfood-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "dogfood-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in governance.yml",
    "type": "missing_timeout_minutes",
    "file": "governance.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath marked this pull request as ready for review June 2, 2026 09:05
@hyperpolymath hyperpolymath merged commit 4cda8d7 into main Jun 2, 2026
18 checks passed
@hyperpolymath hyperpolymath deleted the perf/d-3-compare-schema-drift branch June 2, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant