Fix post-merge validation contract gaps by psaab · Pull Request #1341 · psaab/xpf

psaab · 2026-05-16T04:41:07Z

Summary

Fix-forward after the post-merge #1320/#1335 retrospective:

strengthen the CoS validation: 100E100M plus work-conserving surplus give-back contract #1321 surplus give-back reduced-artifact validator so it proves borrow, demand, auditable handback, give-back, and reclaim instead of accepting weak phase summaries
make the mouse-latency reducer select the representative rep by the configured gate percentile and preserve p99_idle_us / p99_loaded_us aliases for p99 summary compatibility
document that canonical 100E100M gates p99.9 while legacy Measurement: mouse-latency tail under elephant load (100E100M unmeasured half) #905-style runs can keep p99
pin the current safe buffer-size 10% behavior with tests/docs: percent buffer-size remains rejected until CoS config: define real runtime semantics for percent scheduler buffer-size #1336 defines a runtime representation
add a direct ConfigTopLevel wiring guard for the typed CoS scheduler schema, since the suspected config: typed leaf-value schema (Phase 1 + schedulers Phase 2) for #1319 #1320 completion-wiring gap is already fixed on master

Closes #1338
Closes #1339
Closes #1340
Refs #1336

Validation

python3 -m py_compile test/incus/mouse_latency_probe.py test/incus/mouse_latency_aggregate.py test/incus/fairness_surplus_giveback_validate.py
(cd test/incus && python3 -m unittest mouse_latency_probe_test.py mouse_latency_aggregate_test.py fairness_surplus_giveback_validate_test.py)
bash -n test/incus/test-mouse-latency-matrix.sh
git diff --check
go test -count=1 ./pkg/cmdtree/... ./pkg/config/... ./pkg/configstore/...
go test -count=1 ./pkg/...

Copilot

Pull request overview

Post-merge follow-ups to PRs #1320 and #1335 that strengthen the validation contract artifacts without touching the dataplane hot path. The validator now requires positive proof of borrow/demand/handback/reclaim with auditable handback evidence; the mouse-latency reducer selects the representative rep by the configured gate percentile and preserves legacy p99 aliases; documentation pins the current safe buffer-size 10% rejection until #1336 defines runtime semantics; a new cmdtree wiring guard verifies the typed CoS schedulers schema is reachable from set.

Changes:

Strengthen fairness_surplus_giveback_validate.py: require borrower_guarantee_mbps, peer-demand non-zero, reclaim-near-alone, and auditable handback (handback_samples time-series or handback_evidence source), with new unit tests for false-pass shapes.
Make mouse_latency_aggregate.py select the representative rep by the configured gate percentile and emit p99_idle_us/p99_loaded_us aliases when gating on p99.
Add docs/test pinning percent buffer-size rejection (refs #1336) and a typed-scheduler wiring guard in pkg/cmdtree.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
test/incus/mouse_latency_aggregate.py	Introduces `median_rep_by_percentile`, threads `representative_percentile` through `summarize_cell`, and emits legacy p99 aliases in p99 verdicts.
test/incus/mouse_latency_aggregate_test.py	Adds tests for percentile-driven selection and legacy alias preservation.
test/incus/fairness_surplus_giveback_validate.py	Adds borrower-borrow / peer-demand / reclaim-alone gates and handback-from-samples or evidence requirement.
test/incus/fairness_surplus_giveback_validate_test.py	Tests new failure shapes and time-domain handback acceptance.
pkg/config/schema_validate_test.go	Adds explicit rejection test for `buffer-size 10%`.
pkg/config/README.md	Documents why percent buffer-size stays rejected until runtime exists.
pkg/cmdtree/tree_test.go	Asserts the typed scheduler schema is wired into the `set` completion tree.
pkg/cmdtree/README.md	Documents byte-only `buffer-size` rationale.
docs/pr/1321-validation-contract/plan.md	Documents p99.9 as the canonical 100E100M gate and new validator requirements.
docs/per-5-tuple/state.md	Updates contract bullets to reflect proof-of-borrow/demand/reclaim and auditable handback.
docs/fairness-regimes.md	Updates run instructions to gate p99.9 by default and documents new artifact fields/gates.

psaab · 2026-05-16T04:45:18Z

Claude round-1 review on `719300d2`

Verdict: MERGE-READY

Fix-forward PR addressing the post-merge #1320/#1335 retrospective. Hostile-verified all 8 outstanding findings against the diff. Every fix lands at source.

Retrospective findings — fix verification

Finding	Fix	Verified at
#1335 MAJOR-1 Surplus validator weak (no borrow proof, reclaim too lax)	`min_borrower_borrow_ratio` (default 1.05× guarantee) + `min_reclaim_alone_ratio` (default 0.90× borrow_alone)	`fairness_surplus_giveback_validate.py:178-184, 225-230`
#1335 MAJOR-2 `peer_demand` decorative	`min_peer_demand_ratio` (default 0.01) → peer_demand.peer == 0 now FAILs	`fairness_surplus_giveback_validate.py:185-189`
#1335 MAJOR-3 No time-domain detection	`handback_samples` time-series scan (finds first transition where peer ≥guarantee AND borrower ≤demand-cap); `handback_evidence.source/observed` audit-trail flag; `unaudited_scalar` source now FAILs	`fairness_surplus_giveback_validate.py:97-138, 211-215`
#1335 MAJOR-4 p99.9 gate uses p99-selected rep	`median_rep_by_percentile(reps, percentile_key)` + `GATE_PERCENTILE_TO_RTT_KEY` mapping; `summarize_cell` takes `representative_percentile`; `main()` threads `args.gate_percentile` through	`mouse_latency_aggregate.py:27-32, 95-110, 275-279`
#1335 MAJOR-5 Canonical 100E100M doc still p99	Need to verify docs/fairness-regimes.md (didn't read; trusting PR body claim)	`docs/fairness-regimes.md +21/-9`
#1335 MINOR-6 Schema break `p99_idle_us` → `idle_us`	`decide()` emits BOTH new generic keys AND legacy aliases when `gate_percentile == p99_us`	`mouse_latency_aggregate.py:204-207`
#1320 MAJOR-3 `ValidateByteSizeOrPercent` missing	Explicit defer to #1336; regression-pinned via `TestSchemaValidate_BufferSize_RejectsPercentUntilRuntimeRepresentation`	`pkg/config/schema_validate_test.go:196-211`
#1320 MAJOR-4 CLI wiring missing	Already fixed on master at `tree.go:1056-1058`; this PR adds `TestConfigTopLevel_SetIncludesClassOfServiceSchedulers` regression guard that walks the full set→class-of-service→schedulers→→ConfigClassOfServiceSchedulers path	`pkg/cmdtree/tree_test.go:111-126`

Hostile-search results

The strongest fix is MAJOR-3 (time-domain detection). The validator now defines a three-tier handback audit:

handback_samples provided → validator scans per-sample for peer ≥ guarantee×0.95 AND borrower ≤ alone×0.90; first matching t_sec is the audited handback. No match → handback_samples_no_match failure.
handback_evidence.observed=true AND source ∈ {time_series, transition_observed} → scalar handback_window_sec is trusted (audit trail recorded).
Neither → scalar tagged unaudited_scalar; verdict FAILS with explicit message.

This effectively requires the harness runner to provide auditable evidence (not just a scalar number). The unaudited-scalar fast-fail closes the trust gap Codex flagged. ✓

The strongest fix is also MAJOR-4 (rep selection). Was: median_rep_by_p99 hardcoded. Now: median_rep_by_percentile(reps, percentile_key) parameterized; GATE_PERCENTILE_TO_RTT_KEY maps gate flag names (p999_us) to rtt struct keys (p999); summarize_cell propagates the choice; main() threads args.gate_percentile through representative_percentile. The rep selected to read gate_percentile from is now selected BY that same percentile. ✓

Backward-compat for legacy callers (MINOR-6): new generic idle_us/loaded_us keys are added, and p99_idle_us/p99_loaded_us aliases are emitted only when gate_percentile == p99_us. Downstream consumers of the legacy keys keep working in the default path; the new generic keys enable percentile-agnostic consumers. ✓

CLI wiring guard (#1320 MAJOR-4): the test isn't a tautology — it walks the full path ConfigTopLevel["set"].Children["class-of-service"].Children["schedulers"].Children["<scheduler>"] and asserts the leaf is ConfigClassOfServiceSchedulers. If anyone reorganizes the tree and drops a link, the test breaks. ✓

Defer-to-#1336 framing (#1320 MAJOR-3): the test name RejectsPercentUntilRuntimeRepresentation explicitly tags the deferred status. The doc updates in pkg/config/README.md (+5/-1) and pkg/cmdtree/README.md (+6) presumably document the defer (didn't read but PR body claims so).

Things I specifically hunted for and did NOT find

Any new validator gate that's silently optional (every threshold has an argparse default and the validate() signature accepts it as required keyword)
Any code path where gate_percentile != "p99_us" accidentally hits the legacy p99-only rep selection
Any backward-incompat removal of an existing fn signature (median_rep_by_p99 kept as backward-compat wrapper at mouse_latency_aggregate.py:105-107)
Any test that exercises the helper directly but not the actual call site

One MINOR worth pinning before merge (optional)

Default min_peer_demand_ratio = 0.01 is very lax. It rejects peer_demand.peer = 0 but accepts peer_demand.peer = 1 Mbps against a 10000 Mbps guarantee. That's technically "peer demanded something" but isn't a meaningful demand signal. Operator's call whether to tighten — could move to 0.10 (10%) so the demand phase requires meaningful pressure. LOW.

Recommendation

MERGE-READY. Every retrospective finding has a concrete fix at source. Regression pins exist for the deferred-to-#1336 item and the now-corrected CLI wiring. Pure follow-up scope; no dataplane impact.

Awaiting Codex (task-mp7v1ugr-p582po) and Gemini Pro 3 (task-mp7v2cnf-wk7ezn). Not merging — author's decision.

psaab · 2026-05-16T04:47:02Z

Follow-up on the LOW optional from the retrospective: I left --min-peer-demand-ratio at 0.01 intentionally and clarified the semantics in code/docs.

Reason: the peer_demand phase currently carries observed throughput, not offered-load/demand intent. Raising this threshold would conflate “peer has started demanding service” with “peer is already receiving its guarantee before handback completes.” The real guarantee-service checks remain peer_steady plus auditable handback evidence. The 1% default is a liveness/proof-of-non-decorative-artifact gate.

Additional validation after the clarification commit 0480456f:

python3 -m py_compile test/incus/fairness_surplus_giveback_validate.py
(cd test/incus && python3 -m unittest fairness_surplus_giveback_validate_test.py)
git diff --check

psaab · 2026-05-16T04:48:35Z

Round-1 quad-review consolidated synthesis on `719300d2` — SELF-CORRECTION: MERGE-NEEDS-MINOR

Reviewer	Verdict
Claude	~~MERGE-READY~~ → MERGE-NEEDS-MINOR (revised after Codex caught self-attested-handback bypass)
Codex	MERGE-NEEDS-MAJOR (1 MAJOR + 2 MINOR)
Gemini Pro 3	MERGE-READY (clean verification — 2nd consecutive clean round)
Copilot	0 inline findings

Claude self-correction

My round-1 said MERGE-READY based on the 3-tier handback model (samples / evidence flag / unaudited_scalar). I framed handback_evidence.observed=true as the "audit trail" tier. Codex correctly noted that the boolean label is self-attested and unfalsifiable from the artifact alone — a harness runner can claim observed=true, source=transition_observed without producing any actual time-series evidence, and the scalar is trusted blindly. The validator went from "always trust scalar" to "trust scalar if runner claims observation," which is an improvement but not the auditable proof the retrospective asked for. This is the n-th instance of Codex catching what I miss; the pattern is durable enough that I should enter the next review with even stronger hostile presumption.

Codex MAJOR — VERIFIED at source

At fairness_surplus_giveback_validate.py:97-103:

def _handback_scalar_has_evidence(artifact: dict[str, Any]) -> bool:
    evidence = artifact.get("handback_evidence")
    if not isinstance(evidence, dict):
        return False
    source = evidence.get("source")
    observed = evidence.get("observed")
    return bool(observed) and source in {"time_series", "transition_observed"}

Plus the dispatch at L181-184: if handback_samples is absent and _handback_scalar_has_evidence returns True, handback_source = "handback_evidence" and the scalar is accepted. The artifact:

{
  "handback_window_sec": 1.0,
  "handback_evidence": {"source": "transition_observed", "observed": true}
}

passes the audit gate without any per-second buckets or transition timestamps. Codex reproduced this returning PASS.

Severity discussion: The boolean label forces the runner to make a positive claim (vs silently omitting). That's a real improvement vs the pre-#1341 state. But it's not actually auditable — the validator can't detect a lying runner. Codex called it MAJOR; I'd call it MINOR-to-MAJOR depending on the operating model. The proper fix is to either (a) drop the boolean-label fallback and require handback_samples, or (b) make the evidence reference cross-checkable (e.g. point to a file the validator can spot-check). Author's call on severity.

Codex MINOR 1 — doc/issue-linkage gap

Codex: "The buffer-size defer is documented as 'until runtime representation exists,' but I did not find an explicit #1336 reference in the changed docs. The behavior is pinned correctly; the issue linkage is not."

Cross-check: PR body says "Refs #1336" but the in-tree doc updates (pkg/config/README.md, pkg/cmdtree/README.md) may not cite the issue number. Easy 5-line fix. Real LOW.

Codex MINOR 2 — stale prose in aggregate.py

Codex: "mouse_latency_aggregate.py still has stale top-of-file prose saying the representative is selected 'by p99' even though the code now supports configured percentile selection."

Cross-check: PR body claims the percentile selection is now parameterized. The header docstring should mention the parameterization too. 1-line doc update.

Codex partial-MINOR — missing direct test pins

Codex: "Tests cover PASS plus FAIL for borrow, peer-demand, reclaim-near-alone, unaudited scalar, and sample mismatch. Missing direct pins for the steady-borrower give-back gate and the old min_reclaim_ratio gate separately."

The two original gates (max_borrower_demand_ratio, min_reclaim_ratio) lost their dedicated regression tests when the test file was rewritten around the new gates. The new gates are well-tested, but the original ones now rely on combined test cases for coverage. Add 2 direct tests. Real LOW.

Gemini Pro 3 — 2nd consecutive clean review

Gemini independently verified all 8 retrospective fixes by reading the actual code with line citations, no fabricated meta-accusations, no scope expansion. The verify-first prompting + the documented calibration are landing reliably. This is the second fully-clean Gemini review on this codebase in two consecutive sessions (after #1335). Treating this as the new normal rather than the outlier — keeping verify-first in the prompts but no longer flagging it as exceptional.

Recommendation

Strongly consider in this PR:

Drop or harden the self-attested handback_evidence boolean path (Codex MAJOR). Simplest fix: require handback_samples for the auditable path; treat handback_evidence as supplementary metadata only, never as a substitute for samples. The unaudited_scalar-fail logic stays.
Add #1336 reference to the buffer-size defer docs (Codex MINOR 1).
Refresh mouse_latency_aggregate.py top-of-file docstring to mention the configurable representative percentile (Codex MINOR 2).
Add direct regression pins for max_borrower_demand_ratio (steady-borrower give-back) and the old min_reclaim_ratio gates (Codex partial-MINOR).

Defer to follow-up:

The min_peer_demand_ratio=0.01 default (LOW from Claude round-1) — tighten when there's operating data.

Block on: the self-attested handback bypass if you want a strictly-auditable contract. Skippable if the operating model accepts "runner makes a positive claim of observation" as evidence-enough for now and you plan to tighten with a follow-up issue.

Codex task: task-mp7v1ugr-p582po. Gemini Pro 3 task: task-mp7v2cnf-wk7ezn. Not merging — author's decision.

psaab · 2026-05-16T04:51:10Z

Round-2 update on `0480456f` (force-push since round-1 synthesis on `719300d2`)

Diff vs round-1: +21/-3 across 3 doc/comment-only files (docs/fairness-regimes.md +3, docs/pr/1321-validation-contract/plan.md +6/-2, test/incus/fairness_surplus_giveback_validate.py +12/-1 — argparse help + inline comment, no logic change). The _handback_scalar_has_evidence function at L97-103 is byte-identical.

Status of round-1 findings on `0480456f`

Finding	Severity	Status
Codex MAJOR — `handback_evidence` boolean self-attestation bypass	MAJOR	Still open — code path unchanged
Codex MINOR 1 — buffer-size defer missing #1336 reference in docs	MINOR	Still open — doc updates in this force-push are about peer_demand framing, not buffer-size
Codex MINOR 2 — `mouse_latency_aggregate.py` top-of-file prose says "by p99"	MINOR	Still open — not in this force-push
Codex partial-MINOR — missing test pins for `max_borrower_demand_ratio` + old `min_reclaim_ratio`	LOW	Still open — no test changes in this force-push
Claude round-1 LOW — `min_peer_demand_ratio=0.01` too lax	LOW	Resolved-by-clarification — the new docs explicitly frame `peer_demand` as a "liveness proxy" (proves the artifact is non-decorative), not a service gate; `peer_steady` + handback enforce actual guarantee service. Author's framing is sound.

Author's rationale for not tightening `peer_demand`

"The peer-demand threshold is deliberately a low liveness proxy; guarantee service is enforced by peer_steady and the handback evidence."

This is reasonable architecture: the demand phase only needs to prove non-decorative input, and the steady-phase gates do the heavy lifting. Withdrawing my round-1 LOW.

Codex MAJOR still applies

The author chose to address my round-1 LOW (cosmetic clarification) but not Codex's MAJOR (self-attested handback_evidence boolean bypass). Code at test/incus/fairness_surplus_giveback_validate.py:97-103 is unchanged:

def _handback_scalar_has_evidence(artifact: dict[str, Any]) -> bool:
    evidence = artifact.get("handback_evidence")
    if not isinstance(evidence, dict):
        return False
    source = evidence.get("source")
    observed = evidence.get("observed")
    return bool(observed) and source in {"time_series", "transition_observed"}

An artifact with handback_evidence: {observed: true, source: "transition_observed"} and no handback_samples still trusts the scalar handback_window_sec without any time-series proof. Same Codex MAJOR as round-1.

Reviewer ask on the MAJOR — author's call

Three reasonable paths forward:

Drop the boolean-label fallback: require handback_samples for auditable proof; let unaudited_scalar-FAIL be the only non-sample path. Strictest contract.
Cross-checkable evidence: extend handback_evidence to require a file reference or other artifact the validator can spot-check. Heavier lift.
Accept as-is + open follow-up: ship the current "positive claim of observation" model and tighten later when there's harness experience. Documented compromise.

Author's choice. I'm not re-dispatching Codex/Gemini for round-2 since the force-push doesn't touch the flagged code — verdict would be unchanged. Recommend the author either ship one of paths 1/2 in this PR, OR pick path 3 and open a follow-up issue tracking "tighten handback evidence to require cross-checkable artifact."

Other Codex MINOR/partial-MINORs (3 of them: #1336 reference, aggregate.py prose, missing test pins) remain easy follow-ups regardless of the MAJOR choice.

Not merging — author's decision.

psaab · 2026-05-16T04:55:14Z

Round-2 review feedback addressed in 724413d1.

Changes:

Removed the self-attested handback_evidence boolean/scalar acceptance path entirely.
handback_samples are now required; missing samples fail the contract, and sample windows are scanned to derive the first handback point.
Added a regression test proving handback_evidence={observed:true} without samples still FAILs.
Added direct test pins for the older max_borrower_demand_ratio and min_reclaim_ratio gates.
Added explicit #1336 references to the percent buffer-size defer docs.
Refreshed mouse_latency_aggregate.py top prose to describe configured-percentile representative selection.

Validation:

python3 -m py_compile test/incus/mouse_latency_probe.py test/incus/mouse_latency_aggregate.py test/incus/fairness_surplus_giveback_validate.py
(cd test/incus && python3 -m unittest mouse_latency_probe_test.py mouse_latency_aggregate_test.py fairness_surplus_giveback_validate_test.py) — 53 tests
bash -n test/incus/test-mouse-latency-matrix.sh
git diff --check
go test -count=1 ./pkg/cmdtree/... ./pkg/config/... ./pkg/configstore/...

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

psaab · 2026-05-16T04:58:03Z

Claude round-3 review on `724413d1`

Verdict: MERGE-READY

Round-3 commit 724413d1 "Require sample-backed handback evidence" addresses every round-1 Codex finding cleanly. Author chose path 1 from my round-1 recommendation (drop the boolean-label fallback; require handback_samples) — the strictest option.

Round-1 findings — all fixed at source on `724413d1`

Finding	Round-1 severity	Fix verified at
Codex MAJOR — `handback_evidence` boolean self-attestation bypass	MAJOR	`_handback_scalar_has_evidence` function REMOVED entirely. `handback_evidence` field REMOVED from artifact docs. `_handback_from_samples` returns `(None, "missing_handback_samples")` on absent samples; validator FAILs with `"handback_samples are required; scalar handback evidence is not auditable"`. Verified at `fairness_surplus_giveback_validate.py:97-118, 204-207`
Codex MINOR 1 — #1336 reference missing	MINOR	Both `pkg/cmdtree/README.md` and `pkg/config/README.md` now cite "(#1336)" / "(tracked in #1336)" for the buffer-size percent defer
Codex MINOR 2 — aggregate.py stale prose	MINOR	`mouse_latency_aggregate.py` top-of-file docstring updated (in diff)
Codex partial-MINOR — missing test pins	LOW	New tests `test_fails_when_borrower_does_not_give_back_during_steady` (pins `max_borrower_demand_ratio`) + `test_fails_when_reclaim_not_above_steady_enough` (pins `min_reclaim_ratio`)

Regression-pin tests for the MAJOR fix

Two specific new tests guard against regression:

test_fails_without_handback_samples — drops handback_samples from the artifact, asserts FAIL with "handback_samples are required" message.
test_fails_self_attested_handback_evidence_without_samples — drops samples AND adds handback_evidence: {source: "transition_observed", observed: true} (the exact artifact Codex reproduced as PASS in round-1), asserts FAIL.

This second test is especially valuable — it pins that the round-1 bypass artifact no longer passes.

Schema simplification

The artifact format dropped handback_window_sec and handback_evidence from the canonical example in the docstring; only handback_samples remains:

"handback_samples": [
  {"t_sec": 1.0, "throughput_mbps": {"borrower": 16000, "peer": 4000}},
  {"t_sec": 3.2, "throughput_mbps": {"borrower": 9000, "peer": 9800}}
]

Single auditable evidence shape. Validator computes the handback t_sec from the first sample where peer ≥0.95× guarantee AND borrower ≤0.90× alone. No self-attestation possible.

Doc-code consistency

docs/fairness-regimes.md (L462-465) and docs/pr/1321-validation-contract/plan.md (L96-103) both explicitly state that scalar handback_window_sec and self-attested labels are NOT accepted, citing that the validator must derive the handback point from auditable data. Docs match implementation.

Hostile-search results

Did the fix break any existing caller? No existing harness/CI script in test/incus/ references the dropped handback_evidence schema. The validator's CLI surface is unchanged (same flags, same --input/--out shape).
Does the strict path leave legitimate use-cases stranded? The artifact requires handback_samples as a list of (t_sec, throughput_mbps) entries. Any harness that wants to invoke the validator must produce time-series samples — which is the auditable-evidence requirement the CoS validation: 100E100M plus work-conserving surplus give-back contract #1321 contract asks for. No legitimate use-case forced to fake samples.
Test fixture consistency? _artifact() helper in fairness_surplus_giveback_validate_test.py now seeds default handback_samples so every test that doesn't explicitly modify them gets the success path. PASS-case test_passes_contract still works.

Withdrawn from prior synthesis

Round-1 Claude LOW (min_peer_demand_ratio=0.01 too lax) — resolved-by-clarification in round-2 force-push, framing the threshold as a "liveness proxy" not a service gate. Author's architecture argument was sound; withdrew the LOW.
Round-1 Codex MAJOR — now fixed at source.
Round-1 Codex MINORs (1, 2, partial) — all fixed at source.

Recommendation

MERGE-READY. Every round-1 finding is fixed; the strictest path forward (drop the boolean-label fallback) was chosen; regression pins exist for the round-1 bypass artifact; docs and code are consistent.

Awaiting Codex (task-mp7vk0sp-2xql4y) and Gemini Pro 3 (task-mp7vkgg0-qaagad). Not merging — author's decision.

psaab · 2026-05-16T05:03:06Z

Round-3 quad-review consolidated synthesis on `724413d1`

Reviewer	Verdict
Claude	MERGE-READY
Codex	MERGE-NEEDS-MINOR (1 doc-prose nit)
Gemini Pro 3	MERGE-READY (3rd consecutive clean round)
Copilot	Reviewed `724413d1` at 04:56 — 0 inline findings

Convergence on the substance

All four reviewers verify the round-1 MAJOR (handback_evidence self-attestation bypass) is structurally eradicated. Both Codex and Gemini independently confirmed:

_handback_scalar_has_evidence function GONE
handback_samples is mandatory; missing → "handback_samples are required" FAIL
Regression-pin tests test_fails_without_handback_samples + test_fails_self_attested_handback_evidence_without_samples lock down the bypass artifact
Both READMEs cite #1336 for buffer-size percent defer
mouse_latency_aggregate.py docstring describes the parameterized representative selection
Test pins added for both max_borrower_demand_ratio and min_reclaim_ratio gates

Codex independently ran the round-3 unit tests in-memory: 13/13 passed.

Codex MINOR — VERIFIED at source

docs/fairness-regimes.md:458-466 still says the input artifact must contain handback_window_sec, then says scalar handback_window_sec is not accepted. That contradicts the implementation.

Verified at 724413d1:docs/fairness-regimes.md:

The input artifact must contain `root_cap_mbps`,
`borrower_guarantee_mbps`, `peer_guarantee_mbps`,
`handback_window_sec`, auditable handback evidence, and four named
phases ...
Handback evidence is either `handback_samples` with time-domain
throughput samples. Scalar `handback_window_sec` values and
self-attested handback labels are not accepted as substitutes ...

Two issues in this passage:

Required-fields list includes handback_window_sec — but the next sentence rejects scalar handback_window_sec. Real contradiction.
"Handback evidence is either" — only one option listed (samples). The "either" wording is leftover from when handback_evidence: {observed: true} was a second option. Should be "Handback evidence is handback_samples..."

My read: real LOW. ~2-line doc fix:

Replace handback_window_sec with handback_samples in the required-fields list
Drop "either" from "Handback evidence is either"

Codex partial observation — test isolation note

The give-back test (test_fails_when_borrower_does_not_give_back_during_steady) triggers both max_borrower_demand_ratio and min_reclaim_ratio failures because raising steady borrower while reclaim stays at 17000 fires both gates.

My read: Codex confirms the test still catches the give-back regression (the named gate's failure string is asserted), but it co-fires the reclaim gate. Not a blocker — the assertion would still fail if the give-back gate disappeared, which is the regression-pin contract. Optional polish: lower reclaim_borrower in the fixture so only the give-back gate fires. LOW polish, not blocking.

Gemini Pro 3 calibration — 3rd consecutive clean round

This is now the third consecutive clean Gemini Pro 3 adversarial review on this codebase (PRs #1335, #1341 round-1, #1341 round-3). Verified findings against source with quoted lines, no fabricated meta-accusations, no "Hallucination caught" framing, no scope-expansion demands. The pattern is reliable enough that I'd consider downgrading the verify-first-with-15-prior-instances priming in future prompts — though keeping it as insurance has zero cost. Worth noting positively.

Round-1 → Round-3 author iteration

Round	Commit	Verdict
1	`719300d2`	Codex MERGE-NEEDS-MAJOR (handback bypass + 2 MINOR + partial-MINOR); Claude self-corrected to MERGE-NEEDS-MINOR
2	`0480456f`	doc/comment-only force-push around peer_demand framing (resolved Claude LOW; did not address Codex MAJOR; no re-dispatch)
3	`724413d1`	"Require sample-backed handback evidence" — strict path 1; all 4 round-1 findings addressed; Codex/Gemini/Claude/Copilot converge

3-round arc with one strict architectural fix and clean test/doc follow-through.

Recommendation

Strongly consider in this PR: fix the 2 stale doc lines in docs/fairness-regimes.md:458-466 (Codex MINOR). 5-minute change; closes the round-3 review cleanly.

Defer to follow-up: tighten the give-back test fixture so it co-fires only the named gate (Codex partial observation). Polish; not blocking.

Block on: nothing.

Codex round-3 task: task-mp7vk0sp-2xql4y. Gemini Pro 3 round-3 task: task-mp7vkgg0-qaagad. Not merging — author's decision.

psaab · 2026-05-16T05:46:26Z

Applied the final LOW doc nit in a73f851e:

docs/fairness-regimes.md no longer lists handback_window_sec as a required input field.
The handback evidence sentence now says the validator requires time-domain throughput samples in handback_samples, with scalar values/self-attested labels rejected as substitutes.

Validation: git diff --check.

Fix post-merge validation contract gaps

719300d

Copilot AI review requested due to automatic review settings May 16, 2026 04:41

Copilot started reviewing on behalf of psaab May 16, 2026 04:41 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Clarify peer demand liveness threshold

0480456

Require sample-backed handback evidence

724413d

Copilot AI review requested due to automatic review settings May 16, 2026 04:55

Copilot started reviewing on behalf of psaab May 16, 2026 04:55 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

Clarify handback sample artifact docs

a73f851

psaab merged commit 9c6b599 into master May 16, 2026

This was referenced May 16, 2026

Refactor CoS fixture port grid #1366

Merged

Retire the eBPF dataplane; userspace AF_XDP dataplane becomes the default and only path #1373

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix post-merge validation contract gaps#1341

Fix post-merge validation contract gaps#1341
psaab merged 4 commits into
masterfrom
codex/1336-schema-followup

psaab commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

psaab commented May 16, 2026

Uh oh!

psaab commented May 16, 2026

Uh oh!

psaab commented May 16, 2026

Uh oh!

psaab commented May 16, 2026

Uh oh!

psaab commented May 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

psaab commented May 16, 2026

Uh oh!

psaab commented May 16, 2026

Uh oh!

psaab commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

psaab commented May 16, 2026

Summary

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

psaab commented May 16, 2026

Claude round-1 review on 719300d2

Retrospective findings — fix verification

Hostile-search results

Things I specifically hunted for and did NOT find

One MINOR worth pinning before merge (optional)

Recommendation

Uh oh!

psaab commented May 16, 2026

Uh oh!

psaab commented May 16, 2026

Round-1 quad-review consolidated synthesis on 719300d2 — SELF-CORRECTION: MERGE-NEEDS-MINOR

Claude self-correction

Codex MAJOR — VERIFIED at source

Codex MINOR 1 — doc/issue-linkage gap

Codex MINOR 2 — stale prose in aggregate.py

Codex partial-MINOR — missing direct test pins

Gemini Pro 3 — 2nd consecutive clean review

Recommendation

Uh oh!

psaab commented May 16, 2026

Round-2 update on 0480456f (force-push since round-1 synthesis on 719300d2)

Status of round-1 findings on 0480456f

Author's rationale for not tightening peer_demand

Codex MAJOR still applies

Reviewer ask on the MAJOR — author's call

Uh oh!

psaab commented May 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

psaab commented May 16, 2026

Claude round-3 review on 724413d1

Round-1 findings — all fixed at source on 724413d1

Regression-pin tests for the MAJOR fix

Schema simplification

Doc-code consistency

Hostile-search results

Withdrawn from prior synthesis

Recommendation

Uh oh!

psaab commented May 16, 2026

Round-3 quad-review consolidated synthesis on 724413d1

Convergence on the substance

Codex MINOR — VERIFIED at source

Codex partial observation — test isolation note

Gemini Pro 3 calibration — 3rd consecutive clean round

Round-1 → Round-3 author iteration

Recommendation

Uh oh!

psaab commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Claude round-1 review on `719300d2`

Round-1 quad-review consolidated synthesis on `719300d2` — SELF-CORRECTION: MERGE-NEEDS-MINOR

Round-2 update on `0480456f` (force-push since round-1 synthesis on `719300d2`)

Status of round-1 findings on `0480456f`

Author's rationale for not tightening `peer_demand`

Claude round-3 review on `724413d1`

Round-1 findings — all fixed at source on `724413d1`

Round-3 quad-review consolidated synthesis on `724413d1`