Dogfood 2026-06-06: gate verdict clarity, legis dev-loop, payload controls, 401 distinction by tachyon-beep · Pull Request #30 · foundryside-dev/wardline

tachyon-beep · 2026-06-06T07:49:39Z

Closes the actionable wardline items from the 2026-06-06 Loom dogfood friction report
(label `dogfood-2026-06-06`). All five concerns from the re-test are addressed and
live-verified against a freshly-spawned MCP server.

⚠️ Deployment note for the federation (read first)

The re-test reported #2/#3/#4 as "not addressed". Root cause: stale long-running
`wardline mcp` processes, not missing code. The install is editable
(`~/wardline/src`), so a fresh spawn already has every fix — but seven long-lived MCP
servers were frozen at their spawn-time source (one was internally inconsistent and
crashed on `GateDecision.reason`). The re-tester tested #1 via the CLI (fresh process →
worked) and #2/#3/#4 via a stale MCP server (→ looked unaddressed).

Action required after merge: partners must restart their `wardline mcp` server
(or session) to pick up the code. No restart ⇒ same "broken" output.

Fixes

#	Concern	Fix	Issue
1 (P0)	`--allow-dirty` on `scan --format legis`	unsigned, `dirty:true`-marked dev artifact; signing stays clean-tree-only	wardline-30f3d38fa5
2 (P1)	gate contradicts its summary	`gate.reason` + `gate.evaluated`; `next_actions` now gate-aware (no "rescan after edits" on a tripped gate)	wardline-be75c6676d
3 (P1)	silent gate-default breaking change	`gate.migration_hint` (CLI stderr + MCP) + `UPGRADING.md`	wardline-5f662e7a4f
4 (P1)	`where` didn't shrink payload; `explain` blew budget	`where` filters the agent_summary; `summary_only`/`max_findings`/`include_suppressed`; default explain cap (10); `truncation` block	wardline-2957009961
5 (P2)	401 reported as "could not reach"	`EmitResult.status`/`auth_rejected`; CLI/MCP print "401 (auth rejected) … set WARDLINE_FILIGREE_TOKEN"; stays soft	wardline-53a44a3bb1

Live verification (fresh server, the re-tester's exact scenarios)

`tools/list` exposes the new `summary_only`/`max_findings`/`include_suppressed` args (proves fresh server).
34-baselined gate trip → `gate.reason`, `gate.evaluated`, `gate.migration_hint`, and gate-aware `next_actions` all present; no crash.
`where:{active,CRITICAL}` (0 match) + `explain:true` → 1,585 chars (was 57,639), `suppressed_findings` empty inline, `truncation` present.
`summary_only` → 0 finding bodies, counts intact.

Tests / quality

Full suite 2482 passing, ruff + mypy + mkdocs-strict clean. Every fix is TDD'd
(red→green); golden legis signature byte-unchanged; CLI↔MCP parity preserved; CLI
`--format agent-summary` output unchanged.

🤖 Generated with Claude Code

…efusing (wardline-30f3d38fa5) Dogfood friction #1: on a dirty tree `scan --format legis` failed exit 2 naming an `allow_dirty` flag that was never exposed on the CLI — presenting identically to "legis is broken." Expose `--allow-dirty` (CLI) / `allow_dirty` (MCP scan). The honest fix: a dirty tree under allow_dirty does NOT sign. The only tree_sha readable is the *committed* one, which does not describe dirty working content — signing it would be false provenance (the `_git_tree_sha` guard). Instead it falls through to the UNSIGNED dev artifact, clearly marked `dirty: true` (legis records it `unverified`). Signing stays clean-tree-only; verification stays clean-tree/CI. The loud refusal without --allow-dirty is unchanged. CLI emits a stderr warning when the artifact is dirty/unsigned; MCP reports `signed:false` + `dirty:true` in legis_artifact_status. legis ignores the unknown `dirty` top-level key on the unverified path, so ingest is unaffected; the golden clean-tree signature is byte-unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rdline-be75c6676d) Dogfood friction #2: a scan reporting summary.active:0 AND gate.tripped:true read as a bug — the agent had to run scan twice (with/without trust_suppressions) and read --help to learn the gate evaluates the unsuppressed (baselined-included) population by default. GateDecision now carries `reason` and `evaluated`. `reason` names the count and class that decided the verdict — "1 suppressed ERROR+ defect(s) (baseline/waiver/ judged) not cleared; pass --trust-suppressions (trusted checkout) or --new-since <ref> (PR)" when the trip is solely from suppressed-but-gated findings, "N active ERROR+ defect(s)" on a genuine trip (no misdirection to the suppression flags), and the mixed form when both. `evaluated` names the population: "unsuppressed (repository baseline/waiver/judged ignored)" by default, "post-suppression … honored" under --trust-suppressions. Counts come from `gate_breakdown` over the ANNOTATED findings so they match what the agent reads in `summary`. Surfaced in the MCP scan gate block, the agent_summary gate block, and on CLI stderr when the gate trips (never a silent exit 1). Both None when no --fail-on. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…dline-5f662e7a4f) Dogfood friction #3: the secure gate-default (gate on the unsuppressed population) is correct, but the rollout was silent — a repo whose committed baseline used to clear --fail-on goes red with no code change, and an agent can't tell whether IT broke scan or HEAD was already red. New `baseline_migration_hint`: fires ONLY in the exact 'my repo went red with no code change' case — a committed .wardline/baseline.yaml exists, the gate trips SOLELY because baselined defects re-enter the unsuppressed population (no genuinely-active defect, no waiver/judged-only trip), and neither --trust-suppressions nor --new-since was passed. It points at both escape hatches and UPGRADING.md. Silent on a genuine active trip, a trusted/PR-scoped run, or no baseline file. Surfaced loudly on CLI stderr and as MCP `scan` gate.migration_hint (None otherwise). New UPGRADING.md documents the secure-default migration; CHANGELOG [Unreleased] gains entries for dogfood #1/#2/#3. Secure default unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…y/max_findings/include_suppressed, default explain cap (wardline-2957009961) Dogfood friction #5: the documented cost lever (`where`) did not control cost and one-shot `explain:true` was unusable on a real repo. - `where` now filters the agent_summary arrays too (it only filtered the top-level findings list before) — a filter matching 0 findings no longer returns dozens of suppressed findings inline. agent_summary build takes a display_findings view; its summary COUNTS stay whole-project. - New `summary_only:true` (counts + gate, no bodies — smallest "did the gate pass?" payload), `include_suppressed:false` (drop suppressed bodies; counts stay), `max_findings:N` (cap returned bodies). - DEFAULT explain ceiling: `explain:true` inlined provenance for EVERY active defect (56,820 chars on one line over a whole repo). Capped at 25 by default; max_findings tightens it. Findings past the cap are still returned, sans inline explanation. - New `truncation` block (findings_total/findings_returned/findings_truncated/ explanations_truncated/summary_only/include_suppressed/max_findings) so a bounded payload is never mistaken for "covered everything." CLI --format agent-summary is byte-unchanged (defaults preserve whole-project, uncapped behaviour). Docs (agents.md, legis-handoff.md --allow-dirty) + CHANGELOG updated. Full suite 2476 green; ruff/mypy/mkdocs-strict clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…wardline-be75c6676d follow-up) The gate reason counted `gate_breakdown(result.findings)` — the annotated population — so under `--new-since` a delta-scoped-out defect (converted to BASELINED by apply_delta_scope) was wrongly counted as "suppressed >= threshold", inflating the count and pointing at `--new-since` (already supplied). _gate_reason now classifies the defects that ACTUALLY gate (the unsuppressed gate population, where out-of-delta defects are BASELINED and so excluded) by their state in the emitted findings. The count is exactly what tripped the gate; the `--new-since` path no longer over-counts. The trust-suppressions branch is unchanged (gate == emitted findings there). Locked by extending the new_since differential to assert 1, not 2. Verified: legis `ScanResultsIn.scan` is typed `dict` (arbitrary mapping), so the new unsigned `dirty:true` marker rides through intake untouched — confirmed the dev artifact stays postable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ow-up) The reported one-shot blowup was 56,820 chars over 34 findings and exceeded the tool token limit; a default of 25 inlined provenances was still uncomfortably close. Lower the default ceiling to 10 — comfortably under the limit, still plenty to triage in one call — and let max_findings RAISE it when the agent explicitly accepts the larger payload (summary_only covers the common "did the gate pass?" case). New test locks that max_findings can lift the count above the default. Docs/CHANGELOG updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ped gate (wardline-be75c6676d follow-up) Dogfood re-test, #2 "Worse" half: when the gate trips solely on baselined findings summary.active is 0, so next_actions said "no active defects; rescan after edits" — telling the agent it PASSED while the gate FAILED. _next_actions_for now takes the GateDecision. With 0 active defects but a tripped gate it emits a scan action whose reason names the gate failure + the escape hatches (trust_suppressions / new_since / clear the baseline; see gate.reason / gate.migration_hint) instead of the passive "rescan after edits". The active>0 and genuinely-clean paths are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…le (wardline-53a44a3bb1) Dogfood #5: a 401 (token absent from the CLI env) was reported as "could not reach Filigree" — a wrong diagnosis that sent the agent chasing a broken-bridge / wrong- endpoint theory. The prior seam work deliberately made 401/403 SOFT (auth failure must not crash the scan loop); that is kept — only the MESSAGE changes. EmitResult now carries `status` (the HTTP status when one reached us; None when the transport itself failed) and `auth_rejected` (the 401/403 case). The CLI prints "Filigree returned 401 (auth rejected) … set WARDLINE_FILIGREE_TOKEN" vs a 5xx "server error" vs the genuine "could not reach"; the MCP scan filigree_emit block and agent_summary carry the same discriminated disabled_reason. 401/403 stays reachable=False (non-load-bearing), never exit-2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nreachable (#5) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ince the rebrand) uv.lock still carried the pre-rebrand `clarion` optional-dependency extra; pyproject already renamed it to `loomweave` (Clarion→Loomweave). Regenerated to match — no dependency change (blake3 >=1.0, unchanged), just the extra name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 39b87efd38

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T07:54:53Z

+            if self.max_findings is not None:
+                shown_active = shown_active[: self.max_findings]
+                shown_suppressed = shown_suppressed[: self.max_findings]
+                shown_facts = shown_facts[: self.max_findings]


Apply max_findings across all summary arrays

When an MCP scan returns mixed categories, max_findings only slices each agent-summary array independently, so max_findings: 1 can still inline one active defect, one suppressed defect, and one engine fact in agent_summary while the top-level findings list is capped to one. This defeats the payload-control contract for the embedded summary and can still produce oversized responses in repos with many categories; the cap needs to be applied to the combined displayed finding bodies, not per bucket.

Useful? React with 👍 / 👎.

…bda branch-locality, finding-lifecycle glossary Resolves three Filigree ready-queue items, built TDD with adversarial review. PY-WL-110 weft_markers soundness gap (wardline-d62845bb18, P2) contradictory_trust.py hardcoded `wardline.decorators.*` as the only marker prefix, silently missing contradictory stacks imported from the renamed `weft_markers` shim. Now derives _MARKER_NAMES + _MARKER_MODULE_PREFIXES from BUILTIN_BOUNDARY_TYPES so the rule can't drift from the grammar. +2 tests. Lambda bindings are branch-local (wardline-36016d26f3, P3) _CURRENT_LAMBDA_BINDINGS was shared across if/else, try/except, match arms, leaking a lambda bound in one arm into siblings (over-fire). Each arm now walks an arm-local copy. NOTE: the first cut of the merge-out (clear()+full-union with the synthetic fall-through arm last) introduced a *false-negative regression* — verified empirically against HEAD: a lambda rebound in a no-else `if` / no-catch-all `match` and called after the branch resolved EXTERNAL_RAW on HEAD but INTEGRAL after the naive fix. Replaced with a delta merge (layer each arm's net add/changed bindings onto the pre-branch state in source order) that keeps the leak fix AND reproduces HEAD's after-branch bindings, so no new false negative. +3 over-fire guards, +3 no-false-negative guards. Finding-lifecycle vocabulary glossary (wardline-26e84dbd44, P3) Audited wardline's own usage: `active` is already the canonical word on every surface except the CLI summary, which printed `N new`. Relabelled to `N active` (text only; no JSON/SARIF/wire field renamed). Added the canonical glossary docs/reference/finding-lifecycle-vocabulary.md (single source of truth for new/active/suppressed/baselined/waived/judged + emitted-active vs gate population) with discipline tests + nav wiring. Cross-tool asks (Filigree first-seen "new", legis active) recorded as coordination context, not renamed. Full suite 2471 passed, ruff + mypy clean, mkdocs --strict OK. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8103dff59

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-06T11:56:21Z

+        for name, lam in arm.items():
+            if pre.get(name) is not lam:
+                parent[name] = lam


Remove stale lambda bindings after branch overwrites

When a name was bound to a lambda before a full branch and every arm reassigns it to a non-lambda, each arm's copied binding map has popped the name, but this merge only iterates arm.items(), so the deletion is never reflected in the parent. A later cb(raw) is still resolved through the stale pre-branch lambda, reintroducing false PY-WL-107/108 reports in the branch-locality path this patch changes; removals need to be merged when all reachable arms drop the binding.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-06T11:56:21Z

+        handler_lambdas = _branch_copy(parent_lambdas)
+        _walk_branch_body(handler.body, function_taint, taint_map, handler_taints, call_site_taints, handler_lambdas)


Keep try-body lambda bindings visible to handlers

For try blocks where a lambda is assigned before a later statement raises, the except handler executes with that assignment still in scope, but this starts every handler from parent_lambdas instead of the try arm state. A real path such as try: cb = lambda x: eval(x); risky() / except: cb(raw) is therefore no longer resolved through the lambda, so the branch-locality change can miss PY-WL-107/108 sinks that are reachable after a partial try-body execution.

Useful? React with 👍 / 👎.

tachyon-beep · 2026-06-06T12:42:34Z

Superseded by #31 — the head branch was renamed fix/dogfood-2026-06-06-gate-legis-payload → rc/1.0.0rc2 and a review-hardening pass + the 1.0.0rc2 cut were added. GitHub can't move a PR's head branch, so this PR is continued there. The old branch is being deleted.

John Morrissey and others added 10 commits June 6, 2026 12:44

docs(changelog): record next_actions gate-awareness (#2) and 401-vs-u…

39b87ef

…nreachable (#5) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed Jun 6, 2026

View reviewed changes

tachyon-beep mentioned this pull request Jun 6, 2026

Dogfood 2026-06-06 + 1.0.0rc2: gate verdict clarity, legis dev-loop, payload controls, 401 distinction, review hardening #31

Closed

tachyon-beep closed this Jun 6, 2026

tachyon-beep deleted the fix/dogfood-2026-06-06-gate-legis-payload branch June 6, 2026 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dogfood 2026-06-06: gate verdict clarity, legis dev-loop, payload controls, 401 distinction#30

Dogfood 2026-06-06: gate verdict clarity, legis dev-loop, payload controls, 401 distinction#30
tachyon-beep wants to merge 11 commits into
mainfrom
fix/dogfood-2026-06-06-gate-legis-payload

tachyon-beep commented Jun 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Uh oh!

tachyon-beep commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		handler_lambdas = _branch_copy(parent_lambdas)
		_walk_branch_body(handler.body, function_taint, taint_map, handler_taints, call_site_taints, handler_lambdas)

Conversation

tachyon-beep commented Jun 6, 2026

⚠️ Deployment note for the federation (read first)

Fixes

Live verification (fresh server, the re-tester's exact scenarios)

Tests / quality

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

tachyon-beep commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant