Dogfood 2026-06-06 + 1.0.0rc3: gate verdict clarity, legis dev-loop, payload controls, 401 distinction, review hardening by tachyon-beep · Pull Request #33 · foundryside-dev/wardline

tachyon-beep · 2026-06-06T14:55:14Z

Closes the actionable wardline items from the 2026-06-06 Loom dogfood friction report
(label `dogfood-2026-06-06`). All five concerns from the re-test are addressed and
live-verified against a freshly-spawned MCP server.

⚠️ Deployment note for the federation (read first)

The re-test reported #2/#3/#4 as "not addressed". Root cause: stale long-running
`wardline mcp` processes, not missing code. The install is editable
(`~/wardline/src`), so a fresh spawn already has every fix — but seven long-lived MCP
servers were frozen at their spawn-time source (one was internally inconsistent and
crashed on `GateDecision.reason`). The re-tester tested #1 via the CLI (fresh process →
worked) and #2/#3/#4 via a stale MCP server (→ looked unaddressed).

Action required after merge: partners must restart their `wardline mcp` server
(or session) to pick up the code. No restart ⇒ same "broken" output.

Fixes

#	Concern	Fix	Issue
1 (P0)	`--allow-dirty` on `scan --format legis`	unsigned, `dirty:true`-marked dev artifact; signing stays clean-tree-only	wardline-30f3d38fa5
2 (P1)	gate contradicts its summary	`gate.reason` + `gate.evaluated`; `next_actions` now gate-aware (no "rescan after edits" on a tripped gate)	wardline-be75c6676d
3 (P1)	silent gate-default breaking change	`gate.migration_hint` (CLI stderr + MCP) + `UPGRADING.md`	wardline-5f662e7a4f
4 (P1)	`where` didn't shrink payload; `explain` blew budget	`where` filters the agent_summary; `summary_only`/`max_findings`/`include_suppressed`; default explain cap (10); `truncation` block	wardline-2957009961
5 (P2)	401 reported as "could not reach"	`EmitResult.status`/`auth_rejected`; CLI/MCP print "401 (auth rejected) … set WARDLINE_FILIGREE_TOKEN"; stays soft	wardline-53a44a3bb1

Live verification (fresh server, the re-tester's exact scenarios)

`tools/list` exposes the new `summary_only`/`max_findings`/`include_suppressed` args (proves fresh server).
34-baselined gate trip → `gate.reason`, `gate.evaluated`, `gate.migration_hint`, and gate-aware `next_actions` all present; no crash.
`where:{active,CRITICAL}` (0 match) + `explain:true` → 1,585 chars (was 57,639), `suppressed_findings` empty inline, `truncation` present.
`summary_only` → 0 finding bodies, counts intact.

Tests / quality

Full suite 2482 passing, ruff + mypy + mkdocs-strict clean. Every fix is TDD'd
(red→green); golden legis signature byte-unchanged; CLI↔MCP parity preserved; CLI
`--format agent-summary` output unchanged.

🤖 Generated with Claude Code

Supersedes #30 (head branch renamed fix/dogfood-2026-06-06-gate-legis-payload → rc/1.0.0rc2; GitHub cannot move a PR's head branch, so this is its continuation). Adds the PR #30 multi-reviewer hardening pass and cuts release candidate 1.0.0rc2.

Continued from #32 (auto-closed by the rc2→rc3 branch rename). Version bumped to 1.0.0rc3.

…efusing (wardline-30f3d38fa5) Dogfood friction #1: on a dirty tree `scan --format legis` failed exit 2 naming an `allow_dirty` flag that was never exposed on the CLI — presenting identically to "legis is broken." Expose `--allow-dirty` (CLI) / `allow_dirty` (MCP scan). The honest fix: a dirty tree under allow_dirty does NOT sign. The only tree_sha readable is the *committed* one, which does not describe dirty working content — signing it would be false provenance (the `_git_tree_sha` guard). Instead it falls through to the UNSIGNED dev artifact, clearly marked `dirty: true` (legis records it `unverified`). Signing stays clean-tree-only; verification stays clean-tree/CI. The loud refusal without --allow-dirty is unchanged. CLI emits a stderr warning when the artifact is dirty/unsigned; MCP reports `signed:false` + `dirty:true` in legis_artifact_status. legis ignores the unknown `dirty` top-level key on the unverified path, so ingest is unaffected; the golden clean-tree signature is byte-unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rdline-be75c6676d) Dogfood friction #2: a scan reporting summary.active:0 AND gate.tripped:true read as a bug — the agent had to run scan twice (with/without trust_suppressions) and read --help to learn the gate evaluates the unsuppressed (baselined-included) population by default. GateDecision now carries `reason` and `evaluated`. `reason` names the count and class that decided the verdict — "1 suppressed ERROR+ defect(s) (baseline/waiver/ judged) not cleared; pass --trust-suppressions (trusted checkout) or --new-since <ref> (PR)" when the trip is solely from suppressed-but-gated findings, "N active ERROR+ defect(s)" on a genuine trip (no misdirection to the suppression flags), and the mixed form when both. `evaluated` names the population: "unsuppressed (repository baseline/waiver/judged ignored)" by default, "post-suppression … honored" under --trust-suppressions. Counts come from `gate_breakdown` over the ANNOTATED findings so they match what the agent reads in `summary`. Surfaced in the MCP scan gate block, the agent_summary gate block, and on CLI stderr when the gate trips (never a silent exit 1). Both None when no --fail-on. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…dline-5f662e7a4f) Dogfood friction #3: the secure gate-default (gate on the unsuppressed population) is correct, but the rollout was silent — a repo whose committed baseline used to clear --fail-on goes red with no code change, and an agent can't tell whether IT broke scan or HEAD was already red. New `baseline_migration_hint`: fires ONLY in the exact 'my repo went red with no code change' case — a committed .wardline/baseline.yaml exists, the gate trips SOLELY because baselined defects re-enter the unsuppressed population (no genuinely-active defect, no waiver/judged-only trip), and neither --trust-suppressions nor --new-since was passed. It points at both escape hatches and UPGRADING.md. Silent on a genuine active trip, a trusted/PR-scoped run, or no baseline file. Surfaced loudly on CLI stderr and as MCP `scan` gate.migration_hint (None otherwise). New UPGRADING.md documents the secure-default migration; CHANGELOG [Unreleased] gains entries for dogfood #1/#2/#3. Secure default unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…y/max_findings/include_suppressed, default explain cap (wardline-2957009961) Dogfood friction #5: the documented cost lever (`where`) did not control cost and one-shot `explain:true` was unusable on a real repo. - `where` now filters the agent_summary arrays too (it only filtered the top-level findings list before) — a filter matching 0 findings no longer returns dozens of suppressed findings inline. agent_summary build takes a display_findings view; its summary COUNTS stay whole-project. - New `summary_only:true` (counts + gate, no bodies — smallest "did the gate pass?" payload), `include_suppressed:false` (drop suppressed bodies; counts stay), `max_findings:N` (cap returned bodies). - DEFAULT explain ceiling: `explain:true` inlined provenance for EVERY active defect (56,820 chars on one line over a whole repo). Capped at 25 by default; max_findings tightens it. Findings past the cap are still returned, sans inline explanation. - New `truncation` block (findings_total/findings_returned/findings_truncated/ explanations_truncated/summary_only/include_suppressed/max_findings) so a bounded payload is never mistaken for "covered everything." CLI --format agent-summary is byte-unchanged (defaults preserve whole-project, uncapped behaviour). Docs (agents.md, legis-handoff.md --allow-dirty) + CHANGELOG updated. Full suite 2476 green; ruff/mypy/mkdocs-strict clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…wardline-be75c6676d follow-up) The gate reason counted `gate_breakdown(result.findings)` — the annotated population — so under `--new-since` a delta-scoped-out defect (converted to BASELINED by apply_delta_scope) was wrongly counted as "suppressed >= threshold", inflating the count and pointing at `--new-since` (already supplied). _gate_reason now classifies the defects that ACTUALLY gate (the unsuppressed gate population, where out-of-delta defects are BASELINED and so excluded) by their state in the emitted findings. The count is exactly what tripped the gate; the `--new-since` path no longer over-counts. The trust-suppressions branch is unchanged (gate == emitted findings there). Locked by extending the new_since differential to assert 1, not 2. Verified: legis `ScanResultsIn.scan` is typed `dict` (arbitrary mapping), so the new unsigned `dirty:true` marker rides through intake untouched — confirmed the dev artifact stays postable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ow-up) The reported one-shot blowup was 56,820 chars over 34 findings and exceeded the tool token limit; a default of 25 inlined provenances was still uncomfortably close. Lower the default ceiling to 10 — comfortably under the limit, still plenty to triage in one call — and let max_findings RAISE it when the agent explicitly accepts the larger payload (summary_only covers the common "did the gate pass?" case). New test locks that max_findings can lift the count above the default. Docs/CHANGELOG updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ped gate (wardline-be75c6676d follow-up) Dogfood re-test, #2 "Worse" half: when the gate trips solely on baselined findings summary.active is 0, so next_actions said "no active defects; rescan after edits" — telling the agent it PASSED while the gate FAILED. _next_actions_for now takes the GateDecision. With 0 active defects but a tripped gate it emits a scan action whose reason names the gate failure + the escape hatches (trust_suppressions / new_since / clear the baseline; see gate.reason / gate.migration_hint) instead of the passive "rescan after edits". The active>0 and genuinely-clean paths are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…le (wardline-53a44a3bb1) Dogfood #5: a 401 (token absent from the CLI env) was reported as "could not reach Filigree" — a wrong diagnosis that sent the agent chasing a broken-bridge / wrong- endpoint theory. The prior seam work deliberately made 401/403 SOFT (auth failure must not crash the scan loop); that is kept — only the MESSAGE changes. EmitResult now carries `status` (the HTTP status when one reached us; None when the transport itself failed) and `auth_rejected` (the 401/403 case). The CLI prints "Filigree returned 401 (auth rejected) … set WARDLINE_FILIGREE_TOKEN" vs a 5xx "server error" vs the genuine "could not reach"; the MCP scan filigree_emit block and agent_summary carry the same discriminated disabled_reason. 401/403 stays reachable=False (non-load-bearing), never exit-2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nreachable (#5) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ince the rebrand) uv.lock still carried the pre-rebrand `clarion` optional-dependency extra; pyproject already renamed it to `loomweave` (Clarion→Loomweave). Regenerated to match — no dependency change (blake3 >=1.0, unchanged), just the extra name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…bda branch-locality, finding-lifecycle glossary Resolves three Filigree ready-queue items, built TDD with adversarial review. PY-WL-110 weft_markers soundness gap (wardline-d62845bb18, P2) contradictory_trust.py hardcoded `wardline.decorators.*` as the only marker prefix, silently missing contradictory stacks imported from the renamed `weft_markers` shim. Now derives _MARKER_NAMES + _MARKER_MODULE_PREFIXES from BUILTIN_BOUNDARY_TYPES so the rule can't drift from the grammar. +2 tests. Lambda bindings are branch-local (wardline-36016d26f3, P3) _CURRENT_LAMBDA_BINDINGS was shared across if/else, try/except, match arms, leaking a lambda bound in one arm into siblings (over-fire). Each arm now walks an arm-local copy. NOTE: the first cut of the merge-out (clear()+full-union with the synthetic fall-through arm last) introduced a *false-negative regression* — verified empirically against HEAD: a lambda rebound in a no-else `if` / no-catch-all `match` and called after the branch resolved EXTERNAL_RAW on HEAD but INTEGRAL after the naive fix. Replaced with a delta merge (layer each arm's net add/changed bindings onto the pre-branch state in source order) that keeps the leak fix AND reproduces HEAD's after-branch bindings, so no new false negative. +3 over-fire guards, +3 no-false-negative guards. Finding-lifecycle vocabulary glossary (wardline-26e84dbd44, P3) Audited wardline's own usage: `active` is already the canonical word on every surface except the CLI summary, which printed `N new`. Relabelled to `N active` (text only; no JSON/SARIF/wire field renamed). Added the canonical glossary docs/reference/finding-lifecycle-vocabulary.md (single source of truth for new/active/suppressed/baselined/waived/judged + emitted-active vs gate population) with discipline tests + nav wiring. Cross-tool asks (Filigree first-seen "new", legis active) recorded as coordination context, not renamed. Full suite 2471 passed, ruff + mypy clean, mkdocs --strict OK. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…, MCP legis reason, strict arg validation Applies the PR #30 multi-reviewer findings (code/tests/errors/comments/types): - GateDecision.__post_init__ makes "tripped gate that reads as passed" (dogfood #2) unconstructible, not merely avoided by the factory. - Filigree 403 is now distinguished from 401 across all three render sites (CLI stderr, CLI disabled_reason, MCP) — "forbidden (token lacks access)" rather than the misleading "set WARDLINE_FILIGREE_TOKEN". - MCP dirty-unsigned legis artifact carries a loud `reason` (parity with the CLI "never gate CI on it" warning) — agent-first surfaces stay equally loud. - migration_hint threaded into the agent-summary gate block so the "see gate.migration_hint" pointer in next_actions resolves on that surface too. - Strict boolean validation for summary_only/include_suppressed/allow_dirty/ explain (reject non-bool rather than silently coercing "false"→True) + max_findings JSON schema gains `minimum: 0`. - CHANGELOG: payload-controls entry corrected to dogfood #4 (verified against the friction report: #4=payload, #5=auth); genuine-trip reason quoted verbatim. - Glossary file:line anchors tightened to the WAIVED/JUDGED assignment lines. Quality consolidation (behavior-preserving): shared severity_gates() and filigree_disabled_reason() helpers, enum-identity (`is`) unified. New tests pin 5xx rendering (CLI+MCP), the MCP legis dirty/signed projection, the mixed active+suppressed gate-reason branch, the GateDecision invariant guard, strict arg validation, and the agent-summary migration_hint. Suite 2515 passed; ruff/mypy clean; mkdocs --strict builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Increment the release candidate (rc1 → rc2) to carry the PR #30 review hardening (gate invariant, 403/5xx distinction, strict MCP arg validation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ust version test - CHANGELOG: stamp the accumulated [Unreleased] work as [1.0.0rc2] - 2026-06-06 and open a fresh empty [Unreleased]; consolidate the two `### Added` blocks into one (no content change, removes a Keep-a-Changelog duplicate-section smell). - README: the quick-start scan output said "1 new" — corrected to "1 active", matching the CLI relabel shipped in this same release (and getting-started.md). - test_package: assert __version__ starts with "1.0.0" (release line) instead of the exact rc suffix, so cutting a new rc no longer breaks the test. Suite 2515 passed; ruff clean; mkdocs --strict builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`ruff format --check src tests` (run in CI's Lint+Format job) was red. Reformats 6 test files: two touched in this rc2 work (test_run.py, test_server_query_explain.py), test_variable_level.py (dogfood branch change), and three with pre-existing drift already on main (test_legis_intake_contract.py, test_client.py, test_sei_client_wire.py) — the gate checks the whole tree, so all six must be clean. Formatting only; no behavior change. Suite 2515 passed; ruff check + format + mypy all clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…, FP guard, doc-anchor rot) Addresses the three Important findings from the PR #32 review panel, each validated with an actual RED->GREEN cycle under debugging discipline. I-1 EmitResult contradictory states (core/filigree_emit.py): - auth_rejected is now a derived @Property (status in {401,403}), deleting the redundant axis so "auth-rejected (200)" is unrepresentable, not merely unbuilt. - __post_init__ guard mirrors GateDecision: a reachable/success result carries no error status; a soft-failure created/updated nothing. Rejects reachable+503. - Docstring corrected (status is the error status; None on transport-fail AND 2xx). - No wire change: server.py still serializes auth_rejected via the property. I-2 false-positive guard for PY-WL-110 (test_contradictory_trust.py): - Empirically: a foreign-only marker stack is filtered at the anchoring gate (provenance "fallback"), never reaching the line-81 prefix check. Added both the system-level test and the isolating test (real trust_boundary anchor + a coincidental foreign `trusted`). Mutation-proven: breaking the prefix check makes the isolating test fire a false PY-WL-110. I-3 stale file:line anchors (finding-lifecycle-vocabulary.md): - Re-derived every churned-file anchor from HEAD; corrected ~26 citations. - Added a two-way content-binding discipline test: each load-bearing anchor's token must be on the cited source line AND the doc must cite that line, so doc and code can never silently diverge again. Full suite 2520 passed; ruff/format/mypy clean; mkdocs --strict builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

John Morrissey and others added 17 commits June 6, 2026 12:44

docs(changelog): record next_actions gate-awareness (#2) and 401-vs-u…

39b87ef

…nreachable (#5) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(release): cut 1.0.0rc2

77e1d8e

Increment the release candidate (rc1 → rc2) to carry the PR #30 review hardening (gate invariant, 403/5xx distinction, strict MCP arg validation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore(release): bump version to 1.0.0rc3

58ab20f

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tachyon-beep closed this Jun 6, 2026

tachyon-beep deleted the rc3 branch June 6, 2026 19:31

tachyon-beep mentioned this pull request Jun 6, 2026

1.0.0rc4: federation WEFT_FEDERATION_TOKEN + dogfood 2026-06-06 (gate clarity, legis dev-loop, payload controls, 401 distinction) #34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dogfood 2026-06-06 + 1.0.0rc3: gate verdict clarity, legis dev-loop, payload controls, 401 distinction, review hardening#33

Dogfood 2026-06-06 + 1.0.0rc3: gate verdict clarity, legis dev-loop, payload controls, 401 distinction, review hardening#33
tachyon-beep wants to merge 17 commits into
mainfrom
rc3

tachyon-beep commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tachyon-beep commented Jun 6, 2026

⚠️ Deployment note for the federation (read first)

Fixes

Live verification (fresh server, the re-tester's exact scenarios)

Tests / quality

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant