Dogfood 2026-06-06 + 1.0.0rc2: gate verdict clarity, legis dev-loop, payload controls, 401 distinction, review hardening#31
Dogfood 2026-06-06 + 1.0.0rc2: gate verdict clarity, legis dev-loop, payload controls, 401 distinction, review hardening#31tachyon-beep wants to merge 15 commits into
Conversation
…efusing (wardline-30f3d38fa5) Dogfood friction #1: on a dirty tree `scan --format legis` failed exit 2 naming an `allow_dirty` flag that was never exposed on the CLI — presenting identically to "legis is broken." Expose `--allow-dirty` (CLI) / `allow_dirty` (MCP scan). The honest fix: a dirty tree under allow_dirty does NOT sign. The only tree_sha readable is the *committed* one, which does not describe dirty working content — signing it would be false provenance (the `_git_tree_sha` guard). Instead it falls through to the UNSIGNED dev artifact, clearly marked `dirty: true` (legis records it `unverified`). Signing stays clean-tree-only; verification stays clean-tree/CI. The loud refusal without --allow-dirty is unchanged. CLI emits a stderr warning when the artifact is dirty/unsigned; MCP reports `signed:false` + `dirty:true` in legis_artifact_status. legis ignores the unknown `dirty` top-level key on the unverified path, so ingest is unaffected; the golden clean-tree signature is byte-unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rdline-be75c6676d) Dogfood friction #2: a scan reporting summary.active:0 AND gate.tripped:true read as a bug — the agent had to run scan twice (with/without trust_suppressions) and read --help to learn the gate evaluates the unsuppressed (baselined-included) population by default. GateDecision now carries `reason` and `evaluated`. `reason` names the count and class that decided the verdict — "1 suppressed ERROR+ defect(s) (baseline/waiver/ judged) not cleared; pass --trust-suppressions (trusted checkout) or --new-since <ref> (PR)" when the trip is solely from suppressed-but-gated findings, "N active ERROR+ defect(s)" on a genuine trip (no misdirection to the suppression flags), and the mixed form when both. `evaluated` names the population: "unsuppressed (repository baseline/waiver/judged ignored)" by default, "post-suppression … honored" under --trust-suppressions. Counts come from `gate_breakdown` over the ANNOTATED findings so they match what the agent reads in `summary`. Surfaced in the MCP scan gate block, the agent_summary gate block, and on CLI stderr when the gate trips (never a silent exit 1). Both None when no --fail-on. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dline-5f662e7a4f) Dogfood friction #3: the secure gate-default (gate on the unsuppressed population) is correct, but the rollout was silent — a repo whose committed baseline used to clear --fail-on goes red with no code change, and an agent can't tell whether IT broke scan or HEAD was already red. New `baseline_migration_hint`: fires ONLY in the exact 'my repo went red with no code change' case — a committed .wardline/baseline.yaml exists, the gate trips SOLELY because baselined defects re-enter the unsuppressed population (no genuinely-active defect, no waiver/judged-only trip), and neither --trust-suppressions nor --new-since was passed. It points at both escape hatches and UPGRADING.md. Silent on a genuine active trip, a trusted/PR-scoped run, or no baseline file. Surfaced loudly on CLI stderr and as MCP `scan` gate.migration_hint (None otherwise). New UPGRADING.md documents the secure-default migration; CHANGELOG [Unreleased] gains entries for dogfood #1/#2/#3. Secure default unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y/max_findings/include_suppressed, default explain cap (wardline-2957009961) Dogfood friction #5: the documented cost lever (`where`) did not control cost and one-shot `explain:true` was unusable on a real repo. - `where` now filters the agent_summary arrays too (it only filtered the top-level findings list before) — a filter matching 0 findings no longer returns dozens of suppressed findings inline. agent_summary build takes a display_findings view; its summary COUNTS stay whole-project. - New `summary_only:true` (counts + gate, no bodies — smallest "did the gate pass?" payload), `include_suppressed:false` (drop suppressed bodies; counts stay), `max_findings:N` (cap returned bodies). - DEFAULT explain ceiling: `explain:true` inlined provenance for EVERY active defect (56,820 chars on one line over a whole repo). Capped at 25 by default; max_findings tightens it. Findings past the cap are still returned, sans inline explanation. - New `truncation` block (findings_total/findings_returned/findings_truncated/ explanations_truncated/summary_only/include_suppressed/max_findings) so a bounded payload is never mistaken for "covered everything." CLI --format agent-summary is byte-unchanged (defaults preserve whole-project, uncapped behaviour). Docs (agents.md, legis-handoff.md --allow-dirty) + CHANGELOG updated. Full suite 2476 green; ruff/mypy/mkdocs-strict clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wardline-be75c6676d follow-up) The gate reason counted `gate_breakdown(result.findings)` — the annotated population — so under `--new-since` a delta-scoped-out defect (converted to BASELINED by apply_delta_scope) was wrongly counted as "suppressed >= threshold", inflating the count and pointing at `--new-since` (already supplied). _gate_reason now classifies the defects that ACTUALLY gate (the unsuppressed gate population, where out-of-delta defects are BASELINED and so excluded) by their state in the emitted findings. The count is exactly what tripped the gate; the `--new-since` path no longer over-counts. The trust-suppressions branch is unchanged (gate == emitted findings there). Locked by extending the new_since differential to assert 1, not 2. Verified: legis `ScanResultsIn.scan` is typed `dict` (arbitrary mapping), so the new unsigned `dirty:true` marker rides through intake untouched — confirmed the dev artifact stays postable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ow-up) The reported one-shot blowup was 56,820 chars over 34 findings and exceeded the tool token limit; a default of 25 inlined provenances was still uncomfortably close. Lower the default ceiling to 10 — comfortably under the limit, still plenty to triage in one call — and let max_findings RAISE it when the agent explicitly accepts the larger payload (summary_only covers the common "did the gate pass?" case). New test locks that max_findings can lift the count above the default. Docs/CHANGELOG updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ped gate (wardline-be75c6676d follow-up) Dogfood re-test, #2 "Worse" half: when the gate trips solely on baselined findings summary.active is 0, so next_actions said "no active defects; rescan after edits" — telling the agent it PASSED while the gate FAILED. _next_actions_for now takes the GateDecision. With 0 active defects but a tripped gate it emits a scan action whose reason names the gate failure + the escape hatches (trust_suppressions / new_since / clear the baseline; see gate.reason / gate.migration_hint) instead of the passive "rescan after edits". The active>0 and genuinely-clean paths are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le (wardline-53a44a3bb1) Dogfood #5: a 401 (token absent from the CLI env) was reported as "could not reach Filigree" — a wrong diagnosis that sent the agent chasing a broken-bridge / wrong- endpoint theory. The prior seam work deliberately made 401/403 SOFT (auth failure must not crash the scan loop); that is kept — only the MESSAGE changes. EmitResult now carries `status` (the HTTP status when one reached us; None when the transport itself failed) and `auth_rejected` (the 401/403 case). The CLI prints "Filigree returned 401 (auth rejected) … set WARDLINE_FILIGREE_TOKEN" vs a 5xx "server error" vs the genuine "could not reach"; the MCP scan filigree_emit block and agent_summary carry the same discriminated disabled_reason. 401/403 stays reachable=False (non-load-bearing), never exit-2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nreachable (#5) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ince the rebrand) uv.lock still carried the pre-rebrand `clarion` optional-dependency extra; pyproject already renamed it to `loomweave` (Clarion→Loomweave). Regenerated to match — no dependency change (blake3 >=1.0, unchanged), just the extra name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bda branch-locality, finding-lifecycle glossary Resolves three Filigree ready-queue items, built TDD with adversarial review. PY-WL-110 weft_markers soundness gap (wardline-d62845bb18, P2) contradictory_trust.py hardcoded `wardline.decorators.*` as the only marker prefix, silently missing contradictory stacks imported from the renamed `weft_markers` shim. Now derives _MARKER_NAMES + _MARKER_MODULE_PREFIXES from BUILTIN_BOUNDARY_TYPES so the rule can't drift from the grammar. +2 tests. Lambda bindings are branch-local (wardline-36016d26f3, P3) _CURRENT_LAMBDA_BINDINGS was shared across if/else, try/except, match arms, leaking a lambda bound in one arm into siblings (over-fire). Each arm now walks an arm-local copy. NOTE: the first cut of the merge-out (clear()+full-union with the synthetic fall-through arm last) introduced a *false-negative regression* — verified empirically against HEAD: a lambda rebound in a no-else `if` / no-catch-all `match` and called after the branch resolved EXTERNAL_RAW on HEAD but INTEGRAL after the naive fix. Replaced with a delta merge (layer each arm's net add/changed bindings onto the pre-branch state in source order) that keeps the leak fix AND reproduces HEAD's after-branch bindings, so no new false negative. +3 over-fire guards, +3 no-false-negative guards. Finding-lifecycle vocabulary glossary (wardline-26e84dbd44, P3) Audited wardline's own usage: `active` is already the canonical word on every surface except the CLI summary, which printed `N new`. Relabelled to `N active` (text only; no JSON/SARIF/wire field renamed). Added the canonical glossary docs/reference/finding-lifecycle-vocabulary.md (single source of truth for new/active/suppressed/baselined/waived/judged + emitted-active vs gate population) with discipline tests + nav wiring. Cross-tool asks (Filigree first-seen "new", legis active) recorded as coordination context, not renamed. Full suite 2471 passed, ruff + mypy clean, mkdocs --strict OK. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, MCP legis reason, strict arg validation Applies the PR #30 multi-reviewer findings (code/tests/errors/comments/types): - GateDecision.__post_init__ makes "tripped gate that reads as passed" (dogfood #2) unconstructible, not merely avoided by the factory. - Filigree 403 is now distinguished from 401 across all three render sites (CLI stderr, CLI disabled_reason, MCP) — "forbidden (token lacks access)" rather than the misleading "set WARDLINE_FILIGREE_TOKEN". - MCP dirty-unsigned legis artifact carries a loud `reason` (parity with the CLI "never gate CI on it" warning) — agent-first surfaces stay equally loud. - migration_hint threaded into the agent-summary gate block so the "see gate.migration_hint" pointer in next_actions resolves on that surface too. - Strict boolean validation for summary_only/include_suppressed/allow_dirty/ explain (reject non-bool rather than silently coercing "false"→True) + max_findings JSON schema gains `minimum: 0`. - CHANGELOG: payload-controls entry corrected to dogfood #4 (verified against the friction report: #4=payload, #5=auth); genuine-trip reason quoted verbatim. - Glossary file:line anchors tightened to the WAIVED/JUDGED assignment lines. Quality consolidation (behavior-preserving): shared severity_gates() and filigree_disabled_reason() helpers, enum-identity (`is`) unified. New tests pin 5xx rendering (CLI+MCP), the MCP legis dirty/signed projection, the mixed active+suppressed gate-reason branch, the GateDecision invariant guard, strict arg validation, and the agent-summary migration_hint. Suite 2515 passed; ruff/mypy clean; mkdocs --strict builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Increment the release candidate (rc1 → rc2) to carry the PR #30 review hardening (gate invariant, 403/5xx distinction, strict MCP arg validation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 77e1d8e2eb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for name, lam in arm.items(): | ||
| if pre.get(name) is not lam: | ||
| parent[name] = lam |
There was a problem hiding this comment.
Drop stale lambda bindings removed in all arms
When a name is bound to a lambda before an exhaustive branch and every branch reassigns that name to a non-lambda, the arm copies correctly pop it, but this merge only writes added/changed lambdas back and never records removals. The parent therefore keeps the pre-branch lambda, so a later cb(...) is analyzed as if the stale lambda can still execute, producing false taint findings for code paths that are impossible after the branch.
Useful? React with 👍 / 👎.
| shown_active = shown_active[: self.max_findings] | ||
| shown_suppressed = shown_suppressed[: self.max_findings] | ||
| shown_facts = shown_facts[: self.max_findings] |
There was a problem hiding this comment.
Honor max_findings across agent-summary arrays
With MCP scan using max_findings to shrink payloads, the nested agent_summary can still return up to that many active defects, suppressed defects, and engine facts because the cap is applied independently to each array. In mixed-result scans, max_findings: 10 can therefore include 30 finding bodies in agent_summary while the top-level truncation block reports only 10 returned findings, defeating the payload bound agents requested.
Useful? React with 👍 / 👎.
| if last in {"external_boundary", "trust_boundary", "trusted"} and ( | ||
| fqn.startswith("wardline.decorators.") or fqn.startswith("wardline.decorators.trust.") | ||
| ): | ||
| if last in _MARKER_NAMES and any(fqn.startswith(prefix + ".") for prefix in _MARKER_MODULE_PREFIXES): |
There was a problem hiding this comment.
Restrict weft marker matches to real exports
Because this now accepts any decorator whose FQN starts with a builtin prefix, from weft_markers.custom import trusted is treated as Wardline's trusted marker even though the decorator provider only recognizes the exact public exports (P.<name> and P.trust.<name>). In projects with helper submodules under the weft_markers package, PY-WL-110 can therefore report contradictory Wardline trust stacks for decorators that never seeded Wardline trust at all.
Useful? React with 👍 / 👎.
…ust version test - CHANGELOG: stamp the accumulated [Unreleased] work as [1.0.0rc2] - 2026-06-06 and open a fresh empty [Unreleased]; consolidate the two `### Added` blocks into one (no content change, removes a Keep-a-Changelog duplicate-section smell). - README: the quick-start scan output said "1 new" — corrected to "1 active", matching the CLI relabel shipped in this same release (and getting-started.md). - test_package: assert __version__ starts with "1.0.0" (release line) instead of the exact rc suffix, so cutting a new rc no longer breaks the test. Suite 2515 passed; ruff clean; mkdocs --strict builds. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`ruff format --check src tests` (run in CI's Lint+Format job) was red. Reformats 6 test files: two touched in this rc2 work (test_run.py, test_server_query_explain.py), test_variable_level.py (dogfood branch change), and three with pre-existing drift already on main (test_legis_intake_contract.py, test_client.py, test_sei_client_wire.py) — the gate checks the whole tree, so all six must be clean. Formatting only; no behavior change. Suite 2515 passed; ruff check + format + mypy all clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6f2c1d6e45
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| handler_lambdas = _branch_copy(parent_lambdas) | ||
| _walk_branch_body(handler.body, function_taint, taint_map, handler_taints, call_site_taints, handler_lambdas) |
There was a problem hiding this comment.
Preserve try-body lambda bindings for except handlers
When a lambda is assigned before a statement in the try body can raise, that binding is in scope for the except handler at runtime, but this handler copy is seeded from the pre-try bindings instead of the try-body bindings. For example, try: cb = lambda c: sink(c); risky() followed by except Exception: cb(raw) no longer resolves cb(raw) to the lambda, so the sink in the lambda body is analyzed as clean and the tainted exception path is missed.
Useful? React with 👍 / 👎.
|
Consolidated onto the single RC branch |
Closes the actionable wardline items from the 2026-06-06 Loom dogfood friction report
(label `dogfood-2026-06-06`). All five concerns from the re-test are addressed and
live-verified against a freshly-spawned MCP server.
The re-test reported #2/#3/#4 as "not addressed". Root cause: stale long-running
`wardline mcp` processes, not missing code. The install is editable
(`~/wardline/src`), so a fresh spawn already has every fix — but seven long-lived MCP
servers were frozen at their spawn-time source (one was internally inconsistent and
crashed on `GateDecision.reason`). The re-tester tested #1 via the CLI (fresh process →
worked) and #2/#3/#4 via a stale MCP server (→ looked unaddressed).
Action required after merge: partners must restart their `wardline mcp` server
(or session) to pick up the code. No restart ⇒ same "broken" output.
Fixes
Live verification (fresh server, the re-tester's exact scenarios)
Tests / quality
Full suite 2482 passing, ruff + mypy + mkdocs-strict clean. Every fix is TDD'd
(red→green); golden legis signature byte-unchanged; CLI↔MCP parity preserved; CLI
`--format agent-summary` output unchanged.
🤖 Generated with Claude Code
Supersedes #30 (head branch renamed
fix/dogfood-2026-06-06-gate-legis-payload→rc/1.0.0rc2; GitHub cannot move a PR's head branch, so this is its continuation). Adds the PR #30 multi-reviewer hardening pass and cuts release candidate1.0.0rc2.