Skip to content

Dogfood 2026-06-06: gate verdict clarity, legis dev-loop, payload controls, 401 distinction#30

Closed
tachyon-beep wants to merge 11 commits into
mainfrom
fix/dogfood-2026-06-06-gate-legis-payload
Closed

Dogfood 2026-06-06: gate verdict clarity, legis dev-loop, payload controls, 401 distinction#30
tachyon-beep wants to merge 11 commits into
mainfrom
fix/dogfood-2026-06-06-gate-legis-payload

Conversation

@tachyon-beep

Copy link
Copy Markdown
Collaborator

Closes the actionable wardline items from the 2026-06-06 Loom dogfood friction report
(label `dogfood-2026-06-06`). All five concerns from the re-test are addressed and
live-verified against a freshly-spawned MCP server.

⚠️ Deployment note for the federation (read first)

The re-test reported #2/#3/#4 as "not addressed". Root cause: stale long-running
`wardline mcp` processes
, not missing code. The install is editable
(`~/wardline/src`), so a fresh spawn already has every fix — but seven long-lived MCP
servers were frozen at their spawn-time source (one was internally inconsistent and
crashed on `GateDecision.reason`). The re-tester tested #1 via the CLI (fresh process →
worked) and #2/#3/#4 via a stale MCP server (→ looked unaddressed).

Action required after merge: partners must restart their `wardline mcp` server
(or session)
to pick up the code. No restart ⇒ same "broken" output.

Fixes

# Concern Fix Issue
1 (P0) `--allow-dirty` on `scan --format legis` unsigned, `dirty:true`-marked dev artifact; signing stays clean-tree-only wardline-30f3d38fa5
2 (P1) gate contradicts its summary `gate.reason` + `gate.evaluated`; `next_actions` now gate-aware (no "rescan after edits" on a tripped gate) wardline-be75c6676d
3 (P1) silent gate-default breaking change `gate.migration_hint` (CLI stderr + MCP) + `UPGRADING.md` wardline-5f662e7a4f
4 (P1) `where` didn't shrink payload; `explain` blew budget `where` filters the agent_summary; `summary_only`/`max_findings`/`include_suppressed`; default explain cap (10); `truncation` block wardline-2957009961
5 (P2) 401 reported as "could not reach" `EmitResult.status`/`auth_rejected`; CLI/MCP print "401 (auth rejected) … set WARDLINE_FILIGREE_TOKEN"; stays soft wardline-53a44a3bb1

Live verification (fresh server, the re-tester's exact scenarios)

  • `tools/list` exposes the new `summary_only`/`max_findings`/`include_suppressed` args (proves fresh server).
  • 34-baselined gate trip → `gate.reason`, `gate.evaluated`, `gate.migration_hint`, and gate-aware `next_actions` all present; no crash.
  • `where:{active,CRITICAL}` (0 match) + `explain:true` → 1,585 chars (was 57,639), `suppressed_findings` empty inline, `truncation` present.
  • `summary_only` → 0 finding bodies, counts intact.

Tests / quality

Full suite 2482 passing, ruff + mypy + mkdocs-strict clean. Every fix is TDD'd
(red→green); golden legis signature byte-unchanged; CLI↔MCP parity preserved; CLI
`--format agent-summary` output unchanged.

🤖 Generated with Claude Code

John Morrissey and others added 10 commits June 6, 2026 12:44
…efusing (wardline-30f3d38fa5)

Dogfood friction #1: on a dirty tree `scan --format legis` failed exit 2 naming
an `allow_dirty` flag that was never exposed on the CLI — presenting identically
to "legis is broken." Expose `--allow-dirty` (CLI) / `allow_dirty` (MCP scan).

The honest fix: a dirty tree under allow_dirty does NOT sign. The only tree_sha
readable is the *committed* one, which does not describe dirty working content —
signing it would be false provenance (the `_git_tree_sha` guard). Instead it
falls through to the UNSIGNED dev artifact, clearly marked `dirty: true` (legis
records it `unverified`). Signing stays clean-tree-only; verification stays
clean-tree/CI. The loud refusal without --allow-dirty is unchanged.

CLI emits a stderr warning when the artifact is dirty/unsigned; MCP reports
`signed:false` + `dirty:true` in legis_artifact_status. legis ignores the unknown
`dirty` top-level key on the unverified path, so ingest is unaffected; the golden
clean-tree signature is byte-unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rdline-be75c6676d)

Dogfood friction #2: a scan reporting summary.active:0 AND gate.tripped:true read
as a bug — the agent had to run scan twice (with/without trust_suppressions) and
read --help to learn the gate evaluates the unsuppressed (baselined-included)
population by default.

GateDecision now carries `reason` and `evaluated`. `reason` names the count and
class that decided the verdict — "1 suppressed ERROR+ defect(s) (baseline/waiver/
judged) not cleared; pass --trust-suppressions (trusted checkout) or --new-since
<ref> (PR)" when the trip is solely from suppressed-but-gated findings, "N active
ERROR+ defect(s)" on a genuine trip (no misdirection to the suppression flags),
and the mixed form when both. `evaluated` names the population: "unsuppressed
(repository baseline/waiver/judged ignored)" by default, "post-suppression …
honored" under --trust-suppressions. Counts come from `gate_breakdown` over the
ANNOTATED findings so they match what the agent reads in `summary`.

Surfaced in the MCP scan gate block, the agent_summary gate block, and on CLI
stderr when the gate trips (never a silent exit 1). Both None when no --fail-on.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dline-5f662e7a4f)

Dogfood friction #3: the secure gate-default (gate on the unsuppressed population)
is correct, but the rollout was silent — a repo whose committed baseline used to
clear --fail-on goes red with no code change, and an agent can't tell whether IT
broke scan or HEAD was already red.

New `baseline_migration_hint`: fires ONLY in the exact 'my repo went red with no
code change' case — a committed .wardline/baseline.yaml exists, the gate trips
SOLELY because baselined defects re-enter the unsuppressed population (no
genuinely-active defect, no waiver/judged-only trip), and neither
--trust-suppressions nor --new-since was passed. It points at both escape hatches
and UPGRADING.md. Silent on a genuine active trip, a trusted/PR-scoped run, or no
baseline file.

Surfaced loudly on CLI stderr and as MCP `scan` gate.migration_hint (None
otherwise). New UPGRADING.md documents the secure-default migration; CHANGELOG
[Unreleased] gains entries for dogfood #1/#2/#3. Secure default unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y/max_findings/include_suppressed, default explain cap (wardline-2957009961)

Dogfood friction #5: the documented cost lever (`where`) did not control cost and
one-shot `explain:true` was unusable on a real repo.

- `where` now filters the agent_summary arrays too (it only filtered the top-level
  findings list before) — a filter matching 0 findings no longer returns dozens of
  suppressed findings inline. agent_summary build takes a display_findings view;
  its summary COUNTS stay whole-project.
- New `summary_only:true` (counts + gate, no bodies — smallest "did the gate pass?"
  payload), `include_suppressed:false` (drop suppressed bodies; counts stay),
  `max_findings:N` (cap returned bodies).
- DEFAULT explain ceiling: `explain:true` inlined provenance for EVERY active
  defect (56,820 chars on one line over a whole repo). Capped at 25 by default;
  max_findings tightens it. Findings past the cap are still returned, sans inline
  explanation.
- New `truncation` block (findings_total/findings_returned/findings_truncated/
  explanations_truncated/summary_only/include_suppressed/max_findings) so a bounded
  payload is never mistaken for "covered everything."

CLI --format agent-summary is byte-unchanged (defaults preserve whole-project,
uncapped behaviour). Docs (agents.md, legis-handoff.md --allow-dirty) + CHANGELOG
updated. Full suite 2476 green; ruff/mypy/mkdocs-strict clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…wardline-be75c6676d follow-up)

The gate reason counted `gate_breakdown(result.findings)` — the annotated population —
so under `--new-since` a delta-scoped-out defect (converted to BASELINED by
apply_delta_scope) was wrongly counted as "suppressed >= threshold", inflating the
count and pointing at `--new-since` (already supplied).

_gate_reason now classifies the defects that ACTUALLY gate (the unsuppressed gate
population, where out-of-delta defects are BASELINED and so excluded) by their state in
the emitted findings. The count is exactly what tripped the gate; the `--new-since`
path no longer over-counts. The trust-suppressions branch is unchanged (gate == emitted
findings there). Locked by extending the new_since differential to assert 1, not 2.

Verified: legis `ScanResultsIn.scan` is typed `dict` (arbitrary mapping), so the new
unsigned `dirty:true` marker rides through intake untouched — confirmed the dev artifact
stays postable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ow-up)

The reported one-shot blowup was 56,820 chars over 34 findings and exceeded the tool
token limit; a default of 25 inlined provenances was still uncomfortably close. Lower
the default ceiling to 10 — comfortably under the limit, still plenty to triage in one
call — and let max_findings RAISE it when the agent explicitly accepts the larger
payload (summary_only covers the common "did the gate pass?" case). New test locks that
max_findings can lift the count above the default. Docs/CHANGELOG updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ped gate (wardline-be75c6676d follow-up)

Dogfood re-test, #2 "Worse" half: when the gate trips solely on baselined findings
summary.active is 0, so next_actions said "no active defects; rescan after edits" —
telling the agent it PASSED while the gate FAILED.

_next_actions_for now takes the GateDecision. With 0 active defects but a tripped
gate it emits a scan action whose reason names the gate failure + the escape hatches
(trust_suppressions / new_since / clear the baseline; see gate.reason /
gate.migration_hint) instead of the passive "rescan after edits". The active>0 and
genuinely-clean paths are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le (wardline-53a44a3bb1)

Dogfood #5: a 401 (token absent from the CLI env) was reported as "could not reach
Filigree" — a wrong diagnosis that sent the agent chasing a broken-bridge / wrong-
endpoint theory. The prior seam work deliberately made 401/403 SOFT (auth failure must
not crash the scan loop); that is kept — only the MESSAGE changes.

EmitResult now carries `status` (the HTTP status when one reached us; None when the
transport itself failed) and `auth_rejected` (the 401/403 case). The CLI prints
"Filigree returned 401 (auth rejected) … set WARDLINE_FILIGREE_TOKEN" vs a 5xx
"server error" vs the genuine "could not reach"; the MCP scan filigree_emit block and
agent_summary carry the same discriminated disabled_reason. 401/403 stays
reachable=False (non-load-bearing), never exit-2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nreachable (#5)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ince the rebrand)

uv.lock still carried the pre-rebrand `clarion` optional-dependency extra; pyproject
already renamed it to `loomweave` (Clarion→Loomweave). Regenerated to match — no
dependency change (blake3 >=1.0, unchanged), just the extra name.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 39b87efd38

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +74 to +77
if self.max_findings is not None:
shown_active = shown_active[: self.max_findings]
shown_suppressed = shown_suppressed[: self.max_findings]
shown_facts = shown_facts[: self.max_findings]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply max_findings across all summary arrays

When an MCP scan returns mixed categories, max_findings only slices each agent-summary array independently, so max_findings: 1 can still inline one active defect, one suppressed defect, and one engine fact in agent_summary while the top-level findings list is capped to one. This defeats the payload-control contract for the embedded summary and can still produce oversized responses in repos with many categories; the cap needs to be applied to the combined displayed finding bodies, not per bucket.

Useful? React with 👍 / 👎.

…bda branch-locality, finding-lifecycle glossary

Resolves three Filigree ready-queue items, built TDD with adversarial review.

PY-WL-110 weft_markers soundness gap (wardline-d62845bb18, P2)
  contradictory_trust.py hardcoded `wardline.decorators.*` as the only marker
  prefix, silently missing contradictory stacks imported from the renamed
  `weft_markers` shim. Now derives _MARKER_NAMES + _MARKER_MODULE_PREFIXES from
  BUILTIN_BOUNDARY_TYPES so the rule can't drift from the grammar. +2 tests.

Lambda bindings are branch-local (wardline-36016d26f3, P3)
  _CURRENT_LAMBDA_BINDINGS was shared across if/else, try/except, match arms,
  leaking a lambda bound in one arm into siblings (over-fire). Each arm now walks
  an arm-local copy.

  NOTE: the first cut of the merge-out (clear()+full-union with the synthetic
  fall-through arm last) introduced a *false-negative regression* — verified
  empirically against HEAD: a lambda rebound in a no-else `if` / no-catch-all
  `match` and called after the branch resolved EXTERNAL_RAW on HEAD but INTEGRAL
  after the naive fix. Replaced with a delta merge (layer each arm's net
  add/changed bindings onto the pre-branch state in source order) that keeps the
  leak fix AND reproduces HEAD's after-branch bindings, so no new false negative.
  +3 over-fire guards, +3 no-false-negative guards.

Finding-lifecycle vocabulary glossary (wardline-26e84dbd44, P3)
  Audited wardline's own usage: `active` is already the canonical word on every
  surface except the CLI summary, which printed `N new`. Relabelled to `N active`
  (text only; no JSON/SARIF/wire field renamed). Added the canonical glossary
  docs/reference/finding-lifecycle-vocabulary.md (single source of truth for
  new/active/suppressed/baselined/waived/judged + emitted-active vs gate
  population) with discipline tests + nav wiring. Cross-tool asks (Filigree
  first-seen "new", legis active) recorded as coordination context, not renamed.

Full suite 2471 passed, ruff + mypy clean, mkdocs --strict OK.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8103dff59

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1175 to +1177
for name, lam in arm.items():
if pre.get(name) is not lam:
parent[name] = lam

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove stale lambda bindings after branch overwrites

When a name was bound to a lambda before a full branch and every arm reassigns it to a non-lambda, each arm's copied binding map has popped the name, but this merge only iterates arm.items(), so the deletion is never reflected in the parent. A later cb(raw) is still resolved through the stale pre-branch lambda, reintroducing false PY-WL-107/108 reports in the branch-locality path this patch changes; removals need to be merged when all reachable arms drop the binding.

Useful? React with 👍 / 👎.

Comment on lines +1333 to +1334
handler_lambdas = _branch_copy(parent_lambdas)
_walk_branch_body(handler.body, function_taint, taint_map, handler_taints, call_site_taints, handler_lambdas)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep try-body lambda bindings visible to handlers

For try blocks where a lambda is assigned before a later statement raises, the except handler executes with that assignment still in scope, but this starts every handler from parent_lambdas instead of the try arm state. A real path such as try: cb = lambda x: eval(x); risky() / except: cb(raw) is therefore no longer resolved through the lambda, so the branch-locality change can miss PY-WL-107/108 sinks that are reachable after a partial try-body execution.

Useful? React with 👍 / 👎.

@tachyon-beep

Copy link
Copy Markdown
Collaborator Author

Superseded by #31 — the head branch was renamed fix/dogfood-2026-06-06-gate-legis-payloadrc/1.0.0rc2 and a review-hardening pass + the 1.0.0rc2 cut were added. GitHub can't move a PR's head branch, so this PR is continued there. The old branch is being deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant