Skip to content

feat(agent-context): layered CLAUDE.md doctrine + evidence_context CLI/MCP + init scaffold#128

Merged
luofang34 merged 10 commits into
mainfrom
pr/agent-context-from-evidence
May 21, 2026
Merged

feat(agent-context): layered CLAUDE.md doctrine + evidence_context CLI/MCP + init scaffold#128
luofang34 merged 10 commits into
mainfrom
pr/agent-context-from-evidence

Conversation

@luofang34
Copy link
Copy Markdown
Owner

@luofang34 luofang34 commented May 21, 2026

Summary

Lands the four-PR design at docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md as one branch. Applies Anthropic's large-codebases guidance (lean layered CLAUDE.md + MCP for structured queries) by surfacing the existing cert/trace/ graph through a new per-module agent-context surface.

  • PR 1 (c5ac789): doctrine + spec + trace seed — root CLAUDE.md pointer, 3 per-crate CLAUDE.md (≤80 lines each), layered_claude_md_doctrine enforces the convention, SYS-031 + HLR-072 + LLR-079 + TEST-086.
  • PR 2 (c8f88ee): evidence_core::context library + cargo evidence context CLI verb. Returns the trace + boundary + floors slice for any file / crate / module selector. Adds 4 CONTEXT_* content codes + 3 terminals. Drive-by: fills the KNOWN_SURFACES gap PR 1 left dangling.
  • PR 4 merge (c6eab78, 765b275): cargo evidence init --with-agent-context scaffolds a starter CLAUDE.md + .claude/settings.json for downstream adopters.
  • PR 3 merge (3a1c6d7, 1a911ae): evidence_context MCP tool (single-blob shape, mirrors evidence_diff). 7th tool method on Server.
  • Review fix (7b516c8): distinct CONTEXT_RUNTIME_ERROR content code so I/O faults don't masquerade as CONTEXT_SELECTOR_OUT_OF_SCOPE. New unit test pins the mapping.
  • Floors bump (60a7ee6): per-crate test_count floors closed to current (evidence-core 392, cargo-evidence 166). Golden fixture regenerated.

Test plan

  • cargo fmt --all -- --check passes (CI: fmt gate)
  • cargo clippy --workspace --all-targets -- -D warnings passes (CI: clippy gate)
  • cargo test --workspace passes (CI: test gate) — every individual suite verified green locally; full suite blocked by an environmental disk issue on the dev box that doesn't apply in CI
  • cargo doc --workspace --no-deps passes with -D rustdoc::broken_intra_doc_links -D warnings (CI: doc gate)
  • cargo build --workspace --release (CI: release-build gate)
  • cargo evidence trace --validate --require-hlr-sys-trace --require-hlr-surface-bijection --check-test-selectors (CI: trace gate)
  • cargo evidence check . self-check (CI: agent-facing one-shot)
  • cargo evidence doctor self-dogfood (CI: rigor audit)
  • cargo evidence floors ratchet gate (CI: floors-only-up)
  • evidence-mcp tools/call(evidence_rules) round-trip (CI: MCP smoke)
  • cross-host deterministic_hash parity (CI: determinism-compare)

🤖 Generated with Claude Code

Sokoly2024 and others added 9 commits May 19, 2026 17:44
… seed

PR 1 of the agent-context-from-evidence design at
docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md:
applies the lean-layered CLAUDE.md half of Anthropic's large-codebases
guidance to this repo, mechanically enforced.

Per-crate CLAUDE.md files (each <= 80 lines, local conventions only):
- crates/evidence-core/CLAUDE.md
- crates/cargo-evidence/CLAUDE.md
- crates/evidence-mcp/CLAUDE.md

Root CLAUDE.md adds a pointer paragraph to the per-module agent-context
surface (CLI + MCP, landing in PRs 2-3 on the same branch).

Trace chain seeded for this PR:
- SYS-031 — per-module agent context surface (umbrella for all 4 PRs)
- HLR-072 — repo demonstrates lean-layered CLAUDE.md doctrine
- LLR-079 — layered_claude_md_doctrine enforces per-crate cap + scope
- TEST-086 — every workspace crate ships a lean-layered CLAUDE.md

The doctrine test in crates/evidence-core/tests/layered_claude_md_doctrine.rs
asserts for every workspace crate under crates/: the file exists, has
>= 10 non-blank lines, is under 80 lines, and contains its scoped
`cargo test -p <crate>` command. Adding a new workspace crate without
a CLAUDE.md is a hard CI fail.

Subsequent PRs on this branch land the queryable surface:
- PR 2: evidence_core::context lib + `cargo evidence context` CLI
- PR 3: `evidence_context` MCP tool
- PR 4: `cargo evidence init --with-agent-context` scaffold for downstream

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR 2 of the agent-context-from-evidence design at
docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md:
implements the per-module trace + boundary + floors + nearest CLAUDE.md
slice an agent (or human) can query before editing a source file.

Library (`evidence_core::context`, 5 sub-files under `src/context/`):
- `resolve_selector(workspace_root, raw)` classifies the input as
  File > Crate > Module > Workspace (priority order on ambiguity).
- `context_for(workspace_root, selector)` composes a `ContextReport`
  by reading `cert/trace/*.toml` once, filtering LLRs whose `modules`
  field overlaps the selector's module space, rolling each LLR's
  `traces_to` up to HLR + SYS parents, collecting verifying tests,
  aggregating every `LLR.emits`, slicing per-crate floor / ceiling
  rows, and pointing at the nearest layered CLAUDE.md.
- `ContextReport` matches design spec §3.1 exactly; round-tripped
  through serde in unit tests; sorted deterministically (requirements
  by id, tests by name, codes alphabetically).
- `ContextError` is `thiserror`-typed with seven variants and a
  `content_code()` mapping to the `CONTEXT_*` content codes.

CLI (`cargo evidence context [<selector>] [--crate|--module] [--json|--format=jsonl]`):
- Three output shapes: human table (default), single pretty JSON,
  streaming JSONL (report blob + one diagnostic per warning + a
  `CONTEXT_OK` / `CONTEXT_FAIL` / `CONTEXT_ERROR` terminal).
- Exit codes: 0 for `CONTEXT_OK` and the graceful
  `CONTEXT_NO_TRACE_CONFIGURED` path, 1 for `CONTEXT_ERROR`, 2 for
  `CONTEXT_FAIL`.
- Three hand-built terminals registered in `TERMINAL_CODES`; four
  content codes registered in `RULES` under the new `Context`
  domain + `HAND_EMITTED_CLI_CODES`. Each claimed by an LLR's
  `emits` field per the bijection invariants.

Tests:
- 15 unit tests under `crates/evidence-core/src/context/tests.rs`
  cover resolver classification (6 cases), lookup composition (7),
  and error variants (2).
- 4 integration tests at `crates/cargo-evidence/tests/cli_context.rs`:
  golden byte-diff (`tests/fixtures/golden_context.json`,
  regenerated via `tools/regen-golden-fixtures.sh`),
  non-adopter graceful path, invalid-selector failure path, and a
  human-mode smoke test.

Trace seed (implements the LLRs PR 1 set up):
- HLR-073 — cargo evidence context CLI verb returns per-module trace slice.
- LLR-080 — context::resolver classifies selectors with documented priority.
- LLR-081 — context::lookup composes ContextReport from trace/boundary/floors/CLAUDE.md.
- LLR-082 — cli::context wires CLI verb to context_for + emits CONTEXT_* terminals.
- LLR-083 — context content codes register in RULES and gate the golden wire shape.
- TEST-087, TEST-088, TEST-089, TEST-090.

Surface catalog:
- Adds "context" to `KNOWN_SURFACES` (Group 1 — CLI verbs).
- Adds "layered CLAUDE.md (root + crates/*/CLAUDE.md)" (Group 2 —
  observable contracts) to claim HLR-072's surface, closing the
  bijection gap PR 1 left.

Floors bump (`cert/floors.toml`):
- workspace: diagnostic_codes 154→161, terminal_codes 15→18,
  trace_sys 30→31, trace_hlr 71→73, trace_llr 78→83, trace_test
  83→88, known_surfaces 23→25.
- per_crate.evidence-core test_count 367→383
  (15 new context unit tests + 1 from PR 1's doctrine test).
- per_crate.cargo-evidence test_count 154→158
  (4 new cli_context integration tests).

Module size discipline (workspace 500-line cap):
- Split `crates/evidence-core/src/rules.rs` constructors into
  `rules/constructors.rs` (440 lines, was 512).
- Split `crates/cargo-evidence/src/cli/args.rs`'s schema clap types
  into `args/schema.rs` (474 lines, was 508).

Subsequent PR on this branch:
- PR 3: `evidence_context` MCP tool — single-blob response wrapping
  the same `ContextReport`, byte-locked against a parallel golden.
- PR 4: `cargo evidence init --with-agent-context` scaffold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aude/settings.json

PR 4 of the agent-context-from-evidence design at
docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md:
make `cargo evidence init` write a starter root `CLAUDE.md` + a
`.claude/settings.json` snippet that registers `evidence-mcp` as an
MCP server alongside the existing `cert/` tree, so downstream
adopters land on the same lean-layered convention this repo uses.

Defaults to on; `--no-agent-context` opts out (mutually exclusive
with `--with-agent-context` via clap's `conflicts_with`). Existing
files are never overwritten — pre-existing `CLAUDE.md` is preserved
unchanged, and a pre-existing `.claude/settings.json` triggers a
single stderr advisory describing the merge the user would do by
hand.

Starter `CLAUDE.md` is intentionally lean (≤30 lines): title +
project-description placeholder + one section pointing agents at
`evidence_context` (MCP) / `cargo evidence context` for per-module
trace + boundary + floors context + one section telling future
humans to add `crates/<x>/CLAUDE.md` per workspace crate. No
project rules — those belong to the downstream user, not this
scaffold.

`init.rs` stays under the 500-line cap by carving the new emitter
into `cli/init/agent_context.rs` (172 lines); `init.rs` grows by 13
lines (414 → 427) to wire the call through.

Trace chain seeded for this PR:
- HLR-075 — cargo evidence init --with-agent-context scaffolds
  downstream CLAUDE.md
- LLR-090 — write_agent_context_files emits root CLAUDE.md +
  .claude/settings.json
- TEST-097 — init --with-agent-context scaffolds CLAUDE.md +
  .claude/settings.json (5 integration tests in
  crates/cargo-evidence/tests/init_agent_context.rs)

Side fixes carried in this PR to keep CI green on the branch:
- Add `"init"` and `"layered CLAUDE.md (root + crates/*/CLAUDE.md)"`
  to `KNOWN_SURFACES`. HLR-072 (PR 1) and HLR-075 (this PR) now
  resolve under `--require-hlr-surface-bijection`, which the
  trace_cmd_jsonl::trace_validate_jsonl_happy_path test exercises.
- Bump known_surfaces floor 23 → 25 to match.
- Replace `&PathBuf` with `&Path` in layered_claude_md_doctrine.rs
  so clippy's `ptr_arg` stays quiet under -D warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Conflict resolution:
- cert/trace/{hlr,llr,tests}.toml — both branches appended new
  requirement/test blocks; resolved by keeping both (PR 2's HLR-073 /
  LLR-080..083 / TEST-087..090 plus PR 4's HLR-075 / LLR-090 / TEST-097).
- crates/evidence-core/src/trace/surfaces.rs — both branches added
  to KNOWN_SURFACES (PR 2 added the "context" verb, PR 4 added the
  "layered CLAUDE.md..." contract). Auto-merge took both entries but
  left CONTRACTS_START at 9; the verb group now has 10 entries
  (including "context") so CONTRACTS_START bumps to 10.
- cert/floors.toml known_surfaces — bumped 25 → 26 to track the
  catalog growth (one verb + one contract).

Both subagents independently fixed two latent gaps in PR 1: the
ptr_arg clippy nit on layered_claude_md_doctrine.rs::crate_dirs and
the missing "layered CLAUDE.md..." entry in KNOWN_SURFACES that left
HLR-072's surfaces claim dangling. The PR 2 fix landed in c8f88ee;
PR 4's identical fix is the no-op side of this merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ontext

PR 3 of the agent-context-from-evidence design at
docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md:
exposes PR 2's CLI verb as the seventh `#[tool]` on the MCP `Server`
so MCP-connected agents reach the per-module trace + boundary + floors
slice without needing the human CLI surface.

Handler (`Server::evidence_context`, single-blob shape mirroring
`evidence_diff`):
- Validates the request at the handler layer: at most one of
  `selector` / `crate_name` / `module` may be `Some` — supplying
  more than one returns `Err(String)` instead of silently picking
  one (same posture as `evidence_check`'s invalid `mode` value).
- Spawns `cargo evidence context [--crate <c> | --module <m> |
  <selector>] --json` via the existing `subprocess::run_evidence`
  plumbing. No new transport machinery.
- Parses the CLI's single JSON blob into the `context` field;
  prepends workspace-fallback + version-skew warnings via the
  shared response helpers.
- Tool-layer failures (subprocess can't spawn, times out, produces
  malformed JSON) surface as well-formed responses with
  `context = None`, `exit_code = 2`, structured `MCP_*` error
  diagnostic. No new diagnostic codes — the content vocabulary
  (`CONTEXT_*`) is owned by the wrapped CLI verb (PR 2).

Schema additions in `evidence-mcp::schema`:
- `ContextRequest { workspace_path, selector, crate_name, module }`
  with `#[serde(deny_unknown_fields)]` (typo'd args fail loud).
- `ContextToolResponse { success, exit_code, context, warnings,
  error }` matching `DiffToolResponse`'s blob-style shape.

Server-layer split (`server.rs` was approaching its 500-line cap):
- New `server/context.rs` holds the `SelectorArg` enum +
  `pick_selector_arg` request-validation helper, plus five unit
  tests covering the four happy-path arms and the rejection arm.
- New `responses::workspace_fallback_warning` helper returns
  `Option<Value>` for the single-blob handlers (the streaming
  helper `prepend_fallback_signal` keeps its in-place semantics).

Tests:
- `crates/evidence-mcp/tests/context_roundtrip.rs` (5 cases):
  workspace-overview returns `selector.kind == "workspace"`; file
  selector resolves into a crate with ≥1 governing LLR; unmapped
  file (`Cargo.toml`) stays well-formed on the failure path;
  mutual-exclusivity guard rejects multi-field requests; typo'd
  field rejected by `deny_unknown_fields`.
- `mcp_surface::tools_list_advertises_all_six_verbs` updated to
  expect `evidence_context` (now seven verbs).
- Helpers tweak: hold stdin open until every expected response is
  read. rmcp's stdio loop hits its drain-timeout the moment stdin
  returns EOF, truncating any tool-call response whose handler is
  still running a subprocess (the longer-running context handler
  surfaced this).

Trace chain seeded:
- HLR-074 (`f88508e9-eb70-4ee6-bb3e-13f07e4f181b`): claims surface
  "context" alongside HLR-073 (PR 2's CLI verb).
- LLR-084 / LLR-085 / LLR-086: handler wiring, request/response
  schema, integration-test pin.
- TEST-091 / TEST-092 / TEST-093 / TEST-094: covering the
  integration tests + the request-mapping unit tests.
- `cert/floors.toml`: trace_hlr 73→74, trace_llr 83→86, trace_test
  88→92, per_crate.evidence-mcp.test_count 45→55.

No new diagnostic codes; `MCP_MALFORMED_JSONL` covers the
JSON-parse failure path same as `evidence_diff`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issues from local-branch code review:

1. ContextError::content_code() mislabelled runtime variants.
   TraceRead / Io / CargoManifestRead / CargoManifestParse all
   returned "CONTEXT_SELECTOR_OUT_OF_SCOPE" — an IO fault would
   surface to agents as if the user's selector was bad, leading them
   to re-prompt the user instead of reporting the real cause.

   Fix: introduce CONTEXT_RUNTIME_ERROR (severity Error, content
   code) and route the four runtime variants to it. CONTEXT_SELECTOR_
   OUT_OF_SCOPE stays scoped to genuine selector-not-resolved cases.

   - cert/floors.toml: diagnostic_codes 161 → 162.
   - cert/trace/llr.toml: LLR-083 emits the new code, description
     updated from "four content-level" to "five" with the new
     variant called out.
   - cert/trace/tests.toml: TEST-086 migrated from singular
     test_selector to the canonical test_selectors = [...]
     plural shape (review issue 5 — schema migration consistency).
   - golden_rules.json + golden_context.json regenerated.

2. match_file silently swallowed canonicalize() failures.
   Permission denied on a parent dir or a symlink to nonexistent
   target made the resolver return None — same misclassification
   problem as issue 1.

   Fix: match_file now returns Result<Option<...>, ContextError>
   and propagates IO via ContextError::Io, which maps to the new
   CONTEXT_RUNTIME_ERROR content code.

A new unit test
context::tests::runtime_error_variants_map_to_distinct_content_code
pins the mapping so a future enum edit can't re-conflate the buckets.

Spec §6 updated to list five content codes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Conflict resolution:
- cert/trace/{hlr,llr,tests}.toml — PR 3 added HLR-074, LLR-084..086,
  TEST-091..094 at the tail; PR 4 had earlier added HLR-075, LLR-090,
  TEST-097 at the same tail. Resolved by ordering numerically:
  PR 3's entries first, then PR 4's, each with its own [[requirements]]
  / [[tests]] header. No content of either side was lost.
- cert/floors.toml — auto-merge took PR 3's bumps (trace_hlr 73→74,
  trace_llr 83→86, trace_test 88→92) but the post-merge actual counts
  are one higher in each dimension because PR 4's entries are also
  present. Bumped to actual: trace_hlr 75, trace_llr 87, trace_test 93.

Post-merge guardrail:
- CONTEXT_RUNTIME_ERROR (added in 7b516c8 to fix review issue 1) is
  emitted by cli::context::handle_runtime_error, so it must register
  in HAND_EMITTED_CLI_CODES for the every_rules_entry_is_implemented
  bijection check to pass. Adding to the set alongside the other four
  CONTEXT_* content codes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cert/floors.toml had slack on two dimensions after the four-PR merge:

- per_crate.cargo-evidence test_count: 158 → 166 (PR 2's golden-fixture
  + cli_context tests, PR 4's init_agent_context tests).
- per_crate.evidence-core test_count: 383 → 392 (PR 1's
  layered_claude_md_doctrine, PR 2's context::tests).

The floors_equal_current_no_slack gate enforces floor == current so a
later PR can't delete tests along a ratcheted dimension without
firing the gate. Bumping both to the post-merge measurements closes
the slack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI on three hosts reported:

  DOCTOR_FLOORS_VIOLATED: floors breached:
  evidence-core/test_count current=384 floor=392

The discrepancy is an 8-test untracked file
(crates/evidence-core/src/policy/boundary/tests.rs, 227 lines) in the
local working tree but not on the remote. The 392 floor I set in
60a7ee6 was based on the local measurement; CI sees 384 without the
untracked file. Dropping the floor to 384 matches the real,
mergeable state.

The golden_context.json fixture is unchanged: it captures
cargo-evidence's per-crate slice (test_count = 166), not
evidence-core's, so the 392 → 384 change here has no fixture impact.
(Supersedes da5118d, which mistakenly bundled an empty
golden_context.json from a backgrounded regen redirect.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@luofang34 luofang34 force-pushed the pr/agent-context-from-evidence branch from da5118d to ada31ba Compare May 21, 2026 12:25
…embed CRLF

Windows Check (windows-latest) failed:

  context --json diverged from golden at line 15:
    current:  "description": "...\r\nstrict-signature-missing...\r\n",
    golden:   "description": "...\nstrict-signature-missing...\n",

Root cause: cert/trace/*.toml carries multi-line `description = """ ... """`
blocks. The CLI parses them into strings and embeds those strings in
the JSON response. On Windows with core.autocrlf=true (default), git
checks the TOML files out with CRLF; the CLI then emits "\\r\\n" in the
JSON, breaking the byte-diff against the LF golden_context.json
fixture (which is already pinned `binary` in .gitattributes).

Fix: pin every cert/**/*.toml to LF on checkout. Covers boundary.toml,
floors.toml, profiles/*, and trace/*.toml — anything the tool reads
into its serialized output.

Matches the existing precedent for Cargo.lock + rust-toolchain.toml.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@luofang34 luofang34 merged commit da3164e into main May 21, 2026
168 of 169 checks passed
@luofang34 luofang34 deleted the pr/agent-context-from-evidence branch May 21, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants