feat(agent-context): layered CLAUDE.md doctrine + evidence_context CLI/MCP + init scaffold by luofang34 · Pull Request #128 · luofang34/Evidence

luofang34 · 2026-05-21T12:05:59Z

Summary

Lands the four-PR design at docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md as one branch. Applies Anthropic's large-codebases guidance (lean layered CLAUDE.md + MCP for structured queries) by surfacing the existing cert/trace/ graph through a new per-module agent-context surface.

PR 1 (c5ac789): doctrine + spec + trace seed — root CLAUDE.md pointer, 3 per-crate CLAUDE.md (≤80 lines each), layered_claude_md_doctrine enforces the convention, SYS-031 + HLR-072 + LLR-079 + TEST-086.
PR 2 (c8f88ee): evidence_core::context library + cargo evidence context CLI verb. Returns the trace + boundary + floors slice for any file / crate / module selector. Adds 4 CONTEXT_* content codes + 3 terminals. Drive-by: fills the KNOWN_SURFACES gap PR 1 left dangling.
PR 4 merge (c6eab78, 765b275): cargo evidence init --with-agent-context scaffolds a starter CLAUDE.md + .claude/settings.json for downstream adopters.
PR 3 merge (3a1c6d7, 1a911ae): evidence_context MCP tool (single-blob shape, mirrors evidence_diff). 7th tool method on Server.
Review fix (7b516c8): distinct CONTEXT_RUNTIME_ERROR content code so I/O faults don't masquerade as CONTEXT_SELECTOR_OUT_OF_SCOPE. New unit test pins the mapping.
Floors bump (60a7ee6): per-crate test_count floors closed to current (evidence-core 392, cargo-evidence 166). Golden fixture regenerated.

Test plan

🤖 Generated with Claude Code

… seed PR 1 of the agent-context-from-evidence design at docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md: applies the lean-layered CLAUDE.md half of Anthropic's large-codebases guidance to this repo, mechanically enforced. Per-crate CLAUDE.md files (each <= 80 lines, local conventions only): - crates/evidence-core/CLAUDE.md - crates/cargo-evidence/CLAUDE.md - crates/evidence-mcp/CLAUDE.md Root CLAUDE.md adds a pointer paragraph to the per-module agent-context surface (CLI + MCP, landing in PRs 2-3 on the same branch). Trace chain seeded for this PR: - SYS-031 — per-module agent context surface (umbrella for all 4 PRs) - HLR-072 — repo demonstrates lean-layered CLAUDE.md doctrine - LLR-079 — layered_claude_md_doctrine enforces per-crate cap + scope - TEST-086 — every workspace crate ships a lean-layered CLAUDE.md The doctrine test in crates/evidence-core/tests/layered_claude_md_doctrine.rs asserts for every workspace crate under crates/: the file exists, has >= 10 non-blank lines, is under 80 lines, and contains its scoped `cargo test -p <crate>` command. Adding a new workspace crate without a CLAUDE.md is a hard CI fail. Subsequent PRs on this branch land the queryable surface: - PR 2: evidence_core::context lib + `cargo evidence context` CLI - PR 3: `evidence_context` MCP tool - PR 4: `cargo evidence init --with-agent-context` scaffold for downstream Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR 2 of the agent-context-from-evidence design at docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md: implements the per-module trace + boundary + floors + nearest CLAUDE.md slice an agent (or human) can query before editing a source file. Library (`evidence_core::context`, 5 sub-files under `src/context/`): - `resolve_selector(workspace_root, raw)` classifies the input as File > Crate > Module > Workspace (priority order on ambiguity). - `context_for(workspace_root, selector)` composes a `ContextReport` by reading `cert/trace/*.toml` once, filtering LLRs whose `modules` field overlaps the selector's module space, rolling each LLR's `traces_to` up to HLR + SYS parents, collecting verifying tests, aggregating every `LLR.emits`, slicing per-crate floor / ceiling rows, and pointing at the nearest layered CLAUDE.md. - `ContextReport` matches design spec §3.1 exactly; round-tripped through serde in unit tests; sorted deterministically (requirements by id, tests by name, codes alphabetically). - `ContextError` is `thiserror`-typed with seven variants and a `content_code()` mapping to the `CONTEXT_*` content codes. CLI (`cargo evidence context [<selector>] [--crate|--module] [--json|--format=jsonl]`): - Three output shapes: human table (default), single pretty JSON, streaming JSONL (report blob + one diagnostic per warning + a `CONTEXT_OK` / `CONTEXT_FAIL` / `CONTEXT_ERROR` terminal). - Exit codes: 0 for `CONTEXT_OK` and the graceful `CONTEXT_NO_TRACE_CONFIGURED` path, 1 for `CONTEXT_ERROR`, 2 for `CONTEXT_FAIL`. - Three hand-built terminals registered in `TERMINAL_CODES`; four content codes registered in `RULES` under the new `Context` domain + `HAND_EMITTED_CLI_CODES`. Each claimed by an LLR's `emits` field per the bijection invariants. Tests: - 15 unit tests under `crates/evidence-core/src/context/tests.rs` cover resolver classification (6 cases), lookup composition (7), and error variants (2). - 4 integration tests at `crates/cargo-evidence/tests/cli_context.rs`: golden byte-diff (`tests/fixtures/golden_context.json`, regenerated via `tools/regen-golden-fixtures.sh`), non-adopter graceful path, invalid-selector failure path, and a human-mode smoke test. Trace seed (implements the LLRs PR 1 set up): - HLR-073 — cargo evidence context CLI verb returns per-module trace slice. - LLR-080 — context::resolver classifies selectors with documented priority. - LLR-081 — context::lookup composes ContextReport from trace/boundary/floors/CLAUDE.md. - LLR-082 — cli::context wires CLI verb to context_for + emits CONTEXT_* terminals. - LLR-083 — context content codes register in RULES and gate the golden wire shape. - TEST-087, TEST-088, TEST-089, TEST-090. Surface catalog: - Adds "context" to `KNOWN_SURFACES` (Group 1 — CLI verbs). - Adds "layered CLAUDE.md (root + crates/*/CLAUDE.md)" (Group 2 — observable contracts) to claim HLR-072's surface, closing the bijection gap PR 1 left. Floors bump (`cert/floors.toml`): - workspace: diagnostic_codes 154→161, terminal_codes 15→18, trace_sys 30→31, trace_hlr 71→73, trace_llr 78→83, trace_test 83→88, known_surfaces 23→25. - per_crate.evidence-core test_count 367→383 (15 new context unit tests + 1 from PR 1's doctrine test). - per_crate.cargo-evidence test_count 154→158 (4 new cli_context integration tests). Module size discipline (workspace 500-line cap): - Split `crates/evidence-core/src/rules.rs` constructors into `rules/constructors.rs` (440 lines, was 512). - Split `crates/cargo-evidence/src/cli/args.rs`'s schema clap types into `args/schema.rs` (474 lines, was 508). Subsequent PR on this branch: - PR 3: `evidence_context` MCP tool — single-blob response wrapping the same `ContextReport`, byte-locked against a parallel golden. - PR 4: `cargo evidence init --with-agent-context` scaffold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…aude/settings.json PR 4 of the agent-context-from-evidence design at docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md: make `cargo evidence init` write a starter root `CLAUDE.md` + a `.claude/settings.json` snippet that registers `evidence-mcp` as an MCP server alongside the existing `cert/` tree, so downstream adopters land on the same lean-layered convention this repo uses. Defaults to on; `--no-agent-context` opts out (mutually exclusive with `--with-agent-context` via clap's `conflicts_with`). Existing files are never overwritten — pre-existing `CLAUDE.md` is preserved unchanged, and a pre-existing `.claude/settings.json` triggers a single stderr advisory describing the merge the user would do by hand. Starter `CLAUDE.md` is intentionally lean (≤30 lines): title + project-description placeholder + one section pointing agents at `evidence_context` (MCP) / `cargo evidence context` for per-module trace + boundary + floors context + one section telling future humans to add `crates/<x>/CLAUDE.md` per workspace crate. No project rules — those belong to the downstream user, not this scaffold. `init.rs` stays under the 500-line cap by carving the new emitter into `cli/init/agent_context.rs` (172 lines); `init.rs` grows by 13 lines (414 → 427) to wire the call through. Trace chain seeded for this PR: - HLR-075 — cargo evidence init --with-agent-context scaffolds downstream CLAUDE.md - LLR-090 — write_agent_context_files emits root CLAUDE.md + .claude/settings.json - TEST-097 — init --with-agent-context scaffolds CLAUDE.md + .claude/settings.json (5 integration tests in crates/cargo-evidence/tests/init_agent_context.rs) Side fixes carried in this PR to keep CI green on the branch: - Add `"init"` and `"layered CLAUDE.md (root + crates/*/CLAUDE.md)"` to `KNOWN_SURFACES`. HLR-072 (PR 1) and HLR-075 (this PR) now resolve under `--require-hlr-surface-bijection`, which the trace_cmd_jsonl::trace_validate_jsonl_happy_path test exercises. - Bump known_surfaces floor 23 → 25 to match. - Replace `&PathBuf` with `&Path` in layered_claude_md_doctrine.rs so clippy's `ptr_arg` stays quiet under -D warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Conflict resolution: - cert/trace/{hlr,llr,tests}.toml — both branches appended new requirement/test blocks; resolved by keeping both (PR 2's HLR-073 / LLR-080..083 / TEST-087..090 plus PR 4's HLR-075 / LLR-090 / TEST-097). - crates/evidence-core/src/trace/surfaces.rs — both branches added to KNOWN_SURFACES (PR 2 added the "context" verb, PR 4 added the "layered CLAUDE.md..." contract). Auto-merge took both entries but left CONTRACTS_START at 9; the verb group now has 10 entries (including "context") so CONTRACTS_START bumps to 10. - cert/floors.toml known_surfaces — bumped 25 → 26 to track the catalog growth (one verb + one contract). Both subagents independently fixed two latent gaps in PR 1: the ptr_arg clippy nit on layered_claude_md_doctrine.rs::crate_dirs and the missing "layered CLAUDE.md..." entry in KNOWN_SURFACES that left HLR-072's surfaces claim dangling. The PR 2 fix landed in c8f88ee; PR 4's identical fix is the no-op side of this merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ontext PR 3 of the agent-context-from-evidence design at docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md: exposes PR 2's CLI verb as the seventh `#[tool]` on the MCP `Server` so MCP-connected agents reach the per-module trace + boundary + floors slice without needing the human CLI surface. Handler (`Server::evidence_context`, single-blob shape mirroring `evidence_diff`): - Validates the request at the handler layer: at most one of `selector` / `crate_name` / `module` may be `Some` — supplying more than one returns `Err(String)` instead of silently picking one (same posture as `evidence_check`'s invalid `mode` value). - Spawns `cargo evidence context [--crate <c> | --module <m> | <selector>] --json` via the existing `subprocess::run_evidence` plumbing. No new transport machinery. - Parses the CLI's single JSON blob into the `context` field; prepends workspace-fallback + version-skew warnings via the shared response helpers. - Tool-layer failures (subprocess can't spawn, times out, produces malformed JSON) surface as well-formed responses with `context = None`, `exit_code = 2`, structured `MCP_*` error diagnostic. No new diagnostic codes — the content vocabulary (`CONTEXT_*`) is owned by the wrapped CLI verb (PR 2). Schema additions in `evidence-mcp::schema`: - `ContextRequest { workspace_path, selector, crate_name, module }` with `#[serde(deny_unknown_fields)]` (typo'd args fail loud). - `ContextToolResponse { success, exit_code, context, warnings, error }` matching `DiffToolResponse`'s blob-style shape. Server-layer split (`server.rs` was approaching its 500-line cap): - New `server/context.rs` holds the `SelectorArg` enum + `pick_selector_arg` request-validation helper, plus five unit tests covering the four happy-path arms and the rejection arm. - New `responses::workspace_fallback_warning` helper returns `Option<Value>` for the single-blob handlers (the streaming helper `prepend_fallback_signal` keeps its in-place semantics). Tests: - `crates/evidence-mcp/tests/context_roundtrip.rs` (5 cases): workspace-overview returns `selector.kind == "workspace"`; file selector resolves into a crate with ≥1 governing LLR; unmapped file (`Cargo.toml`) stays well-formed on the failure path; mutual-exclusivity guard rejects multi-field requests; typo'd field rejected by `deny_unknown_fields`. - `mcp_surface::tools_list_advertises_all_six_verbs` updated to expect `evidence_context` (now seven verbs). - Helpers tweak: hold stdin open until every expected response is read. rmcp's stdio loop hits its drain-timeout the moment stdin returns EOF, truncating any tool-call response whose handler is still running a subprocess (the longer-running context handler surfaced this). Trace chain seeded: - HLR-074 (`f88508e9-eb70-4ee6-bb3e-13f07e4f181b`): claims surface "context" alongside HLR-073 (PR 2's CLI verb). - LLR-084 / LLR-085 / LLR-086: handler wiring, request/response schema, integration-test pin. - TEST-091 / TEST-092 / TEST-093 / TEST-094: covering the integration tests + the request-mapping unit tests. - `cert/floors.toml`: trace_hlr 73→74, trace_llr 83→86, trace_test 88→92, per_crate.evidence-mcp.test_count 45→55. No new diagnostic codes; `MCP_MALFORMED_JSONL` covers the JSON-parse failure path same as `evidence_diff`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Issues from local-branch code review: 1. ContextError::content_code() mislabelled runtime variants. TraceRead / Io / CargoManifestRead / CargoManifestParse all returned "CONTEXT_SELECTOR_OUT_OF_SCOPE" — an IO fault would surface to agents as if the user's selector was bad, leading them to re-prompt the user instead of reporting the real cause. Fix: introduce CONTEXT_RUNTIME_ERROR (severity Error, content code) and route the four runtime variants to it. CONTEXT_SELECTOR_ OUT_OF_SCOPE stays scoped to genuine selector-not-resolved cases. - cert/floors.toml: diagnostic_codes 161 → 162. - cert/trace/llr.toml: LLR-083 emits the new code, description updated from "four content-level" to "five" with the new variant called out. - cert/trace/tests.toml: TEST-086 migrated from singular test_selector to the canonical test_selectors = [...] plural shape (review issue 5 — schema migration consistency). - golden_rules.json + golden_context.json regenerated. 2. match_file silently swallowed canonicalize() failures. Permission denied on a parent dir or a symlink to nonexistent target made the resolver return None — same misclassification problem as issue 1. Fix: match_file now returns Result<Option<...>, ContextError> and propagates IO via ContextError::Io, which maps to the new CONTEXT_RUNTIME_ERROR content code. A new unit test context::tests::runtime_error_variants_map_to_distinct_content_code pins the mapping so a future enum edit can't re-conflate the buckets. Spec §6 updated to list five content codes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Conflict resolution: - cert/trace/{hlr,llr,tests}.toml — PR 3 added HLR-074, LLR-084..086, TEST-091..094 at the tail; PR 4 had earlier added HLR-075, LLR-090, TEST-097 at the same tail. Resolved by ordering numerically: PR 3's entries first, then PR 4's, each with its own [[requirements]] / [[tests]] header. No content of either side was lost. - cert/floors.toml — auto-merge took PR 3's bumps (trace_hlr 73→74, trace_llr 83→86, trace_test 88→92) but the post-merge actual counts are one higher in each dimension because PR 4's entries are also present. Bumped to actual: trace_hlr 75, trace_llr 87, trace_test 93. Post-merge guardrail: - CONTEXT_RUNTIME_ERROR (added in 7b516c8 to fix review issue 1) is emitted by cli::context::handle_runtime_error, so it must register in HAND_EMITTED_CLI_CODES for the every_rules_entry_is_implemented bijection check to pass. Adding to the set alongside the other four CONTEXT_* content codes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cert/floors.toml had slack on two dimensions after the four-PR merge: - per_crate.cargo-evidence test_count: 158 → 166 (PR 2's golden-fixture + cli_context tests, PR 4's init_agent_context tests). - per_crate.evidence-core test_count: 383 → 392 (PR 1's layered_claude_md_doctrine, PR 2's context::tests). The floors_equal_current_no_slack gate enforces floor == current so a later PR can't delete tests along a ratcheted dimension without firing the gate. Bumping both to the post-merge measurements closes the slack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI on three hosts reported: DOCTOR_FLOORS_VIOLATED: floors breached: evidence-core/test_count current=384 floor=392 The discrepancy is an 8-test untracked file (crates/evidence-core/src/policy/boundary/tests.rs, 227 lines) in the local working tree but not on the remote. The 392 floor I set in 60a7ee6 was based on the local measurement; CI sees 384 without the untracked file. Dropping the floor to 384 matches the real, mergeable state. The golden_context.json fixture is unchanged: it captures cargo-evidence's per-crate slice (test_count = 166), not evidence-core's, so the 392 → 384 change here has no fixture impact. (Supersedes da5118d, which mistakenly bundled an empty golden_context.json from a backgrounded regen redirect.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…embed CRLF Windows Check (windows-latest) failed: context --json diverged from golden at line 15: current: "description": "...\r\nstrict-signature-missing...\r\n", golden: "description": "...\nstrict-signature-missing...\n", Root cause: cert/trace/*.toml carries multi-line `description = """ ... """` blocks. The CLI parses them into strings and embeds those strings in the JSON response. On Windows with core.autocrlf=true (default), git checks the TOML files out with CRLF; the CLI then emits "\\r\\n" in the JSON, breaking the byte-diff against the LF golden_context.json fixture (which is already pinned `binary` in .gitattributes). Fix: pin every cert/**/*.toml to LF on checkout. Covers boundary.toml, floors.toml, profiles/*, and trace/*.toml — anything the tool reads into its serialized output. Matches the existing precedent for Cargo.lock + rust-toolchain.toml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sokoly2024 and others added 9 commits May 19, 2026 17:44

luofang34 force-pushed the pr/agent-context-from-evidence branch from da5118d to ada31ba Compare May 21, 2026 12:25

luofang34 merged commit da3164e into main May 21, 2026
168 of 169 checks passed

luofang34 deleted the pr/agent-context-from-evidence branch May 21, 2026 13:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent-context): layered CLAUDE.md doctrine + evidence_context CLI/MCP + init scaffold#128

feat(agent-context): layered CLAUDE.md doctrine + evidence_context CLI/MCP + init scaffold#128
luofang34 merged 10 commits into
mainfrom
pr/agent-context-from-evidence

luofang34 commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luofang34 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luofang34 commented May 21, 2026 •

edited

Loading