feat(agent-context): layered CLAUDE.md doctrine + evidence_context CLI/MCP + init scaffold#128
Merged
Merged
Conversation
… seed PR 1 of the agent-context-from-evidence design at docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md: applies the lean-layered CLAUDE.md half of Anthropic's large-codebases guidance to this repo, mechanically enforced. Per-crate CLAUDE.md files (each <= 80 lines, local conventions only): - crates/evidence-core/CLAUDE.md - crates/cargo-evidence/CLAUDE.md - crates/evidence-mcp/CLAUDE.md Root CLAUDE.md adds a pointer paragraph to the per-module agent-context surface (CLI + MCP, landing in PRs 2-3 on the same branch). Trace chain seeded for this PR: - SYS-031 — per-module agent context surface (umbrella for all 4 PRs) - HLR-072 — repo demonstrates lean-layered CLAUDE.md doctrine - LLR-079 — layered_claude_md_doctrine enforces per-crate cap + scope - TEST-086 — every workspace crate ships a lean-layered CLAUDE.md The doctrine test in crates/evidence-core/tests/layered_claude_md_doctrine.rs asserts for every workspace crate under crates/: the file exists, has >= 10 non-blank lines, is under 80 lines, and contains its scoped `cargo test -p <crate>` command. Adding a new workspace crate without a CLAUDE.md is a hard CI fail. Subsequent PRs on this branch land the queryable surface: - PR 2: evidence_core::context lib + `cargo evidence context` CLI - PR 3: `evidence_context` MCP tool - PR 4: `cargo evidence init --with-agent-context` scaffold for downstream Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR 2 of the agent-context-from-evidence design at docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md: implements the per-module trace + boundary + floors + nearest CLAUDE.md slice an agent (or human) can query before editing a source file. Library (`evidence_core::context`, 5 sub-files under `src/context/`): - `resolve_selector(workspace_root, raw)` classifies the input as File > Crate > Module > Workspace (priority order on ambiguity). - `context_for(workspace_root, selector)` composes a `ContextReport` by reading `cert/trace/*.toml` once, filtering LLRs whose `modules` field overlaps the selector's module space, rolling each LLR's `traces_to` up to HLR + SYS parents, collecting verifying tests, aggregating every `LLR.emits`, slicing per-crate floor / ceiling rows, and pointing at the nearest layered CLAUDE.md. - `ContextReport` matches design spec §3.1 exactly; round-tripped through serde in unit tests; sorted deterministically (requirements by id, tests by name, codes alphabetically). - `ContextError` is `thiserror`-typed with seven variants and a `content_code()` mapping to the `CONTEXT_*` content codes. CLI (`cargo evidence context [<selector>] [--crate|--module] [--json|--format=jsonl]`): - Three output shapes: human table (default), single pretty JSON, streaming JSONL (report blob + one diagnostic per warning + a `CONTEXT_OK` / `CONTEXT_FAIL` / `CONTEXT_ERROR` terminal). - Exit codes: 0 for `CONTEXT_OK` and the graceful `CONTEXT_NO_TRACE_CONFIGURED` path, 1 for `CONTEXT_ERROR`, 2 for `CONTEXT_FAIL`. - Three hand-built terminals registered in `TERMINAL_CODES`; four content codes registered in `RULES` under the new `Context` domain + `HAND_EMITTED_CLI_CODES`. Each claimed by an LLR's `emits` field per the bijection invariants. Tests: - 15 unit tests under `crates/evidence-core/src/context/tests.rs` cover resolver classification (6 cases), lookup composition (7), and error variants (2). - 4 integration tests at `crates/cargo-evidence/tests/cli_context.rs`: golden byte-diff (`tests/fixtures/golden_context.json`, regenerated via `tools/regen-golden-fixtures.sh`), non-adopter graceful path, invalid-selector failure path, and a human-mode smoke test. Trace seed (implements the LLRs PR 1 set up): - HLR-073 — cargo evidence context CLI verb returns per-module trace slice. - LLR-080 — context::resolver classifies selectors with documented priority. - LLR-081 — context::lookup composes ContextReport from trace/boundary/floors/CLAUDE.md. - LLR-082 — cli::context wires CLI verb to context_for + emits CONTEXT_* terminals. - LLR-083 — context content codes register in RULES and gate the golden wire shape. - TEST-087, TEST-088, TEST-089, TEST-090. Surface catalog: - Adds "context" to `KNOWN_SURFACES` (Group 1 — CLI verbs). - Adds "layered CLAUDE.md (root + crates/*/CLAUDE.md)" (Group 2 — observable contracts) to claim HLR-072's surface, closing the bijection gap PR 1 left. Floors bump (`cert/floors.toml`): - workspace: diagnostic_codes 154→161, terminal_codes 15→18, trace_sys 30→31, trace_hlr 71→73, trace_llr 78→83, trace_test 83→88, known_surfaces 23→25. - per_crate.evidence-core test_count 367→383 (15 new context unit tests + 1 from PR 1's doctrine test). - per_crate.cargo-evidence test_count 154→158 (4 new cli_context integration tests). Module size discipline (workspace 500-line cap): - Split `crates/evidence-core/src/rules.rs` constructors into `rules/constructors.rs` (440 lines, was 512). - Split `crates/cargo-evidence/src/cli/args.rs`'s schema clap types into `args/schema.rs` (474 lines, was 508). Subsequent PR on this branch: - PR 3: `evidence_context` MCP tool — single-blob response wrapping the same `ContextReport`, byte-locked against a parallel golden. - PR 4: `cargo evidence init --with-agent-context` scaffold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aude/settings.json PR 4 of the agent-context-from-evidence design at docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md: make `cargo evidence init` write a starter root `CLAUDE.md` + a `.claude/settings.json` snippet that registers `evidence-mcp` as an MCP server alongside the existing `cert/` tree, so downstream adopters land on the same lean-layered convention this repo uses. Defaults to on; `--no-agent-context` opts out (mutually exclusive with `--with-agent-context` via clap's `conflicts_with`). Existing files are never overwritten — pre-existing `CLAUDE.md` is preserved unchanged, and a pre-existing `.claude/settings.json` triggers a single stderr advisory describing the merge the user would do by hand. Starter `CLAUDE.md` is intentionally lean (≤30 lines): title + project-description placeholder + one section pointing agents at `evidence_context` (MCP) / `cargo evidence context` for per-module trace + boundary + floors context + one section telling future humans to add `crates/<x>/CLAUDE.md` per workspace crate. No project rules — those belong to the downstream user, not this scaffold. `init.rs` stays under the 500-line cap by carving the new emitter into `cli/init/agent_context.rs` (172 lines); `init.rs` grows by 13 lines (414 → 427) to wire the call through. Trace chain seeded for this PR: - HLR-075 — cargo evidence init --with-agent-context scaffolds downstream CLAUDE.md - LLR-090 — write_agent_context_files emits root CLAUDE.md + .claude/settings.json - TEST-097 — init --with-agent-context scaffolds CLAUDE.md + .claude/settings.json (5 integration tests in crates/cargo-evidence/tests/init_agent_context.rs) Side fixes carried in this PR to keep CI green on the branch: - Add `"init"` and `"layered CLAUDE.md (root + crates/*/CLAUDE.md)"` to `KNOWN_SURFACES`. HLR-072 (PR 1) and HLR-075 (this PR) now resolve under `--require-hlr-surface-bijection`, which the trace_cmd_jsonl::trace_validate_jsonl_happy_path test exercises. - Bump known_surfaces floor 23 → 25 to match. - Replace `&PathBuf` with `&Path` in layered_claude_md_doctrine.rs so clippy's `ptr_arg` stays quiet under -D warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Conflict resolution:
- cert/trace/{hlr,llr,tests}.toml — both branches appended new
requirement/test blocks; resolved by keeping both (PR 2's HLR-073 /
LLR-080..083 / TEST-087..090 plus PR 4's HLR-075 / LLR-090 / TEST-097).
- crates/evidence-core/src/trace/surfaces.rs — both branches added
to KNOWN_SURFACES (PR 2 added the "context" verb, PR 4 added the
"layered CLAUDE.md..." contract). Auto-merge took both entries but
left CONTRACTS_START at 9; the verb group now has 10 entries
(including "context") so CONTRACTS_START bumps to 10.
- cert/floors.toml known_surfaces — bumped 25 → 26 to track the
catalog growth (one verb + one contract).
Both subagents independently fixed two latent gaps in PR 1: the
ptr_arg clippy nit on layered_claude_md_doctrine.rs::crate_dirs and
the missing "layered CLAUDE.md..." entry in KNOWN_SURFACES that left
HLR-072's surfaces claim dangling. The PR 2 fix landed in c8f88ee;
PR 4's identical fix is the no-op side of this merge.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ontext
PR 3 of the agent-context-from-evidence design at
docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.md:
exposes PR 2's CLI verb as the seventh `#[tool]` on the MCP `Server`
so MCP-connected agents reach the per-module trace + boundary + floors
slice without needing the human CLI surface.
Handler (`Server::evidence_context`, single-blob shape mirroring
`evidence_diff`):
- Validates the request at the handler layer: at most one of
`selector` / `crate_name` / `module` may be `Some` — supplying
more than one returns `Err(String)` instead of silently picking
one (same posture as `evidence_check`'s invalid `mode` value).
- Spawns `cargo evidence context [--crate <c> | --module <m> |
<selector>] --json` via the existing `subprocess::run_evidence`
plumbing. No new transport machinery.
- Parses the CLI's single JSON blob into the `context` field;
prepends workspace-fallback + version-skew warnings via the
shared response helpers.
- Tool-layer failures (subprocess can't spawn, times out, produces
malformed JSON) surface as well-formed responses with
`context = None`, `exit_code = 2`, structured `MCP_*` error
diagnostic. No new diagnostic codes — the content vocabulary
(`CONTEXT_*`) is owned by the wrapped CLI verb (PR 2).
Schema additions in `evidence-mcp::schema`:
- `ContextRequest { workspace_path, selector, crate_name, module }`
with `#[serde(deny_unknown_fields)]` (typo'd args fail loud).
- `ContextToolResponse { success, exit_code, context, warnings,
error }` matching `DiffToolResponse`'s blob-style shape.
Server-layer split (`server.rs` was approaching its 500-line cap):
- New `server/context.rs` holds the `SelectorArg` enum +
`pick_selector_arg` request-validation helper, plus five unit
tests covering the four happy-path arms and the rejection arm.
- New `responses::workspace_fallback_warning` helper returns
`Option<Value>` for the single-blob handlers (the streaming
helper `prepend_fallback_signal` keeps its in-place semantics).
Tests:
- `crates/evidence-mcp/tests/context_roundtrip.rs` (5 cases):
workspace-overview returns `selector.kind == "workspace"`; file
selector resolves into a crate with ≥1 governing LLR; unmapped
file (`Cargo.toml`) stays well-formed on the failure path;
mutual-exclusivity guard rejects multi-field requests; typo'd
field rejected by `deny_unknown_fields`.
- `mcp_surface::tools_list_advertises_all_six_verbs` updated to
expect `evidence_context` (now seven verbs).
- Helpers tweak: hold stdin open until every expected response is
read. rmcp's stdio loop hits its drain-timeout the moment stdin
returns EOF, truncating any tool-call response whose handler is
still running a subprocess (the longer-running context handler
surfaced this).
Trace chain seeded:
- HLR-074 (`f88508e9-eb70-4ee6-bb3e-13f07e4f181b`): claims surface
"context" alongside HLR-073 (PR 2's CLI verb).
- LLR-084 / LLR-085 / LLR-086: handler wiring, request/response
schema, integration-test pin.
- TEST-091 / TEST-092 / TEST-093 / TEST-094: covering the
integration tests + the request-mapping unit tests.
- `cert/floors.toml`: trace_hlr 73→74, trace_llr 83→86, trace_test
88→92, per_crate.evidence-mcp.test_count 45→55.
No new diagnostic codes; `MCP_MALFORMED_JSONL` covers the
JSON-parse failure path same as `evidence_diff`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issues from local-branch code review:
1. ContextError::content_code() mislabelled runtime variants.
TraceRead / Io / CargoManifestRead / CargoManifestParse all
returned "CONTEXT_SELECTOR_OUT_OF_SCOPE" — an IO fault would
surface to agents as if the user's selector was bad, leading them
to re-prompt the user instead of reporting the real cause.
Fix: introduce CONTEXT_RUNTIME_ERROR (severity Error, content
code) and route the four runtime variants to it. CONTEXT_SELECTOR_
OUT_OF_SCOPE stays scoped to genuine selector-not-resolved cases.
- cert/floors.toml: diagnostic_codes 161 → 162.
- cert/trace/llr.toml: LLR-083 emits the new code, description
updated from "four content-level" to "five" with the new
variant called out.
- cert/trace/tests.toml: TEST-086 migrated from singular
test_selector to the canonical test_selectors = [...]
plural shape (review issue 5 — schema migration consistency).
- golden_rules.json + golden_context.json regenerated.
2. match_file silently swallowed canonicalize() failures.
Permission denied on a parent dir or a symlink to nonexistent
target made the resolver return None — same misclassification
problem as issue 1.
Fix: match_file now returns Result<Option<...>, ContextError>
and propagates IO via ContextError::Io, which maps to the new
CONTEXT_RUNTIME_ERROR content code.
A new unit test
context::tests::runtime_error_variants_map_to_distinct_content_code
pins the mapping so a future enum edit can't re-conflate the buckets.
Spec §6 updated to list five content codes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Conflict resolution:
- cert/trace/{hlr,llr,tests}.toml — PR 3 added HLR-074, LLR-084..086,
TEST-091..094 at the tail; PR 4 had earlier added HLR-075, LLR-090,
TEST-097 at the same tail. Resolved by ordering numerically:
PR 3's entries first, then PR 4's, each with its own [[requirements]]
/ [[tests]] header. No content of either side was lost.
- cert/floors.toml — auto-merge took PR 3's bumps (trace_hlr 73→74,
trace_llr 83→86, trace_test 88→92) but the post-merge actual counts
are one higher in each dimension because PR 4's entries are also
present. Bumped to actual: trace_hlr 75, trace_llr 87, trace_test 93.
Post-merge guardrail:
- CONTEXT_RUNTIME_ERROR (added in 7b516c8 to fix review issue 1) is
emitted by cli::context::handle_runtime_error, so it must register
in HAND_EMITTED_CLI_CODES for the every_rules_entry_is_implemented
bijection check to pass. Adding to the set alongside the other four
CONTEXT_* content codes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cert/floors.toml had slack on two dimensions after the four-PR merge: - per_crate.cargo-evidence test_count: 158 → 166 (PR 2's golden-fixture + cli_context tests, PR 4's init_agent_context tests). - per_crate.evidence-core test_count: 383 → 392 (PR 1's layered_claude_md_doctrine, PR 2's context::tests). The floors_equal_current_no_slack gate enforces floor == current so a later PR can't delete tests along a ratcheted dimension without firing the gate. Bumping both to the post-merge measurements closes the slack. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI on three hosts reported: DOCTOR_FLOORS_VIOLATED: floors breached: evidence-core/test_count current=384 floor=392 The discrepancy is an 8-test untracked file (crates/evidence-core/src/policy/boundary/tests.rs, 227 lines) in the local working tree but not on the remote. The 392 floor I set in 60a7ee6 was based on the local measurement; CI sees 384 without the untracked file. Dropping the floor to 384 matches the real, mergeable state. The golden_context.json fixture is unchanged: it captures cargo-evidence's per-crate slice (test_count = 166), not evidence-core's, so the 392 → 384 change here has no fixture impact. (Supersedes da5118d, which mistakenly bundled an empty golden_context.json from a backgrounded regen redirect.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
da5118d to
ada31ba
Compare
…embed CRLF
Windows Check (windows-latest) failed:
context --json diverged from golden at line 15:
current: "description": "...\r\nstrict-signature-missing...\r\n",
golden: "description": "...\nstrict-signature-missing...\n",
Root cause: cert/trace/*.toml carries multi-line `description = """ ... """`
blocks. The CLI parses them into strings and embeds those strings in
the JSON response. On Windows with core.autocrlf=true (default), git
checks the TOML files out with CRLF; the CLI then emits "\\r\\n" in the
JSON, breaking the byte-diff against the LF golden_context.json
fixture (which is already pinned `binary` in .gitattributes).
Fix: pin every cert/**/*.toml to LF on checkout. Covers boundary.toml,
floors.toml, profiles/*, and trace/*.toml — anything the tool reads
into its serialized output.
Matches the existing precedent for Cargo.lock + rust-toolchain.toml.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the four-PR design at
docs/superpowers/specs/2026-05-19-agent-context-from-evidence-design.mdas one branch. Applies Anthropic's large-codebases guidance (lean layered CLAUDE.md + MCP for structured queries) by surfacing the existingcert/trace/graph through a new per-module agent-context surface.c5ac789): doctrine + spec + trace seed — rootCLAUDE.mdpointer, 3 per-crateCLAUDE.md(≤80 lines each),layered_claude_md_doctrineenforces the convention, SYS-031 + HLR-072 + LLR-079 + TEST-086.c8f88ee):evidence_core::contextlibrary +cargo evidence contextCLI verb. Returns the trace + boundary + floors slice for any file / crate / module selector. Adds 4CONTEXT_*content codes + 3 terminals. Drive-by: fills theKNOWN_SURFACESgap PR 1 left dangling.c6eab78,765b275):cargo evidence init --with-agent-contextscaffolds a starterCLAUDE.md+.claude/settings.jsonfor downstream adopters.3a1c6d7,1a911ae):evidence_contextMCP tool (single-blob shape, mirrorsevidence_diff). 7th tool method onServer.7b516c8): distinctCONTEXT_RUNTIME_ERRORcontent code so I/O faults don't masquerade asCONTEXT_SELECTOR_OUT_OF_SCOPE. New unit test pins the mapping.60a7ee6): per-cratetest_countfloors closed to current (evidence-core 392, cargo-evidence 166). Golden fixture regenerated.Test plan
cargo fmt --all -- --checkpasses (CI: fmt gate)cargo clippy --workspace --all-targets -- -D warningspasses (CI: clippy gate)cargo test --workspacepasses (CI: test gate) — every individual suite verified green locally; full suite blocked by an environmental disk issue on the dev box that doesn't apply in CIcargo doc --workspace --no-depspasses with-D rustdoc::broken_intra_doc_links -D warnings(CI: doc gate)cargo build --workspace --release(CI: release-build gate)cargo evidence trace --validate --require-hlr-sys-trace --require-hlr-surface-bijection --check-test-selectors(CI: trace gate)cargo evidence check .self-check (CI: agent-facing one-shot)cargo evidence doctorself-dogfood (CI: rigor audit)cargo evidence floorsratchet gate (CI: floors-only-up)evidence-mcptools/call(evidence_rules)round-trip (CI: MCP smoke)deterministic_hashparity (CI: determinism-compare)🤖 Generated with Claude Code