Skip to content

[P1] Add evidence blocks to Madar MCP responses#348

Merged
mohanagy merged 1 commit into
nextfrom
issue-339-madar-response-contract
May 27, 2026
Merged

[P1] Add evidence blocks to Madar MCP responses#348
mohanagy merged 1 commit into
nextfrom
issue-339-madar-response-contract

Conversation

@mohanagy
Copy link
Copy Markdown
Owner

@mohanagy mohanagy commented May 27, 2026

Closes #339

Summary

  • add deterministic top-level evidence blocks across the relevant Madar MCP response surfaces
  • update install guidance/templates to gate exploration on evidence.agent_directive and its three literal branches
  • record agent_directive_seen in compare traces and document the response-shape mapping

Testing

  • npm run typecheck
  • npm run build
  • CI=1 npm run test:run

Summary by CodeRabbit

  • New Features

    • MCP responses now include evidence blocks containing pack confidence, coverage status, and agent directives (answer_from_pack, verify_one_targeted_file, explore_with_caution) to guide exploration.
    • Response shapes extended with context-pack data (claims, expandable references, coverage, missing context).
    • Agent directives tracked during tool execution traces.
  • Documentation

    • Added MCP response shape documentation defining evidence fields and mapping rules.
  • Tests

    • Added tests validating evidence blocks in MCP responses and documentation content.

Review Change Stack

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

This PR implements the MCP response evidence shape from issue #339: every Madar MCP tool response now carries a deterministic top-level evidence block that exposes pack_confidence, coverage, agent_directive, missing_phases, and covered_workflow_owners fields. Install rules and agent routing templates are updated to explicitly gate exploration scope on evidence.agent_directive instead of internal confidence floats, making the rules mechanically enforceable by the agent.

Changes

MCP Response Evidence Shape and Agent Directives

Layer / File(s) Summary
Evidence type system and scoring logic
src/runtime/mcp-response-evidence.ts, docs/mcp-response-shape.md
Defines MadarResponsePackConfidence (high/medium/low), MadarResponseCoverage (complete/partial/unknown), and MadarResponseAgentDirective (answer_from_pack/verify_one_targeted_file/explore_with_caution) types. Implements confidence scoring from coverage data with penalties for missing required/semantic items, coverage status derivation (unknown→absent, complete→full coverage, partial→some gap), and directive selection logic (complete+high→answer, else not-unknown+not-low→verify, else→caution). buildMadarResponseEvidence factory deduplicates and normalizes inputs, caps workflow owners to 5, and returns fully populated evidence. Documentation specifies the deterministic mapping and operational meaning of each directive.
Madar trace directive extraction and recording
src/infrastructure/compare.ts, tests/unit/compare.test.ts
CompareMadarTraceTurnSummary adds optional agent_directive_seen array field. New trace-parsing helpers normalize Anthropic message shapes (content vs message.content), parse tool-result payloads (including JSON-string payloads), and extract deduplicated agent directives from tool-result evidence when the tool is recognized as a Madar trace tool. extractMadarTrace refactored to accumulate directives per turn and merge into existing per-turn entries with deduplication. Test verifies directive extraction and persistence in madar_trace.per_turn.
MCP tool response evidence wiring
src/runtime/stdio/tools.ts
Imports buildMadarResponseEvidence and ContextPackExecutionPhase. Implements collectWorkflowOwners (deduplicated owner extraction, capped to 5), missingPhasesFromPayload (derives phases from answer_contract or execution_slice shapes), and evidence builders for retrieve, path-based, impact, and graph-summary payloads. Attaches evidence to all tool responses: context_pack (delta and full), retrieve, context_prompt, relevant_files, feature_map, risk_map, implementation_checklist, graph_summary, impact, and pr_impact. Context pack responses include evidence from coverage plus derived missing phases and workflow owners.
Result shape extensions for context-pack data
src/runtime/implementation-checklist.ts, src/runtime/risk-map.ts
ImplementationChecklistResult and RiskMapResult extended to include claims, expandable, and optional coverage/missing_context. Implementations updated to populate these fields from feature and retrieve results, conditionally spreading coverage data when present. Type-only imports added for context-pack structures.
Install rule and template evidence-driven guidance
src/infrastructure/install.ts, src/infrastructure/install-skill-templates.ts
Strict instruction rule builders (strictNonMadarMcpRule, strictSkillOverrideRule, strictContextPackStopRule, strictContextPackExpandRule) updated to reference evidence.agent_directive with explicit meanings: answer_from_pack (use pack, read ≤1 file, no search), verify_one_targeted_file (use pack, read 1 specific file, no search), explore_with_caution (pack partial, allow 1 targeted Glob/Grep per directory). Codex template updated to enforce stricter restrictions on broad searches after high/medium-confidence packs and tie deeper exploration to explore_with_caution with missing_context/missing_semantic constraints.
Install rule and documentation test coverage
tests/unit/install.test.ts, tests/unit/install-templates.test.ts, tests/unit/mcp-response-shape-doc.test.ts, tests/unit/stdio-server.test.ts
Strict instruction test constants updated to assert evidence.agent_directive wording and revised exploration/expansion rules. Codex template test verifies directive branches and all three directive identifiers present. MCP response shape doc test verifies required headings, field identifiers, and coverage-to-confidence mappings. Comprehensive tool integration test invokes all MCP tools and asserts each includes evidence with expected field types and formats.

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • mohanagy/madar#347: Refactors Madar trace extraction in src/infrastructure/compare.ts with new per-turn/tool-result classification logic that feeds the directive extraction added by this PR.
  • mohanagy/madar#345: Adds snippet and budget controls to retrieve MCP tool response construction in src/runtime/stdio/tools.ts, which is wired together with the evidence block added here.
  • mohanagy/madar#321: Updates "when to stop vs expand exploration" behavior in src/infrastructure/install.ts and trace reporting in src/infrastructure/compare.ts, directly complementing this PR's directive-driven install rules.

Poem

🐰 The packs now speak in signals clear,
"High confidence, answer without fear!"
Or "One deep read to verify,"
Or "Careful search, don't run too dry."
Three simple directives set the way,
Agents know exactly what to say!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly summarizes the main change: adding evidence blocks to Madar MCP responses, which is the core objective of the PR.
Description check ✅ Passed The PR description covers the summary section and testing section with validation steps, but is missing explicit checkboxes for the required testing items in the template.
Linked Issues check ✅ Passed The PR fully implements the acceptance criteria from issue #339: evidence blocks with pack_confidence/coverage/agent_directive are added to all specified MCP responses, deterministic mapping is documented, schema tests verify the evidence field, install templates reference evidence.agent_directive with three branches, and madar_trace records agent_directive_seen.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #339 objectives: documentation of response shape, evidence block implementation across MCP tools, install rule updates, trace recording, and test coverage—no out-of-scope changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-339-madar-response-contract

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/runtime/stdio/tools.ts (1)

1206-1211: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add evidence to context_pack review/impact responses.

Line 1211 and Line 1241 return context_pack payloads without top-level evidence. That leaves strict clients without evidence.agent_directive for these two task modes.

Suggested patch
-        const payload = {
+        const payload = {
           ...contextPackBasePayload(task, prompt, resolvedBudget, graphPath, plan),
           pack: compactPack,
           ...reviewMetadata,
+          evidence: buildMadarResponseEvidence({
+            coverage: prResult.review_bundle.coverage,
+            missingPhases: missingPhasesFromPayload(prResult.review_bundle),
+            coveredWorkflowOwners: collectWorkflowOwners(prResult.changed_files),
+          }),
         }
         return helpers.ok(id, helpers.textToolResult(JSON.stringify(payload)))
@@
         return helpers.ok(id, helpers.textToolResult(JSON.stringify({
           ...contextPackBasePayload(task, prompt, resolvedBudget, graphPath, initialPlan),
           target: impactTarget,
           pack: impactPack,
           ...metadata,
+          evidence: buildMadarResponseEvidence({
+            coverage: metadata.coverage,
+            coveredWorkflowOwners: collectWorkflowOwners(
+              impactResult.target_file ? [impactResult.target_file] : [],
+              impactResult.affected_files,
+              impactResult.direct_dependents.map((entry) => entry.source_file),
+              impactResult.transitive_dependents.map((entry) => entry.source_file),
+            ),
+          }),
         })))

Also applies to: 1236-1241

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/stdio/tools.ts` around lines 1206 - 1211, The payload returned
for the context_pack review/impact responses (the local variable payload built
from contextPackBasePayload, compactPack, and reviewMetadata and then returned
via helpers.ok(id, helpers.textToolResult(JSON.stringify(payload)))) must
include a top-level evidence field; update those return sites (the one that
builds payload with compactPack and the other similar return around lines
1236-1241) to merge in an evidence object that at minimum contains
agent_directive (e.g., evidence: { agent_directive: <appropriate directive
string> }) so strict clients receive evidence.agent_directive; keep using the
same payload variable and ensure the modified payload is JSON-stringified in the
helpers.textToolResult call.
🧹 Nitpick comments (1)
tests/unit/stdio-server.test.ts (1)

827-913: ⚡ Quick win

Expand this contract test to include pr_impact and context_prompt evidence.

Nice coverage for most tools, but this shared evidence assertion currently skips two evidence-enabled surfaces from the PR contract, so regressions there can slip through.

Suggested patch
       const calls = {
@@
         impact: await Promise.resolve(handleStdioRequest(graphPath, {
           id: 107,
           method: 'tools/call',
           params: {
             name: 'impact',
             arguments: {
               label: 'AuthService',
               depth: 3,
             },
           },
         })),
+        context_prompt: await Promise.resolve(handleStdioRequest(graphPath, {
+          id: 108,
+          method: 'tools/call',
+          params: {
+            name: 'context_prompt',
+            arguments: {
+              prompt: 'How does auth reach transport?',
+              provider: 'gemini',
+            },
+          },
+        })),
       } as const

Then add the same evidence assertions for a pr_impact response in the existing pr_impact test block (that fixture already sets up git state).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/stdio-server.test.ts` around lines 827 - 913, Add two missing tool
calls to the test's calls fixture and assert their evidence fields: update the
calls object (where handleStdioRequest is invoked for context_pack/retrieve/...)
to include entries for 'context_prompt' and 'pr_impact' (use the same pattern:
id, method 'tools/call', params.name set to 'context_prompt' or 'pr_impact' and
appropriate params.arguments); then, in the existing pr_impact test block (the
test that already sets up git state), add the same evidence assertions used for
other evidence-enabled tools to validate the 'evidence' structure on the
pr_impact response. Ensure you reference the same response shapes used for
'context_pack'/'retrieve' assertions so coverage matches the other tools.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/runtime/stdio/tools.ts`:
- Around line 1206-1211: The payload returned for the context_pack review/impact
responses (the local variable payload built from contextPackBasePayload,
compactPack, and reviewMetadata and then returned via helpers.ok(id,
helpers.textToolResult(JSON.stringify(payload)))) must include a top-level
evidence field; update those return sites (the one that builds payload with
compactPack and the other similar return around lines 1236-1241) to merge in an
evidence object that at minimum contains agent_directive (e.g., evidence: {
agent_directive: <appropriate directive string> }) so strict clients receive
evidence.agent_directive; keep using the same payload variable and ensure the
modified payload is JSON-stringified in the helpers.textToolResult call.

---

Nitpick comments:
In `@tests/unit/stdio-server.test.ts`:
- Around line 827-913: Add two missing tool calls to the test's calls fixture
and assert their evidence fields: update the calls object (where
handleStdioRequest is invoked for context_pack/retrieve/...) to include entries
for 'context_prompt' and 'pr_impact' (use the same pattern: id, method
'tools/call', params.name set to 'context_prompt' or 'pr_impact' and appropriate
params.arguments); then, in the existing pr_impact test block (the test that
already sets up git state), add the same evidence assertions used for other
evidence-enabled tools to validate the 'evidence' structure on the pr_impact
response. Ensure you reference the same response shapes used for
'context_pack'/'retrieve' assertions so coverage matches the other tools.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d5a0d69b-ba15-4b10-b401-99b52c6f8624

📥 Commits

Reviewing files that changed from the base of the PR and between ef71158 and 79d968b.

📒 Files selected for processing (13)
  • docs/mcp-response-shape.md
  • src/infrastructure/compare.ts
  • src/infrastructure/install-skill-templates.ts
  • src/infrastructure/install.ts
  • src/runtime/implementation-checklist.ts
  • src/runtime/mcp-response-evidence.ts
  • src/runtime/risk-map.ts
  • src/runtime/stdio/tools.ts
  • tests/unit/compare.test.ts
  • tests/unit/install-templates.test.ts
  • tests/unit/install.test.ts
  • tests/unit/mcp-response-shape-doc.test.ts
  • tests/unit/stdio-server.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant