[P1] Add evidence blocks to Madar MCP responses by mohanagy · Pull Request #348 · mohanagy/madar

mohanagy · 2026-05-27T12:38:22Z

Closes #339

Summary

add deterministic top-level evidence blocks across the relevant Madar MCP response surfaces
update install guidance/templates to gate exploration on evidence.agent_directive and its three literal branches
record agent_directive_seen in compare traces and document the response-shape mapping

Testing

npm run typecheck
npm run build
CI=1 npm run test:run

Summary by CodeRabbit

New Features
- MCP responses now include evidence blocks containing pack confidence, coverage status, and agent directives (answer_from_pack, verify_one_targeted_file, explore_with_caution) to guide exploration.
- Response shapes extended with context-pack data (claims, expandable references, coverage, missing context).
- Agent directives tracked during tool execution traces.
Documentation
- Added MCP response shape documentation defining evidence fields and mapping rules.
Tests
- Added tests validating evidence blocks in MCP responses and documentation content.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-05-27T12:38:37Z

📝 Walkthrough

Walkthrough

This PR implements the MCP response evidence shape from issue #339: every Madar MCP tool response now carries a deterministic top-level evidence block that exposes pack_confidence, coverage, agent_directive, missing_phases, and covered_workflow_owners fields. Install rules and agent routing templates are updated to explicitly gate exploration scope on evidence.agent_directive instead of internal confidence floats, making the rules mechanically enforceable by the agent.

Changes

MCP Response Evidence Shape and Agent Directives

Layer / File(s)	Summary
Evidence type system and scoring logic `src/runtime/mcp-response-evidence.ts`, `docs/mcp-response-shape.md`	Defines `MadarResponsePackConfidence` (`high`/`medium`/`low`), `MadarResponseCoverage` (`complete`/`partial`/`unknown`), and `MadarResponseAgentDirective` (`answer_from_pack`/`verify_one_targeted_file`/`explore_with_caution`) types. Implements confidence scoring from coverage data with penalties for missing required/semantic items, coverage status derivation (unknown→absent, complete→full coverage, partial→some gap), and directive selection logic (complete+high→answer, else not-unknown+not-low→verify, else→caution). `buildMadarResponseEvidence` factory deduplicates and normalizes inputs, caps workflow owners to 5, and returns fully populated evidence. Documentation specifies the deterministic mapping and operational meaning of each directive.
Madar trace directive extraction and recording `src/infrastructure/compare.ts`, `tests/unit/compare.test.ts`	`CompareMadarTraceTurnSummary` adds optional `agent_directive_seen` array field. New trace-parsing helpers normalize Anthropic message shapes (`content` vs `message.content`), parse tool-result payloads (including JSON-string payloads), and extract deduplicated agent directives from tool-result evidence when the tool is recognized as a Madar trace tool. `extractMadarTrace` refactored to accumulate directives per turn and merge into existing per-turn entries with deduplication. Test verifies directive extraction and persistence in `madar_trace.per_turn`.
MCP tool response evidence wiring `src/runtime/stdio/tools.ts`	Imports `buildMadarResponseEvidence` and `ContextPackExecutionPhase`. Implements `collectWorkflowOwners` (deduplicated owner extraction, capped to 5), `missingPhasesFromPayload` (derives phases from answer_contract or execution_slice shapes), and evidence builders for retrieve, path-based, impact, and graph-summary payloads. Attaches `evidence` to all tool responses: `context_pack` (delta and full), `retrieve`, `context_prompt`, `relevant_files`, `feature_map`, `risk_map`, `implementation_checklist`, `graph_summary`, `impact`, and `pr_impact`. Context pack responses include evidence from coverage plus derived missing phases and workflow owners.
Result shape extensions for context-pack data `src/runtime/implementation-checklist.ts`, `src/runtime/risk-map.ts`	`ImplementationChecklistResult` and `RiskMapResult` extended to include `claims`, `expandable`, and optional `coverage`/`missing_context`. Implementations updated to populate these fields from `feature` and `retrieve` results, conditionally spreading coverage data when present. Type-only imports added for context-pack structures.
Install rule and template evidence-driven guidance `src/infrastructure/install.ts`, `src/infrastructure/install-skill-templates.ts`	Strict instruction rule builders (`strictNonMadarMcpRule`, `strictSkillOverrideRule`, `strictContextPackStopRule`, `strictContextPackExpandRule`) updated to reference `evidence.agent_directive` with explicit meanings: `answer_from_pack` (use pack, read ≤1 file, no search), `verify_one_targeted_file` (use pack, read 1 specific file, no search), `explore_with_caution` (pack partial, allow 1 targeted Glob/Grep per directory). Codex template updated to enforce stricter restrictions on broad searches after high/medium-confidence packs and tie deeper exploration to `explore_with_caution` with `missing_context`/`missing_semantic` constraints.
Install rule and documentation test coverage `tests/unit/install.test.ts`, `tests/unit/install-templates.test.ts`, `tests/unit/mcp-response-shape-doc.test.ts`, `tests/unit/stdio-server.test.ts`	Strict instruction test constants updated to assert `evidence.agent_directive` wording and revised exploration/expansion rules. Codex template test verifies directive branches and all three directive identifiers present. MCP response shape doc test verifies required headings, field identifiers, and coverage-to-confidence mappings. Comprehensive tool integration test invokes all MCP tools and asserts each includes `evidence` with expected field types and formats.

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

mohanagy/madar#347: Refactors Madar trace extraction in src/infrastructure/compare.ts with new per-turn/tool-result classification logic that feeds the directive extraction added by this PR.
mohanagy/madar#345: Adds snippet and budget controls to retrieve MCP tool response construction in src/runtime/stdio/tools.ts, which is wired together with the evidence block added here.
mohanagy/madar#321: Updates "when to stop vs expand exploration" behavior in src/infrastructure/install.ts and trace reporting in src/infrastructure/compare.ts, directly complementing this PR's directive-driven install rules.

Poem

🐰 The packs now speak in signals clear,
"High confidence, answer without fear!"
Or "One deep read to verify,"
Or "Careful search, don't run too dry."
Three simple directives set the way,
Agents know exactly what to say!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly summarizes the main change: adding evidence blocks to Madar MCP responses, which is the core objective of the PR.
Description check	✅ Passed	The PR description covers the summary section and testing section with validation steps, but is missing explicit checkboxes for the required testing items in the template.
Linked Issues check	✅ Passed	The PR fully implements the acceptance criteria from issue `#339`: evidence blocks with pack_confidence/coverage/agent_directive are added to all specified MCP responses, deterministic mapping is documented, schema tests verify the evidence field, install templates reference evidence.agent_directive with three branches, and madar_trace records agent_directive_seen.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#339` objectives: documentation of response shape, evidence block implementation across MCP tools, install rule updates, trace recording, and test coverage—no out-of-scope changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-339-madar-response-contract

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/runtime/stdio/tools.ts (1)

1206-1211: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add evidence to context_pack review/impact responses.

Line 1211 and Line 1241 return context_pack payloads without top-level evidence. That leaves strict clients without evidence.agent_directive for these two task modes.

Suggested patch

-        const payload = {
+        const payload = {
           ...contextPackBasePayload(task, prompt, resolvedBudget, graphPath, plan),
           pack: compactPack,
           ...reviewMetadata,
+          evidence: buildMadarResponseEvidence({
+            coverage: prResult.review_bundle.coverage,
+            missingPhases: missingPhasesFromPayload(prResult.review_bundle),
+            coveredWorkflowOwners: collectWorkflowOwners(prResult.changed_files),
+          }),
         }
         return helpers.ok(id, helpers.textToolResult(JSON.stringify(payload)))
@@
         return helpers.ok(id, helpers.textToolResult(JSON.stringify({
           ...contextPackBasePayload(task, prompt, resolvedBudget, graphPath, initialPlan),
           target: impactTarget,
           pack: impactPack,
           ...metadata,
+          evidence: buildMadarResponseEvidence({
+            coverage: metadata.coverage,
+            coveredWorkflowOwners: collectWorkflowOwners(
+              impactResult.target_file ? [impactResult.target_file] : [],
+              impactResult.affected_files,
+              impactResult.direct_dependents.map((entry) => entry.source_file),
+              impactResult.transitive_dependents.map((entry) => entry.source_file),
+            ),
+          }),
         })))

Also applies to: 1236-1241

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/runtime/stdio/tools.ts` around lines 1206 - 1211, The payload returned
for the context_pack review/impact responses (the local variable payload built
from contextPackBasePayload, compactPack, and reviewMetadata and then returned
via helpers.ok(id, helpers.textToolResult(JSON.stringify(payload)))) must
include a top-level evidence field; update those return sites (the one that
builds payload with compactPack and the other similar return around lines
1236-1241) to merge in an evidence object that at minimum contains
agent_directive (e.g., evidence: { agent_directive: <appropriate directive
string> }) so strict clients receive evidence.agent_directive; keep using the
same payload variable and ensure the modified payload is JSON-stringified in the
helpers.textToolResult call.

🧹 Nitpick comments (1)

tests/unit/stdio-server.test.ts (1)

827-913: ⚡ Quick win

Expand this contract test to include pr_impact and context_prompt evidence.

Nice coverage for most tools, but this shared evidence assertion currently skips two evidence-enabled surfaces from the PR contract, so regressions there can slip through.

Suggested patch

       const calls = {
@@
         impact: await Promise.resolve(handleStdioRequest(graphPath, {
           id: 107,
           method: 'tools/call',
           params: {
             name: 'impact',
             arguments: {
               label: 'AuthService',
               depth: 3,
             },
           },
         })),
+        context_prompt: await Promise.resolve(handleStdioRequest(graphPath, {
+          id: 108,
+          method: 'tools/call',
+          params: {
+            name: 'context_prompt',
+            arguments: {
+              prompt: 'How does auth reach transport?',
+              provider: 'gemini',
+            },
+          },
+        })),
       } as const

Then add the same evidence assertions for a pr_impact response in the existing pr_impact test block (that fixture already sets up git state).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/stdio-server.test.ts` around lines 827 - 913, Add two missing tool
calls to the test's calls fixture and assert their evidence fields: update the
calls object (where handleStdioRequest is invoked for context_pack/retrieve/...)
to include entries for 'context_prompt' and 'pr_impact' (use the same pattern:
id, method 'tools/call', params.name set to 'context_prompt' or 'pr_impact' and
appropriate params.arguments); then, in the existing pr_impact test block (the
test that already sets up git state), add the same evidence assertions used for
other evidence-enabled tools to validate the 'evidence' structure on the
pr_impact response. Ensure you reference the same response shapes used for
'context_pack'/'retrieve' assertions so coverage matches the other tools.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/runtime/stdio/tools.ts`:
- Around line 1206-1211: The payload returned for the context_pack review/impact
responses (the local variable payload built from contextPackBasePayload,
compactPack, and reviewMetadata and then returned via helpers.ok(id,
helpers.textToolResult(JSON.stringify(payload)))) must include a top-level
evidence field; update those return sites (the one that builds payload with
compactPack and the other similar return around lines 1236-1241) to merge in an
evidence object that at minimum contains agent_directive (e.g., evidence: {
agent_directive: <appropriate directive string> }) so strict clients receive
evidence.agent_directive; keep using the same payload variable and ensure the
modified payload is JSON-stringified in the helpers.textToolResult call.

---

Nitpick comments:
In `@tests/unit/stdio-server.test.ts`:
- Around line 827-913: Add two missing tool calls to the test's calls fixture
and assert their evidence fields: update the calls object (where
handleStdioRequest is invoked for context_pack/retrieve/...) to include entries
for 'context_prompt' and 'pr_impact' (use the same pattern: id, method
'tools/call', params.name set to 'context_prompt' or 'pr_impact' and appropriate
params.arguments); then, in the existing pr_impact test block (the test that
already sets up git state), add the same evidence assertions used for other
evidence-enabled tools to validate the 'evidence' structure on the pr_impact
response. Ensure you reference the same response shapes used for
'context_pack'/'retrieve' assertions so coverage matches the other tools.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: d5a0d69b-ba15-4b10-b401-99b52c6f8624

📥 Commits

Reviewing files that changed from the base of the PR and between ef71158 and 79d968b.

📒 Files selected for processing (13)

docs/mcp-response-shape.md
src/infrastructure/compare.ts
src/infrastructure/install-skill-templates.ts
src/infrastructure/install.ts
src/runtime/implementation-checklist.ts
src/runtime/mcp-response-evidence.ts
src/runtime/risk-map.ts
src/runtime/stdio/tools.ts
tests/unit/compare.test.ts
tests/unit/install-templates.test.ts
tests/unit/install.test.ts
tests/unit/mcp-response-shape-doc.test.ts
tests/unit/stdio-server.test.ts

Add MCP evidence blocks for Madar responses

79d968b

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

mohanagy merged commit 43bdad7 into next May 27, 2026
7 checks passed

coderabbitai Bot mentioned this pull request May 27, 2026

Fix runtime evidence wiring for issue 353 #355

Merged

mohanagy mentioned this pull request May 27, 2026

[Release] 0.27.0 stable — CHANGELOG and README punchlist #358

Closed

7 tasks

This was referenced May 28, 2026

Tighten runtime pack confidence gating #393

Merged

feat: add pack governance receipts #448

Merged

Implement scoped graph freshness surfaces #478

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P1] Add evidence blocks to Madar MCP responses#348

[P1] Add evidence blocks to Madar MCP responses#348
mohanagy merged 1 commit into
nextfrom
issue-339-madar-response-contract

mohanagy commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 27, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohanagy commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohanagy commented May 27, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading