Prevent agent turn explosion after Madar context injection by mohanagy · Pull Request #321 · mohanagy/madar

mohanagy · 2026-05-25T23:23:21Z

Summary

tighten strict install guidance so agents answer from high/medium-confidence context packs before broad exploration
classify compare/native-agent trace artifacts as reduced exploration vs added context only and surface that in summaries
add FounderCommandCenter auth-flow contrast notes plus regression coverage for the new guidance and trace behavior

Testing

npm run typecheck
npm run build
CI=1 npm run test:run
npm pack --dry-run

Closes #314

Summary by CodeRabbit

New Features
- Enhanced MADAR trace reporting with exploration outcome classifications, tracking whether agents reduced exploration, added context only, or performed broad searches.
- Updated strict profile guidance to clarify when agents answer from initial context packs versus when they expand based on missing context or diagnostic findings.
Documentation
- New benchmark documentation for authentication flow comparison scenarios with interpretation guidance.

Closes #314 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-05-25T23:23:32Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: abd8a8c3-f6c5-44a9-bba7-e8a48e6e671d

📥 Commits

Reviewing files that changed from the base of the PR and between 0dcf344 and b5c50df.

📒 Files selected for processing (10)

README.md
docs/benchmarks/2026-05-25-founder-command-center-auth-flow/README.md
src/infrastructure/compare.ts
src/infrastructure/install-skill-templates.ts
src/infrastructure/install.ts
tests/unit/compare-native-agent.test.ts
tests/unit/compare.test.ts
tests/unit/install-templates.test.ts
tests/unit/install.test.ts
tests/unit/why-madar-doc.test.ts

📝 Walkthrough

Walkthrough

This PR instruments MADAR trace reporting to classify exploration outcomes, generates consistent strict context-pack guidance across installed agents, and documents benchmark evidence of turn-count differences. The changes tie together trace enrichment, guidance generation, installer integration, and validation to address agent turn explosion after context injection.

Changes

MADAR Exploration Outcome & Agent Guidance

Layer / File(s)	Summary
MADAR Trace enrichment with exploration-outcome classification `src/infrastructure/compare.ts`, `tests/unit/compare.test.ts`, `tests/unit/compare-native-agent.test.ts`	`CompareMadarTrace` gains context-pack/focused-follow-up/broad-exploration call counters and classification logic. Tool calls are canonicalized and categorized into three exploration types. Traces now record outcome distribution, and native-agent reports persist MADAR trace details with exploration summaries.
Strict context-pack guidance generation and installation `src/infrastructure/install.ts`, `tests/unit/install.test.ts`	Helper functions generate markdown and plain-text variants of stop-when-confident and expand-only-on-missing-context rules. Rules are integrated into `STRICT_CONTEXT_PACK_MESSAGE` and installer outputs for Claude, Gemini, and Cursor. Console announcements reference generated rule strings. Test suite centralizes rule constants to validate all agent outputs.
Codex SKILL template context-pack decision rule `src/infrastructure/install-skill-templates.ts`, `tests/unit/install-templates.test.ts`	Codex profile gains conditional rule: answer from first `madar pack` if quality/diagnostics thresholds are met; expand only on explicit missing context or user request. Tests validate expected guidance phrases.
Benchmark documentation and validation `README.md`, `docs/benchmarks/2026-05-25-founder-command-center-auth-flow/README.md`, `tests/unit/why-madar-doc.test.ts`	README links to FounderCommandCenter auth-flow benchmark documenting good run (19→4 turns) and bad run (2→19 turns), with validation commands, manual checks, and safe/unsafe claim boundaries. Documentation honesty tests verify benchmark presence and content correctness.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

mohanagy/madar#249: Prior "madar rebrand" PR that restructured compare artifacts and trace types, which this PR builds upon by extending CompareMadarTrace with exploration-outcome fields.
mohanagy/madar#246: Earlier PR modifying compare and template infrastructure that may share overlapping context-pack guidance patterns.

Poem

🐰 Turns once were many, context packs were few,
Now traces whisper: "Stopped here, knew what to do."
Guidance flows like carrots through installed homes,
Agent stops exploring, rests its weary bones. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main objective—preventing agent turn explosion after Madar context injection—which aligns with the primary focus of all changes in the PR.
Description check	✅ Passed	The description covers the three main changes (guidance tightening, trace classification, benchmark notes), includes testing commands matching the template, and references the linked issue (`#314`). All required sections are present and complete.
Linked Issues check	✅ Passed	All acceptance criteria from issue `#314` are addressed: guidance for Claude/Codex/Copilot/Cursor updated [install.ts, install-skill-templates.ts], trace artifacts classify exploration outcomes [compare.ts, compare-native-agent.test.ts], tests added [compare.test.ts], and FounderCommandCenter benchmark documented [docs/benchmarks/README.md].
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#314` objectives: guidance updates, trace classification, benchmark documentation, and regression tests. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-314-agent-turn-explosion

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Prevent turn explosion after context packs

b5c50df

Closes #314 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mohanagy merged commit 15b739f into next May 25, 2026
7 checks passed

mohanagy mentioned this pull request May 25, 2026

[P0] Prevent agent turn explosion after Madar context injection #314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent agent turn explosion after Madar context injection#321

Prevent agent turn explosion after Madar context injection#321
mohanagy merged 1 commit into
nextfrom
issue-314-agent-turn-explosion

mohanagy commented May 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohanagy commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mohanagy commented May 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading