Skip to content

Prevent agent turn explosion after Madar context injection#321

Merged
mohanagy merged 1 commit into
nextfrom
issue-314-agent-turn-explosion
May 25, 2026
Merged

Prevent agent turn explosion after Madar context injection#321
mohanagy merged 1 commit into
nextfrom
issue-314-agent-turn-explosion

Conversation

@mohanagy
Copy link
Copy Markdown
Owner

@mohanagy mohanagy commented May 25, 2026

Summary

  • tighten strict install guidance so agents answer from high/medium-confidence context packs before broad exploration
  • classify compare/native-agent trace artifacts as reduced exploration vs added context only and surface that in summaries
  • add FounderCommandCenter auth-flow contrast notes plus regression coverage for the new guidance and trace behavior

Testing

  • npm run typecheck
  • npm run build
  • CI=1 npm run test:run
  • npm pack --dry-run

Closes #314

Summary by CodeRabbit

  • New Features

    • Enhanced MADAR trace reporting with exploration outcome classifications, tracking whether agents reduced exploration, added context only, or performed broad searches.
    • Updated strict profile guidance to clarify when agents answer from initial context packs versus when they expand based on missing context or diagnostic findings.
  • Documentation

    • New benchmark documentation for authentication flow comparison scenarios with interpretation guidance.

Review Change Stack

Closes #314

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: abd8a8c3-f6c5-44a9-bba7-e8a48e6e671d

📥 Commits

Reviewing files that changed from the base of the PR and between 0dcf344 and b5c50df.

📒 Files selected for processing (10)
  • README.md
  • docs/benchmarks/2026-05-25-founder-command-center-auth-flow/README.md
  • src/infrastructure/compare.ts
  • src/infrastructure/install-skill-templates.ts
  • src/infrastructure/install.ts
  • tests/unit/compare-native-agent.test.ts
  • tests/unit/compare.test.ts
  • tests/unit/install-templates.test.ts
  • tests/unit/install.test.ts
  • tests/unit/why-madar-doc.test.ts

📝 Walkthrough

Walkthrough

This PR instruments MADAR trace reporting to classify exploration outcomes, generates consistent strict context-pack guidance across installed agents, and documents benchmark evidence of turn-count differences. The changes tie together trace enrichment, guidance generation, installer integration, and validation to address agent turn explosion after context injection.

Changes

MADAR Exploration Outcome & Agent Guidance

Layer / File(s) Summary
MADAR Trace enrichment with exploration-outcome classification
src/infrastructure/compare.ts, tests/unit/compare.test.ts, tests/unit/compare-native-agent.test.ts
CompareMadarTrace gains context-pack/focused-follow-up/broad-exploration call counters and classification logic. Tool calls are canonicalized and categorized into three exploration types. Traces now record outcome distribution, and native-agent reports persist MADAR trace details with exploration summaries.
Strict context-pack guidance generation and installation
src/infrastructure/install.ts, tests/unit/install.test.ts
Helper functions generate markdown and plain-text variants of stop-when-confident and expand-only-on-missing-context rules. Rules are integrated into STRICT_CONTEXT_PACK_MESSAGE and installer outputs for Claude, Gemini, and Cursor. Console announcements reference generated rule strings. Test suite centralizes rule constants to validate all agent outputs.
Codex SKILL template context-pack decision rule
src/infrastructure/install-skill-templates.ts, tests/unit/install-templates.test.ts
Codex profile gains conditional rule: answer from first madar pack if quality/diagnostics thresholds are met; expand only on explicit missing context or user request. Tests validate expected guidance phrases.
Benchmark documentation and validation
README.md, docs/benchmarks/2026-05-25-founder-command-center-auth-flow/README.md, tests/unit/why-madar-doc.test.ts
README links to FounderCommandCenter auth-flow benchmark documenting good run (19→4 turns) and bad run (2→19 turns), with validation commands, manual checks, and safe/unsafe claim boundaries. Documentation honesty tests verify benchmark presence and content correctness.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • mohanagy/madar#249: Prior "madar rebrand" PR that restructured compare artifacts and trace types, which this PR builds upon by extending CompareMadarTrace with exploration-outcome fields.
  • mohanagy/madar#246: Earlier PR modifying compare and template infrastructure that may share overlapping context-pack guidance patterns.

Poem

🐰 Turns once were many, context packs were few,
Now traces whisper: "Stopped here, knew what to do."
Guidance flows like carrots through installed homes,
Agent stops exploring, rests its weary bones. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main objective—preventing agent turn explosion after Madar context injection—which aligns with the primary focus of all changes in the PR.
Description check ✅ Passed The description covers the three main changes (guidance tightening, trace classification, benchmark notes), includes testing commands matching the template, and references the linked issue (#314). All required sections are present and complete.
Linked Issues check ✅ Passed All acceptance criteria from issue #314 are addressed: guidance for Claude/Codex/Copilot/Cursor updated [install.ts, install-skill-templates.ts], trace artifacts classify exploration outcomes [compare.ts, compare-native-agent.test.ts], tests added [compare.test.ts], and FounderCommandCenter benchmark documented [docs/benchmarks/README.md].
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #314 objectives: guidance updates, trace classification, benchmark documentation, and regression tests. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-314-agent-turn-explosion

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant