Skip to content

feat: add end-to-end integration tests with mock agent backend (#11) (#11)#28

Merged
jafreck merged 6 commits intomainfrom
cadre/issue-11
Feb 23, 2026
Merged

feat: add end-to-end integration tests with mock agent backend (#11) (#11)#28
jafreck merged 6 commits intomainfrom
cadre/issue-11

Conversation

@jafreck
Copy link
Copy Markdown
Owner

@jafreck jafreck commented Feb 22, 2026

Summary

This PR adds end-to-end integration tests for the full CADRE pipeline using a mock agent backend and mock platform provider, so there is no dependency on real GitHub credentials or network calls. A GitHub Actions workflow is also added so e2e tests run automatically on every push and pull request.

Closes #11

Changes

  • tests/e2e-pipeline.test.ts: New e2e test suite exercising the real IssueOrchestrator through four scenarios (happy path, retry, blocked task, resume) using an inline E2ELauncher and MockPlatformProvider. CommitManager is mocked via vi.mock to avoid real git operations.
  • tests/helpers/mock-agent-launcher.ts: Reusable MockAgentLauncher helper with per-agent and per-task handler registration and configurable failure injection.
  • tests/helpers/mock-platform-provider.ts: In-memory MockPlatformProvider implementing the full PlatformProvider interface without real credentials.
  • tests/e2e-workflow.test.ts: Unit tests validating the GitHub Actions workflow YAML file contents (11 test cases).
  • .github/workflows/e2e.yml: New workflow that triggers on push and pull_request, installs with npm ci, sets CADRE_E2E=1, and runs npm run test:e2e with a 10-minute timeout.

Implementation Details

The e2e tests wire IssueOrchestrator with real CheckpointManager and real filesystem I/O under os.tmpdir(), while replacing the two external boundaries (agent execution and platform API) with fast, deterministic in-process stubs. Each test creates a unique temp directory and cleans up in afterEach. The E2ELauncher writes synthetic Markdown outputs that match the schemas ResultParser expects, so all five orchestrator phases execute normally.

Testing

  • Happy path: 1 issue, 2-task plan, all agents succeed → result.success === true, 5 phases all pass, pr-content.md written to disk
  • Retry path: code-writer fails on first attempt for task-001, succeeds on second → result.success === true
  • Blocked task: 1 of 3 tasks always fails (exceeds maxRetriesPerTask) → pipeline still returns result.success === true, blocked task visible in checkpoint state
  • Resume: first run completes phases 1–2 then stops; second run skips those phases → token usage for phases 1 and 2 is 0
  • All 4 e2e scenario tests pass (npx vitest run tests/e2e-pipeline.test.ts)
  • 11 workflow validation tests pass (tests/e2e-workflow.test.ts)
  • All other existing unit tests continue to pass

Integration Verification

  • Install: pass
  • Build: pass
  • Tests: 209 of 210 pass — 1 pre-existing failure in tests/github-issues.test.ts (unrelated to this PR; that test expects the old get_issue MCP tool name but the implementation was updated to issue_read)

Notes

  • The single failing test (GitHubAPI > getIssue > should fetch issue details via MCP) pre-dates this PR's changes. It asserts callTool('get_issue', ...) but the implementation now calls callTool('issue_read', { method: 'get', ... }). A fix-surgeon result file documenting the needed fix is included in the diff but the test file itself was not corrected in this PR to keep the change minimal.
  • The e2e tests do not cover the budget-exceeded scenario (tokens exceed budget → graceful halt); the issue listed it as one of five options and "at least 3" were required. The four implemented scenarios exceed the acceptance criteria.
  • Node version is hardcoded to 22 in the workflow (no .nvmrc found in the repo).

Cadre Process Challenges

This section is required for all CADRE-generated PRs (dogfooding data).
Document honestly what was difficult, confusing, or error-prone when CADRE processed this issue.

  • Issue clarity: The issue listed five test scenarios but said "at least 3 are required" without specifying which three are mandatory. This forced the implementation agent to make an arbitrary choice and resulted in a mismatch with the resume scenario acceptance criterion (the plan said to use dryRun: true to pause at phase 2, but it's unclear whether IssueOrchestrator actually supports that semantics).
  • Agent contracts: The MockAgentLauncher helper and the inline E2ELauncher class duplicated some logic. The planner requested a separate tests/helpers/mock-agent-launcher.ts file (task-001) and also a test that used its own inline launcher (task-003), which created confusion about which launcher the e2e tests should actually use.
  • Context limitations: Analysis noted that no file tree was provided, so the exact locations for helper files and the shapes of checkpoint/cost-report data structures had to be inferred from source code. The codebase-scout phase helped, but the agent still had to make guesses that led to some back-and-forth.
  • Git/worktree: Mocking CommitManager via vi.mock required knowing the exact module path at write time. Any path mismatch silently fails (the mock doesn't apply), which is hard to diagnose. A cleaner dependency-injection approach in IssueOrchestrator would make this simpler.
  • Parsing/output: The synthetic implementation-plan output in the e2e tests must exactly match the schema that ResultParser.parseImplementationPlan expects. Getting the heading format and dependency syntax right required iterating — minor deviations caused silent parse failures producing 0 tasks.
  • Retry behavior: No agent retries were needed in this run, but the fix-surgeon was invoked to address the pre-existing github-issues.test.ts failure. The fix-surgeon result file was committed to the worktree rather than automatically updating the test, which is an odd artifact.
  • Overall: The biggest friction in this run was the ambiguity between using the standalone MockAgentLauncher helper (task-001) vs. the inline E2ELauncher in the test file itself (task-003), and the lack of a clear spec for what ResultParser expects from synthetic plan output. Both caused multiple implementation iterations.

Closes #11

@jafreck jafreck marked this pull request as ready for review February 23, 2026 00:07
@jafreck jafreck merged commit 15c84ae into main Feb 23, 2026
2 checks passed
@jafreck jafreck added the cadre-generated Pull request automatically generated by cadre label Feb 23, 2026
@jafreck jafreck deleted the cadre/issue-11 branch February 25, 2026 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cadre-generated Pull request automatically generated by cadre

Projects

None yet

Development

Successfully merging this pull request may close these issues.

End-to-end integration tests with mock agent backend

1 participant