Problem
In the last 14 days, 24 Copilot PRs were opened and closed in [WIP] state — meaning the agent started work, created a PR, but could not complete it. WIP PRs represent the highest-cost session failure mode: the agent consumed a full runner session, opened a draft PR, but delivered zero merged value. The root cause is that agents conduct long exploration phases before their first build or validation step, triggering the documented 5-minute MCP inactivity timeout (context canceled on the final validation call).
Evidence
- Analysis window: 2026-05-11 to 2026-05-25
- Sessions analyzed: 50 workflow runs (822 Copilot PRs in scope)
- Key metrics and examples:
- 24
[WIP]-prefixed PRs closed without merging in the 14-day window (15.4% of 156 total closed-unmerged PRs)
- Representative WIP stalls:
- PR
#34288: "Add 5-minute MCP keepalive heartbeat" — the fix for this exact timeout pattern itself became a WIP stall
- PR
#34639: "Fix failing GitHub Actions job 'agent'" — stalled; a prior identical attempt (#34119, 2026-05-22) also stalled
- PR
#34286: "Fix copilot-harness post-step set_output ENOENT error" — stalled
- PR
#33817: "Refactor to extract progressive disclosure guidelines" — stalled
AGENTS.md already warns about MCP inactivity timeout but agents continue to violate the guidance
Proposed Change
- Checkpoint 1 in workflow prompts: Add an explicit early-validation gate instruction to all agentic workflow prompts: agents must run
make build && make fmt after their first code edit, before any further exploration. This prevents the MCP transport from timing out before the terminal validation.
- Emit a warning step in the
Running Copilot cloud agent workflow that detects if no build step has been called within the first 15 minutes and flags it as a timeout risk.
- Close WIP PRs faster: Add an automatic workflow that comments on PRs that have been in
[WIP] state for more than 4 hours with a checklist of the most common root causes (MCP timeout, missing build step, context size exceeded).
Expected Impact
- Reduce WIP PR rate from 15.4% toward <5% of closed-unmerged PRs
- Recover approximately 10–15 agent sessions per 14-day cycle that currently produce zero output
- Reduce reviewer noise from orphaned WIP PRs
Notes
- Distinct root cause category: late validation / MCP inactivity timeout causing incomplete sessions
- Data quality: No
events.jsonl logs were available; WIP counts are derived from PR title prefix patterns
Generated by ⚡ Copilot Opt · sonnet46 2.5M · ◷
Problem
In the last 14 days, 24 Copilot PRs were opened and closed in
[WIP]state — meaning the agent started work, created a PR, but could not complete it. WIP PRs represent the highest-cost session failure mode: the agent consumed a full runner session, opened a draft PR, but delivered zero merged value. The root cause is that agents conduct long exploration phases before their first build or validation step, triggering the documented 5-minute MCP inactivity timeout (context canceledon the final validation call).Evidence
[WIP]-prefixed PRs closed without merging in the 14-day window (15.4% of 156 total closed-unmerged PRs)#34288: "Add 5-minute MCP keepalive heartbeat" — the fix for this exact timeout pattern itself became a WIP stall#34639: "Fix failing GitHub Actions job 'agent'" — stalled; a prior identical attempt (#34119, 2026-05-22) also stalled#34286: "Fix copilot-harness post-step set_output ENOENT error" — stalled#33817: "Refactor to extract progressive disclosure guidelines" — stalledAGENTS.mdalready warns about MCP inactivity timeout but agents continue to violate the guidanceProposed Change
make build && make fmtafter their first code edit, before any further exploration. This prevents the MCP transport from timing out before the terminal validation.Running Copilot cloud agentworkflow that detects if no build step has been called within the first 15 minutes and flags it as a timeout risk.[WIP]state for more than 4 hours with a checklist of the most common root causes (MCP timeout, missing build step, context size exceeded).Expected Impact
Notes
events.jsonllogs were available; WIP counts are derived from PR title prefix patterns