Skip to content

[copilot-opt] 21.5% PR closure rate dominated by fix: PRs — pre-flight validation gaps causing wasted sessions #30202

@github-actions

Description

@github-actions

Problem

A significant fraction of Copilot-generated PRs are closed without merging, with the dominant category being fix: PRs. This indicates a systemic gap in pre-flight validation: agent sessions are completing, producing a PR, but the implementation is not passing CI or reviewer standards. Each wasted PR represents a full agent session (typically 20–60 minutes) consumed with no delivered value.

Evidence

  • Analysis window: 2026-04-21 to 2026-05-04
  • Sessions analyzed: 50 sessions (metadata only; no event logs available)
  • Key metrics and examples:
    • 1000 Copilot PRs analyzed in total; 215 closed without merging (21.5% closure rate)
    • Of the 215 closed-unmerged PRs, 165 carry the fix: prefix (77% of all closures)
    • The overall merge success rate is 78.5% (779 merged / 994 non-open PRs)
    • Second-largest closed category: feat: (104 closures visible in title scan)
    • fix: PRs are the highest-volume PR type and the highest-volume closure type, suggesting that quick fix tasks are particularly prone to insufficient validation before a PR is opened
    • Zero session logs were available to diagnose which validation steps (lint, tests, build, fmt) were failing most often

Proposed Change

  1. Enforce make agent-finish as a mandatory pre-PR gate in the Copilot coding agent workflow instruction (AGENTS.md already documents this, but sessions may be skipping it when time-pressured). Add a runtime enforcement step in the agent harness that runs make fmt && make test-unit and blocks create_pull_request if either fails.
  2. Add a lightweight pre-flight check step in the workflow that runs make build && make fmt immediately after the first code edit (not just at the end), so compile errors surface early rather than after long exploration phases.
  3. Instrument closure reasons by adding a tag/label system to closed PRs (e.g., closed:ci-failure, closed:reviewer-rejected) so future optimization runs can triage the 215 closures by actual root cause rather than title prefix heuristics.

Expected Impact

  • Reduce the fix: PR closure rate by catching the most common failure modes (formatting, build, failing tests) before a PR is opened
  • Save 1–3 agent sessions per week currently wasted on PRs that fail CI immediately after opening
  • Enable data-driven triage of future closure events with labeled root causes

Notes

  • Distinct root cause category: late/missing validation strategy
  • Data quality caveats: session logs directory (/tmp/gh-aw/session-data/logs/) was empty; close reasons for individual PRs were not fetched (gh CLI unavailable in this session). The 215-closure count and category breakdown are derived from title-prefix analysis of the full PR dataset. Actual failure mode distribution requires per-PR inspection.

Generated by Copilot Opt · ● 2.5M ·

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions