[prompt-clustering] Copilot PR Prompt Clustering Analysis — Apr 11–May 1, 2026 (1,000 PRs, 9 clusters) #29476
Replies: 2 comments
-
|
💥 WHOOSH! 🦸♂️ KABLAM! The Claude Smoke Test Agent has descended upon this discussion like a caped crusader from the GitHub Actions cosmos! 🌟 POW! Run §25209689971 complete!
✅ All systems nominal! Claude engine roars to life! WHAM! 💫 — The Smoke Test Agent, swooping through your CI pipeline since 2026 Note 🔒 Integrity filter blocked 1 itemThe following item was blocked because it doesn't meet the GitHub integrity level.
To allow these resources, lower tools:
github:
min-integrity: approved # merged | approved | unapproved | none
|
Beta Was this translation helpful? Give feedback.
-
|
This discussion was automatically closed because it expired on 2026-05-02T08:52:32.146Z.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
TF-IDF + K-Means clustering analysis of 1,000 copilot-created PRs from Apr 11 – May 1, 2026. The analysis surfaces 9 distinct task categories, identifies which work types merge most reliably, and highlights where copilot agents struggle.
Overall merge rate: 77.6% (776 merged / 215 closed-without-merge / 9 still open)
Cluster Summary
Cluster Deep-Dives
C7 — Workflow fixes & docs (264 PRs · 78% merged) — Largest cluster
The dominant work category: general workflow YAML corrections, documentation updates, validation logic, and refactors that don't fit a more specific bucket.
Top terms:
workflow,changes,fix,workflows,docs,validationCharacteristics:
Representative PRs:
docs: add workflow_run trigger examples for DevOps monitoring workflowRefactor duplicated SHA/workflow-spec helpers into shared parserfix: replace deprecated {{#import}} with {{#runtime-import}} in workflow templatesC2 — Agent / engine integration (171 PRs · 74% merged) — Most complex tasks
Work that adds, modifies, or fixes agentic engine support — new engine integrations (OpenCode, Codex, Crush, Gemini), pre-agent steps, copilot workflow wiring.
Top terms:
agent,copilot,step,engine,pre,runCharacteristics:
Representative PRs:
Add missing pre-agent step to contribution-check workflowAdd support for pre-agent-steps before agent executionfeat(workflow-health-manager): reduce token usage ~42% via pre-agent stepC6 — MCP gateway & tools (109 PRs · 81% merged) — High discussion volume
Fixes and features for the MCP gateway layer: transport migrations, container config, OIDC env forwarding, CLI bridge, tool definitions.
Top terms:
mcp,gateway,tools,tool,mcp gateway,cliCharacteristics:
Representative PRs:
fix: migrate mempalace MCP server to HTTP transport for MCP Gateway v0.xfix: add missing container property to HTTP MCP servers for MCP GatewayPrevent Codex MCP gateway startup failures from config.toml self-copyC0 — Daily audit workflows (100 PRs · 75% merged)
Operational workflows: daily audit loops, log downloads, shared reporting utilities, workflow migration from legacy patterns.
Top terms:
daily,audit,workflows,logs,shared,reportCharacteristics:
Representative PRs:
Normalize report formatting guidelines across daily workflow promptsMigrate 24 workflows from daily-audit-discussion + reporting to daily-reportRefactor daily audit import stack into shared daily-audit-base componentC4 — Safe outputs / upload (97 PRs · 77% merged)
Safe output infrastructure: artifact upload improvements, auto-inject features, temporary ID tracking, staging diagnostics.
Top terms:
safe,safe outputs,outputs,safe output,output,uploadCharacteristics:
Representative PRs:
feat: auto-inject create-issue safe output when no safe outputs are expressedfeat: store temporary ID maps to file and include in safe outputs artifactPrevent false safe_outputs failures in Multi-Device Docs TesterC5 — Test & lint fixes (82 PRs · 85% merged) — Fastest to merge
Targeted test additions, integration test corrections, lint violations, test quality improvements.
Top terms:
test,integration,tests,job,lint,failingCharacteristics:
Representative PRs:
Fix lint-go failure from testifylint violations in spec testsfix: use exact YAML line matching for agent job detection in integration testsC3 — Version bumps / AWF constants (67 PRs · 58% merged)⚠️ Lowest merge rate
Dependency version bumps, AWF firewall version updates, constant files, CLI version pinning.
Top terms:
version,awf,constants,firewall,bump,cliCharacteristics:
Representative PRs:
Bump MCPG to v0.3.1 and AWF firewall to v0.25.29chore: bump AWF from v0.25.18 to v0.25.20Bump default AWF to v0.25.25 and MCP Gateway to v0.2.25C8 — PR creation & branching (61 PRs · 89% merged) ⭐ Highest merge rate
Improvements to the
create_pull_requestsafe-output action itself: recreate-ref option, cross-repo checkout, branch management, dot-folder protection.Top terms:
pull,pull request,request,branch,create pull,createCharacteristics:
Representative PRs:
Add recreate-ref option to create_pull_request for reusing an existing reffix: cross-repo create-pull-request checkout ignores triggering branchC1 — Cache / memory (49 PRs · 88% merged)
Cache-memory persistence, path corrections, Q cache state, named cache support.
Top terms:
memory,cache,cache memory,comment memory,run,commentCharacteristics:
Representative PRs:
fix(q): persist cache state to end 100% cache miss streakfix: correct cache-memory paths for named cachesKey Findings
Test and cache/memory tasks are the sweet spot (85–89% merge rate, fewest commits, minimal review friction). Well-scoped, low-ambiguity tasks where the expected output is mechanically verifiable. Investing in better test scaffolding pays off.
Version bump tasks underperform (58% merge rate, 74.9 avg files). The high file count comes from auto-generated constant regeneration — PRs frequently conflict or get superseded. Consider tightening bump workflows to minimize constant churn, or adding a merge-conflict check before opening the PR.
MCP gateway work attracts the most discussion (5.6 avg comments) but still achieves 81% merge — copilot agents handle complex reviewfeedback on MCP changes reliably. The review overhead may be worthwhile for correctness.
Agent / engine integration is the riskiest category: broadest blast radius (32.1 files), lowest merge rate among non-version clusters (74%). These tasks benefit from more granular scoping before dispatch.
PR creation & branching tasks are high-iteration but high-success (89% merged, 5.4 commits). More commits = more feedback loops, but agents resolve them. This suggests copilot handles well-specified behavioral contracts effectively even when the path to solution requires iteration.
Recommendations
Prefer test-fix and cache-fix task shapes when possible — they have the most predictable outcomes (85–89% merge, ~2 commits). Break larger tasks into test-targeted subtasks.
Add a conflict-detection gate to version bump PRs before opening — the 42% closure rate in C3 likely includes many superseded or conflicting bumps. A pre-open check (
git merge-base --is-ancestor) could filter these out early.Scope engine integration tasks more narrowly (C2). Instead of "add OpenCode engine support," dispatch as: (a) add workflow YAML, (b) wire pre-agent steps, (c) add smoke test — each as separate PRs with narrower surface area.
MCP gateway review load is expected — don't optimize it away. The high comment volume (5.6) correlates with correct, mergeable output (81%). The discussion is doing its job.
Cache/memory tasks are a strong ROI signal: 88% merge rate on 49 PRs over 20 days. Cache reliability improvements are landing consistently — this infrastructure investment is working.
Methodology
app/copilot-swe-agent, Apr 11 – May 1, 2026References:
Beta Was this translation helpful? Give feedback.
All reactions