[prompt-clustering] Copilot PR Prompt Clustering Analysis — Apr 11–May 1, 2026 (1,000 PRs, 9 clusters) #29476

2026-05-01T08:52:32Z

github-actions[bot]
Bot May 1, 2026

Overview

TF-IDF + K-Means clustering analysis of 1,000 copilot-created PRs from Apr 11 – May 1, 2026. The analysis surfaces 9 distinct task categories, identifies which work types merge most reliably, and highlights where copilot agents struggle.

Overall merge rate: 77.6% (776 merged / 215 closed-without-merge / 9 still open)

Cluster Summary

#	Theme	PRs	Share	Merged	Merge Rate	Avg Files	Avg Commits	Avg Comments
C7	Workflow fixes & docs	264	26.4%	205	78%	14.1	3.3	2.7
C2	Agent / engine integration	171	17.1%	127	74%	32.1	3.6	2.8
C6	MCP gateway & tools	109	10.9%	88	81%	34.1	4.5	5.6
C0	Daily audit workflows	100	10.0%	75	75%	24.2	3.0	2.6
C4	Safe outputs / upload	97	9.7%	75	77%	22.4	3.3	2.7
C5	Test & lint fixes	82	8.2%	70	85%	11.3	2.0	1.1
C3	Version bumps / AWF	67	6.7%	39	58%	74.9	3.2	4.5
C8	PR creation & branching	61	6.1%	54	89%	11.6	5.4	4.8
C1	Cache / memory	49	4.9%	43	88%	16.5	4.2	3.2

Cluster Deep-Dives

C7 — Workflow fixes & docs (264 PRs · 78% merged) — Largest cluster

The dominant work category: general workflow YAML corrections, documentation updates, validation logic, and refactors that don't fit a more specific bucket.

Top terms: workflow, changes, fix, workflows, docs, validation

Characteristics:

Widest breadth — catch-all for single-file YAML tweaks through multi-package refactors
Moderate complexity (avg 14.1 files, 3.3 commits)
Low comment volume (2.7) → reviewers rarely push back hard

Representative PRs:

#29060 docs: add workflow_run trigger examples for DevOps monitoring workflow
#27108 Refactor duplicated SHA/workflow-spec helpers into shared parser
#29399 fix: replace deprecated {{#import}} with {{#runtime-import}} in workflow templates

C2 — Agent / engine integration (171 PRs · 74% merged) — Most complex tasks

Work that adds, modifies, or fixes agentic engine support — new engine integrations (OpenCode, Codex, Crush, Gemini), pre-agent steps, copilot workflow wiring.

Top terms: agent, copilot, step, engine, pre, run

Characteristics:

Highest avg file count (32.1) after version bumps → broad surface area
Lowest merge rate in top-5 clusters (74%) → reviewers find more issues with engine changes
3.6 commits avg: agents iterate more on these

Representative PRs:

#25968 Add missing pre-agent step to contribution-check workflow
#26666 Add support for pre-agent-steps before agent execution
#28968 feat(workflow-health-manager): reduce token usage ~42% via pre-agent step

C6 — MCP gateway & tools (109 PRs · 81% merged) — High discussion volume

Fixes and features for the MCP gateway layer: transport migrations, container config, OIDC env forwarding, CLI bridge, tool definitions.

Top terms: mcp, gateway, tools, tool, mcp gateway, cli

Characteristics:

Highest avg comments (5.6) — MCP changes attract the most review discussion
Highest avg file count among non-version clusters (34.1)
High merge rate (81%) despite complexity → copilot agents handle MCP work reliably

Representative PRs:

#28288 fix: migrate mempalace MCP server to HTTP transport for MCP Gateway v0.x
#28324 fix: add missing container property to HTTP MCP servers for MCP Gateway
#27582 Prevent Codex MCP gateway startup failures from config.toml self-copy

C0 — Daily audit workflows (100 PRs · 75% merged)

Operational workflows: daily audit loops, log downloads, shared reporting utilities, workflow migration from legacy patterns.

Top terms: daily, audit, workflows, logs, shared, report

Characteristics:

Mid-range complexity (24.2 files, 3.0 commits)
75% merge rate — slightly below average; daily workflow refactors sometimes need multiple passes

Representative PRs:

#28649 Normalize report formatting guidelines across daily workflow prompts
#28151 Migrate 24 workflows from daily-audit-discussion + reporting to daily-report
#26654 Refactor daily audit import stack into shared daily-audit-base component

C4 — Safe outputs / upload (97 PRs · 77% merged)

Safe output infrastructure: artifact upload improvements, auto-inject features, temporary ID tracking, staging diagnostics.

Top terms: safe, safe outputs, outputs, safe output, output, upload

Characteristics:

Consistent complexity (22.4 files, 3.3 commits)
Merge rate matches the overall average (77%)

Representative PRs:

#29270 feat: auto-inject create-issue safe output when no safe outputs are expressed
#25828 feat: store temporary ID maps to file and include in safe outputs artifact
#27195 Prevent false safe_outputs failures in Multi-Device Docs Tester

C5 — Test & lint fixes (82 PRs · 85% merged) — Fastest to merge

Targeted test additions, integration test corrections, lint violations, test quality improvements.

Top terms: test, integration, tests, job, lint, failing

Characteristics:

Smallest scope (11.3 files, 2.0 commits — fewest of any cluster)
Lowest comment volume (1.1) → reviewers accept quickly
85% merge rate → high signal: well-scoped test tasks succeed reliably
Near-zero review friction — the ideal copilot task shape

Representative PRs:

#26686 Fix lint-go failure from testifylint violations in spec tests
#25780 fix: use exact YAML line matching for agent job detection in integration tests

C3 — Version bumps / AWF constants (67 PRs · 58% merged) ⚠️ Lowest merge rate

Dependency version bumps, AWF firewall version updates, constant files, CLI version pinning.

Top terms: version, awf, constants, firewall, bump, cli

Characteristics:

By far the most files touched (avg 74.9 files — regenerated constant files dominate)
Lowest merge rate: 58% — only cluster below 70%
Moderate comments (4.5): human reviewers often intervene on timing/correctness
Many PRs in this cluster represent attempted automated bumps that were superseded or conflicted

Representative PRs:

#28952 Bump MCPG to v0.3.1 and AWF firewall to v0.25.29
#26041 chore: bump AWF from v0.25.18 to v0.25.20
#27102 Bump default AWF to v0.25.25 and MCP Gateway to v0.2.25

C8 — PR creation & branching (61 PRs · 89% merged) ⭐ Highest merge rate

Improvements to the create_pull_request safe-output action itself: recreate-ref option, cross-repo checkout, branch management, dot-folder protection.

Top terms: pull, pull request, request, branch, create pull, create

Characteristics:

Highest merge rate: 89%
Most commits per PR (5.4) — these tasks require careful iteration
Higher review discussion (4.8 comments) — but agents resolve feedback and merge
Tight file scope (11.6 files) keeps blast radius manageable despite iteration depth

Representative PRs:

#29153 Add recreate-ref option to create_pull_request for reusing an existing ref
#28395 fix: cross-repo create-pull-request checkout ignores triggering branch

C1 — Cache / memory (49 PRs · 88% merged)

Cache-memory persistence, path corrections, Q cache state, named cache support.

Top terms: memory, cache, cache memory, comment memory, run, comment

Characteristics:

Second-highest merge rate (88%) — well-understood problem space
Moderate complexity (16.5 files, 4.2 commits)
More comments than test tasks (3.2) but agents successfully incorporate feedback

Representative PRs:

#28473 fix(q): persist cache state to end 100% cache miss streak
#28439 fix: correct cache-memory paths for named caches

Key Findings

Test and cache/memory tasks are the sweet spot (85–89% merge rate, fewest commits, minimal review friction). Well-scoped, low-ambiguity tasks where the expected output is mechanically verifiable. Investing in better test scaffolding pays off.
Version bump tasks underperform (58% merge rate, 74.9 avg files). The high file count comes from auto-generated constant regeneration — PRs frequently conflict or get superseded. Consider tightening bump workflows to minimize constant churn, or adding a merge-conflict check before opening the PR.
MCP gateway work attracts the most discussion (5.6 avg comments) but still achieves 81% merge — copilot agents handle complex reviewfeedback on MCP changes reliably. The review overhead may be worthwhile for correctness.
Agent / engine integration is the riskiest category: broadest blast radius (32.1 files), lowest merge rate among non-version clusters (74%). These tasks benefit from more granular scoping before dispatch.
PR creation & branching tasks are high-iteration but high-success (89% merged, 5.4 commits). More commits = more feedback loops, but agents resolve them. This suggests copilot handles well-specified behavioral contracts effectively even when the path to solution requires iteration.

Recommendations

Prefer test-fix and cache-fix task shapes when possible — they have the most predictable outcomes (85–89% merge, ~2 commits). Break larger tasks into test-targeted subtasks.
Add a conflict-detection gate to version bump PRs before opening — the 42% closure rate in C3 likely includes many superseded or conflicting bumps. A pre-open check (git merge-base --is-ancestor) could filter these out early.
Scope engine integration tasks more narrowly (C2). Instead of "add OpenCode engine support," dispatch as: (a) add workflow YAML, (b) wire pre-agent steps, (c) add smoke test — each as separate PRs with narrower surface area.
MCP gateway review load is expected — don't optimize it away. The high comment volume (5.6) correlates with correct, mergeable output (81%). The discussion is doing its job.
Cache/memory tasks are a strong ROI signal: 88% merge rate on 49 PRs over 20 days. Cache reliability improvements are landing consistently — this infrastructure investment is working.

Methodology

Data: 1,000 PRs created by app/copilot-swe-agent, Apr 11 – May 1, 2026
Text: PR title + first 300 chars of cleaned body text
Vectorization: TF-IDF, 300 features, unigrams + bigrams, min_df=3, sublinear TF
Clustering: K-Means, k=9 (selected by silhouette score sweep k=3–9)
Silhouette score: 0.031 (typical for high-dimensional NLP data; cluster coherence verified by manual review of representative PRs)

References:

§25208010776

Generated by Copilot Agent Prompt Clustering Analysis · ● 308.6K · ◷

expires on May 2, 2026, 8:52 AM UTC

2026-05-01T09:43:20Z

github-actions[bot]
Bot May 1, 2026
Author

💥 WHOOSH! 🦸♂️

KABLAM! The Claude Smoke Test Agent has descended upon this discussion like a caped crusader from the GitHub Actions cosmos! 🌟

POW! Run §25209689971 complete!

"With great MCP power comes great smoke test responsibility!"

✅ All systems nominal! Claude engine roars to life! WHAM! 💫

— The Smoke Test Agent, swooping through your CI pipeline since 2026

Note

🔒 Integrity filter blocked 1 item

The following item was blocked because it doesn't meet the GitHub integrity level.

chore: bump CLI/MCP tool versions (Claude Code 2.1.126, Copilot 1.0.39, Codex 0.128.0, Playwright MCP 0.0.72, MCP Gateway v0.3.3) #29484 pull_request_read: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

💥 [THE END] — Illustrated by Smoke Claude · ● 334.1K · ◷

0 replies

2026-05-02T08:59:32Z

github-actions[bot]
Bot May 2, 2026
Author

This discussion was automatically closed because it expired on 2026-05-02T08:52:32.146Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot PR Prompt Clustering Analysis — Apr 11–May 1, 2026 (1,000 PRs, 9 clusters) #29476

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot PR Prompt Clustering Analysis — Apr 11–May 1, 2026 (1,000 PRs, 9 clusters) #29476

Uh oh!

github-actions[bot] Bot May 1, 2026

Overview

Cluster Summary

Cluster Deep-Dives

Key Findings

Recommendations

Methodology

Replies: 2 comments

Uh oh!

github-actions[bot] Bot May 1, 2026 Author

Uh oh!

github-actions[bot] Bot May 2, 2026 Author

github-actions[bot]
Bot May 1, 2026

github-actions[bot]
Bot May 1, 2026
Author

github-actions[bot]
Bot May 2, 2026
Author