[prompt-clustering] Copilot Agent Prompt Clustering — 2026-06-24 (1,000 PRs, 8 clusters) #41210

2026-06-24T11:02:43Z

github-actions[bot]
Bot Jun 24, 2026

Summary

Analysis Period: last 30 days (2026-06-03 → 2026-06-24)
Tasks Analyzed: 1,000 Copilot coding-agent PRs (app/copilot-swe-agent)
Clusters Identified: 8 (TF-IDF + K-means, title weighted 3×)
Overall Merge (Success) Rate: 80.0% (790 merged · 198 closed · 12 open)

Copilot agent work in github/gh-aw splits into one large general-engineering bucket plus seven tight domain clusters tracking gh-aw's own subsystems (safe-outputs, firewall, AI-credit guardrails, Copilot SDK, model resolution). Merge rates are remarkably uniform (78–82%) across every theme — outcome is driven less by what the task is than by execution. The one clear underperformer is WIP "fix failing Actions job" tasks at 68%.

Key Findings

Throughput is healthy and consistent. 80% of agent PRs merge, and seven of eight clusters fall in a narrow 78–82% band. Task category is a weak predictor of success — the pipeline handles refactors, infra plumbing, and reporting workflows about equally well.
One catch-all dominates volume. Cluster C4 (general refactors / schemas / tests / docs) is 36% of all tasks. Its keywords (schema, docs, test, string, remove, coverage) show heavy investment in code hygiene and reinvention cleanup (e.g. [WIP] Refactor inline string-truncation reinventions and file-existence idioms #41191 truncation/os.Stat consolidation).
The rest of the work mirrors gh-aw's architecture. Distinct, coherent clusters map 1:1 to subsystems: agent-context/prompt tuning (22%), AI-credit & token guardrails (13%), safe-outputs/MCP (9%), Copilot SDK/driver (8%), firewall/AWF network-isolation (6%), and model-resolution audits (4%).
"Fix the red build" is the weak spot. C0 — WIP tasks targeting failing GitHub Actions / integration jobs — is the smallest cluster (19 tasks, 2%) yet the only one below the pack at 68% merge rate, and several titles repeat near-identically ([WIP] Fix failing GitHub Actions job 'Integration: Workflow Features' #41153, [WIP] Fix failing GitHub Actions job for CLI Docker build #37886 both "Fix failing job Integration: Workflow Features"). Diagnostic/CI-repair tasks are the hardest and most retried.

Cluster Analysis (8 clusters, full detail)

Merge rate by cluster (sorted)

Cluster	Theme	Tasks	Share	Merge rate	Closed	Open
C4	General code refactors, schemas, tests & docs	355	36%	82%	62	8
C5	Model resolution & experiments	39	4%	82%	7	1
C3	Firewall / network-isolation (AWF)	63	6%	81%	12	1
C7	AI-credit / token usage guardrails	133	13%	80%	27	0
C6	Agent context, prompts & reporting workflows	220	22%	79%	47	0
C2	Copilot SDK / driver & permissions	79	8%	78%	17	0
C1	Safe-outputs & MCP plumbing	92	9%	78%	20	2
C0	WIP: failing Actions / integration jobs	19	2%	68%	6	0

Themes, keywords & representative PRs

C4 — General code refactors, schemas, tests & docs (355, 82% merged): schema, docs, test, repo, string, remove, coverage · ex docs: spec audit — add github README, update fileutil/constants/timeutil/tty specs #38848, docs(cost-management): replace all tables with headers and lists #38224, Replace non-idiomatic len(string) == 0 checks flagged by lenstringzero #38015
C6 — Agent context, prompts & reporting workflows (220, 79% merged): context, failure, guidance, prompt, agent, step, report · ex Remove redundant python-dataviz imports from daily reporting workflows #41158, Guard startup terminal probing on Windows when stderr is redirected #37823, Align mcp/secrets unknown-subcommand behavior with root CLI errors #37935
C7 — AI-credit / token usage guardrails (133, 80% merged): aic, credits, usage, token, daily, guardrail, max · ex Backfill firewall activity reports from usage artifact domain aggregates #41046, Lower daily credit-limit guardrail test to 1 AI credit #37631, Preserve agent AIC in create-issue footer breakdown #37464
C1 — Safe-outputs & MCP plumbing (92, 78% merged): safe outputs, safe output, outputs, mcp, tool · ex Add logging to publish-safe-outputs-node scripts #39085, [aw] Make spec-librarian reliably emit a safe output #37321, Tighten safe-outputs noop contract for prompt-omission scenarios #37122
C2 — Copilot SDK / driver & permissions (79, 78% merged): copilot, sdk, driver, permission, auth · ex Add multi-language Copilot SDK driver samples and wire daily workflows to exercise runtime installs #36734, Fix Copilot SDK sample driver BYOK session configuration in Daily Model Inventory workflow #37454, feat: add-wizard prompts Copilot users to choose copilot-requests (org billing) vs PAT #38449
C3 — Firewall / network-isolation (AWF) (63, 81% merged): awf, firewall, domains, bump, blocked · ex chore: bump CLI tool versions (Claude 2.1.178, Copilot 1.0.63, Codex 0.140.0, Pi 0.79.4, GH MCP Server v1.3.0, Playwright v1.61.0) #39624, chore: bump Claude Code 2.1.178→2.1.179, Pi 0.79.4→0.79.6 #39772, Fix AIC usage cache always empty in activation job #39130
C5 — Model resolution & experiments (39, 82% merged): model, experiment, alias, models, sub · ex Add summary_detail A/B experiment to dependabot campaign and support guardrail direction metadata #37563, Switch model refresh to models.dev catalog schema, consume native cost fields, and add dispatch-based refresh PR workflow #37055, Fix workflow test expectations for strict-mode deprecations and daily model inventory prompt text #37128
C0 — WIP: failing Actions / integration jobs (19, 68% merged): failing actions, actions job, wip failing, integration · ex [WIP] Fix failing GitHub Actions job 'Integration: Workflow Features' #41153, [WIP] Fix failing GitHub Actions job Integration: Workflow Misc Part 2 #38265, [WIP] Fix the failing GitHub Actions job build #37674

Methodology & limitations

Source: copilot-prs.json — 1,000 PRs authored by app/copilot-swe-agent, created 2026-06-03 → 2026-06-24. All had non-empty bodies (avg 1,562 chars).
Text: PR title (conventional-commit/[WIP] prefixes stripped, weighted 3×) + body with code fences, inline code, URLs, HTML, markdown and checkboxes removed; letters only, 1–2 char tokens dropped.
Vectorization: TF-IDF, 1–2 grams, max_features=600, min_df=3, max_df=0.5, English + domain stop-words (github, workflow, gh, aw, pkg, fix, add, ...).
Clustering: K-means, k chosen by silhouette over k∈[3,8] → k=8. Silhouette scores are low (0.01–0.03), expected for short engineering text — clusters are thematically coherent (see keywords) but not geometrically well-separated; treat boundaries as soft.
Success metric: "merge rate" = merged ÷ (merged+closed) per cluster, excluding still-open PRs.
Limitation — no turn/cost metrics: these are GitHub Copilot coding-agent PRs, not gh-aw workflow runs, so per-task turn counts, duration, and AIC cost from aw_info.json do not map to individual PRs and were intentionally omitted rather than estimated. The stale pr-full-data/ cache (May, different PR numbers) was not used.

Recommendations

Target CI-repair tasks (C0). The only sub-pack cluster (68%) and visibly repetitive. Give "fix failing job X" prompts the failing run's logs and a reproduction command up front instead of just the job name — these tasks fail for lack of diagnostic context, not capability.
Sub-segment the C4 catch-all. At 36% it hides distinct work types (truncation/idiom refactors vs. schema vs. test coverage). A future run could re-cluster C4 alone for sharper prompt-pattern insights.
Keep the domain workflows as-is. Safe-outputs, firewall, AIC-guardrail and model-resolution clusters all sit at 78–82% — no category-level intervention warranted; gains there are about reducing the ~20% close rate via clearer acceptance criteria, not retargeting.

References: §28092507871

Generated by 📊 Copilot Agent Prompt Clustering Analysis · 131.1 AIC · ⌖ 18.3 AIC · ⊞ 13.2K · ◷

expires on Jun 25, 2026, 3:02 AM UTC-08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering — 2026-06-24 (1,000 PRs, 8 clusters) #41210

Uh oh!

{{title}}

Uh oh!

Merge rate by cluster (sorted)

Themes, keywords & representative PRs

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering — 2026-06-24 (1,000 PRs, 8 clusters) #41210

Uh oh!

github-actions[bot] Bot Jun 24, 2026

Summary

Key Findings

Merge rate by cluster (sorted)

Themes, keywords & representative PRs

Recommendations

Replies: 0 comments

github-actions[bot]
Bot Jun 24, 2026