[prompt-clustering] Copilot Agent Prompt Clustering — 1,000 PRs (May 29–Jun 18) #40038

2026-06-18T11:34:49Z

github-actions[bot]
Bot Jun 18, 2026

Summary

Analysis period: 2026-05-29 → 2026-06-18 (~3 weeks) · PRs analyzed: 1,000 · Clusters: 8 · Overall merge rate: 79.8%

Clustering of 1,000 Copilot-agent PR descriptions (TF-IDF + K-means, k chosen at 8) surfaces 8 coherent task families mapped to gh-aw subsystems. Merge rates are healthy across the board (74–93%) with one clear weak spot: auto-triggered "[WIP] Fix failing GitHub Actions job" PRs merge only 61% — the lowest of any group and ~19 points below average.

Full analysis report

Method & data notes

Source: copilot-prs.json — 1,000 PRs authored by app/copilot-swe-agent in github/gh-aw.
Text: PR title + body (the agent's own task description), cleaned of code blocks, URLs, issue refs, and snake_case identifiers.
Model: TfidfVectorizer(1–2 grams, min_df=3, max_df=0.6, sublinear_tf) → KMeans(k=8). Silhouette is low (0.040) — expected for short, topically-overlapping engineering text — so clusters are thematic tendencies, not hard partitions.
⚠️ Result cap: the search returned exactly 1,000 PRs (the ceiling), so the window is compressed to ~3 weeks rather than a full 30 days. Volume peaked at 471 PRs in the week of Jun 1.
⚠️ No turn/cost/duration metrics: the cached full-PR data (pr-full-data/, PRs 30577–31xxx) is stale and has zero overlap with the current window (35747–40007). Per-PR comment/review/commit counts and workflow turn counts were therefore unavailable; success is measured by merge outcome only.

General insights

Most common work: code correctness, linters & refactors (28%) and schema/compiler/docs (23%) — together half of all agent PRs are core-engineering hygiene.
Highest success: firewall/infra & path-mount fixes — 93.3% merged. Small, well-scoped, deterministic patches.
Lowest success: CI-failure auto-fix [WIP] PRs — 61.1% merged. These are reactive, broadly-scoped "make the red job green" tasks and are the clearest optimization target.
Weekly merge rate drifts down as volume rose: 85% (wk May 25) → 80% → 78% → 77% (wk Jun 15).

Clusters (largest first)

#	Theme	Size	Merge rate	Top keywords
C7	Code correctness, linters & refactors	280 (28.0%)	80.7%	error, behavior, path, coverage, ci, package
C6	Schema / compiler / docs & validation	232 (23.2%)	78.9%	schema, field, compiler, validation, frontmatter
C1	Prompts, safe-outputs & guidance tuning	195 (19.5%)	77.4%	prompt, output, safe, guidance, tool
C0	AI-credit / token budgeting & reporting	128 (12.8%)	83.6%	ai credits, aic, token, budget, effective
C2	Copilot SDK driver & permissions	60 (6.0%)	73.3%	copilot sdk, driver, permission, session, mode
C4	Firewall / infra & path-mount fixes	45 (4.5%)	93.3%	domains, firewall, smoke, bind mount, paths
C5	Sous-chef / agent-config & length refactors	42 (4.2%)	81.0%	sous chef, gpt mini, timeout, function-length
C3	CI-failure auto-fix (WIP)	18 (1.8%)	61.1%	failing github actions job, wip, plan

Representative PRs

PR	Cluster	Title
#40007	CI-failure auto-fix (WIP)	[WIP] Fix failing GitHub Actions job Integration: Workflow Features
#40001	Code correctness & refactor	safe-outputs: validate extracted base branch with git check-ref-format
#40000	AI-credit/token budgeting	feat: add disclosure-footer shorthand for AI authorship disclosure
#39999	Schema/compiler/docs	fix: correct stale 1 MB default for safe-outputs max-patch-size
#39990	Code correctness & refactor	rawloginlib: resolve log package identity via types to fix shadowing
#39966	Prompts & safe-outputs	Reduce ambient context: compress workflow prompts to cut token overhead
#39959	Copilot SDK driver	Improve Copilot harness classification for opaque exitCode=1 failures
#39950	Firewall/infra & paths	fix: add /tmp/gh-aw bind mount to safeoutputs MCP container
#39940	Copilot SDK driver	isolate copilot_sdk_driver test session writes (false-positive denials)
#39933	Sous-chef & config refactor	fix: increase sendAndWait timeout in sample SDK drivers 60s→10min
#39123	AI-credit/token budgeting	Skip daily AI-credit guardrail for user-initiated/command runs
#38397	CI-failure auto-fix (WIP)	[WIP] Fix failing GitHub Actions job 'js-typecheck'

Key findings

Scoped > reactive. The two extremes tell one story: tightly-scoped deterministic patches (firewall/path/mount, 93%) succeed; broadly-scoped reactive "fix the red CI job" tasks (61%) fail most. Task specificity predicts merge.
Half the fleet is engineering hygiene. 51% of agent PRs are code-correctness, linters, refactors, schema and docs — the agent is being used heavily as a maintenance multiplier, and it lands those reliably (~79%).
Merge rate erodes under volume. As weekly throughput climbed past ~470 PRs, merge rate slipped from 85% to 77%, hinting at reviewer saturation or thinner-quality tail at high volume.

Recommendations

Tighten the CI-auto-fix prompt (C3). Have the WIP "fix failing job" workflow first emit a diagnosis + minimal-scope plan before editing, and constrain it to the failing package/job rather than open-ended repair. This is the single highest-leverage fix (61% → target 80%+).
Templatize the high-success pattern (C4). Firewall/path/mount fixes merge at 93% because they're small and deterministic. Encourage that shape elsewhere: smaller diffs, explicit acceptance criteria in the prompt.
Restore turn/cost telemetry. Re-point the full-PR cache at the current PR window (or match PRs→workflow runs by branch/timestamp) so future runs can correlate merge outcome with turn count and cost — the most actionable missing dimension this run.

Generated by Prompt Clustering Analysis — Run §27755055618

Generated by 📊 Copilot Agent Prompt Clustering Analysis · ◷

expires on Jun 19, 2026, 3:34 AM UTC-08:00

2026-06-19T11:40:42Z

github-actions[bot]
Bot Jun 19, 2026
Author

This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis.

A newer discussion is available at Discussion #40288.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering — 1,000 PRs (May 29–Jun 18) #40038

Uh oh!

{{title}}

Uh oh!

Method & data notes

General insights

Clusters (largest first)

Representative PRs

Key findings

Recommendations

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering — 1,000 PRs (May 29–Jun 18) #40038

Uh oh!

github-actions[bot] Bot Jun 18, 2026

Summary

Method & data notes

General insights

Clusters (largest first)

Representative PRs

Key findings

Recommendations

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Jun 19, 2026 Author

github-actions[bot]
Bot Jun 18, 2026

github-actions[bot]
Bot Jun 19, 2026
Author