You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Analysis window: 2026-04-22 → 2026-05-11 (996 of 1000 Copilot agent PRs after cleaning)
Clusters: 12 (TF-IDF + KMeans, silhouette=0.020)
Overall merge rate: 78.7%
Median PR size: 4 files / +98/-17 lines
WIP/draft PRs: 28
Key findings
Workflow YAML & lock-file edits dominate volume — Cluster C2 (Workflow YAML & lock-file changes) is the largest at 195 PRs (20% of all work), driven by routine .github/workflows/*.yml and lock-file regeneration tasks.
AWF config tasks are the hardest — C3 (AWF config emission) has a merge rate of only 47% vs. the 79% baseline — these PRs also have the highest median diff size (+205 lines) and most files changed (7), suggesting they're cross-cutting and harder for the agent to land cleanly.
Experiments & cache-memory work lands reliably — C5 (A/B experiments framework) reaches 90% merge rate; these are well-scoped, single-subsystem changes.
High-quality plumbing changes ship — Go-internals (C7, 89%) and cache-memory (C4, 89%) merge at well above baseline despite being technical, indicating the agent handles narrowly-scoped engineering tasks well.
Version bumps churn — Version-bump / model-inventory work (C9) merges at only 71% despite being mechanical, likely because of overlapping/superseded bump PRs.
Cluster overview
Cluster
Theme
# PRs
Share
Merge
Closed
Med. files
Med. +adds
Top keywords
C2
Workflow YAML & lock-file changes
195
20%
73%
26%
3
68
github, workflows, run, yml
C7
Go package internals (pkg/*)
135
14%
89%
11%
3
61
pkg, string, engine, test
C6
Safe-outputs .cjs handlers
113
11%
86%
13%
4
100
cjs, safe, safe outputs, outputs
C10
MCP servers & gateway
94
9%
83%
17%
5
74
mcp, cli, tool, gateway
C9
Version bumps & model inventory
82
8%
71%
29%
10
216
com, version, copilot, constants
C0
Agent token & turn optimization
67
7%
79%
21%
3
95
turns, agent, turn, tokens
C4
Cache-memory subsystem
61
6%
89%
11%
2
50
cache, cache-memory, memory, gh-aw
C8
Shared workflow imports
61
6%
72%
28%
5
267
shared, workflows, import, apm
C3
AWF config emission
51
5%
47%
53%
7
205
awf, awf config, config, models
C11
Docs site (docs/src)
49
5%
78%
22%
3
82
docs, src, docs src, reference
C1
PR branch / checkout plumbing
48
5%
83%
17%
3
167
pull request, pull, request, branch
C5
A/B experiments framework
40
4%
90%
10%
4
200
experiment, experiments, variant, state
Daily volume by cluster
Cluster space (2D projection)
Per-cluster detail (themes, top terms, representative PRs)
Dataset: 1,000 most recent PRs authored by app/copilot-swe-agent in github/gh-aw (the search cap), spanning 2026-04-22 → 2026-05-11.
Prompt proxy: PR title + cleaned body. Where present, the explicit Original prompt block is extracted; otherwise body content is used, which describes the resulting work and is semantically correlated with the original task.
Cleaning: stripped Copilot suffix metadata, the firewall-block <details> warning (otherwise its /usr/bin, goinsecure, gomodcache, triggering command tokens dominate the vocabulary), code blocks, HTML tags, URLs, markdown checkboxes, and quoted lines.
Features: TF-IDF over 1–2 grams with min_df=4, max_df=0.6, English stopwords + a small custom list (fix, add, update, wip, etc.).
Clustering: KMeans across k=3..15; chose k=12 as the largest k still within 0.001 of the maximum silhouette in the [3..12] range — favors interpretable splits without over-fragmenting. Note that silhouette values are small (≈0.02) because TF-IDF vectors are very high-dimensional and sparse — silhouette underestimates separation in that regime; the chosen clusters are nevertheless semantically coherent, as the representative PRs show.
Cluster labels: assigned manually after inspecting top TF-IDF terms and the five PRs closest to each centroid.
Limitations:
Workflow metrics not joined: no per-PR turn counts / cost / duration are available in this run — those would require correlating workflow runs to PRs, which is fragile without an explicit join key. The agentic-workflows MCP server's logs tool could supply them in a future run for engine=copilot workflows that produce PRs.
The 1,000-PR search cap is the only ceiling — the actual window (~19 days) fits comfortably inside the requested 30-day target.
Workflow YAML changes (C2, 195 PRs, 73% merge) are the biggest lever — even a few-point lift on this cluster outweighs improvements anywhere else. The security-fix sub-theme (curl | bash replacements) is over-represented in the closed PRs there.
Version-bump churn (C9, 71% merge) — most of the failures here look like superseded bumps. Worth adding a check that flags or auto-closes prior open bump PRs when a newer one supersedes them.
Keep doing what works for C4/C5/C7 — narrow, single-subsystem tasks (cache-memory, experiments, Go internals) all sit at ≥85% merge and should be the template for prompt patterns elsewhere.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Key findings
.github/workflows/*.ymland lock-file regeneration tasks.Cluster overview
Daily volume by cluster
Cluster space (2D projection)
Per-cluster detail (themes, top terms, representative PRs)
C2 — Workflow YAML & lock-file changes
C7 — Go package internals (pkg/*)
any→[]stringinline conversions, removetoStringSlice, consolidate filename sanitizationC6 — Safe-outputs .cjs handlers
update_issuesafe outputC10 — MCP servers & gateway
engine.mcp.session-timeoutC9 — Version bumps & model inventory
C0 — Agent token & turn optimization
C4 — Cache-memory subsystem
C8 — Shared workflow imports
shared/daily-audit-chartscomposite importC3 — AWF config emission
apiProxy.maxEffectiveTokensin generated AWF configC11 — Docs site (docs/src)
C1 — PR branch / checkout plumbing
push_to_pull_request_branchby scoping git ops to target repocwdC5 — A/B experiments framework
experimentscommand to read experiment state from storage repo branchesMethodology
app/copilot-swe-agentingithub/gh-aw(the search cap), spanning 2026-04-22 → 2026-05-11.Original promptblock is extracted; otherwise body content is used, which describes the resulting work and is semantically correlated with the original task.<details>warning (otherwise its/usr/bin,goinsecure,gomodcache,triggering commandtokens dominate the vocabulary), code blocks, HTML tags, URLs, markdown checkboxes, and quoted lines.min_df=4,max_df=0.6, English stopwords + a small custom list (fix,add,update,wip, etc.).Limitations:
agentic-workflowsMCP server'slogstool could supply them in a future run for engine=copilot workflows that produce PRs.Recommendations
curl | bashreplacements) is over-represented in the closed PRs there.References:
Generated by Copilot Agent Prompt Clustering Analysis
Beta Was this translation helpful? Give feedback.
All reactions