[prompt-clustering] Copilot Agent Prompt Clustering — 2026-06-30 (1,000 PRs, 7 clusters, 80.7% merged) #42465
Replies: 2 comments
-
|
This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis. A newer discussion is available at Discussion #42710. |
Beta Was this translation helpful? Give feedback.
-
|
Smoke bot grunt. Run 28513491479 done poke. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Analysis Period: Last 30 days (2026-06-08 → 2026-06-30) · Tasks Analyzed: 1,000 Copilot agent PRs · Clusters: 7 · Overall Merge Rate: 80.7%
NLP clustering (TF-IDF + K-means, k selected by silhouette) of the task prompts in 1,000 PRs authored by
app/copilot-swe-agentsurfaces 7 coherent task families. Volume is dominated by feature/schema additions and agentic-workflow tuning; the lowest merge rate sits in engine/harness runtime fixes (71.8%) — the hardest, most iteration-heavy category. Dependency bumps and firewall/smoke-test PRs are nearly auto-merged (96.4%).Full clustering report — themes, success rates, examples, data table, recommendations
Methodology
max_features=600,min_df=3,max_df=0.6.k=7chosen by silhouette over k∈[3,7] (monotonic: 0.022→0.041). Silhouette is low in absolute terms (expected for short, vocabulary-overlapping engineering prompts), but term/example inspection confirms the clusters are thematically coherent.aw_info.jsonmetrics don't map to them. The cached full-PR dataset (pr-full-data/) is stale (May, PR #30xxx) and does not cover the current Improvelenstringzeroprecision forlen(string)aliases in zero-comparisons #37750–Allowbrandingfield inaw.ymlpackage manifests #42454 range, so comment/review/file-change counts were not enriched. Analysis is therefore prompt-text + outcome + temporal only.Cluster Analysis
1. Schema / Manifest / Feature additions — 327 tasks (32.7%), 81.7% merged
Largest family. Adding fields/properties to JSON schemas, manifest support, new CLI commands, dashboard features, docs. Keywords:
schema, adds, changes, updated, workflow, docs, files, test, command. Bread-and-butter additive work — solid, near-average merge rate.2. Agentic workflow & prompt tuning — 218 tasks (21.8%), 79.4% merged
Authoring/editing
.github/workflowsagentic specs, prompt guidance, turn budgets, model selection, reviewer path-gating. Keywords:workflow, prompt, agent, guidance, tool, output. Meta-work on the agentic system itself; slightly below-average merge rate (prompt/policy choices are more contested → more closures).3. Engine / harness runtime fixes — 181 tasks (18.1%), 71.8% merged (lowest)
The hardest category: API routing (Responses API, provider mapping), harness retry loops, sandbox EACCES, HTTP 400 surfacing, TPM exhaustion. Keywords:
step, failure, job, copilot, env, detection, awf. Lowest merge rate and the most open PRs (4) — these are deep, stateful runtime bugs that need real reproduction.4. Refactor / linters / code quality — 141 tasks (14.1%), 84.4% merged
Deduplication, custom linters/analyzers, function relocation, largefunc cleanup, panic→error. Keywords:
error, analyzer, sites, flagged, function, helper. High merge rate — mechanical, well-scoped, low-risk.5. Sous-chef generated PRs — 59 tasks (5.9%), 84.7% merged
PRs produced by the "sous-chef" agentic workflow (long, structured bodies; avg ~2,100 chars). Keywords:
sous chef, chef, pr, aic, chef run. Self-generated improvements with above-average acceptance.6. Dependency bumps & firewall/smoke-test — 56 tasks (5.6%), 96.4% merged (highest)
Version bumps (firewall, Claude Code, Codex, mcpg), firewall/smoke-test changesets. Longest bodies (avg ~3,800 chars — verbose smoke-test summaries). Keywords:
claude, smoke, domains, firewall, blocked. Near-automatic acceptance — routine, low-judgment, high-confidence.7. CI job auto-fixes (WIP) — 18 tasks (1.8%), 77.8% merged
Auto-generated "[WIP] Fix failing GitHub Actions job ..." PRs. Shortest bodies (avg ~410 chars). Keywords:
actions, fix, job, github actions, logs, root cause. Small reactive category.Merge Rate by Cluster
Temporal Trend
Merge rate is stable across the window (78–85% per 5-day bucket), with a mild peak (85.0%) around 2026-06-21. No degradation or improvement trend; volume is heaviest in the final week.
Representative PRs (3 per cluster)
brandingfield inaw.ymlpackage manifestsskillssupport with activation-time `gh s...Key Findings
Recommendations
References:
Generated by Prompt Clustering Analysis (Run: 28438505290)
Beta Was this translation helpful? Give feedback.
All reactions