[prompt-clustering] Copilot Agent Prompt Clustering Analysis — 2026-06-10 #38340
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-06-11T11:26:58.016Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Analysis period: last 30 days (2026-05-24 → 2026-06-10)
Tasks analyzed: 1,000 Copilot coding-agent PRs (
app/copilot-swe-agent)Clusters identified: 8 (KMeans on TF-IDF, 1–2 grams)
Overall success rate: 80% (731 merged / 915 decided; 85 still open)
Eight coherent task themes emerge. A single bucket — workflow infrastructure & JS (
.cjs) test work — accounts for ~39% of all agent tasks, while docs/skills and AI-credit guardrail work are both the smallest-effort and highest-success themes (85%). The genuinely hard work concentrates in two small clusters: firewall/domain allowlist PRs (avg 117 files touched, ~20 comments each) and Copilot SDK driver/harness PRs (most commits-per-PR, heavy iteration).Cluster overview
.cjs)pkg/ CLI / linters refactorsCluster details, top keywords & representative PRs
C1 — Workflow infra & JS tests (
.cjs) · 389 PRs (39%) · 77% successThe dominant catch-all: GitHub Actions setup, the
.cjsruntime scripts, model/engine wiring, and their tests.authHeaderinsandbox.agent.targets· #38152 fixassertTrustedCheckoutRuntimefor bot/app actorsC5 — Docs, skills & markdown workflows · 207 PRs (21%) · 85% success
Documentation,
.mdworkflow definitions, skills, and daily-workflow prompts. Lowest discussion (1.4 comments) and joint-highest success.agentic-workflow-designerskill · #34874 inline skill extraction/runtime · #34941 body hash in lock metadataC4 — Go
pkg/ CLI / linters refactors · 126 PRs (13%) · 77% successCore Go code:
pkg/cli,pkg/workflow, analyzers, linters, dedup/refactors. High file counts (avg 64) but low discussion — mechanical refactors land cleanly.ParseWorkflowFileinto phasesC0 — AI-credit / cost guardrails · 86 PRs (9%) · 85% success
aic, AI-credit resolution,max-daily-ai-creditsguardrails, effective-multiplier work.max-daily-ai-creditsguardrail · #37936 detect guardrail exhaustion from firewall logC6 — Safe-outputs features · 70 PRs (7%) · 77% success
The
safe-outputssubsystem: normalization, targeting,temporary_id,noop.create_check_run· #37469 enforce requiredtemporary_idC3 — Copilot SDK driver & harness · 56 PRs (6%) · 78% success
The experimental
copilot-sdkdriver, harness, stdin wiring, permission/threat-detection. Most iteration-heavy theme (5.4 commits, 5.7 comments per PR).C2 — Firewall / domain allowlists · 44 PRs (4%) · 84% success
Network egress: firewall config, domain allowlists (
googleapis.com, etc.), proxy/cache-mount handling. By far the heaviest PRs — avg 117 files and ~20 comments each.C7 — CI failure fixes (WIP) · 22 PRs (2%) · 73% success
Narrow "fix the failing GitHub Actions job" tasks, almost all titled
[WIP]. Lowest success (73%) and almost no discussion (0.5 comments) — fast, single-purpose, sometimes abandoned.Key findings
[WIP]-tagged; a non-trivial fraction get superseded or abandoned rather than merged.Recommendations
[WIP]so abandoned drafts are distinguishable.Methodology & limitations
<summary>Original prompt</summary>blockquote) → TF-IDF (max 600 feats, 1–2 grams, min_df=3) → KMeans.k=8chosen by silhouette sweep (k=3–8); silhouette is low (~0.04, expected for sparse text) so clusters were validated by thematic coherence of top terms + representative PRs, not the score alone.Original promptblock; for the rest the PR description stands in for the task prompt, so clustering reflects task topic, not verbatim user wording.copilot-swe-agent, which does not emitaw_info.json;gh-aw logswould not map onto them, so comments and commits are used as iteration proxies. Turn-level analysis remains a gap.analyzed-prs.txtfor incremental future runs.References: §27271629188
Beta Was this translation helpful? Give feedback.
All reactions