[prompt-clustering] Copilot Agent Prompt Clustering — Daily Analysis (2026-06-13) #39060

2026-06-13T10:58:06Z

github-actions[bot]
Bot Jun 13, 2026

Summary

NLP clustering of 991 Copilot-agent task prompts (PRs from the last 30 days, window 2026-05-26 → 2026-06-13). The agent's overall merge (success) rate held steady at 78.5%, in line with the trailing two-week average of ~79%.

Metric	Value
Tasks analyzed	991
Merged / Closed / Open	778 / 206 / 7
Overall success (merge) rate	78.5%
Clusters (k, silhouette)	10 (sil=0.0604)
Avg commits / comments per PR	3.88 / 3.44
Avg files changed / additions	35.75 / 2062

Key Findings

Most common work is Workflow & Prompt Engineering — 177 tasks (17.9% of all prompts) at a 82% merge rate. The agent's workload is dominated by workflow/prompt iteration and internal refactors rather than net-new features.
Highest success: Firewall & Network Domains at 84% (41/49 merged). Tightly-scoped, well-specified tasks (e.g. firewall-domain allowlisting) merge most reliably.
Lowest success: Root-Cause Bug Fixes (failing CI) at 70% (38/54 merged) — root-cause bug fixes on failing CI are the riskiest category, likely because the failure is under-specified at task-creation time.
Most iteration-heavy: Firewall & Network Domains averages 6.18 commits/PR and 153.51 files touched vs the 3.88-commit / 35.75-file global average — version-bump PRs that recompile large numbers of generated .lock.yml outputs drive the most churn.

Success & Effort by Cluster

#	Theme	Tasks	%	Success	Avg commits	Avg files	Top terms
1	Workflow & Prompt Engineering	177	17.9%	82%	3.53	16.78	workflow, prompt, guidance, workflows
2	Code Refactor & Shared Helpers	143	14.4%	76%	3.68	59.52	package, function, string, helpers
3	Token Budgeting & Cost (AIC)	125	12.6%	81%	3.97	30.91	credits, aic, token, effective
4	CI Steps & Job Configuration	124	12.5%	81%	3.61	43.74	step, job, workflow, env
5	Safe-Outputs Tooling	102	10.3%	75%	3.58	16.26	safe, safe output, output, safe outputs
6	Model Aliases & Regression Coverage	93	9.4%	80%	3.39	12.73	model, alias, coverage, regression
7	Engine SDK / Driver & Permissions	79	8.0%	75%	4.7	18.39	sdk, driver, permission, mode
8	Root-Cause Bug Fixes (failing CI)	54	5.4%	70%	2.56	19.94	fix, root cause, actions, root
9	Firewall & Network Domains	49	4.9%	84%	6.18	153.51	domains, firewall, blocked, version
10	Sous-Chef Generated Workflows	45	4.5%	78%	5.76	39.18	sous, chef, sous chef, generated sous

Trend (overall success rate)

Date	Tasks	k	Success	Avg commits
2026-06-02	999	8	80.3%	–
2026-06-05	999	8	75.4%	–
2026-06-06	999	9	78.4%	–
2026-06-07	999	9	79.0%	–
2026-06-08	999	10	79.7%	3.99
2026-06-12	1000	9	78.8%	3.9
2026-06-13	991	10	78.5%	3.88

Success rate has been stable in a 75–80% band for two weeks; no regression or improvement signal this run.

Per-cluster detail & representative PRs

1. Workflow & Prompt Engineering — 146/177 merged (82%), 3.53 commits/PR. Terms: workflow, prompt, guidance, workflows, skill, removed. Ex: #36748 Add portable agentic-workflow-designer skill, route wor...; #36727, #37326.

2. Code Refactor & Shared Helpers — 108/143 merged (76%), 3.68 commits/PR. Terms: package, function, string, helpers, line, behavior. Ex: #36012 Refactor ParseWorkflowFile orchestration into focused h...; #36177, #36144.

3. Token Budgeting & Cost (AIC) — 101/125 merged (81%), 3.97 commits/PR. Terms: credits, aic, token, effective, budget, cost. Ex: #37265 Update safe-output health failure messaging to AI Credits; #37101, #36042.

4. CI Steps & Job Configuration — 100/124 merged (81%), 3.61 commits/PR. Terms: step, job, workflow, env, conclusion, artifact. Ex: #37976 Derive omitted GitHub App owners from effective checkout...; #35270, #37408.

5. Safe-Outputs Tooling — 76/102 merged (75%), 3.58 commits/PR. Terms: safe, safe output, output, safe outputs, outputs, tool. Ex: #37122 Tighten safe-outputs noop contract for prompt-omission sc...; #36901, #36963.

6. Model Aliases & Regression Coverage — 74/93 merged (80%), 3.39 commits/PR. Terms: model, alias, coverage, regression, entries, regression coverage. Ex: #36388 Update 2026-06-02 model inventory: add missing Gemini pre...; #35826, #36226.

7. Engine SDK / Driver & Permissions — 59/79 merged (75%), 4.7 commits/PR. Terms: sdk, driver, permission, mode, command, behavior. Ex: #37322 Fix Copilot SDK headless auth/driver path and tool-permis...; #36538, #36731.

8. Root-Cause Bug Fixes (failing CI) — 38/54 merged (70%), 2.56 commits/PR. Terms: fix, root cause, actions, root, cause, job. Ex: #38397 [WIP] Fix failing GitHub Actions job 'js-typecheck'; #37674, #36647.

9. Firewall & Network Domains — 41/49 merged (84%), 6.18 commits/PR. Terms: domains, firewall, blocked, version, smoke, network. Ex: #37708 [awf] Bump firewall images to v0.25.66 and MCPG to v0.3.24; #35973, #35117.

10. Sous-Chef Generated Workflows — 35/45 merged (78%), 5.76 commits/PR. Terms: sous, chef, sous chef, generated sous, generated sous chef, generated. Ex: #35573 chore: update changeset workflow model to gpt-5.4; #37273, #37162.

Methodology & data quality

Source: pre-fetched Copilot PR data (1000 PRs) + per-PR full data (comments, reviews, commits, files).
Prompt extraction: parsed <issue_title>/<issue_description>, stripped agent preamble, code, URLs, HTML. 991/1000 PRs had a usable prompt (≥40 chars); 9 empty bodies dropped.
Vectorization: TF-IDF, unigrams–trigrams, min_df=3, max_df=0.6, 400 features, extended stop-list.
Clustering: spherical K-means (cosine), k∈[6,10]; k=10 by best sampled silhouette (0.0604). Iteration proxied by commits/PR (turn logs unavailable). Silhouette: k10=0.0604, k6=0.0317, k7=0.0533, k8=0.0464, k9=0.0596 — low scores (short jargon text); treat cluster boundaries as soft.

Recommendations

Tighten root-cause / failing-CI task specs. The lowest-merge cluster (Root-Cause Bug Fixes, 70%) correlates with under-specified failures. Auto-attaching the failing job log, the suspected file, and a reproduction step to these tasks should lift merge rates.
Split large generated/multi-file tasks. The highest-commit clusters (Sous-Chef Generated, Firewall & Domains at 5–6 commits/PR) churn most. Decomposing them into smaller, independently-mergeable units would cut iteration cost.
Lean into well-scoped allowlist/config tasks. Narrow, declarative tasks (firewall domains, model aliases) merge at 80–83% with low effort — good candidates to keep delegating to the agent.
Workload is maintenance-heavy. ~75% of prompts are refactors, CI plumbing, and workflow/prompt tuning. If feature throughput is a goal, consider rebalancing the issue backlog the agent is fed.

References: §27464239047

Generated by Prompt Clustering Analysis (Run: 27464239047)

Generated by 📊 Copilot Agent Prompt Clustering Analysis · 255.7 AIC · ⌖ 19 AIC · ⊞ 14K · ◷

expires on Jun 14, 2026, 2:58 AM UTC-08:00

2026-06-14T11:02:59Z

github-actions[bot]
Bot Jun 14, 2026
Author

This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis.

A newer discussion is available at Discussion #39212.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering — Daily Analysis (2026-06-13) #39060

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering — Daily Analysis (2026-06-13) #39060

Uh oh!

github-actions[bot] Bot Jun 13, 2026

Summary

Key Findings

Success & Effort by Cluster

Trend (overall success rate)

Recommendations

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Jun 14, 2026 Author

github-actions[bot]
Bot Jun 13, 2026

github-actions[bot]
Bot Jun 14, 2026
Author