[prompt-clustering] Copilot Agent Prompt Clustering Analysis #33980
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Copilot Agent Prompt Clustering Analysis. A newer discussion is available at Discussion #34200. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Analyzed 1108 copilot-swe-agent PRs opened in 2026-05-02 → 2026-05-22 (881 merged, 203 closed unmerged, 24 open). TF-IDF on cleaned PR bodies + K-means (k=7, picked by silhouette over k∈[4,9]) identified 7 prompt clusters spanning agent self-improvement, bug triage, compiler-output regeneration, safe-outputs plumbing, and CI failure auto-fixes.
Key findings
Clustering quality
Silhouette scores were tight across k=4–9: k=4:0.030, k=5:0.032, k=6:0.030, k=7:0.035, k=8:0.035, k=9:0.037. k=7 chosen for interpretability (k=9 fragments into clusters of 26–34 PRs without meaningful gain). The general-refactor cluster (C5, 479 PRs ≈ 43% of total) is genuinely heterogeneous — TF-IDF can't separate it further without dropping more generic terms.
Cluster summary table
Per-cluster deep dive (top terms, sample PRs, metrics)
C5 · General refactors, doc updates, small fixes
updated,behavior,workflow,existing,docs,error,shared,coveragefmt.Errorf("%s", str)anti-pattern witherrors.New(str)in pkg/cli (Merged)C0 · Agent self-improvement (prompts, models, sub-agents)
prompt,workflow,run,analysis,model,issue,experiment,reportC1 · Bug fixes (CI failures, stale assertions, lint)
bug,did,bug bug,workflow,job,failed,ci,output$INSTRUCTIONassertion inTestEngineArgsIntegrationCodex(Merged)C4 · AWF compiler / golden file regeneration
awf,workflow,golden,lock,generated,updated,compiled,recompiledC3 · Safe-outputs / review / comment-workflow plumbing
comment,review,run,commit,validation,workflow,sous,sous chefactions: readpermission to smoke-water.yml (#investigate-smoke-water-failure) (Merged)C6 · OpenTelemetry / spans / attributes
span,conclusion,spans,attributes,conclusion span,setup,attribute,envC2 · Auto-generated 'Fix failing GitHub Actions job' PRs
job,analyze logs,failure implement,progress failing,job url,implement check,logs identify,identify rootDaily PR volume by cluster (last 20 days)
2D projection of prompts (TF-IDF → SVD)
Each point is one PR; color = cluster assignment. Overlap is real — the prompt vocabulary is shared across themes, and silhouette confirms only weak separation.
Sample PR table (80 most recent across clusters)
sub_agent_strategyA/B experiment tosmoke-geminiworkflowgemini-3.5-flashET multiplier to model inventoryprompt_compressionA/B experiment andcavemanprompt variant to agent...raptor-minialias coverage and missing GPT-...spec-enforcerprompt with inline small-model sub-agentsoutput_formatA/B experiment to daily-code-metrics workflow.lock.ymlownership, and uni...upload-assetand enforce text-onl...prompt_styleA/B experiment toci-coachwith concise vs detailed prom...detail_levelA/B experiment to daily architecture diagram workflow outputFull per-PR cluster assignments are in
pr-clusters.csv(1108 rows).Recommendations
Methodology
pr-full-data/pr-*.json, opened 2026-05-02 – 2026-05-22.agent,workflow,fix,update,claude,copilot, etc.) to surface real topical signal.gh-aw logs) wasn't run for this analysis; cycle time stands in as a proxy for iteration cost.References:
Beta Was this translation helpful? Give feedback.
All reactions