[prompt-clustering] Copilot agent prompt clustering — daily report (2026-05-11) #31482

2026-05-11T11:14:51Z

github-actions[bot]
Bot May 11, 2026

Summary

Analysis window: 2026-04-22 → 2026-05-11 (996 of 1000 Copilot agent PRs after cleaning)
Clusters: 12 (TF-IDF + KMeans, silhouette=0.020)
Overall merge rate: 78.7%
Median PR size: 4 files / +98/-17 lines
WIP/draft PRs: 28

Key findings

Workflow YAML & lock-file edits dominate volume — Cluster C2 (Workflow YAML & lock-file changes) is the largest at 195 PRs (20% of all work), driven by routine .github/workflows/*.yml and lock-file regeneration tasks.
AWF config tasks are the hardest — C3 (AWF config emission) has a merge rate of only 47% vs. the 79% baseline — these PRs also have the highest median diff size (+205 lines) and most files changed (7), suggesting they're cross-cutting and harder for the agent to land cleanly.
Experiments & cache-memory work lands reliably — C5 (A/B experiments framework) reaches 90% merge rate; these are well-scoped, single-subsystem changes.
High-quality plumbing changes ship — Go-internals (C7, 89%) and cache-memory (C4, 89%) merge at well above baseline despite being technical, indicating the agent handles narrowly-scoped engineering tasks well.
Version bumps churn — Version-bump / model-inventory work (C9) merges at only 71% despite being mechanical, likely because of overlapping/superseded bump PRs.

Cluster overview

Cluster	Theme	# PRs	Share	Merge	Closed	Med. files	Med. +adds	Top keywords
C2	Workflow YAML & lock-file changes	195	20%	73%	26%	3	68	github, workflows, run, yml
C7	Go package internals (pkg/*)	135	14%	89%	11%	3	61	pkg, string, engine, test
C6	Safe-outputs .cjs handlers	113	11%	86%	13%	4	100	cjs, safe, safe outputs, outputs
C10	MCP servers & gateway	94	9%	83%	17%	5	74	mcp, cli, tool, gateway
C9	Version bumps & model inventory	82	8%	71%	29%	10	216	com, version, copilot, constants
C0	Agent token & turn optimization	67	7%	79%	21%	3	95	turns, agent, turn, tokens
C4	Cache-memory subsystem	61	6%	89%	11%	2	50	cache, cache-memory, memory, gh-aw
C8	Shared workflow imports	61	6%	72%	28%	5	267	shared, workflows, import, apm
C3	AWF config emission	51	5%	47%	53%	7	205	awf, awf config, config, models
C11	Docs site (docs/src)	49	5%	78%	22%	3	82	docs, src, docs src, reference
C1	PR branch / checkout plumbing	48	5%	83%	17%	3	167	pull request, pull, request, branch
C5	A/B experiments framework	40	4%	90%	10%	4	200	experiment, experiments, variant, state

Daily volume by cluster

Cluster space (2D projection)

Per-cluster detail (themes, top terms, representative PRs)

C2 — Workflow YAML & lock-file changes

Size: 195 PRs (19.6%)
Merge rate: 73% — Closed: 26%
Median PR: 3 files / +68/-14 / 2 commits / 0 comments
Top keywords: github, workflows, run, yml, lock, github workflows, step, lock yml, files, job
Representative PRs:
- #29842 (✅ merged) — feat: add smoke-pi workflow and switch smoke-crush/smoke-opencode to water label
- #29209 (❌ not merged) — fix(security): replace curl-pipe-bash with actions/setup-cli (RGS-006)
- #29482 (❌ not merged) — fix(security): replace curl | bash installer pattern with secure alternatives (RGS-018)
- #30788 (✅ merged) — Align workflow step names to Title Case in source and compiled workflows
- #28838 (✅ merged) — fix: move schema-demo files to non-protected schema-demos/ folder

C7 — Go package internals (pkg/*)

Size: 135 PRs (13.6%)
Merge rate: 89% — Closed: 11%
Median PR: 3 files / +61/-19 / 3 commits / 1 comments
Top keywords: pkg, string, engine, test, yaml, pkg workflow, validation, error, strings, fmt
Representative PRs:
- #29409 (✅ merged) — perf: optimize YAML generation hot paths (-7% latency, -10% allocations)
- #29197 (✅ merged) — fix(errors): replace fmt.Errorf("%s") with errors.New and preserve error chains in pkg/workflow + pkg/parser
- #28407 (✅ merged) — perf: fix BenchmarkParseWorkflow regression — ~31% faster, ~40% fewer allocations
- #29098 (✅ merged) — refactor(workflow): eliminate scattered any→[]string inline conversions, remove toStringSlice, consolidate filename sanitization
- #28745 (✅ merged) — refactor: consolidate duplicate provider extraction and remove redundant engine overrides

C6 — Safe-outputs .cjs handlers

Size: 113 PRs (11.3%)
Merge rate: 86% — Closed: 13%
Median PR: 4 files / +100/-12 / 3 commits / 2 comments
Top keywords: cjs, safe, safe outputs, outputs, test, create, tests, harness, setup, issue
Representative PRs:
- #29648 (✅ merged) — feat: add body-header message type to safe outputs
- #28331 (✅ merged) — fix: add render_template.cjs and is_truthy.cjs to SAFE_OUTPUTS_FILES
- #31005 (❌ not merged) — Add issue field updates to update_issue safe output
- #29662 (✅ merged) — fix: add messages_header.cjs to SAFE_OUTPUTS_FILES in setup.sh
- #28053 (✅ merged) — fix: apply sanitizeContent to body in create_discussion and create_pull_request handlers

C10 — MCP servers & gateway

Size: 94 PRs (9.4%)
Merge rate: 83% — Closed: 17%
Median PR: 5 files / +74/-15 / 3 commits / 2 comments
Top keywords: mcp, cli, tool, gateway, mode, tools, server, logs, bash, test
Representative PRs:
- #30400 (❌ not merged) — feat: add web-fetch MCP server for Codex engine
- #28291 (✅ merged) — fix(mcp): audit/audit-diff return graceful JSON errors instead of IsError=true
- #30158 (✅ merged) — fix: 4 CLI consistency issues in mcp, logs, and init commands
- #30223 (✅ merged) — Add MCP server unit tests using InMemoryTransport (no subprocess)
- #29354 (✅ merged) — Support configurable MCP gateway session timeout via engine.mcp.session-timeout

C9 — Version bumps & model inventory

Size: 82 PRs (8.2%)
Merge rate: 71% — Closed: 29%
Median PR: 10 files / +216/-38 / 3 commits / 3 comments
Top keywords: com, version, copilot, constants, changeset, api, gemini, gpt-, engine, cli
Representative PRs:
- #29484 (✅ merged) — chore: bump CLI/MCP tool versions (Claude Code 2.1.126, Copilot 1.0.39, Codex 0.128.0, Playwright MCP 0.0.72, MCP Gateway v0.3.3)
- #28401 (✅ merged) — chore: bump Copilot CLI → 1.0.36, Codex CLI → 0.125.0, GitHub MCP Server → v1.0.3
- #30957 (✅ merged) — chore: bump default Claude/Copilot/Codex/Playwright MCP versions and refresh generated workflows
- #28385 (✅ merged) — bump Gemini CLI default to 0.39.1 to fix API_KEY_INVALID smoke failures
- #29819 (❌ not merged) — chore: upgrade gh-aw-firewall to v0.25.35

C0 — Agent token & turn optimization

Size: 67 PRs (6.7%)
Merge rate: 79% — Closed: 21%
Median PR: 3 files / +95/-31 / 3 commits / 0 comments
Top keywords: turns, agent, turn, tokens, phase, prompt, max-turns, bash, token, json
Representative PRs:
- #29792 (✅ merged) — Add inline sub-agents and deterministic pre-agent steps to unbloat-docs workflow
- #28828 (✅ merged) — optimize spec-extractor: pre-agent-steps + toolset trim + prompt slim (~1.6M tokens/run)
- #28968 (✅ merged) — feat(workflow-health-manager): reduce token usage ~42% via pre-agent steps and prompt trimming
- #28839 (✅ merged) — Optimize daily-syntax-error-quality: add deterministic pre-step, tighten turn budget
- #29407 (✅ merged) — optimize: reduce repository-quality-improver token usage ~800K/run

C4 — Cache-memory subsystem

Size: 61 PRs (6.1%)
Merge rate: 89% — Closed: 11%
Median PR: 2 files / +50/-13 / 3 commits / 0 comments
Top keywords: cache, cache-memory, memory, gh-aw, agent, state, tmp, tmp gh-aw, gh-aw cache-memory, cache memory
Representative PRs:
- #28516 (✅ merged) — feat: cache-memory cache_memory_miss detection and conclusion handler
- #29485 (✅ merged) — fix(go-fan): specify explicit cache file path to suppress false cache_memory_miss alerts
- #28473 (✅ merged) — fix(q): persist cache state to end 100% cache miss streak
- #29479 (✅ merged) — fix(cache): persist meaningful state and add hit-history tracking in Smoke Codex
- #30466 (✅ merged) — fix: don't report cache_memory_miss as failure on first run of daily-caveman-optimizer

C8 — Shared workflow imports

Size: 61 PRs (6.1%)
Merge rate: 72% — Closed: 28%
Median PR: 5 files / +267/-71 / 2 commits / 0 comments
Top keywords: shared, workflows, import, apm, reporting, imports, headers, github, workflows shared, otel
Representative PRs:
- #28079 (✅ merged) — Refactor audit workflows with new shared/daily-audit-charts composite import
- #28834 (✅ merged) — Fix gh aw add not recursively downloading transitive shared imports
- #29162 (✅ merged) — feat: add shared/otel.md OpenTelemetry shared import and instrument 6 daily workflows
- #28147 (❌ not merged) — feat: create shared/go-daily-audit.md and migrate 5 workflows
- #29517 (❌ not merged) — Centralize workflow formatting/validation imports and add compiler lint check for prompt drift

C3 — AWF config emission

Size: 51 PRs (5.1%)
Merge rate: 47% — Closed: 53%
Median PR: 7 files / +205/-19 / 3 commits / 2 comments
Top keywords: awf, awf config, config, models, apiproxy, json, node, container, binary, engine
Representative PRs:
- #31214 (❌ not merged) — feat: move max-runs constraint from engine-specific flags to AWF config
- #30280 (❌ not merged) — feat: emit models section in AWF config JSON (under apiProxy)
- #31117 (✅ merged) — Stop emitting unsupported apiProxy.maxEffectiveTokens in generated AWF config
- #29222 (✅ merged) — feat: compiler emits AWF JSON config file instead of CLI flag soup
- #29431 (❌ not merged) — Fix compiler to pass --enable-opencode to AWF when engine is opencode

C11 — Docs site (docs/src)

Size: 49 PRs (4.9%)
Merge rate: 78% — Closed: 22%
Median PR: 3 files / +82/-3 / 2 commits / 0 comments
Top keywords: docs, src, docs src, reference, page, link, astro, content, src content, content docs
Representative PRs:
- #31260 (✅ merged) — feat(docs): improve GEO scores — robots.txt AI crawlers, homepage stats, JSON-LD sameAs/dateModified
- #30688 (✅ merged) — Add missing Agentic Ops pattern page
- #30256 (✅ merged) — docs: generate model alias & multiplier reference tables from JSON data
- #28280 (✅ merged) — docs: add build-time table scroll wrapper as no-JS fallback
- #30490 (✅ merged) — Remove experiments.owner field from front matter, JSON, and docs

C1 — PR branch / checkout plumbing

Size: 48 PRs (4.8%)
Merge rate: 83% — Closed: 17%
Median PR: 3 files / +167/-10 / 3 commits / 2 comments
Top keywords: pull request, pull, request, branch, checkout, push, git, target, ref, base
Representative PRs:
- #28377 (✅ merged) — fix: resolve target repo checkout path in push_to_pull_request_branch handlers
- #30071 (✅ merged) — refactor: decouple safe-outputs checkout from event trigger context
- #30718 (✅ merged) — docs(instructions): never suggest pull_request_target over pull_request
- #27894 (✅ merged) — Handle side-repo checkouts in push_to_pull_request_branch by scoping git ops to target repo cwd
- #29433 (✅ merged) — Add pull_request_target security validation (pwn request detection)

C5 — A/B experiments framework

Size: 40 PRs (4.0%)
Merge rate: 90% — Closed: 10%
Median PR: 4 files / +200/-16 / 3 commits / 1 comments
Top keywords: experiment, experiments, variant, state, prompt, variants, owner, experiment state, schema, cjs
Representative PRs:
- #29534 (✅ merged) — feat: extend frontmatter with A/B experiments section
- #30020 (✅ merged) — feat: add hidden experiments command to read experiment state from storage repo branches
- #29601 (✅ merged) — Add .github/aw/experiments.md instruction file for A/B testing experiments
- #30046 (✅ merged) — Analysis: branch storage supports multiple experiments per workflow ID; simplification to single-experiment API not recommended
- #29996 (✅ merged) — feat: add storage option to experiments (cache | repo, default repo)

Methodology

Dataset: 1,000 most recent PRs authored by app/copilot-swe-agent in github/gh-aw (the search cap), spanning 2026-04-22 → 2026-05-11.
Prompt proxy: PR title + cleaned body. Where present, the explicit Original prompt block is extracted; otherwise body content is used, which describes the resulting work and is semantically correlated with the original task.
Cleaning: stripped Copilot suffix metadata, the firewall-block <details> warning (otherwise its /usr/bin, goinsecure, gomodcache, triggering command tokens dominate the vocabulary), code blocks, HTML tags, URLs, markdown checkboxes, and quoted lines.
Features: TF-IDF over 1–2 grams with min_df=4, max_df=0.6, English stopwords + a small custom list (fix, add, update, wip, etc.).
Clustering: KMeans across k=3..15; chose k=12 as the largest k still within 0.001 of the maximum silhouette in the [3..12] range — favors interpretable splits without over-fragmenting. Note that silhouette values are small (≈0.02) because TF-IDF vectors are very high-dimensional and sparse — silhouette underestimates separation in that regime; the chosen clusters are nevertheless semantically coherent, as the representative PRs show.
Cluster labels: assigned manually after inspecting top TF-IDF terms and the five PRs closest to each centroid.

Limitations:

Workflow metrics not joined: no per-PR turn counts / cost / duration are available in this run — those would require correlating workflow runs to PRs, which is fragile without an explicit join key. The agentic-workflows MCP server's logs tool could supply them in a future run for engine=copilot workflows that produce PRs.
The 1,000-PR search cap is the only ceiling — the actual window (~19 days) fits comfortably inside the requested 30-day target.

Recommendations

Investigate why AWF config (C3) lands so rarely — at 47% merge rate this cluster is the clearest outlier. Worth pulling the closed PRs in this cluster (e.g. feat: move max-runs constraint from engine-specific flags to AWF config #31214, feat: emit models section in AWF config JSON (under apiProxy) #30280) to see whether prompts need more architectural framing, smaller scope, or wait for the AWF schema to stabilize.
Workflow YAML changes (C2, 195 PRs, 73% merge) are the biggest lever — even a few-point lift on this cluster outweighs improvements anywhere else. The security-fix sub-theme (curl | bash replacements) is over-represented in the closed PRs there.
Version-bump churn (C9, 71% merge) — most of the failures here look like superseded bumps. Worth adding a check that flags or auto-closes prior open bump PRs when a newer one supersedes them.
Keep doing what works for C4/C5/C7 — narrow, single-subsystem tasks (cache-memory, experiments, Go internals) all sit at ≥85% merge and should be the template for prompt patterns elsewhere.

References:

Workflow run: §25665481020
Cluster sizes / merge-rate chart: see top of report
2D scatter & daily volume charts: see sections above

Generated by Copilot Agent Prompt Clustering Analysis

Generated by Copilot Agent Prompt Clustering Analysis · ● 14.4M · ◷

expires on May 12, 2026, 11:14 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot agent prompt clustering — daily report (2026-05-11) #31482

Uh oh!

{{title}}

Uh oh!

C2 — Workflow YAML & lock-file changes

C7 — Go package internals (pkg/*)

C6 — Safe-outputs .cjs handlers

C10 — MCP servers & gateway

C9 — Version bumps & model inventory

C0 — Agent token & turn optimization

C4 — Cache-memory subsystem

C8 — Shared workflow imports

C3 — AWF config emission

C11 — Docs site (docs/src)

C1 — PR branch / checkout plumbing

C5 — A/B experiments framework

Replies: 0 comments

Select a reply

Uh oh!

[prompt-clustering] Copilot agent prompt clustering — daily report (2026-05-11) #31482

Uh oh!

github-actions[bot] Bot May 11, 2026

Summary

Key findings

Cluster overview

Daily volume by cluster

Cluster space (2D projection)

C2 — Workflow YAML & lock-file changes

C7 — Go package internals (pkg/*)

C6 — Safe-outputs .cjs handlers

C10 — MCP servers & gateway

C9 — Version bumps & model inventory

C0 — Agent token & turn optimization

C4 — Cache-memory subsystem

C8 — Shared workflow imports

C3 — AWF config emission

C11 — Docs site (docs/src)

C1 — PR branch / checkout plumbing

C5 — A/B experiments framework

Recommendations

Replies: 0 comments

github-actions[bot]
Bot May 11, 2026