[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 990 Tasks Analyzed #14588

2026-02-09T05:06:13Z

github-actions[bot]
bot Feb 9, 2026

Daily NLP-based clustering analysis of copilot agent task prompts using machine learning to identify patterns, success rates, and optimization opportunities.

Summary

Analysis Date: 2026-02-09
Analysis Period: Last 30 days
Total Tasks Analyzed: 990
Clusters Identified: 8
Overall Success Rate: 69.2%
Clustering Quality: 0.086 (silhouette score)

Key Findings

Most Common Task Type: Dependency Updates (36.0% of all tasks)
Highest Success Rate: Bug Fixes cluster achieved 79.5% merge rate
Average Complexity: 19.5 files changed per PR
CI/CD Tasks Need Attention: Only 58.8% success rate - lowest among all clusters

View Detailed Cluster Analysis

Cluster 1: Dependency Updates (356 tasks, 36.0%)

Success Rate: 75.6% (269/356 merged)
Avg Files Changed: 18.8
Avg Comments: 2.9
Top Keywords: agentic, update, workflow, github, file
Example PRs: Update parent issue template for agentic-workflow failures #11053, Fix ephemerals tests after blockquote prefix requirement in PR #11036 #11058, Add interactive engine selection and secret configuration to init command #11064

Characteristics: Largest cluster focusing on dependency updates, package version bumps, and maintaining up-to-date dependencies. High success rate indicates well-structured, predictable tasks.

Sample Task:

Update the template used to create the parent issue for all agentic-workflow issues so that it creates a conclusion job. Set the issue title to "agentic-workflows failures"

Cluster 2: CI/CD & Workflows (199 tasks, 20.1%)

Success Rate: 58.8% (117/199 merged) ⚠️
Avg Files Changed: 12.9
Avg Comments: 2.1
Top Keywords: gh, aw, gh aw, workflow, comments
Example PRs: Merge maintenance jobs and add comprehensive logging #11060, Add missing get_repository tool to repos toolset #11067, Enable safe-input tool tracking for Daily Performance Summary workflow #11074

Characteristics: Second-largest cluster dealing with GitHub Actions workflows, CI/CD pipelines, and workflow automation. Lowest success rate suggests these tasks are more complex or require more careful handling.

Sample Task:

If you update the code that checks whether workflows are in sync in the agentic maintenance workflow, and the workflows are not in sync, and the gh-aw-agent-token secret is available, then the workflow should create an issue

Cluster 3: General Maintenance - MCP Servers (100 tasks, 10.1%)

Success Rate: 66.0% (66/100 merged)
Avg Files Changed: 42.1 (highest complexity)
Avg Comments: 4.5
Top Keywords: mcp, server, constants, mcp server, gateway
Example PRs: chore: Update Sentry MCP server to 0.27.0 #11050, Add codemod to migrate MCP per-server network config to top-level #11110, Convert safe-outputs MCP server to HTTP transport #11120

Characteristics: Focus on MCP server configurations, updates, and maintenance. Highest file change count indicates significant scope. Higher comment count suggests more iteration needed.

Sample Task:

Run the update command and ensure that the Sentry MCP is updated. It should be upgraded to version 0.27.0

Cluster 4: General Maintenance - Safe Outputs (89 tasks, 9.0%)

Success Rate: 76.4% (68/89 merged)
Avg Files Changed: 18.4
Avg Comments: 3.7
Top Keywords: project, safe, safe outputs, outputs, create
Example PRs: chore: recompile workflows after safe outputs handler changes #11180, Enable append-only comments in smoke-copilot workflow #11202, Fix campaign project URL resolution in issue updates #11238

Characteristics: Tasks related to safe-outputs functionality, project configuration, and output handling. Strong success rate indicates well-defined scope.

Cluster 5: Bug Fixes - Workflow Failures (74 tasks, 7.5%)

Success Rate: 68.9% (51/74 merged)
Avg Files Changed: 21.0
Avg Comments: 1.4
Top Keywords: reference, fix, tests, review, logs
Example PRs: Fix safe-outputs server startup by copying tools.json to expected location #11129, Fix setup.sh: Add missing safe-outputs MCP HTTP transport files #11144, Fix safe-outputs MCP server: Enable stateless mode for gateway compatibility #11147

Characteristics: Bug fixes with workflow references and log analysis. Lower comment count suggests clearer problem definition.

Cluster 6: General Maintenance - Task Mining (70 tasks, 7.1%)

Success Rate: 58.6% (41/70 merged) ⚠️
Avg Files Changed: 15.5
Avg Comments: 1.1
Top Keywords: task miner, discussion task, discussion task miner, miner, task
Example PRs: Add comprehensive test coverage for compiler_safe_outputs.go #11587, Refactor ParseWorkflowFile: extract three helper functions to reduce complexity #11592, Add glossary callout to Quick Start for terminology discoverability #11593

Characteristics: Focus on task mining and discussion management. Lower success rate may indicate complexity in automated task generation.

Cluster 7: Bug Fixes - Campaigns & Security (63 tasks, 6.4%)

Success Rate: 66.7% (42/63 merged)
Avg Files Changed: 9.0 (lowest complexity)
Avg Comments: 1.7
Top Keywords: campaign, security, project, issue, docs
Example PRs: chore: campaign discovery via label-based approach #11070, Clarify tracker-id is optional for campaign worker workflows #11080, Replace campaign fusion with first-class dispatch-only workers #11087

Characteristics: Smallest scope tasks focusing on campaigns, security, and documentation fixes.

Cluster 8: Bug Fixes - Workflow Jobs (39 tasks, 3.9%)

Success Rate: 79.5% (31/39 merged) ✅
Avg Files Changed: 25.7
Avg Comments: 0.5
Top Keywords: job, fix, workflow, failure, url
Example PRs: Fix staticcheck S1009 lint error: remove redundant nil check on map #11915, Fix lint-go workflow: Remove unused logger variable #12304, Remove obsolete campaign command test case #12646

Characteristics: Highest success rate despite complexity. Tasks with clear failure references (job IDs, URLs) lead to better outcomes.

Sample Task:

Fix the failing GitHub Actions workflow js. Analyze the workflow logs, identify the root cause of the failure, and implement a fix. Job ID: 61070763482

Success Rate Comparison

Cluster	Theme	Tasks	Success	Merged	Avg Files	Risk Level
8	Bug Fixes (Jobs)	39	79.5%	31	25.7	🟢 Low
4	Maintenance (Safe Outputs)	89	76.4%	68	18.4	🟢 Low
1	Dependency Updates	356	75.6%	269	18.8	🟢 Low
7	Bug Fixes (Campaigns)	74	68.9%	51	21.0	🟡 Medium
5	Bug Fixes (Workflow)	63	66.7%	42	9.0	🟡 Medium
3	Maintenance (MCP)	100	66.0%	66	42.1	🟡 Medium
2	CI/CD & Workflows	199	58.8%	117	12.9	🔴 High
6	Maintenance (Task Mining)	70	58.6%	41	15.5	🔴 High

Cluster Distribution

Dependency Updates        █████████████████ 356 (36.0%)
CI/CD & Workflows         ██████████ 199 (20.1%)
Maintenance (MCP)         █████ 100 (10.1%)
Maintenance (Safe Out)    ████ 89 (9.0%)
Bug Fixes (Workflow)      ███ 74 (7.5%)
Maintenance (Task Mining) ███ 70 (7.1%)
Bug Fixes (Campaigns)     ███ 63 (6.4%)
Bug Fixes (Jobs)          █ 39 (3.9%)

Insights & Patterns

✅ What Works Well

Clear Failure References: Bug fixes with specific job IDs/URLs achieve 79.5% success
Dependency Updates: Well-defined scope leads to 75.6% success
Minimal Iteration: Tasks requiring fewer comments have higher success rates

⚠️ Areas for Improvement

CI/CD Complexity: 58.8% success rate indicates need for better task decomposition
Task Mining Automation: 58.6% success suggests challenges with automated task generation
High File Count Tasks: MCP server tasks (42 files avg) require more iteration (4.5 comments)

🔍 Key Observations

Size vs Success: Smaller clusters (39 tasks) achieve highest success when well-defined
Context Matters: Tasks with workflow logs/job URLs perform significantly better
Complexity Cost: Tasks averaging >40 files changed show lower success rates
Iteration Overhead: More comments correlate with lower merge rates

Recommendations

1. Improve CI/CD Task Prompts

Issue: CI/CD & Workflows cluster has lowest success rate (58.8%)

Action:

Break complex workflow changes into smaller, focused tasks
Provide more context about workflow dependencies
Include examples of expected workflow YAML structures
Reference similar successful PRs as patterns

2. Leverage Successful Bug Fix Patterns

Observation: Bug fixes with job IDs achieve 79.5% success

Action:

Always include workflow run URLs and job IDs in bug fix prompts
Provide log excerpts showing exact failure points
Reference specific error messages and stack traces
Apply this pattern to all failure-related tasks

3. Manage Task Complexity

Issue: MCP server tasks average 42 files changed with 4.5 comments

Action:

Split large MCP updates into configuration + code changes
Create separate PRs for dependency updates vs feature changes
Establish clearer acceptance criteria upfront
Consider phased rollouts for complex changes

4. Enhance Task Mining Success

Issue: Task mining cluster has 58.6% success rate

Action:

Review generated task descriptions for clarity
Add validation step before task creation
Include more context from discussion/issue analysis
Consider human review for ambiguous tasks

5. Continue Monitoring Trends

Action:

Track these 8 clusters over time to identify trends
Compare month-over-month success rate changes
Identify which prompt engineering improvements work
Build a knowledge base of successful prompt patterns

Methodology

Data Source: 990 copilot-created PRs from last 30 days

Analysis Approach:

Extracted task prompts from PR bodies using pattern matching
Cleaned and preprocessed text (lowercasing, special char removal)
TF-IDF vectorization with unigrams, bigrams, and trigrams
K-means clustering (k=8, optimized via elbow method)
Cluster validation using silhouette score
Manual theme assignment based on keyword analysis

Limitations:

Silhouette score of 0.086 indicates moderate cluster separation
Some clusters overlap (multiple "General Maintenance" themes)
Success rate influenced by task complexity not captured in prompts
Workflow metrics not fully integrated (future enhancement)

Next Steps

Automate Tracking: Store cluster assignments in cache for trend analysis
Integrate Workflow Metrics: Correlate turn counts and duration with clusters
Prompt Engineering: Create best-practice templates for each cluster type
Feedback Loop: Share insights with prompt authors to improve task quality

Full Report: Available in workflow artifacts
Cluster Assignments: /tmp/gh-aw/pr-data/cluster-assignments.json
Analysis Script: /tmp/gh-aw/clustering-analysis.py

AI generated by Copilot Agent Prompt Clustering Analysis

expires on Feb 16, 2026, 5:06 AM UTC

2026-02-09T05:55:56Z

github-actions[bot]
bot Feb 9, 2026
Author

🤖 Beep boop! The smoke test agent was here! 🎪

Just did a quick flyby to make sure all the gears are turning smoothly. Everything looks ship-shape! ⚓️✨

May your clusters be well-separated and your silhouette scores be ever in your favor! 📊🎯

AI generated by Smoke Copilot

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 990 Tasks Analyzed #14588

Uh oh!

{{title}}

Uh oh!

Cluster 1: Dependency Updates (356 tasks, 36.0%)

Cluster 2: CI/CD & Workflows (199 tasks, 20.1%)

Cluster 3: General Maintenance - MCP Servers (100 tasks, 10.1%)

Cluster 4: General Maintenance - Safe Outputs (89 tasks, 9.0%)

Cluster 5: Bug Fixes - Workflow Failures (74 tasks, 7.5%)

Cluster 6: General Maintenance - Task Mining (70 tasks, 7.1%)

Cluster 7: Bug Fixes - Campaigns & Security (63 tasks, 6.4%)

Cluster 8: Bug Fixes - Workflow Jobs (39 tasks, 3.9%)

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering Analysis - 990 Tasks Analyzed #14588

Uh oh!

github-actions[bot] bot Feb 9, 2026

Summary

Key Findings

Cluster 1: Dependency Updates (356 tasks, 36.0%)

Cluster 2: CI/CD & Workflows (199 tasks, 20.1%)

Cluster 3: General Maintenance - MCP Servers (100 tasks, 10.1%)

Cluster 4: General Maintenance - Safe Outputs (89 tasks, 9.0%)

Cluster 5: Bug Fixes - Workflow Failures (74 tasks, 7.5%)

Cluster 6: General Maintenance - Task Mining (70 tasks, 7.1%)

Cluster 7: Bug Fixes - Campaigns & Security (63 tasks, 6.4%)

Cluster 8: Bug Fixes - Workflow Jobs (39 tasks, 3.9%)

Success Rate Comparison

Cluster Distribution

Insights & Patterns

✅ What Works Well

⚠️ Areas for Improvement

🔍 Key Observations

Recommendations

1. Improve CI/CD Task Prompts

2. Leverage Successful Bug Fix Patterns

3. Manage Task Complexity

4. Enhance Task Mining Success

5. Continue Monitoring Trends

Methodology

Next Steps

Replies: 1 comment

Uh oh!

github-actions[bot] bot Feb 9, 2026 Author

github-actions[bot]
bot Feb 9, 2026

github-actions[bot]
bot Feb 9, 2026
Author