Agent Performance Report — Week of 2026-05-17 #32813

2026-05-17T13:07:50Z

github-actions[bot]
Bot May 17, 2026

Executive Summary

Agents analyzed: 229 workflows (140+ copilot, 60+ claude, 12 codex, others)
Total outputs reviewed: ~80+ (issues, PRs, comments from recent runs)
Average quality score: 74/100 (plateau breaking — primary blocker resolved ✅)
Average effectiveness score: 71/100 (expect uplift next run: ~76-78 / 73-75)
Top performers: Agentic Maintenance, Issue Monster, Auto-Triage Issues, Bot Detection, PR Sous Chef
Needs improvement: CGO/CJS Regression, Daily Observability Report, Deployment Monitor (zombie)

🎉 Key event this week: PR-review cluster fix (#31724 CLOSED) eliminates ~272 wasted runs/day — the primary driver of the 15-day quality/effectiveness plateau. Health score rose 64→67/100. Quality/Effectiveness plateau should break on next run.

Performance Rankings

Top Performing Agents 🏆

Agentic Maintenance (Quality: 90/100, Effectiveness: 92/100)
- 100% success rate, stable daily execution
- Consistently produces clear, actionable maintenance outputs
Issue Monster (Quality: 85/100, Effectiveness: 87/100)
- 98%+ success rate, ~6m39s runtime
- High-quality issue triage and categorization
Auto-Triage Issues (Quality: 82/100, Effectiveness: 85/100)
- 100% success rate across multiple daily triggers
- Accurate priority and label assignment
Bot Detection (Quality: 82/100, Effectiveness: 83/100)
- 100% success, 9s runtime — model for fast detection workflows
License Compliance Check (Quality: 80/100, Effectiveness: 82/100)
- ~98% success rate, reliable daily execution
PR Sous Chef (Quality: 80/100, Effectiveness: 82/100)
- 100% success (4/4 recent runs), good PR review quality
Copilot SWE Agent (Quality: 78/100, Effectiveness: 85/100)
- 28/50 PRs merged (56% merge rate) — strong for autonomous coding

Agents Needing Improvement 📉

CGO/CJS Build Regression (Quality: 20/100, Effectiveness: 15/100)
- Failing on every push to main since Feb 2026 (90+ days)
- Issue [CGO] Workflow failure on main - Run #2565 #29669 open — highest-friction PR path item
- Pattern: inconsistency + resource-waste
- Action needed: Root-cause fix to cgo_test/cjs runner configuration
Daily Observability Report (Quality: 55/100, Effectiveness: 50/100)
- Hit 80M effective token limit ([aw] Daily Observability Report for AWF Firewall and MCP Gateway failed #32717)
- Pattern: resource-waste + inconsistency
- Systemic risk: other token-heavy daily workflows approaching same threshold
- Action needed: Reduce output scope or increase max-effective-tokens
Deployment Monitor (Quality: 15/100, Effectiveness: 10/100)
- Pattern: zombie + over-creation + resource-waste
- 100 runs/day at 8% success = ~92 wasted invocations/day
- Recommendation: Deprecate or add trigger circuit-breaker immediately
Daily Fact About gh-aw (Quality: 40/100, Effectiveness: 35/100)
- ~50% success, recurring parse failures despite PR Comment out top-level on.labels in compiled workflows to prevent push-time workflow parse failures #31411 merge
- Issues [aw-failures] Daily Fact About gh-aw: 15+ consecutive push-time parse failures — P1 escalation #31432, [deep-report] Add circuit-breaker + schema fix to Daily Fact workflow (P1 — 15+ consecutive parse failures) #31524 still open — second root cause not yet addressed
Codex Smoke Test (Quality: 45/100, Effectiveness: 40/100)
- Pattern: inconsistency + scope-creep
- Attempting OPENAI_API_KEY access outside sandbox scope (Codex OpenAI proxy fails because OPENAI_API_KEY is excluded from AWF sandbox #32446)

Inactive / New Failures (Monitor)

Outcome Collector ([aw] Outcome Collector failed #32728): new failure May 17 — monitor next 3 runs
Sergo/Serena Go Expert ([aw] Sergo - Serena Go Expert failed #32755): new failure May 17
Step Name Alignment ([aw] Step Name Alignment failed #32754): new failure May 17
Linter Miner ([aw] Linter Miner failed #32748): new failure May 17
Note: All 4 failed in the same ~4h window (01:00–05:00Z May 17) — possible shared transient infrastructure cause

Quality & Pattern Analysis

Output Quality Distribution

Excellent (80-100): 7 agents (Agentic Maintenance, Issue Monster, Auto-Triage, Bot Detection, License, PR Sous Chef, SWE Agent)
Good (60-79): ~15 agents (Failure Investigator, Daily Compiler Quality, various reporters)
Fair (40-59): ~12 agents (Daily Fact, Smoke workflows, Outcome Collector, Observability Report)
Poor (<40): 3 agents (CGO/CJS regression, Deployment Monitor zombie, Codex Smoke)

Pattern Detector Results

Pattern	Count	Key Agents
`inconsistency`	11	CGO/CJS, Daily Fact, Smoke CI, Observability Report, Compiler Quality, Outcome Collector, Sergo, Step Name, Linter Miner, Failure Investigator, Codex Smoke
`resource-waste`	4	PR-review cluster (RESOLVED ✅), CGO/CJS, Observability Report, Deployment Monitor
`zombie`	2	PR-review cluster (RESOLVED ✅), Deployment Monitor
`over-creation`	2	PR-review cluster (RESOLVED ✅), Deployment Monitor
`scope-creep`	1	Codex Smoke Test
Healthy (no patterns)	7	Agentic Maintenance, Issue Monster, Auto-Triage, Bot Detection, License, PR Sous Chef, SWE Agent

Common Quality Issues This Week

ET budget exhaustion (new systemic pattern): Daily Observability Report hit 80M limit. Token-heavy daily workflows need max-effective-tokens audit.
Engine-failure-after-completion ([aw] Daily Compiler Quality Check failed #32736): Daily Compiler Quality Check completes work but exits without sending safe-output. Work is lost even when agent succeeded.
Clustered transient failures (May 17, 01:00–05:00Z): 4 workflows failed in the same window — possible shared infrastructure sensitivity.
Persistent parse failures ([aw-failures] Daily Fact About gh-aw: 15+ consecutive push-time parse failures — P1 escalation #31432, [deep-report] Add circuit-breaker + schema fix to Daily Fact workflow (P1 — 15+ consecutive parse failures) #31524): Daily Fact still failing post-PR#31411 merge — second root cause exists.

Effectiveness Analysis

Task Completion Rates

High completion (>80%): 7 agents (Top Performers above)
Medium completion (50-80%): ~15 agents
Low completion (<50%): 3-5 agents (zombie/broken workflows)

PR Statistics (Copilot SWE Agent)

Merge rate: 56% (28/50 PRs) — above ecosystem average for autonomous agents
Time to merge: typically 24-72h with human review

Resource Efficiency

🏆 Bot Detection: 9s runtime — benchmark for fast detection workflows
⚠️ Daily Observability Report: 80M token/run — unsustainable, needs scope reduction
✅ PR-review cluster waste eliminated: 272 runs/day → 0 (resolved via [deep-report] Fix PR-review cluster trigger gates — 272 wasted run-attempts/day across 8 agents #31724)
⚠️ Deployment Monitor: 100 runs/day × 8% success = 92 wasted invocations/day

Behavioral Patterns

Productive Patterns ✅

Agentic Maintenance → Issue Monster → Auto-Triage chain: effective daily health loop
Workflow Health Manager ↔ Agent Performance Manager: cross-orchestrator coordination working well via shared-alerts.md
Campaign Manager → SWE Agent: effective task delegation, 56% PR merge rate on autonomous PRs

Problematic Patterns ⚠️

Zombie — Deployment Monitor: 100 runs/day, 8% success, ~92 wasted invocations/day. Needs circuit-breaker or deprecation.
ET budget exhaustion creep: Daily Observability Report first to hit 80M; 5-10 others at risk. Growing prompts without corresponding max-effective-tokens adjustment.
Engine-fail-after-completion ([aw] Daily Compiler Quality Check failed #32736): New pattern — work done but safe-output not sent. Check prompt structure and timeout in final output phase.
Clustered transient failures (May 17): 4 workflows in same 4h window — shared infrastructure sensitivity, not individual bugs.

Coverage Analysis

Well-Covered Areas

Campaign orchestration, code quality monitoring, issue lifecycle, security and compliance

Coverage Gaps

Build regression response: CGO/CJS failing 90+ days with no automated fix workflow
Token budget monitoring: No workflow tracks max-effective-tokens headroom before failures occur
Engine availability monitoring: May 17 clustered failures went undetected until post-hoc

Recommendations

High Priority

Fix CGO/CJS push regression ([CGO] Workflow failure on main - Run #2565 #29669) — P1, Day 90+
- Every push to main hits this failure; highest contributor friction item
- Estimated effort: 2-4h | Expected improvement: eliminates P1 friction from every merge
Resolve Codex OPENAI_API_KEY sandbox exclusion (Codex OpenAI proxy fails because OPENAI_API_KEY is excluded from AWF sandbox #32446) — P1
- Blocking all Codex-engine workflows in sandbox
- Estimated effort: 1-2h | Expected improvement: restores Codex smoke tests
Audit max-effective-tokens for daily workflows — new P2 (ET exhaustion)
- Daily Observability Report at 80M; 5-10 others approaching same threshold
- Estimated effort: 2-3h | Expected improvement: prevents future ET budget failures
Deprecate or circuit-break Deployment Monitor zombie
- 100 runs/day × 8% success = 92 wasted invocations/day
- Estimated effort: 30min to deprecate | Expected improvement: immediate waste elimination

Medium Priority

Fix engine-failure-after-completion ([aw] Daily Compiler Quality Check failed #32736) — Daily Compiler Quality Check
Fix Daily Fact parse failures ([aw-failures] Daily Fact About gh-aw: 15+ consecutive push-time parse failures — P1 escalation #31432, [deep-report] Add circuit-breaker + schema fix to Daily Fact workflow (P1 — 15+ consecutive parse failures) #31524) — second root cause post-PR#31411
Investigate May 17 clustered transient failures — shared infrastructure root cause

Low Priority

Consolidate Smoke test cluster — coordinate Smoke CI/Codex/Pi/OTEL triggers
Add token-budget monitoring workflow — alert at 70% of max-effective-tokens

Trends

Metric	May 16	May 17	Change	Notes
Quality score	74/100	74/100	→	Plateau breaking (PR cluster fixed)
Effectiveness	71/100	71/100	→	Expect 76-78 next run
Health score	64/100	67/100	↑ +3	PR cluster fix
Open [aw] failures	19	21	↑ +2	New May 17 failures
Wasted runs/day	~272	~0	↓ ✅	PR-review cluster resolved
Total workflows	229	229	→	Stable

Outlook: With the PR-review cluster fix (#31724) merged, the 15-day Q/E plateau should break. Expect Q→76-78, E→73-75 on next run as the cluster's 0% success rate no longer drags averages.

Actions Taken This Run

Created this performance report discussion
No new issues filed — existing [CGO] Workflow failure on main - Run #2565 #29669, [aw] Daily Observability Report for AWF Firewall and MCP Gateway failed #32717, [aw] Daily Compiler Quality Check failed #32736, Codex OpenAI proxy fails because OPENAI_API_KEY is excluded from AWF sandbox #32446, [aw-failures] Daily Fact About gh-aw: 15+ consecutive push-time parse failures — P1 escalation #31432, [deep-report] Add circuit-breaker + schema fix to Daily Fact workflow (P1 — 15+ consecutive parse failures) #31524, [aw] Smoke CI failed #32690, [aw] Outcome Collector failed #32728, [aw] Step Name Alignment failed #32754, [aw] Sergo - Serena Go Expert failed #32755, [aw] Linter Miner failed #32748 cover all active items
Updated shared memory: agent-performance-latest.md, shared-alerts.md

Next Steps

Monitor Q/E scores next run — expect improvement from PR-review cluster fix
Fix CGO/CJS push regression ([CGO] Workflow failure on main - Run #2565 #29669) — P1, highest contributor friction
Audit max-effective-tokens for all daily workflows (ET budget exhaustion risk)
Deprecate Deployment Monitor zombie
Investigate May 17 clustered transient failures (4 workflows, same 4h window)
Fix Daily Fact parse failures — second root cause needed

Analysis period: 2026-05-10 to 2026-05-17
Previous report: Week of 2026-05-16 (Q:74, E:71, H:64)
Next report: 2026-05-18
Run: §25991507629

Generated by ⚡ Agent Performance Analyzer - Meta-Orchestrator · ● 17.1M · ◷

expires on May 18, 2026, 1:07 PM UTC

pelikhan · 2026-05-17T13:10:04Z

pelikhan
May 17, 2026
Maintainer

/q connect to agentdb

1 reply

github-actions[bot] Bot May 17, 2026
Author

🔧 Pay attention, 007! Q is preparing your gadgets for this discussion comment...

2026-05-18T13:17:28Z

github-actions[bot]
Bot May 18, 2026
Author

This discussion was automatically closed because it expired on 2026-05-18T13:07:49.782Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Performance Report — Week of 2026-05-17 #32813

Uh oh!

{{title}}

Uh oh!

Top Performing Agents 🏆

Agents Needing Improvement 📉

Inactive / New Failures (Monitor)

Output Quality Distribution

Pattern Detector Results

Common Quality Issues This Week

Task Completion Rates

PR Statistics (Copilot SWE Agent)

Resource Efficiency

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Agent Performance Report — Week of 2026-05-17 #32813

Uh oh!

github-actions[bot] Bot May 17, 2026

Executive Summary

Top Performing Agents 🏆

Agents Needing Improvement 📉

Inactive / New Failures (Monitor)

Output Quality Distribution

Pattern Detector Results

Common Quality Issues This Week

Task Completion Rates

PR Statistics (Copilot SWE Agent)

Resource Efficiency

Behavioral Patterns

Productive Patterns ✅

Problematic Patterns ⚠️

Coverage Analysis

Well-Covered Areas

Coverage Gaps

Recommendations

High Priority

Medium Priority

Low Priority

Trends

Actions Taken This Run

Next Steps

Replies: 2 comments · 1 reply

Uh oh!

pelikhan May 17, 2026 Maintainer

Uh oh!

github-actions[bot] Bot May 17, 2026 Author

Uh oh!

github-actions[bot] Bot May 18, 2026 Author

github-actions[bot]
Bot May 17, 2026

Replies: 2 comments 1 reply

pelikhan
May 17, 2026
Maintainer

github-actions[bot] Bot May 17, 2026
Author

github-actions[bot]
Bot May 18, 2026
Author