Agent Performance Report — Week of 2026-05-17 #32813
Closed
Replies: 2 comments 1 reply
-
|
/q connect to agentdb |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
This discussion was automatically closed because it expired on 2026-05-18T13:07:49.782Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
🎉 Key event this week: PR-review cluster fix (#31724 CLOSED) eliminates ~272 wasted runs/day — the primary driver of the 15-day quality/effectiveness plateau. Health score rose 64→67/100. Quality/Effectiveness plateau should break on next run.
Performance Rankings
Top Performing Agents 🏆
Agentic Maintenance (Quality: 90/100, Effectiveness: 92/100)
Issue Monster (Quality: 85/100, Effectiveness: 87/100)
Auto-Triage Issues (Quality: 82/100, Effectiveness: 85/100)
Bot Detection (Quality: 82/100, Effectiveness: 83/100)
License Compliance Check (Quality: 80/100, Effectiveness: 82/100)
PR Sous Chef (Quality: 80/100, Effectiveness: 82/100)
Copilot SWE Agent (Quality: 78/100, Effectiveness: 85/100)
Agents Needing Improvement 📉
CGO/CJS Build Regression (Quality: 20/100, Effectiveness: 15/100)
inconsistency+resource-wasteDaily Observability Report (Quality: 55/100, Effectiveness: 50/100)
resource-waste+inconsistencymax-effective-tokensDeployment Monitor (Quality: 15/100, Effectiveness: 10/100)
zombie+over-creation+resource-wasteDaily Fact About gh-aw (Quality: 40/100, Effectiveness: 35/100)
on.labelsin compiled workflows to prevent push-time workflow parse failures #31411 mergeCodex Smoke Test (Quality: 45/100, Effectiveness: 40/100)
inconsistency+scope-creepInactive / New Failures (Monitor)
Quality & Pattern Analysis
Output Quality Distribution
Pattern Detector Results
inconsistencyresource-wastezombieover-creationscope-creepCommon Quality Issues This Week
ET budget exhaustion (new systemic pattern): Daily Observability Report hit 80M limit. Token-heavy daily workflows need
max-effective-tokensaudit.Engine-failure-after-completion ([aw] Daily Compiler Quality Check failed #32736): Daily Compiler Quality Check completes work but exits without sending safe-output. Work is lost even when agent succeeded.
Clustered transient failures (May 17, 01:00–05:00Z): 4 workflows failed in the same window — possible shared infrastructure sensitivity.
Persistent parse failures ([aw-failures] Daily Fact About gh-aw: 15+ consecutive push-time parse failures — P1 escalation #31432, [deep-report] Add circuit-breaker + schema fix to Daily Fact workflow (P1 — 15+ consecutive parse failures) #31524): Daily Fact still failing post-PR#31411 merge — second root cause exists.
Effectiveness Analysis
Task Completion Rates
PR Statistics (Copilot SWE Agent)
Resource Efficiency
Behavioral Patterns
Productive Patterns ✅
Problematic Patterns⚠️
max-effective-tokensadjustment.Coverage Analysis
Well-Covered Areas
Coverage Gaps
max-effective-tokensheadroom before failures occurRecommendations
High Priority
Fix CGO/CJS push regression ([CGO] Workflow failure on main - Run #2565 #29669) — P1, Day 90+
Resolve Codex OPENAI_API_KEY sandbox exclusion (Codex OpenAI proxy fails because OPENAI_API_KEY is excluded from AWF sandbox #32446) — P1
Audit max-effective-tokens for daily workflows — new P2 (ET exhaustion)
Deprecate or circuit-break Deployment Monitor zombie
Medium Priority
Low Priority
max-effective-tokensTrends
Actions Taken This Run
agent-performance-latest.md,shared-alerts.mdNext Steps
max-effective-tokensfor all daily workflows (ET budget exhaustion risk)Beta Was this translation helpful? Give feedback.
All reactions