Agent Performance Report — Week of 2026-07-01 #42767
Closed
Replies: 2 comments
-
|
Thanks for sharing this detailed report. The model-version lifecycle issue and retry behavior around non-retryable 400/401/403 errors seem especially important to address, since both can create recurring failures and unnecessary credit usage. A pre-flight model availability check plus a fast-fail retry guard would likely improve overall workflow reliability quite a bit. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
This discussion was automatically closed because it expired on 2026-07-02T13:41:41.262Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Executive Summary
Top performers: Copilot SWE Agent, Issue Monster, PR Triage, Auto-Triage Issues, Avenger
Needs improvement: PR Sous Chef, Sub-Agent Model Resolution Audit, PR Code Quality Reviewer, Daily Safe Output Integrator
New this run: #aw_model_lifecycle — Systemic model version lifecycle management issue filed.
Performance Rankings
Top Performing Agents
Copilot SWE Agent (Q: 92/100, E: 91/100)
skills:frontmatter for workflow skill installation in instructions #42756, Documentskills:frontmatter with pinned refs, per-skill auth, and Matt Pocock example #42747, Add regression coverage for Copilot AWF chroot-home cleanup #42736, eslint-factory: add empty-string suggestion for null/undefined in no-core-setoutput-non-string #42723, Standardize two linters on Cursor traversal and add sharedastutil.Root#42719Issue Monster (Q: 88/100, E: 87/100)
PR Triage (Q: 88/100, E: 86/100)
Auto-Triage Issues (Q: 84/100, E: 82/100)
Avenger (Q: 83/100, E: 82/100)
Team Status (Q: 82/100, E: 81/100)
Static Analysis (Q: 81/100, E: 80/100)
AB Advisor (Q: 78/100, E: 76/100)
AIC Consumption Report (Q: 75/100, E: 75/100)
Content Moderation (Q: 74/100, E: 72/100)
Agents Needing Improvement
PR Sous Chef (Q: 38/100, E: 30/100)
gpt-5.5, SDK returns 400 "not accessible via /chat/completio [Content truncated due to length] #42444 closed Jun 30 22:50 but problem persists on Jul 1pi(feat: switch pr-sous-chef to pi engine #42730 merged) is the active mitigation — monitor closelySub-Agent Model Resolution Audit (Q: 30/100, E: 25/100)
gpt-5-codex-alpha-2025-11-07404s (same alpha-snapshot d [Content truncated due to length] #42033 OPEN)PR Code Quality Reviewer (Q: 35/100, E: 30/100)
general-purposesubagent requests tier-unsupported model → SDK 400 `model [Content truncated due to length] #42095 OPEN). Every run fails.Daily Safe Output Integrator (Q: 40/100, E: 35/100)
Daily BYOK Ollama (Q: 35/100, E: 30/100)
AI Moderator (Q: 55/100, E: 45/100)
Agentic Commands (Q: 60/100, E: 52/100)
Inactive / Zero-Output Agents
messageinput ([aw-failures] Smoke Copilot safe_outputs red —dispatch_workflowto haiku-printer omits required inputmessage#41988 OPEN)EACCES mkdir /tmp/gh-aw/sandbox/firewall/logs, agent never invoked (rootless left [Content truncated due to length] #42398 OPEN)Quality Analysis
Output Quality Distribution
Common Quality Issues
Effectiveness Analysis
PR Merge Rate (copilot-swe-agent)
Task Completion Rates (Jul 1)
*Q and AI Moderator show action_required/skipped — likely deployment gates or event-filtering, not hard failures.
Behavioral Patterns
Productive Patterns
Problematic Patterns
gpt-5.5, SDK returns 400 "not accessible via /chat/completio [Content truncated due to length] #42444 closed Jun 30; [aw] PR Sous Chef hit HTTP 400 bad request #42652 opened Jul 1. Root cause likely misdiagnosed.Coverage Analysis
Well-Covered
Coverage Gaps
Engine Distribution (257 workflows)
Note: codex (6%) carries disproportionate failure risk. Consider migrating low-priority codex workflows to
copilot.Recommendations
High Priority
Systemic model version lifecycle management — #aw_model_lifecycle (filed this run)
Fix PR Sous Chef root cause ([aw] PR Sous Chef hit HTTP 400 bad request #42652)
pi(feat: switch pr-sous-chef to pi engine #42730) is live — validate next runHarness retry guard for non-retryable errors
Medium Priority
gpt-5-codex-alpha-2025-11-07404s (same alpha-snapshot d [Content truncated due to length] #42033, [aw-failures] PR Code Quality Reviewer red — Copilotgeneral-purposesubagent requests tier-unsupported model → SDK 400 `model [Content truncated due to length] #42095) — reduces P1 count by 2 immediatelyLow Priority
Trends
Overall trajectory: mild decline driven by PR Sous Chef recurrence and 5 new P2s. Copilot SWE Agent and core monitoring agents remain stable anchors.
Actions Taken This Run
agent-performance-latest.mdandshared-alerts.mdin shared memoryNext Steps
References:
Beta Was this translation helpful? Give feedback.
All reactions