[audit-workflows] [aw-audit] Daily Audit 2026-05-28 — 81.0% success, 4 new copilot-CLI silent failures, Avenger/Changeset still stuck #35583
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agentic Workflow Audit Agent. A newer discussion is available at Discussion #35800. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Headline
Mixed day. Success rate dipped to 81.0% on heavier volume, driven by three concurrent issue families:
gpt-5-mini; the fourth isgpt-5.3-codex.max-turns(5 days) and Changeset Generatorgpt-5.3-codex → 404(9 days) each cost another day of failures.target=*pattern is spreading — now affects 3 distinct safe-output tools (was 1 yesterday).Firewall block rate normalized (17.2% vs yesterday's 24.9% spike) — the Playwright FFmpeg /
azureedge.netblocks did not recur.copilot CLI 1.0.52remains stable for its original ENOENT /anthropic-betafailure modes (day 5).Trends
Workflow Health (last 30 days)
The success-rate line (blue, right axis) recovered after the May 23 cliff (41.6%) and has oscillated 81–95% since. Today's 81.0% is the low end of that range. The run-volume stack shows today is the second-highest by failure count (red) in the window, with 11 failures — only May 23 (43 fails) was worse, and that incident was a CLI bug we have since resolved.
Token & Cost (last 30 days)
Tokens rebounded from yesterday's partial-day low (17 M) to 49 M today. Cost (purple line) hit $29.35, the highest single-day value in the window — driven primarily by Daily Safe Output Tool Optimizer at $11.33 (its third consecutive cost record). The 3-day MA is still rising.
Critical Issues
1. NEW: copilot-CLI 'all retries exhausted' with no error category (4 runs)
Affected: Smoke Copilot §26594967220 (gpt-5.3-codex), PR Sous Chef §26595665041, PR Sous Chef §26598471276, Auto-Triage Issues §26596900064 (last 3 all gpt-5-mini).
All four runs share an identical signature:
copilot-harnessattempts 1–4 fail withexitCode=1, and every known error-category flag isfalse(isCAPIError400=false isMCPPolicyError=false isModelNotSupportedError=false isNullTypeToolCallError=false isAuthError=false isAuthenticationFailedError=false). Output was produced (hasOutput=true) — meaning the actual error is in the captured copilot CLI stdout/stderr but isn't being classified by the harness.PR Sous Chef went from 100% success yesterday (3/3) to 50% today (2/4) on the same
gpt-5-minimodel. Combined with the Auto-Triage failure, three of four failing runs are ongpt-5-mini.Recommendation: Capture and post the actual
copilotCLI stderr from the failing attempts, then either extend the harness error classifier with a named category or routegpt-5-miniworkflows to a more reliable model temporarily.2. STUCK 5 DAYS: Avenger isMaxTurnsExit (3 failures × 100% fail rate)
Affected: §26596698302, §26599358074, §26602359656.
All three hit
[claude-harness] attempt 1 failed: isMaxTurnsExit=true. The Claude CLI is invoked with--max-turns 25, but the CI-failure-investigation task burns 16+ turns just to reach failure-detail grep. Each failure also opens a comment on cascade-tracker issue #35532, which is itself becoming noisy.This recommendation has been active for 5 days unactioned. Cumulative wasted cost > $5. One-line fix: add
max-turns: 50to.github/workflows/avenger.mdfrontmatter and recompile.3. STUCK 9 DAYS: Changeset Generator
gpt-5.3-codex→ 404 alpha§26594967273. Codex CLI sent
gpt-5-codex-alpha-2025-11-07to proxy → 404. 4 retry attempts × 5 reconnects each, all failed. Same recommendation pinned for 9 days. One-line fix: changemodel: gpt-5.3-codex→model: gpt-5.4in.github/workflows/changeset-generator.md.Smoke Copilot also runs
gpt-5.3-codexand failed in a related way today — worth reviewing whether that workflow should also be migrated.4. Safe-output
target=*pattern now spreading to 3 toolsYesterday this was confined to
add_comment(Contribution Check, 2 occurrences). Today it appears in:add_commentfailed (target=*, no item_number) × 1create_pull_request_review_commentfailed (target=*, no pull_request_number) × 2set_issue_fieldfailed (no issue number available) × 1 — same familyTool-by-tool prompt fixes won't scale. Recommendation: add a single MCP-boundary validator that rejects any safe-output call whose target is
*and whose workflow context cannot auto-resolve.Other failures and minor issues
Daily Safe Output Tool Optimizer — escalating cost
3-day rising trend: $6.05 → $9.90 → $11.33 (+87%). Workflow succeeded but token consumption keeps climbing. Likely candidates for the growth source: prompt size increase, expanding scope, or accumulated history loaded into context. At this trajectory the single workflow exceeds $15/day within a week. Worth a prompt-diff investigation before next audit.
Resolved this cycle
Top spenders today
By cost / tokens / minutes
Engine breakdown
Recommendations queue (top 4)
gpt-5-miniworkflows should be routed elsewhere short-term. ([copilot-cli-all-retries-exhausted-no-category])max-turns: 50in.github/workflows/avenger.md. (5 days stuck.)model: gpt-5.3-codex→gpt-5.4. Review Smoke Copilot's model choice. (9 days stuck.)target: "*"and missing item/PR number across all affected tools.References
Beta Was this translation helpful? Give feedback.
All reactions