[audit-workflows] [aw-audit] Daily Audit 2026-05-28 — 81.0% success, 4 new copilot-CLI silent failures, Avenger/Changeset still stuck #35583

2026-05-28T22:05:07Z

github-actions[bot]
Bot May 28, 2026

Summary

Metric	Today	Yesterday	Δ
Runs (completed / in-progress)	64 (58 / 6)	37 (33 / 4)	+27
Success rate	81.0%	90.9%	-9.9 pp
Failures	11	3	+8
Total tokens	48.9 M	17.3 M	+183%
Effective tokens	331.8 M	152.1 M	+118%
Cost (partial)	$29.35	$12.00	+145%
Action minutes	488	290	+68%
Firewall block rate	17.2%	24.9%	-7.7 pp

Headline

Mixed day. Success rate dipped to 81.0% on heavier volume, driven by three concurrent issue families:

NEW: silent copilot-CLI failure pattern — 4 runs exhausted all retries with no specific error category set. 3 of the 4 are gpt-5-mini; the fourth is gpt-5.3-codex.
Stuck criticals unactioned: Avenger max-turns (5 days) and Changeset Generator gpt-5.3-codex → 404 (9 days) each cost another day of failures.
Safe-output target=* pattern is spreading — now affects 3 distinct safe-output tools (was 1 yesterday).

Firewall block rate normalized (17.2% vs yesterday's 24.9% spike) — the Playwright FFmpeg / azureedge.net blocks did not recur. copilot CLI 1.0.52 remains stable for its original ENOENT / anthropic-beta failure modes (day 5).

Trends

Workflow Health (last 30 days)

The success-rate line (blue, right axis) recovered after the May 23 cliff (41.6%) and has oscillated 81–95% since. Today's 81.0% is the low end of that range. The run-volume stack shows today is the second-highest by failure count (red) in the window, with 11 failures — only May 23 (43 fails) was worse, and that incident was a CLI bug we have since resolved.

Token & Cost (last 30 days)

Tokens rebounded from yesterday's partial-day low (17 M) to 49 M today. Cost (purple line) hit $29.35, the highest single-day value in the window — driven primarily by Daily Safe Output Tool Optimizer at $11.33 (its third consecutive cost record). The 3-day MA is still rising.

Critical Issues

1. NEW: copilot-CLI 'all retries exhausted' with no error category (4 runs)

Affected: Smoke Copilot §26594967220 (gpt-5.3-codex), PR Sous Chef §26595665041, PR Sous Chef §26598471276, Auto-Triage Issues §26596900064 (last 3 all gpt-5-mini).

All four runs share an identical signature: copilot-harness attempts 1–4 fail with exitCode=1, and every known error-category flag is false (isCAPIError400=false isMCPPolicyError=false isModelNotSupportedError=false isNullTypeToolCallError=false isAuthError=false isAuthenticationFailedError=false). Output was produced (hasOutput=true) — meaning the actual error is in the captured copilot CLI stdout/stderr but isn't being classified by the harness.

PR Sous Chef went from 100% success yesterday (3/3) to 50% today (2/4) on the same gpt-5-mini model. Combined with the Auto-Triage failure, three of four failing runs are on gpt-5-mini.

Recommendation: Capture and post the actual copilot CLI stderr from the failing attempts, then either extend the harness error classifier with a named category or route gpt-5-mini workflows to a more reliable model temporarily.

2. STUCK 5 DAYS: Avenger isMaxTurnsExit (3 failures × 100% fail rate)

Affected: §26596698302, §26599358074, §26602359656.

All three hit [claude-harness] attempt 1 failed: isMaxTurnsExit=true. The Claude CLI is invoked with --max-turns 25, but the CI-failure-investigation task burns 16+ turns just to reach failure-detail grep. Each failure also opens a comment on cascade-tracker issue #35532, which is itself becoming noisy.

This recommendation has been active for 5 days unactioned. Cumulative wasted cost > $5. One-line fix: add max-turns: 50 to .github/workflows/avenger.md frontmatter and recompile.

3. STUCK 9 DAYS: Changeset Generator `gpt-5.3-codex` → 404 alpha

§26594967273. Codex CLI sent gpt-5-codex-alpha-2025-11-07 to proxy → 404. 4 retry attempts × 5 reconnects each, all failed. Same recommendation pinned for 9 days. One-line fix: change model: gpt-5.3-codex → model: gpt-5.4 in .github/workflows/changeset-generator.md.

Smoke Copilot also runs gpt-5.3-codex and failed in a related way today — worth reviewing whether that workflow should also be migrated.

4. Safe-output `target=*` pattern now spreading to 3 tools

Yesterday this was confined to add_comment (Contribution Check, 2 occurrences). Today it appears in:

Contribution Check §26603360040: add_comment failed (target=*, no item_number) × 1
Smoke Claude §26594967360: create_pull_request_review_comment failed (target=*, no pull_request_number) × 2
Smoke Codex §26594967309: set_issue_field failed (no issue number available) × 1 — same family

Tool-by-tool prompt fixes won't scale. Recommendation: add a single MCP-boundary validator that rejects any safe-output call whose target is * and whose workflow context cannot auto-resolve.

Other failures and minor issues

Daily Safe Output Tool Optimizer — escalating cost

3-day rising trend: $6.05 → $9.90 → $11.33 (+87%). Workflow succeeded but token consumption keeps climbing. Likely candidates for the growth source: prompt size increase, expanding scope, or accumulated history loaded into context. At this trajectory the single workflow exceeds $15/day within a week. Worth a prompt-diff investigation before next audit.

Resolved this cycle

Smoke Claude Playwright FFmpeg / azureedge.net timeout: did not recur today. No firewall blocks on Playwright CDN hosts.
Firewall block rate: normalized to 17.2% from yesterday's 24.9% spike.
Copilot CLI 1.0.52 stability: day 5 with 0 ENOENT and 0 anthropic-beta header errors across 41 copilot runs.

Top spenders today

By cost / tokens / minutes

Rank	Workflow	Cost	Eff Tokens	Action Min	Status
1	Daily Safe Output Tool Optimizer	$11.33	54.6 M	8	success
2	Design Decision Gate 🏗️	$4.41	17.9 M	27 (5 runs)	5/5
3	Daily Code Metrics and Trend Tracking Agent	$3.65	—	—	success
4	Smoke Claude	$2.59	—	18	fail
5	Daily Team Evolution Insights	$1.97	—	—	success
6	Copilot Agent PR Analysis	$1.86	—	—	success

Workflow	Eff Tokens	Runs
Daily Safe Output Tool Optimizer	54.6 M	1
Daily Testify Uber Super Expert	23.3 M	1
PR Code Quality Reviewer	22.9 M	5
Test Quality Sentinel	22.6 M	5
Matt Pocock Skills Reviewer	22.5 M	5
PR Sous Chef	21.0 M	4 (2 fail)
Linter Miner	19.6 M	1
Design Decision Gate 🏗️	17.9 M	5

Workflow	Minutes	Runs
Avenger	57	3 (3 fail)
PR Sous Chef	32	4 (2 fail)
Matt Pocock Skills Reviewer	30	5
PR Code Quality Reviewer	30	5
Test Quality Sentinel	28	5
Design Decision Gate 🏗️	27	5
Smoke CI	21	6
AI Moderator	19	1 (success on gpt-5.4)

Engine breakdown

Engine	Runs	Notes
copilot	41	1.0.52 stable; new "all retries exhausted no category" pattern in 4 runs
claude	17	3 Avenger max-turns fails; Smoke Claude safe-output target=* fail
codex	3	Changeset Generator still gpt-5.3 → 404; Smoke Codex set_issue_field fail; AI Moderator success on gpt-5.4
antigravity	1	stable
gemini	1	stable
pi	1	stable

Recommendations queue (top 4)

CRITICAL — Investigate copilot-CLI "all retries exhausted no category" by reading stdout from the 4 failing runs and adding a named harness classifier. Decide whether gpt-5-mini workflows should be routed elsewhere short-term. ([copilot-cli-all-retries-exhausted-no-category])
CRITICAL — Raise Avenger max-turns: 50 in .github/workflows/avenger.md. (5 days stuck.)
CRITICAL — Change Changeset Generator model: gpt-5.3-codex → gpt-5.4. Review Smoke Copilot's model choice. (9 days stuck.)
HIGH — Add a single MCP-boundary validator for safe-output calls with target: "*" and missing item/PR number across all affected tools.

References

This run: §26604152503
Avenger cascade tracker: #35532
Daily Safe Output Tool Optimizer cost spike: §26597857310

Generated by 🔍 Agentic Workflow Audit Agent · opus47 30.8M · ◷

expires on May 29, 2026, 10:05 PM UTC

2026-05-29T21:58:25Z

github-actions[bot]
Bot May 29, 2026
Author

This discussion has been marked as outdated by Agentic Workflow Audit Agent.

A newer discussion is available at Discussion #35800.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[audit-workflows] [aw-audit] Daily Audit 2026-05-28 — 81.0% success, 4 new copilot-CLI silent failures, Avenger/Changeset still stuck #35583

Uh oh!

{{title}}

Uh oh!

Daily Safe Output Tool Optimizer — escalating cost

Resolved this cycle

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[audit-workflows] [aw-audit] Daily Audit 2026-05-28 — 81.0% success, 4 new copilot-CLI silent failures, Avenger/Changeset still stuck #35583

Uh oh!

github-actions[bot] Bot May 28, 2026

Summary

Headline

Trends

Workflow Health (last 30 days)

Token & Cost (last 30 days)

Critical Issues

1. NEW: copilot-CLI 'all retries exhausted' with no error category (4 runs)

2. STUCK 5 DAYS: Avenger isMaxTurnsExit (3 failures × 100% fail rate)

3. STUCK 9 DAYS: Changeset Generator gpt-5.3-codex → 404 alpha

4. Safe-output target=* pattern now spreading to 3 tools

Daily Safe Output Tool Optimizer — escalating cost

Resolved this cycle

Top spenders today

Engine breakdown

Recommendations queue (top 4)

References

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 29, 2026 Author

github-actions[bot]
Bot May 28, 2026

3. STUCK 9 DAYS: Changeset Generator `gpt-5.3-codex` → 404 alpha

4. Safe-output `target=*` pattern now spreading to 3 tools

github-actions[bot]
Bot May 29, 2026
Author