[audit-workflows] Agentic Workflow Audit — 2026-05-31 (89.8% completion, 6 failures, 2 recurrences strengthening) #36085

2026-05-31T08:09:51Z

github-actions[bot]
Bot May 31, 2026

Overview

Full 24h window (2026-05-31, 02:05–07:54Z): 62 runs — 59 completed, 3 still in-progress at audit time. Completion rate 89.8% (53 success / 6 failure), holding the recent high-80s/low-90s band. The six failures are five distinct classes — no single systemic regression — but two recurrence classes strengthened today and one is now the clear top priority.

safe-output partial-failure intolerance (HIGH) hit two more workflows today (Sub-Issue Closer, LintMonster) — the class now spans 4 workflows since 05-26. One failed safe-output item red-fails the entire job even when others succeeded.
token-budget 429 / [aw-failures] Token-budget exhaustion (25M effective-tokens cap) recurring across 6+ scheduled workflows — 2026-05-29 02:00–07:32 UTC #35661 (HIGH) recurred on a 2nd consecutive day with a 2nd workflow (Daily Firewall Logs Collector), same 25M effective-token-cap signature as 05-30's Linter Miner.
1 NEW infra class: threat-detection prompt.txt missing (Code Simplifier).
2 transient failures: a PR-branch-deleted race (external cause).

Key Metrics

Metric	Value
Runs (completed / in-progress)	59 / 3
Completion rate	89.8% (53/59)
Failures	6 (5 classes)
Tokens (raw / effective)	68.8M / 476.6M
Cost (claude-measured1)	$31.63
Turns	1,199
Firewall blocks	702 / 4,204 (16.7%)
Missing tools / data	0 / 0
Engines	copilot 44 · claude 17 · codex 1

1 Only the claude engine reports EstimatedCost; copilot/codex report $0, so total cost is claude-biased.

Critical Findings

🔴 1. Safe-output partial-failure intolerance — ESCALATING (top priority)

One invalid/failed safe-output item red-fails the whole safe_outputs job even when other items in the same batch succeeded. Two workflows today:

Sub-Issue Closer (26706621748): 11 add_comment items had target=* with no item_number on a schedule event → all 11 failed → entire job red despite 11 successful update_issue items.
LintMonster (26702419759): assign_to_agent for [community] Update community contributions in README #36048/[daily-compiler-quality] Daily Compiler Code Quality Report - 2026-05-31 #36049 returned GitHub API Request failed → 2 failed items red-failed the job despite 3 successful create_issue ([community] Update community contributions in README #36048–[lint-monster] [Lint] Fix pkg/workflow function length violations (286 issues) #36050) + 1 create_discussion.

Fix (HIGH): make Process Safe Outputs treat an individual failed item as skipped-with-warning when ≥1 item in the batch succeeded. Plus: (a) validate target=* with no resolvable number at the MCP emit boundary so the agent self-corrects in-loop (prompt-only guardrails have repeatedly failed); (b) investigate why assign_to_agent to copilot is failing via the API (endpoint/permission/eventual-consistency right after create_issue).

🔴 2. Token-budget 429 (#35661) — recurring, 2nd day / 2nd workflow

Daily Firewall Logs Collector (26702042593, copilot/sonnet): after 60 turns hit CAPIError: 429 Maximum effective tokens exceeded (25.41M / 25M), retried 4× with --continue (each re-hitting the cap), then all retries exhausted → exitCode=1 after 18m25s. Identical signature to 05-30's Linter Miner (25.13M/25M). Documentation Noob Tester reached 25.03M eff_tok and barely passed — at the cliff edge.

Fix (HIGH, #35661): (a) chunk/reduce scope of heavy daily-aggregation workflows to stay under 25M eff_tok; (b) harness should fail-fast on a budget-429 — retrying with --continue cannot recover a hard cap, it just burns ~90s ×4.

🟠 3. NEW — threat-detection prompt.txt missing

Code Simplifier (26703540010): the agent succeeded and produced a valid PR patch, but the detection job failed — the setup step copies prompt.txt with cp ... 2>/dev/null || true, which silently masked a missing source. The detection agent then hit Prompt file not found, exited 1, produced no THREAT_DETECTION_RESULT, and the parse step red-failed the run. Shared harness → any create_pull_request workflow with a missing prompt at detection time is exposed.

Fix: don't mask the cp failure; verify the prompt exists (non-empty) before invoking the detection agent and fail with a clear, actionable error or regenerate it.

🟡 Transient & lower-priority failures (2)

PR-branch-deleted race (external cause). Two PR-event runs fired on head branch copilot/update-agentic-workflows-and-skill-file which no longer existed on the remote:

Test Quality Sentinel (26700870332): fatal: couldn't find remote ref ... → git fetch exit 22, 0 turns.
Design Decision Gate (26700870336, claude): same missing branch → push_to_pull_request_branch failed at 21 turns.

Root cause is external — the Copilot PR branch was merged/deleted while these workflows were queued. Optional hardening: a checkout guard that exits neutral/skip on a deleted PR head ref instead of red-failing.

Trend Charts (30-day / 13-window)

Workflow Health

Completion rate sits at 89.8%, essentially flat vs 05-30's 91.1% and well above the 05-23 trough (41.6%). The trend has been stable in the high-80s/low-90s for eight consecutive windows — today's 6 failures are spread across 5 distinct classes rather than a single failing workflow, so the dip is noise, not a regression.

Token & Cost

Daily tokens rose to 68.8M (highest since 05-17) and claude-measured cost to $31.63, both pulled up by a single outlier: Go Logger Enhancement ($8.30 / 41.9M eff_tok / 117 turns) — the most expensive run of the day and a new daily high, but it succeeded, so this is value-for-cost rather than waste. The 7-day moving-average cost line stays in its usual ~$22–25 band.

Cost & token leaderboard

Top cost (claude-measured): Go Logger Enhancement $8.30 (success, new high) · Safe Output Health Monitor $5.24 · Sergo $4.29 · Static Analysis Report $3.11 · Design Decision Gate $1.62 (failed).

Top effective tokens: Go Logger Enhancement 41.9M · Daily Firewall Logs 25.4M (429-failed) · Documentation Noob Tester 25.0M (at-cap success) · Safe Output Health Monitor 23.8M · Copilot CLI Deep Research 23.1M · Daily Compiler Quality Check 22.9M.

Note the cluster of heavy daily-aggregation workflows brushing the 25M effective-token cap — the same population at risk of the #35661 budget-429.

Network / firewall

Block rate 16.7% (702/4,204), flat vs 05-30's 16.6% — stable band. Highest pressure: Documentation Noob Tester 125/247 (51%), dominated by browser/Playwright + Google telemetry (accounts.google.com, content-autofill.googleapis.com, etc.) — not workflow-affecting. No firewall block caused any of the 6 failures.

Carried / watch items

Avenger absent a 3rd consecutive window — --max-turns 25 fix still unverified (raise to ~50 recommendation stays open).
golangci-lint download flake (05-30) did not recur; keep the harden-download (verify+retry+checksum, Makefile:403) as a preventive.
cache-memory git setup fatal — 2nd consecutive clean window; resolved-for-now.
[aw] Failure Investigator early-exit-on-clean-window concern: this window's run was still in-progress at audit time; re-check next window.
Daily Safe Output Tool Optimizer cost watch — not in this window.

Recommendations (priority order)

(HIGH) Make Process Safe Outputs tolerate per-item failures when ≥1 item succeeded + validate target=* at the emit boundary + investigate assign_to_agent API failures. (safe-output partial-failure intolerance — 4 workflows since 05-26)
(HIGH, [aw-failures] Token-budget exhaustion (25M effective-tokens cap) recurring across 6+ scheduled workflows — 2026-05-29 02:00–07:32 UTC #35661) Chunk heavy daily-aggregation workflows under the 25M eff_tok cap and make the harness fail-fast on a budget-429 instead of retrying with --continue.
(MED) Fix the threat-detection harness: don't mask the prompt cp with || true; verify the prompt exists before running the detection agent.
(LOW) Optional checkout guard to skip-neutral on a deleted PR head ref; watch Go Logger Enhancement's cost tail; keep golangci-lint download hardening + Avenger max-turns open.

References:

§26702042593 — Daily Firewall Logs (token-budget 429)
§26706621748 — Sub-Issue Closer (safe-output partial-failure)
§26703540010 — Code Simplifier (threat-detection prompt missing)

Generated by 🔍 Agentic Workflow Audit Agent · opus48 4M · ◷

expires on Jun 1, 2026, 8:09 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[audit-workflows] Agentic Workflow Audit — 2026-05-31 (89.8% completion, 6 failures, 2 recurrences strengthening) #36085

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[audit-workflows] Agentic Workflow Audit — 2026-05-31 (89.8% completion, 6 failures, 2 recurrences strengthening) #36085

Uh oh!

github-actions[bot] Bot May 31, 2026

Overview

Key Metrics

Critical Findings

🔴 1. Safe-output partial-failure intolerance — ESCALATING (top priority)

🔴 2. Token-budget 429 (#35661) — recurring, 2nd day / 2nd workflow

🟠 3. NEW — threat-detection prompt.txt missing

Trend Charts (30-day / 13-window)

Workflow Health

Token & Cost

Recommendations (priority order)

Replies: 0 comments

github-actions[bot]
Bot May 31, 2026