[audit-workflows] Agentic Workflow Audit — 2026-06-13 (success 68.4%, lowest since 05-23; Avenger ×6 + PR Sous Chef ×6) #39155

2026-06-13T21:58:16Z

github-actions[bot]
Bot Jun 13, 2026

Agentic Workflow Audit — 2026-06-13

Audited the last ~7h of agentic runs (window 14:52–21:35Z; the logs MCP tool timed out at 120s, so analysis ran against 99 locally-downloaded agentic run directories out of 207 total). Overall success was 67/98 = 68.4% — the lowest single-window rate since 2026-05-23 (41.6%), with prod-main at 66.7%. The drop is almost entirely two NEW failure clusters: Avenger ×6 and PR Sous Chef ×6, which together account for ~12 of the ~14 genuine main-branch failures. Encouragingly, maintainers already have fix branches in flight for both (copilot/aw-avenger-failed-fix, copilot/aw-fix-pr-sous-chef-fail, copilot/replace-opaque-copilot-error). All PR-event failures were by-design smoke-probe noise.

Summary

Metric	Value
Agentic runs (terminal)	98 (+1 in-progress)
Success / Failure	67 / 31 — 68.4% ⚠️
Prod-main success	42 / 63 — 66.7%
Tokens	43.9M
Cost (claude-measured)	$22.62
Action minutes / Turns	1,126 / 1,157
missing_tools / missing_data / mcp_failures	0 / 0 / 0
Firewall blocked	373 / 6,453 (all smoke-probe, by-design)

Engine health: copilot 48/61 (78.7%) · claude 11/21 (52.4%, dragged down by Avenger) · codex 4/6 · pi 2/2 · antigravity 2/2 · gemini 0/2.

Critical Findings (NEW / escalating)

1. Avenger reddens ×6 — ERR_CONFIG: no structured log entries were produced (claude, HIGH)
The agent completes real work and emits a valid safe-output, then a follow-up engine invocation in the same job returns no structured logs, raising ERR_CONFIG that hard-fails the agent job and reddens the run. Token counts vary (34k/69k/0), consistent with partial completion. First appearance; this is the single biggest contributor to today's trough.

2. PR Sous Chef reddens ×6 (copilot/gpt-5-mini, MEDIUM) — two distinct modes:

×2 safe_outputs failure: update_pull_request(update_branch: true) → "update pull request branch from base failed" on PR [ARC/DinD] Emit chroot.binariesSourcePath and chroot.identity in AWF stdin-config #38911 (non-fast-forward / conflict). Agent succeeded; the safe_outputs job reddened.
×4 agent-startup failure: exit 1, 0 tokens, fails within seconds (HTTP 502 / no structured logs) — likely the same no-structured-logs class as Avenger.

Other prod-main failures (recurring / known)

Workflow	Engine	Class	Detail
Daily Formal Spec Verifier	copilot	`copilot-sdk-driver-failures` (day 7)	tool-perm-lockout, permissionDeniedCount=11
Daily SPDD Spec Planner	copilot	`copilot-sdk-driver-failures` (day 7)	permissionDeniedCount=13
Daily Safe Output Integrator	copilot	`copilot-sdk-driver-failures` (day 7)	denied read `.md` + `git checkout -b` + `git status` + `git diff`; called `missing_tool`
Daily Cache Strategy Analyzer	codex	`model-param-config-incompatibility` (recur ×4)	28× "Model not found" `gpt-5-codex-alpha` — same wf/signature as 06-11 & 06-05
Daily Issues Report Generator	copilot	`chroot-node-not-available` (recur ×2)	exit 127 command-not-found
Documentation Unbloat	claude	`doc-unbloat-empty-output`	exit 1, empty output / "MCP gateway logs missing" — was tolerated/green on 06-12, reddened today (inconsistent tolerance)

The copilot-sdk tool-permission-lockout continues into day 7 (3× today, down from 4× on 06-12) and remains the dominant known class. Nothing new there — the standing fix recommendation still applies.

PR-event / smoke failures (all by-design probe noise)

Smoke Gemini ×2 (gemini pricing missing, recur), Smoke Claude ×2, Smoke Codex ×1, Smoke Copilot ×1, Smoke Copilot AOAI apikey+Entra ×2 (transient-error loop, recur), PR Code Quality Reviewer ×3 (copilot-sdk, 0-tok), Design Decision Gate ×1. Firewall blocks (373/6,453) were 100% smoke-probe telemetry — by-design.

Trend Charts

Workflow Health (30d)

Success rate held a healthy 84–96% band through early June but dropped sharply to 68.4% today — the deepest trough since the 41.6% day on 05-23. The decline is concentrated rather than systemic: two workflows (Avenger, PR Sous Chef) caused the bulk of the failures, so a rebound toward ~90% is likely once their in-flight fix branches land.

Token Usage (30d)

Daily tokens landed at 43.9M, near the 7-day moving average and well below the 06-12 spike (~127M). Note today is a partial ~7h window, so the true daily figure is higher; even so, consumption is in normal range. Top consumers: Matt Pocock Skills Reviewer (5.5M/5 runs), Daily Code Metrics (4.8M), Test Quality Sentinel (4.7M/5), PR Sous Chef (4.1M/11 — much of it wasted on the failing runs).

Recommendations

(HIGH) Avenger no-structured-logs follow-up — treat a follow-up engine invocation that produces no structured logs after a successful prior pass as a success/no-op rather than ERR_CONFIG; or gate the 2nd invocation behind a "work remaining" check. Verify copilot/aw-avenger-failed-fix covers this case.
(MEDIUM) PR Sous Chef — make update_pull_request(update_branch: true) tolerate non-fast-forward/conflict (skip-and-warn instead of reddening the safe_outputs job); the 0-token startup mode shares the Avenger no-structured-logs root. Confirm copilot/aw-fix-pr-sous-chef-fail handles both modes.
(HIGH, day 7) copilot-sdk tool-permission-lockout — standing recommendation unchanged: broaden the sdk-driver default allow-list for git read/branch ops + repo source/markdown reads, or relax the 5-denial hard-abort to soft-skip-and-continue (the claude engine tolerates the same denials).
(LOW) Documentation Unbloat — make empty-output handling deterministic (both green or both fail); today's inconsistency vs 06-12 makes the signal unreliable.

Repo memory updated (audit-history, known-issues +4 new IDs, recommendations, anomalies, metrics-summary) and pushed.

References: §27470367219 (Avenger) · §27471203716 (PR Sous Chef) · §27476058710 (Safe Output Integrator)

Generated by 🔍 Agentic Workflow Audit Agent · 337.7 AIC · ⌖ 31.8 AIC · ⊞ 8K · ◷

expires on Jun 14, 2026, 1:58 PM UTC-08:00

2026-06-13T22:58:36Z

github-actions[bot]
Bot Jun 13, 2026
Author

Smoke test ping: Copilot reached this thread, left a breadcrumb, and kept the lights green. ✅

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · 218.9 AIC · ⌖ 15.8 AIC · ⊞ 20.4K · ◷

0 replies

2026-06-14T21:56:12Z

github-actions[bot]
Bot Jun 14, 2026
Author

This discussion has been marked as outdated by Agentic Workflow Audit Agent.

A newer discussion is available at Discussion #39289.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[audit-workflows] Agentic Workflow Audit — 2026-06-13 (success 68.4%, lowest since 05-23; Avenger ×6 + PR Sous Chef ×6) #39155

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[audit-workflows] Agentic Workflow Audit — 2026-06-13 (success 68.4%, lowest since 05-23; Avenger ×6 + PR Sous Chef ×6) #39155

Uh oh!

github-actions[bot] Bot Jun 13, 2026

Agentic Workflow Audit — 2026-06-13

Summary

Critical Findings (NEW / escalating)

Trend Charts

Workflow Health (30d)

Token Usage (30d)

Recommendations

Replies: 2 comments

Uh oh!

github-actions[bot] Bot Jun 13, 2026 Author

Uh oh!

github-actions[bot] Bot Jun 14, 2026 Author

github-actions[bot]
Bot Jun 13, 2026

github-actions[bot]
Bot Jun 13, 2026
Author

github-actions[bot]
Bot Jun 14, 2026
Author