[audit-workflows] 🔍 Agentic Workflow Audit — 2026-06-14 (rebound to 87%, PR Sous Chef recovered) #39289

2026-06-14T21:56:11Z

github-actions[bot]
Bot Jun 14, 2026

🔍 Agentic Workflow Audit — 2026-06-14

Window: ~7h (14:40–21:35Z) · 100 terminal runs analyzed (logs MCP timed out at 120s — partial window, consistent with recent days)

Headline: Strong rebound to 87.0% success after 06-13's 68.4% (worst since 05-23). PR Sous Chef fully recovered (10/10, was 6 fails — fix landed). All remaining genuine prod-main failures are known/recurring — no new prod-main failure class.

Key Metrics

Metric	Value
Success (overall)	87/100 = 87.0%
Success (prod-main)	47/53 = 88.7%
Success (PR/branch)	40/47 = 85.1%
Tokens	54.5M
Cost (claude-measured)	~$26.32
Turns / Action-min	1,302 / 1,118
missing_tools / missing_data / mcp_failures	0 / 0 / 0
Firewall blocked	177/1,756 = 10.1% (all by-design)

Engines: copilot 66 · claude 21 · codex 8 · antigravity 2 · gemini 2 · pi 2

📈 Trends

Success rate snapped back to 87% from the 06-13 trough (68.4%), landing close to the 30-day baseline (~85%). The rebound was driven mostly by PR Sous Chef recovering its 6-run failure cluster; the residual failures are a stable, well-characterized tail.

Token usage (54.5M this window) sits near the 7-day moving average — below the 06-12 126M spike. Top consumers were all successful: [aw] Failure Investigator (5.9M/$4.75), Safe Output Tool Optimizer (4.2M/$3.92), Daily Code Metrics (3.0M/$3.02).

✅ Resolved / Improved

PR Sous Chef — RESOLVED. 10/10 success on main (was 6 hard-fails on 06-13: update_branch-from-base + agent-startup-0tok). Fix branch copilot/aw-fix-pr-sous-chef-fail appears landed.
Daily Ambient Context Optimizer — clean. Ran under-cap (3.38M tok, success); the prod-main daily-AI-credits-429 risk is quiescent this window.
codex model-not-found (gpt-5-codex-alpha) and chroot-node errors not observed this window.

⚠️ Genuine prod-main failures (6) — all known/recurring

Details

Workflow	Engine	Class	Detail
Avenger ×3	claude/opus	`avenger-err-config-no-structured-logs` (day2)	3/3 reddened. 2 agents did real work + created PRs (`27502877407` created PR, $2.06/48 turns) then run reddened; `27508639846` explicit ERR_CONFIG 0-tok. Fix branch `copilot/aw-avenger-failed-fix` not yet effective.
Daily Formal Spec Verifier	copilot-sdk	`copilot-sdk-driver` tool-perm-lockout (day8)	denied read(`forecast-specification.md`), permissionDeniedCount=11, 5/5 abort, 24m30s wasted, not retried
Daily Safe Output Integrator	copilot-sdk	`copilot-sdk-driver` tool-perm-lockout (day8)	permissionDeniedCount=11, 19m30s, 5/5 abort
Documentation Unbloat	claude	`doc-unbloat-empty-output`	empty output / digest-mismatch, 0-tok; now reddens 2 days running

🧪 PR/branch failures (7) — all by-design smoke noise

Details

Smoke Gemini ×2 — gemini-3.1-flash-tts-preview has no AI-credits pricing (RECUR, count 4).
Smoke Copilot ×2 — 1× 0-tok startup; 1× (27507898911) burned 3.9M tok then CAPIError 429 Maximum AI credits exceeded (1015/1000) — daily-AI-credits cap resurfaced, but only on a heavy PR probe (account-wide cap).
Smoke Copilot AOAI (apikey + Entra) ×2 — 0-tok startup fails (transient-API class).
Agent Container Smoke Test ×1 — copilot 0-tok.

🎯 Top Open Recommendations

copilot-sdk-driver tool-permission-lockout (day8, HIGH) — longest-running unresolved prod-main class. The sdk-driver denies routine read-only ops the workflow legitimately needs (read of .md/spec/source files) then aborts at 5 denials with no retry, wasting 19–24 min/run. claude tolerates the same denials. Fix: allowlist these reads or relax the guard / retry-with-grant.
Avenger ERR_CONFIG no-structured-logs (day2, HIGH) — treat a follow-up engine invocation that produces no structured logs after a successful prior pass as success/no-op, or gate the 2nd invocation behind a "work remaining" check. Agents are doing real work (opening PRs) then being reddened. copilot/aw-avenger-failed-fix in flight but not yet effective.
EffectiveTokens = 0 for all runs (data-quality, recurring since 06-08) — the 25M effective-cap proximity remains unverifiable. Worth fixing the metric pipeline.

References: §27502877407 · §27504260264 · §27508844367

Generated by 🔍 Agentic Workflow Audit Agent · 278.1 AIC · ⌖ 25.8 AIC · ⊞ 7.4K · ◷

expires on Jun 15, 2026, 1:56 PM UTC-08:00

2026-06-14T23:43:44Z

github-actions[bot]
Bot Jun 14, 2026
Author

Smoke poke. Cave bot see latest discussion. Run 27515546240 grunt good.

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · 480.8 AIC · ⌖ 15.5 AIC · ⊞ 19.8K · ◷

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[audit-workflows] 🔍 Agentic Workflow Audit — 2026-06-14 (rebound to 87%, PR Sous Chef recovered) #39289

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[audit-workflows] 🔍 Agentic Workflow Audit — 2026-06-14 (rebound to 87%, PR Sous Chef recovered) #39289

Uh oh!

github-actions[bot] Bot Jun 14, 2026

🔍 Agentic Workflow Audit — 2026-06-14

Key Metrics

📈 Trends

✅ Resolved / Improved

⚠️ Genuine prod-main failures (6) — all known/recurring

🧪 PR/branch failures (7) — all by-design smoke noise

🎯 Top Open Recommendations

Replies: 1 comment

Uh oh!

github-actions[bot] Bot Jun 14, 2026 Author

github-actions[bot]
Bot Jun 14, 2026

github-actions[bot]
Bot Jun 14, 2026
Author