[audit-workflows] Agentic Workflow Audit — 2026-06-11 (prod-main 93.1%; 2 recurring config fails + new gemini pricing gap) #38734

2026-06-11T22:07:04Z

github-actions[bot]
Bot Jun 11, 2026

Agentic Workflow Audit — 2026-06-11

Evening-cluster audit (~3.5h window, 18:20–21:51Z). The logs MCP tool timed out at 120s (recurring), so 67 of 167 run directories were fully processed before the cutoff; the other 100 were enumerated-but-empty (non-agentic CI jobs / not-yet-downloaded). All numbers below are for the 67 fully-observed agentic runs.

Headline: Production main is healthy at 93.1% (27/29 terminal). Both main failures are recurring, known config issues — nothing new reddened prod main. Overall success is 84.6% (55/65); the drag is on PR/feature-branch runs (77.8%), driven by smoke probes surfacing config gaps and the daily-budget gate.

Summary

Scope	Success / Terminal	Rate
Prod `main`	27 / 29	93.1% ✅
PR / feature-branch	28 / 36	77.8% ⚠️
Overall	55 / 65	84.6%

Tokens: 25.4M · Cost (claude-measured): $14.10 · Action-minutes: 736 · Turns: 660
**Errors / Missing-tools / Missing-(redacted) 0 / 0 / 0 · Safe items: 42
Engines: copilot 37 · claude 13 · codex 7 · gemini 2 · antigravity 2 · pi 2 · unknown 4
New engines this window: antigravity 2/2 ✅ and pi 2/2 ✅ (pi runs on a copilot/gpt-5.4 backend); gemini 0/2 ❌ (see below).

Critical Findings

1. Prod-main failure — codex model-not-found (RECURRING, recur #3)
Daily Cache Strategy Analyzer (codex) hard-fails on 404 Model not found gpt-5-codex-alpha-2025-11-07 via the api-proxy /responses endpoint. Same workflow failed identically on 06-05. The codex model id is not in the api-proxy routing/pricing table.

2. Prod-main failure — copilot-sdk tool-permission lockout (RECURRING, recur #5)
Daily Safe Output Integrator (copilot-sdk-driver) hit guard.tool_denials_exceeded 5/5 and aborted after 25.2 min. Denied calls were routine read-only inspection: git diff/git status, make agent-report-progress, sed, and reading a *_test.go file. permissionDeniedCount=11; correctly classified as missing-tool (not retried) — but a 25-min run was wasted on a git status denial.

3. NEW — Gemini model has no AI-credits pricing
Smoke Gemini failed 2/2 with ApiError: unknown_model_ai_credits for model gemini-3.1-flash-tts-preview ("has no AI credits pricing and no default pricing is configured"). The api-proxy AWF config lacks both a pricing entry for this model and an apiProxy.defaultAiCreditsPricing fallback, so any unknown gemini model hard-fails.

4. NEW (by-design, UX issue) — daily effective-workflow cap reddens PR runs
Between ~20:05–20:51Z the daily effective-workflow budget cap tripped (daily_effective_workflow_exceeded == 'true'). This failed the activation job for 3× Matt Pocock Skills Reviewer + 1× Test Quality Sentinel (all PR-triggered): agent.if saw success()==false, so the agent was correctly skipped (zero token waste) — but the run still concluded failure, showing false red on the PRs.

Full failure breakdown (10 failures + 2 in-progress)

Workflow	Engine	Scope	Class	Note
Daily Cache Strategy Analyzer	codex	main	model-param-config	404 `gpt-5-codex-alpha-2025-11-07` (recur 06-05)
Daily Safe Output Integrator	copilot-sdk	main	copilot-sdk-driver	tool-denials 5/5, 25.2m wasted
Smoke Gemini ×2	gemini	PR	gemini-pricing-missing	`gemini-3.1-flash-tts-preview` no pricing
Matt Pocock Skills Reviewer ×3	—	PR	daily-cap-gate	agent skipped, run reddened
Test Quality Sentinel ×1	—	PR	daily-cap-gate	agent skipped, run reddened
Smoke Copilot - AOAI (apikey)	copilot/o4-mini-aw	branch	aoai-transient-loop	73 turns / 2.15M tok then fail
Smoke Copilot	copilot/gpt-5.4	branch	agent-ok-run-reddened	agent exit0 + created issue, run=failure

In-progress at audit time: this audit agent (claude) + Daily Regulatory Report Generator (copilot).

Cost & Token Anomalies

Smoke Copilot - AOAI (apikey) burned 2.15M tokens / 73 turns in a transient API error. Retrying... loop before failing — largest failed-run token spend this window. Needs a retry/token circuit-breaker.
Matt Pocock Skills Reviewer (the successful run) was the top consumer at 5.04M tokens / 66 turns — ~20% of all tokens in a single run. Worth tracking for budget.

Trends (last 30 days)

Success rate has held in a stable 80–96% band for two weeks (today 84.6% on a small evening window), well clear of the 41.6% trough on 05-23. The observed-run count swings with window size, not with health — prod-main reliability remains the steadier signal at 93.1% today.

Daily token volume tracks observed-window size (peaks on the 174-run 05-25 and 69M-token 06-10 days). The 7-day moving average sits near ~35M; today's 25.4M is a partial-window reading, not a real decline. Per-run token outliers (5M / 2.15M above) are a bigger budget lever than daily totals.

Recommendations

(High) Pin Daily Cache Strategy Analyzer to a codex model present in the api-proxy table, or register gpt-5-codex-alpha-2025-11-07 — 2nd identical failure.
(High) Add gemini-3.1-flash-tts-preview to the api-proxy pricing table or set apiProxy.defaultAiCreditsPricing so unknown gemini models degrade gracefully.
(High) Allowlist read-only inspection commands (git diff/status, make agent-report-progress, sed) for Daily Safe Output Integrator, or relax the 5-denial guard for read-only ops.
(Medium) Surface daily-cap gating as a neutral/skipped conclusion instead of failure so PR authors don't see false red.
(Medium) Add a max-token/retry circuit-breaker to the AOAI smoke probe to cap transient-error loops (this one cost 2.15M tokens).

Audit Notes

The logs MCP tool 120s timeout is itself recurring and bounds every audit to a partial window — a higher timeout or paginated fetch would materially improve coverage.
Repo memory updated: audit-history, metrics-summary, known-issues (recur counters bumped, 3 new ids), anomalies (2 new), recommendations (5 new), workflow-trends. Validated at 57 KB (limit 60 KB).

References:

§27371497875 — Daily Cache Strategy Analyzer (codex 404)
§27372008864 — Daily Safe Output Integrator (tool-denials)
§27368963230 — Smoke Gemini (pricing missing)

Generated by 🔍 Agentic Workflow Audit Agent · 331.4 AIC · ⌖ 31.8 AIC · ⊞ 8K · ◷

expires on Jun 12, 2026, 2:07 PM UTC-08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[audit-workflows] Agentic Workflow Audit — 2026-06-11 (prod-main 93.1%; 2 recurring config fails + new gemini pricing gap) #38734

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[audit-workflows] Agentic Workflow Audit — 2026-06-11 (prod-main 93.1%; 2 recurring config fails + new gemini pricing gap) #38734

Uh oh!

github-actions[bot] Bot Jun 11, 2026

Agentic Workflow Audit — 2026-06-11

Summary

Critical Findings

Cost & Token Anomalies

Trends (last 30 days)

Recommendations

Audit Notes

Replies: 0 comments

github-actions[bot]
Bot Jun 11, 2026