[audit-workflows] Agentic Workflow Audit — 2026-06-11 (prod-main 93.1%; 2 recurring config fails + new gemini pricing gap) #38734
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Agentic Workflow Audit — 2026-06-11
Evening-cluster audit (~3.5h window, 18:20–21:51Z). The
logsMCP tool timed out at 120s (recurring), so 67 of 167 run directories were fully processed before the cutoff; the other 100 were enumerated-but-empty (non-agentic CI jobs / not-yet-downloaded). All numbers below are for the 67 fully-observed agentic runs.Headline: Production
mainis healthy at 93.1% (27/29 terminal). Bothmainfailures are recurring, known config issues — nothing new reddened prod main. Overall success is 84.6% (55/65); the drag is on PR/feature-branch runs (77.8%), driven by smoke probes surfacing config gaps and the daily-budget gate.Summary
mainantigravity2/2 ✅ andpi2/2 ✅ (pi runs on acopilot/gpt-5.4backend);gemini0/2 ❌ (see below).Critical Findings
1. Prod-main failure — codex model-not-found (RECURRING, recur #3)
Daily Cache Strategy Analyzer(codex) hard-fails on404 Model not found gpt-5-codex-alpha-2025-11-07via the api-proxy/responsesendpoint. Same workflow failed identically on 06-05. The codex model id is not in the api-proxy routing/pricing table.2. Prod-main failure — copilot-sdk tool-permission lockout (RECURRING, recur #5)
Daily Safe Output Integrator(copilot-sdk-driver) hitguard.tool_denials_exceeded5/5 and aborted after 25.2 min. Denied calls were routine read-only inspection:git diff/git status,make agent-report-progress,sed, and reading a*_test.gofile.permissionDeniedCount=11; correctly classified as missing-tool (not retried) — but a 25-min run was wasted on agit statusdenial.3. NEW — Gemini model has no AI-credits pricing
Smoke Geminifailed 2/2 withApiError: unknown_model_ai_creditsfor modelgemini-3.1-flash-tts-preview("has no AI credits pricing and no default pricing is configured"). The api-proxy AWF config lacks both a pricing entry for this model and anapiProxy.defaultAiCreditsPricingfallback, so any unknown gemini model hard-fails.4. NEW (by-design, UX issue) — daily effective-workflow cap reddens PR runs
Between ~20:05–20:51Z the daily effective-workflow budget cap tripped (
daily_effective_workflow_exceeded == 'true'). This failed the activation job for 3×Matt Pocock Skills Reviewer+ 1×Test Quality Sentinel(all PR-triggered):agent.ifsawsuccess()==false, so the agent was correctly skipped (zero token waste) — but the run still concludedfailure, showing false red on the PRs.Full failure breakdown (10 failures + 2 in-progress)
gpt-5-codex-alpha-2025-11-07(recur 06-05)gemini-3.1-flash-tts-previewno pricingIn-progress at audit time: this audit agent (claude) + Daily Regulatory Report Generator (copilot).
Cost & Token Anomalies
Smoke Copilot - AOAI (apikey)burned 2.15M tokens / 73 turns in atransient API error. Retrying...loop before failing — largest failed-run token spend this window. Needs a retry/token circuit-breaker.Matt Pocock Skills Reviewer(the successful run) was the top consumer at 5.04M tokens / 66 turns — ~20% of all tokens in a single run. Worth tracking for budget.Trends (last 30 days)
Success rate has held in a stable 80–96% band for two weeks (today 84.6% on a small evening window), well clear of the 41.6% trough on 05-23. The observed-run count swings with window size, not with health — prod-main reliability remains the steadier signal at 93.1% today.
Daily token volume tracks observed-window size (peaks on the 174-run 05-25 and 69M-token 06-10 days). The 7-day moving average sits near ~35M; today's 25.4M is a partial-window reading, not a real decline. Per-run token outliers (5M / 2.15M above) are a bigger budget lever than daily totals.
Recommendations
Daily Cache Strategy Analyzerto a codex model present in the api-proxy table, or registergpt-5-codex-alpha-2025-11-07— 2nd identical failure.gemini-3.1-flash-tts-previewto the api-proxy pricing table or setapiProxy.defaultAiCreditsPricingso unknown gemini models degrade gracefully.git diff/status,make agent-report-progress,sed) forDaily Safe Output Integrator, or relax the 5-denial guard for read-only ops.failureso PR authors don't see false red.Audit Notes
logsMCP tool 120s timeout is itself recurring and bounds every audit to a partial window — a higher timeout or paginated fetch would materially improve coverage.audit-history,metrics-summary,known-issues(recur counters bumped, 3 new ids),anomalies(2 new),recommendations(5 new),workflow-trends. Validated at 57 KB (limit 60 KB).References:
Beta Was this translation helpful? Give feedback.
All reactions