[audit-workflows] Agentic Workflow Audit — 2026-05-21 (24h) #33873
Replies: 1 comment
-
|
💥 WHOOSH! 🦸♂️ The Smoke Test Agent rockets in with a sonic BOOM! 🚀 KA-POW! Claude engine smoke test 26256118523 zipped through the skies, dodged the firewall, vanquished the build errors, and landed safely on the runway! ✨
Cape flapping, the agent leaves a glowing trail of green checkmarks ✅✅✅ and vanishes into the next workflow run... 🌟 THWIP! 🕸️ See you in the next adventure! Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Last 24h: 41 completed runs (+ 5 in-progress), 4 failures, 1 cancelled, success rate 75.6% (down from 85.9% yesterday). The decline is driven by a 100% codex engine outage — all 4 production codex runs failed with
invalid_request_erroron modelgpt-5.5andMissing environment variable: OPENAI_API_KEYon retry. Yesterday's fix (fix-codex-openai-key-and-model) did not stick: the model name changed (wasgpt-5-codex-alpha-2025-11-07, nowgpt-5.5) but it is still invalid. Total spend $19.96 over 35.9M tokens / 338 action-minutes.Health Summary
Engines: copilot 24, claude 9, codex 6 (4 real + 1 smoke + 1 in-progress), gemini 1, pi 1. Three runs had no engine_id resolved (Deployment Incident Monitor x2, Q x1 — all very short, likely activation-only).
Critical Issues
🔴 Codex engine: 100% production failure (4/4)
api-proxy:10000Evidence: entrypoint logs
Unset OPENAI_API_KEY from /proc/1/environandUnset CODEX_API_KEY from /proc/1/environ, then attempt 1 returnsinvalid_request_error, retries fail withMissing environment variable: OPENAI_API_KEY. AI Moderator alone burned 15.7 min wall before giving up (12.1 min pre_activation + 1.9 min agent retries).Action: pin codex workflows to a valid model and ensure secret wiring survives the entrypoint scrub for retry attempts. The Changeset Generator path through internal proxy needs separate verification.
🟡 Daily Safe Output Tool Optimizer cost spike
Single run §26253685911-equivalent: $8.54, 12M tokens, 129 turns — 43% of today's total spend. Previous 3-run avg was ~$4.53. Either scope expanded or agent is looping. Worth a tool-call sequence inspection.
🟡 Smoke CI cancellation persists
1/5 Smoke CI runs cancelled at 2m wall (§26249207824). Same pattern as 05-19 / 05-20 — agent budget exhaustion. Known issue
smoke-ci-agent-timeoutnow persisting 3 days.Trend Charts
Success rate fell from a 92.8 → 88.9 → 85.9 → 75.6% trajectory. Today's failure count (5 including cancel) is in line with prior days, but on a much smaller denominator (41 vs 78), so the rate suffers. Failures are concentrated entirely in the codex engine outage.
Daily cost trends ~$20–$36 range with today at the low end despite a notable single-run spike. Tokens-per-dollar improving slightly (1.8M/$ today vs 1.93M/$ on 05-17 — token use is more efficient at smaller scale).
Top 10 cost drivers (24h)
(Zero-cost rows are copilot-engine runs where cost is not reported in run summary.)
Firewall: 18% block rate across 17 workflows
Total: 2,108 requests, 1,723 allowed, 385 blocked (18.3%).
Top blocked patterns:
(unknown)host: 298 blocked (likely DNS-failed lookups inside containers)api-proxy:10002: 20/20 (100%) — Smoke Picontent-autofill.googleapis.com,www.google.com,accounts.google.com,safebrowsingohttpgateway.googleapis.com): ~45 combined — Smoke Copilot/Claude browser probeslocalhost:8080: 15 — Smoke Gemini local-proxy probePer-workflow block rate:
DIFC integrity filtering (13 events)
13 GitHub MCP
list_issues/search_issuescalls were filtered because target issues have lower integrity than agents require. Affected: #33436, #33597, #33640, #33787, #33847 (x2), #33777, #32446, #33605, #33649. Tags:none:all/unapproved:all. This is expected DIFC behavior — verify agents do not loop on the empty result.MCP tool usage (24h)
All tool calls show
status=unknownin summary (telemetry capture issue — does not indicate failure). Nomissing_toolevents flagged.Known-Issue Status
Recommendations
gpt-5etc.) and ensure the entrypoint does not scrub the secret needed by retry attempts. Yesterday's patch did not solve the issue.api-proxy:10000, modelgpt-5.4-mini) — needs separate verification.api-proxy:10002or remove the probe.References:
Beta Was this translation helpful? Give feedback.
All reactions