[audit-workflows] Agentic Workflow Audit — 2026-06-20: Skillet 100% failure spike (fleet 94.2% ex-Skillet) #40516
Replies: 1 comment
-
|
Smoke tap discussion rock. Run 27885948833 say hi. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Agentic Workflow Audit — 2026-06-20
Window: 2026-06-19 21:56Z → 2026-06-20 21:16Z (~23.3h) · 395 runs ·
github/gh-aw🍳 Skillet — fleet-wide 100% failure (NEW, P0)
Skillet (
private: true, merged in HEAD894b94c) is a slash-command PR-review workflow. In 24h it produced 103 runs, all failures, every one 0-token / null-turn — meaning the run fails at pre-activation/setup, before the agent ever executes.slash_commandtrigger (strategy: centralized, PR comment events), and the compiled lock at HEAD carries onlyworkflow_dispatch. Yet every observed run carriesevent: pushand spans every branch — including direct pushes tomain. The trigger/dispatcher wiring does not match the declared intent.copilot/*branches (e.g.copilot/fail-avenger-engine-resolution,copilot/update-conclusion-job-aggregate-data).Recommended actions (P0)
main).mainuntil stable, so in-progress dev iteration stops generating fleet-wide red runs.Owner appears to be actively iterating (many
copilot/*branches in flight).Other failures (17 total — all known-recurring or by-design)
Breakdown
avenger-err-config-no-structured-logscopilot-sdk-tool-perm-lockoutcopilot-agent-artifact-missing-0tokchroot-node/ 0-tokAvenger is now the #1 non-Skillet prod-main offender: 100% of its scheduled runs fail (0-tok config error), escalating day-over-day for three days. The 13-day-old fix branch remains ineffective — needs owner attention.
Capability signals
close_discussion(legit gap — no such safe-output; Daily Project Performance Summary wants to close prior daily discussions) andtool/permission(the Daily Safe Output Integrator sdk-lockout). missing_data = 0, mcp_failures = 0.📊 Trends (30 days)
The 06-20 dip to 69.6% is entirely a Skillet artifact — the dashed line marks the true ex-Skillet rate of 94.2%, squarely within the 30-day 85–95% band. Failure counts spike because Skillet emits ~100 red runs, but genuine prod reliability is unchanged from prior days.
Token usage for 06-20 is partial/undercounted — the logs tool's 120s timeout forced paginated
count-limited fetches, so not all successful-run usage artifacts were downloaded, and the 103 Skillet failures contribute 0 tokens each. Treat today's bar as a floor, not a true daily total.Next actions
mainuntil stable.ERR_CONFIG(recur 13, escalating to x8/day, 100% fail).Data caveats
logscall times out at the server-side 120s limit (recurring tooling friction) → used 6 paginatedcount≤80batches (deduped by run_id).engine_countsunavailable (aw_info.json artifacts not fetched); engines noted from source (Skillet=copilot, Avenger=claude).References: §27878145653 (Skillet/main) · §27883508007 (Avenger) · §27880811834 (Safe Output Integrator)
Beta Was this translation helpful? Give feedback.
All reactions