[workflow-analysis] Weekly Workflow Analysis (2026-05-11 → 2026-05-18) #33009
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-05-19T10:51:03.892Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Analysis of the 52 workflow runs executed in
github/gh-awduring the past 7 days (≈ last 25h of capture window, the older end was truncated by the logs tool's 73s timeout —before_run_id=26016351895is the continuation cursor).Critical Issues
1.
Daily News— exit 127, Node.js missing in AWF chroot (§26026325226)The Copilot harness failed before the agent even started:
The entrypoint hint says "verify the install path is present and bind-mounted into /host" — this is a sandbox bootstrap regression, not user error, and will recur until the chroot mounts Node from
setup-node//opt/hostedtoolcache. This is the 100% failure-rate workflow flagged by the reliability hotspot insight.2.
Multi-Device Docs Tester—error_max_turnsat 80 turns (§26017866628)Claude burnt all 80 turns and $3.82 driving
playwright-clione micro-step at a time (26×run-code, 11×goto, 6×resize). SevenBashcalls to start the Astro dev server were permission-denied — including a literalmkdir -p /tmp/gh-aw/agent— becausenohup ... &andrun_in_backgroundaren't on the allowlist. The agent never recovered. Root cause: dev-server bring-up was blocked, so the agent kept retrying. Addnohup/background-shell or a dedicatedstart-dev-serverscript to the workflow's allowed commands, or raisemax_turnsfor this workflow (currently the lowest of the failures at $0.048/turn).3.
Daily Rendering Scripts Verifier— detection job parse_error (§26023308972)The agent succeeded (posted a clean no-op message). The post-agent detection step failed with
GH_AW_DETECTION_REASON=parse_errorandGH_AW_EFFECTIVE_TOKENS_RATE_LIMIT_ERROR=true. Despite that, this run cost $5.73 — the most expensive of the week — and still got marked as a failure. Worth investigating whetherparse_erroris misclassifying a successful no-op as a failure.4.
Static Analysis Report— step exit 1 (§26016351895)Two Docker images don't support the
--versionflag the workflow expects:Not fatal on its own (the workflow only logs
Warning:), but the run also surfaces real lint findings the static analyzer flags as errors (copilot-requestsis not a valid permission scope, shellcheck SC2016/SC2086, unknownqueuekey inconcurrency). The exit 1 is from the analyzer's intended behavior — these are lint debt to fix in.github/workflows/*.lock.yml.Observability: OTLP exports are dead
Every run with OTEL configured logs:
This appears in
Daily Rendering Scripts Verifier,Static Analysis Report, and others. The four shared secrets (GH_AW_OTEL_SENTRY_*,GH_AW_OTEL_GRAFANA_*) are either rotated, scoped wrong, or invalid. No traces are reaching Sentry/Grafana — observability is blind for the entire fleet. Fix the credentials or remove the OTLP step until they're sorted; the 3-retry loop is just adding latency to every conclusion job.Network friction — top firewall hotspots
Total: 2,019 requests, 611 blocked (30.3%) across 13 blocked domains.
Noteworthy blocked domains:
proxy.golang.org:443— Go module proxy, 7 blocks (relevant forGo Fan,Schema Consistency Checker, etc.) — likely the cause of theDev91% block rate. Either add to the firewall allowlist or pre-vendor Go modules.productionresultssa*.blob.core.windows.net:443— 14 blocks across 7 Azure blob hosts. These are GitHub Actions artifact upload endpoints; the egress firewall is stripping them. Pattern-allowlist*.blob.core.windows.netfor actions runners.api-proxy:10000,api-proxy:10002— 20 blocks to the internal Copilot proxy. Looks like attempts on the wrong port.mtalk.google.com:5228— Google Cloud Messaging push (1 block) — harmless background leak from Chromium/Playwright.traces.example.com:4317— placeholder still hard-coded somewhere in OTEL config.invalid.example.invalid:443— looks like a test/negative-control entry.Cost & performance — top 10 by spend
All spend is on Claude; Copilot runs report
$0.00. Sample (52 runs) shows $30.15 across 15 Claude runs ≈ $2.01/run average, but distribution is heavily skewed:Longest wall-clock runs:
daily-experiment-report20.0m,Daily Syntax Error Quality Check18.4m,Copilot Session Insights18.1m,Daily Rendering Scripts Verifier16.9m. All four use heavy MCP-tool chains (Read/Grep/safeoutputs/playwright). Top tool:Readwith 59 calls across 10 runs.Drift & anomalies
From
observability_insights:PR Sous Chefvaried 39 → 50 turns across 3 runs (avg 43.3) — "changing task shape or unstable prompts." Worth fingerprinting the prompts to see if a recent template change is widening the variance.tool_resultclusters. Likely benign for new tools but warrants a look if any are in failure paths.Recommendations (prioritized)
Daily News(and any other workflow that compiled to Copilot+chroot) can run. This is a sandbox infrastructure bug, not a workflow bug.GH_AW_OTEL_SENTRY_*,GH_AW_OTEL_GRAFANA_*) — 401 on every run means zero observability and ~1s of wasted retry latency per job.nohup/background-shell or add astart-dev-serverhelper toMulti-Device Docs Tester, or bump itsmax_turnspast 80. 7 permission-denied bash calls killed an entire $3.82 run.proxy.golang.organd*.blob.core.windows.netin the egress firewall — these are legitimate (Go modules, Actions artifacts) and account for ~25 of the 611 blocks.parse_errorin the detection step —Daily Rendering Scripts Verifiersucceeded at the agent level then failed in post-processing. If the detection logic is misreading no-op outputs as failures, others may be silently affected.copilot-requestsis not a valid permission scope;concurrency.queueis not a valid key; multiple SC2016/SC2086 shellcheck issues inab-testing-advisor.lock.yml(and likely all thelock.ymltemplates).PR Sous Chefturn-count drift — 28% spread between best and worst run suggests prompt or task-shape regression.References:
Beta Was this translation helpful? Give feedback.
All reactions