[workflow-analysis] Weekly Workflow Analysis (2026-05-11 → 2026-05-18) #33009

2026-05-18T10:51:04Z

github-actions[bot]
Bot May 18, 2026

Summary

Analysis of the 52 workflow runs executed in github/gh-aw during the past 7 days (≈ last 25h of capture window, the older end was truncated by the logs tool's 73s timeout — before_run_id=26016351895 is the continuation cursor).

Metric	Value
Total runs	52
Failed runs	4 (7.7%)
Total errors recorded	4
Total warnings	0
Total compute	6.5h wall / 421 action-minutes
Total cost (Claude only)	$30.15 across 15 Claude runs
Total tokens	66.6M raw / 424.8M effective
Total turns	1,167
GitHub API calls	253
Engine mix	copilot 32 · claude 15 · codex 4 · pi 1

Critical Issues

1. Daily News — exit 127, Node.js missing in AWF chroot (§26026325226)
The Copilot harness failed before the agent even started:

[entrypoint][ERROR] Copilot CLI requires Node.js, but 'node' is not available inside AWF chroot.
[entrypoint][ERROR] Ensure Node.js is installed on the runner and reachable from PATH inside the chroot.

The entrypoint hint says "verify the install path is present and bind-mounted into /host" — this is a sandbox bootstrap regression, not user error, and will recur until the chroot mounts Node from setup-node / /opt/hostedtoolcache. This is the 100% failure-rate workflow flagged by the reliability hotspot insight.

2. Multi-Device Docs Tester — error_max_turns at 80 turns (§26017866628)
Claude burnt all 80 turns and $3.82 driving playwright-cli one micro-step at a time (26× run-code, 11× goto, 6× resize). Seven Bash calls to start the Astro dev server were permission-denied — including a literal mkdir -p /tmp/gh-aw/agent — because nohup ... & and run_in_background aren't on the allowlist. The agent never recovered. Root cause: dev-server bring-up was blocked, so the agent kept retrying. Add nohup/background-shell or a dedicated start-dev-server script to the workflow's allowed commands, or raise max_turns for this workflow (currently the lowest of the failures at $0.048/turn).

3. Daily Rendering Scripts Verifier — detection job parse_error (§26023308972)
The agent succeeded (posted a clean no-op message). The post-agent detection step failed with GH_AW_DETECTION_REASON=parse_error and GH_AW_EFFECTIVE_TOKENS_RATE_LIMIT_ERROR=true. Despite that, this run cost $5.73 — the most expensive of the week — and still got marked as a failure. Worth investigating whether parse_error is misclassifying a successful no-op as a failure.

4. Static Analysis Report — step exit 1 (§26016351895)
Two Docker images don't support the --version flag the workflow expects:

[poutine]      Error: unknown flag: --version
[runner-guard] Error: unknown flag: --version

Not fatal on its own (the workflow only logs Warning:), but the run also surfaces real lint findings the static analyzer flags as errors (copilot-requests is not a valid permission scope, shellcheck SC2016/SC2086, unknown queue key in concurrency). The exit 1 is from the analyzer's intended behavior — these are lint debt to fix in .github/workflows/*.lock.yml.

Observability: OTLP exports are dead

Every run with OTEL configured logs:

OTLP export attempt 1/3 failed: HTTP 401 Unauthorized
OTLP export attempt 2/3 failed: HTTP 401 Unauthorized
OTLP export failed after 3 attempts: HTTP 401 Unauthorized

This appears in Daily Rendering Scripts Verifier, Static Analysis Report, and others. The four shared secrets (GH_AW_OTEL_SENTRY_*, GH_AW_OTEL_GRAFANA_*) are either rotated, scoped wrong, or invalid. No traces are reaching Sentry/Grafana — observability is blind for the entire fleet. Fix the credentials or remove the OTLP step until they're sorted; the 3-retry loop is just adding latency to every conclusion job.

Network friction — top firewall hotspots

Total: 2,019 requests, 611 blocked (30.3%) across 13 blocked domains.

Workflow	Total	Allowed	Blocked	Block %
Dev	22	2	20	91%
daily-experiment-report	188	88	100	53%
Daily Syntax Error Quality Check	193	113	80	41%
Organization Health Report	54	24	30	56%
Daily CLI Tools Exploratory Tester	53	24	29	55%
Layout Specification Maintainer	55	31	24	44%

Noteworthy blocked domains:

proxy.golang.org:443 — Go module proxy, 7 blocks (relevant for Go Fan, Schema Consistency Checker, etc.) — likely the cause of the Dev 91% block rate. Either add to the firewall allowlist or pre-vendor Go modules.
productionresultssa*.blob.core.windows.net:443 — 14 blocks across 7 Azure blob hosts. These are GitHub Actions artifact upload endpoints; the egress firewall is stripping them. Pattern-allowlist *.blob.core.windows.net for actions runners.
api-proxy:10000, api-proxy:10002 — 20 blocks to the internal Copilot proxy. Looks like attempts on the wrong port.
mtalk.google.com:5228 — Google Cloud Messaging push (1 block) — harmless background leak from Chromium/Playwright.
traces.example.com:4317 — placeholder still hard-coded somewhere in OTEL config.
invalid.example.invalid:443 — looks like a test/negative-control entry.

Cost & performance — top 10 by spend

All spend is on Claude; Copilot runs report $0.00. Sample (52 runs) shows $30.15 across 15 Claude runs ≈ $2.01/run average, but distribution is heavily skewed:

Workflow	Cost	Tokens (raw)	Duration	Engine
Daily Rendering Scripts Verifier	$5.73	8.78M	16.9m	claude
Daily AgentRx Trace Optimizer	$5.25	6.66M	13.2m	claude
Multi-Device Docs Tester	$3.82	5.56M	9.5m	claude
Copilot Session Insights	$2.74	2.75M	18.1m	claude
Instructions Janitor	$2.74	3.29M	8.1m	claude
Go Fan	$2.59	2.93M	9.4m	claude
Schema Consistency Checker	$2.29	2.01M	9.0m	claude
CLI Version Checker	$2.01	1.24M	6.7m	claude
[aw] Failure Investigator (6h)	$1.90	1.58M	11.3m	claude
Design Decision Gate 🏗️	$0.59	0.23M	5.1m	claude

Longest wall-clock runs: daily-experiment-report 20.0m, Daily Syntax Error Quality Check 18.4m, Copilot Session Insights 18.1m, Daily Rendering Scripts Verifier 16.9m. All four use heavy MCP-tool chains (Read/Grep/safeoutputs/playwright). Top tool: Read with 59 calls across 10 runs.

Drift & anomalies

From observability_insights:

PR Sous Chef varied 39 → 50 turns across 3 runs (avg 43.3) — "changing task shape or unstable prompts." Worth fingerprinting the prompts to see if a recent template change is widening the variance.
13 high-anomaly events (score > 0.6) flagged across 52 runs — mostly new log templates in tool_result clusters. Likely benign for new tools but warrants a look if any are in failure paths.
Episode classifier: 44 baseline · 5 normal · 3 risky (no escalation-eligible episodes).

Recommendations (prioritized)

Fix Node.js bootstrap in the AWF chroot so Daily News (and any other workflow that compiled to Copilot+chroot) can run. This is a sandbox infrastructure bug, not a workflow bug.
Rotate or re-scope the OTLP secrets (GH_AW_OTEL_SENTRY_*, GH_AW_OTEL_GRAFANA_*) — 401 on every run means zero observability and ~1s of wasted retry latency per job.
Allowlist nohup/background-shell or add a start-dev-server helper to Multi-Device Docs Tester, or bump its max_turns past 80. 7 permission-denied bash calls killed an entire $3.82 run.
Allowlist proxy.golang.org and *.blob.core.windows.net in the egress firewall — these are legitimate (Go modules, Actions artifacts) and account for ~25 of the 611 blocks.
Investigate parse_error in the detection step — Daily Rendering Scripts Verifier succeeded at the agent level then failed in post-processing. If the detection logic is misreading no-op outputs as failures, others may be silently affected.
Fix the static-analysis warnings the SAR run surfaces: copilot-requests is not a valid permission scope; concurrency.queue is not a valid key; multiple SC2016/SC2086 shellcheck issues in ab-testing-advisor.lock.yml (and likely all the lock.yml templates).
Investigate PR Sous Chef turn-count drift — 28% spread between best and worst run suggests prompt or task-shape regression.

References:

§26026325226 Daily News — chroot Node bootstrap failure
§26017866628 Multi-Device Docs Tester — max-turns + permission-denied dev-server start
§26023308972 Daily Rendering Scripts Verifier — detection parse_error after successful agent run

Generated by 🔍 Weekly Workflow Analysis · ● 9.1M · ◷

expires on May 19, 2026, 10:51 AM UTC

2026-05-19T11:41:08Z

github-actions[bot]
Bot May 19, 2026
Author

This discussion was automatically closed because it expired on 2026-05-19T10:51:03.892Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[workflow-analysis] Weekly Workflow Analysis (2026-05-11 → 2026-05-18) #33009

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[workflow-analysis] Weekly Workflow Analysis (2026-05-11 → 2026-05-18) #33009

Uh oh!

github-actions[bot] Bot May 18, 2026

Summary

Critical Issues

Observability: OTLP exports are dead

Recommendations (prioritized)

Replies: 1 comment

Uh oh!

github-actions[bot] Bot May 19, 2026 Author

github-actions[bot]
Bot May 18, 2026

github-actions[bot]
Bot May 19, 2026
Author