Problem
During a transient GitHub infrastructure outage (16:28–16:49 UTC on 2026-04-23), the conclusion job in multiple workflows failed with a two-stage cascade:
- Primary error: git checkout fails with
HTTP 500 or DNS: Could not resolve host: github.com
- Cascade error:
Error: Cannot find module '/home/runner/work/_temp/gh-aw/actions/setup_globals.cjs'
The cascade happens because handle_agent_failure.cjs imports setup_globals.cjs from the checked-out repo. When the checkout fails, that module is unavailable — turning a transient infra blip into a hard, unrecoverable conclusion job failure.
Affected Runs (last 6h)
| Workflow |
Run ID |
Time (UTC) |
Error |
| Smoke CI |
§24846689722 |
16:29 |
HTTP 500 + setup_globals.cjs not found |
| Smoke CI |
§24846759388 |
16:31 |
HTTP 500 + setup_globals.cjs not found |
| Smoke CI |
§24846939615 |
16:35 |
HTTP 500 + setup_globals.cjs not found |
| Smoke CI |
§24847116637 |
16:39 |
HTTP 500 + setup_globals.cjs not found |
| Smoke CI |
§24852137215 |
18:32 |
DNS failure + setup_globals.cjs not found |
| Test Quality Sentinel |
§24846641687 |
16:28 |
HTTP 500 + setup_globals.cjs not found |
| Design Decision Gate 🏗️ |
§24847007622 |
16:37 |
HTTP 500 + setup_globals.cjs not found |
| Slide Deck Maintainer |
§24847273895 |
16:43 |
HTTP 500 + setup_globals.cjs not found |
Note: in most of these runs, the agent job succeeded (noop or valid output) — only the conclusion cleanup step failed.
Root Cause
The conclusion job performs a fresh git checkout to access AWF action helpers (e.g., setup_globals.cjs, handle_agent_failure.cjs). When github.com is transiently unreachable (HTTP 500 or DNS failure), this checkout fails. The error cascades because handle_agent_failure.cjs cannot load its dependency, preventing graceful failure reporting.
Proposed Remediation
Option A (Preferred): Add retry with backoff to the conclusion job's git checkout
- 3 attempts with exponential backoff (e.g., 5s, 15s, 30s)
- Covers transient HTTP 500 and DNS hiccups without false positives
Option B: Pre-bundle AWF action helpers
- Bundle
setup_globals.cjs and related helpers as part of the AWF harness artifacts
- Conclusion job reads from bundle instead of live checkout
- Eliminates the checkout dependency entirely
Option C: Graceful degradation
- If checkout fails, conclusion job exits with a specific known error code
- Downstream reporting treats "checkout failure" distinctly from "agent failure"
Success Criteria
Parent Issue
Part of failure investigation report: #27730
References:
Generated by [aw] Failure Investigator (6h)
Generated by [aw] Failure Investigator (6h) · ● 539.8K · ◷
Problem
During a transient GitHub infrastructure outage (16:28–16:49 UTC on 2026-04-23), the
conclusionjob in multiple workflows failed with a two-stage cascade:HTTP 500orDNS: Could not resolve host: github.comError: Cannot find module '/home/runner/work/_temp/gh-aw/actions/setup_globals.cjs'The cascade happens because
handle_agent_failure.cjsimportssetup_globals.cjsfrom the checked-out repo. When the checkout fails, that module is unavailable — turning a transient infra blip into a hard, unrecoverable conclusion job failure.Affected Runs (last 6h)
Note: in most of these runs, the agent job succeeded (noop or valid output) — only the conclusion cleanup step failed.
Root Cause
The conclusion job performs a fresh
git checkoutto access AWF action helpers (e.g.,setup_globals.cjs,handle_agent_failure.cjs). Whengithub.comis transiently unreachable (HTTP 500 or DNS failure), this checkout fails. The error cascades becausehandle_agent_failure.cjscannot load its dependency, preventing graceful failure reporting.Proposed Remediation
Option A (Preferred): Add retry with backoff to the conclusion job's git checkout
Option B: Pre-bundle AWF action helpers
setup_globals.cjsand related helpers as part of the AWF harness artifactsOption C: Graceful degradation
Success Criteria
github.comreturns HTTP 500 or DNS errorsParent Issue
Part of failure investigation report: #27730
References: