Summary
Our Copilot-engine workflows have been failing since the evening of April 9 across two different gh-aw versions with two different failure modes. We upgraded to v0.67.4 expecting it to fix the first problem, but hit a second one instead. Neither version currently works.
| gh-aw version |
Copilot CLI |
Failure mode |
| v0.67.2 |
v1.0.22 ("latest") |
Hangs indefinitely → workflow timeout |
| v0.67.4 |
v1.0.20 (pinned) |
Silent crash: exitCode=1, 0B output, ~1s |
Key evidence: Our last successful runs (April 9, v0.67.2) used Copilot CLI v1.0.21 — the exact version v0.67.4's release notes blamed for crashes — and they worked fine. This suggests:
- Bug A: Copilot CLI v1.0.22 introduced a hang/freeze on startup (broke v0.67.2 when "latest" rolled forward on the evening of April 9)
- Bug B: The v0.67.4 runtime environment itself prevents v1.0.20 from starting (new
copilot_driver.cjs wrapper, updated sandbox, or chroot changes)
Evidence
Bug A: Copilot CLI v1.0.22 hangs on startup (v0.67.2, no code changes on our side)
| Run |
Workflow |
CLI Version |
Failure |
| (run ID available on request) |
Copilot-engine workflow |
v1.0.22 ("latest") |
Timeout after 5m |
| (run ID available on request) |
Copilot-engine workflow |
v1.0.22 ("latest") |
Timeout after 15m |
These runs were on v0.67.2 with no changes to our workflows. The only difference from our successful runs earlier that day: the latest tag for Copilot CLI rolled from v1.0.21 → v1.0.22 sometime on the evening of April 9.
Bug B: v0.67.4 runtime crashes v1.0.20 on startup
| Run |
Workflow |
CLI Version |
Failure |
| (run ID available on request) |
Copilot-engine workflow |
v1.0.20 (pinned) |
exitCode=1, 1s, 0B output |
| (run ID available on request) |
Copilot-engine workflow |
v1.0.20 (pinned) |
exitCode=1, 1s, 0B output |
copilot-driver output (identical in both):
[copilot-driver] attempt 1: process started (pid=156)
[copilot-driver] attempt 1: process closed exitCode=1 duration=1s stdout=0B stderr=0B hasOutput=false
[copilot-driver] attempt 1 failed: exitCode=1 isCAPIError400=false hasOutput=false retriesRemaining=3
[copilot-driver] attempt 1: no output produced — not retrying
Last known good: v0.67.2 + Copilot CLI v1.0.21
| Run |
Workflow |
CLI Version |
Result |
| (run ID available on request) |
Copilot-engine workflow |
v1.0.21 ("latest") |
✅ Success |
| (run ID available on request) |
Copilot-engine workflow |
v1.0.21 ("latest") |
✅ Success |
Timeline
| When |
gh-aw |
Copilot CLI |
Result |
Notes |
| Apr 9 16:15 |
v0.67.2 |
v1.0.21 (latest) |
✅ Success |
Last known good |
| Apr 9 17:40 |
v0.67.2 |
v1.0.21 (latest) |
✅ Success |
Last known good |
| Apr 9 22:00 |
v0.67.2 |
v1.0.22 (latest) |
❌ Timeout |
Bug A — "latest" rolled to v1.0.22, broke everything |
| Apr 10 16:07 |
v0.67.4 |
v1.0.20 (pinned) |
❌ Silent crash |
Bug B — upgraded to fix Bug A, hit new failure |
| Apr 10 16:10 |
v0.67.4 |
v1.0.20 (pinned) |
❌ Silent crash |
Bug B — confirmed reproducible |
Additional observations
-
The entrypoint logs a warning in v0.67.4 runs that doesn't appear in v0.67.2 runs:
[entrypoint][WARN] Failed to transfer /host/home/runner/work/_temp/gh-aw/safeoutputs ownership to chroot user
-
The permission-discussions input warning on create-github-app-token is present in both passing and failing runs, so likely unrelated.
-
Checksum verification passes for v1.0.20 — the binary is intact.
-
v0.67.4's release notes attributed the crash to Copilot CLI v1.0.21, but our evidence shows v1.0.21 was the last version that worked. The real break came from v1.0.22 (Bug A) and the v0.67.4 runtime itself (Bug B).
What we think is happening
Bug A — Copilot CLI v1.0.22 hang
latest tag rolled from v1.0.21 → v1.0.22 on the evening of April 9
- v1.0.22 hangs on startup (no crash, no output, just freezes until the workflow timeout fires)
- Affects any gh-aw version that installs
latest (which was every version before v0.67.4 pinned to v1.0.20)
Bug B — v0.67.4 runtime silent crash
- v0.67.4 pins Copilot CLI to v1.0.20 to avoid v1.0.22, but the v0.67.4 runtime itself prevents v1.0.20 from starting
- The CLI exits in ~1 second with code 1 and zero output
- Possible culprits in v0.67.4:
- New
copilot_driver.cjs wrapper — the CLI is no longer invoked directly; it goes through a Node.js driver. On v0.67.2, copilot was called directly.
- Updated AWF sandbox — Firewall v0.25.18, MCP Gateway v0.2.17.
- Chroot entrypoint changes — the safeoutputs ownership transfer failure suggests filesystem permission changes in the sandbox.
Impact
- 22 Copilot-engine workflows are completely blocked
- No workaround available — v0.67.2 + latest (v1.0.22) hangs, v0.67.4 + v1.0.20 crashes
- We'd need a gh-aw version that either (a) pins to v1.0.21 or (b) fixes the v0.67.4 runtime to work with v1.0.20
Reproduction
- Bug A: Compile any Copilot-engine workflow with v0.67.2 (which installs
latest = v1.0.22). The agent step will hang until workflow timeout.
- Bug B: Compile any Copilot-engine workflow with v0.67.4. The agent step will fail immediately with exitCode=1.
Environment
- Runner:
ubuntu-latest (GitHub-hosted)
- gh-aw tested: v0.67.2 (
03e31e064a68e8d5ad890c92f303cfb5a3536006), v0.67.4 (9d6ae06250fc0ec536a0e5f35de313b35bad7246)
- Copilot CLI versions tested: v1.0.20 (pinned), v1.0.21 (latest, worked), v1.0.22 (latest, hangs)
- Run IDs and repository details available on request
Summary
Our Copilot-engine workflows have been failing since the evening of April 9 across two different gh-aw versions with two different failure modes. We upgraded to v0.67.4 expecting it to fix the first problem, but hit a second one instead. Neither version currently works.
Key evidence: Our last successful runs (April 9, v0.67.2) used Copilot CLI v1.0.21 — the exact version v0.67.4's release notes blamed for crashes — and they worked fine. This suggests:
copilot_driver.cjswrapper, updated sandbox, or chroot changes)Evidence
Bug A: Copilot CLI v1.0.22 hangs on startup (v0.67.2, no code changes on our side)
These runs were on v0.67.2 with no changes to our workflows. The only difference from our successful runs earlier that day: the
latesttag for Copilot CLI rolled from v1.0.21 → v1.0.22 sometime on the evening of April 9.Bug B: v0.67.4 runtime crashes v1.0.20 on startup
copilot-driver output (identical in both):
Last known good: v0.67.2 + Copilot CLI v1.0.21
Timeline
Additional observations
The entrypoint logs a warning in v0.67.4 runs that doesn't appear in v0.67.2 runs:
The
permission-discussionsinput warning oncreate-github-app-tokenis present in both passing and failing runs, so likely unrelated.Checksum verification passes for v1.0.20 — the binary is intact.
v0.67.4's release notes attributed the crash to Copilot CLI v1.0.21, but our evidence shows v1.0.21 was the last version that worked. The real break came from v1.0.22 (Bug A) and the v0.67.4 runtime itself (Bug B).
What we think is happening
Bug A — Copilot CLI v1.0.22 hang
latesttag rolled from v1.0.21 → v1.0.22 on the evening of April 9latest(which was every version before v0.67.4 pinned to v1.0.20)Bug B — v0.67.4 runtime silent crash
copilot_driver.cjswrapper — the CLI is no longer invoked directly; it goes through a Node.js driver. On v0.67.2,copilotwas called directly.Impact
Reproduction
latest= v1.0.22). The agent step will hang until workflow timeout.Environment
ubuntu-latest(GitHub-hosted)03e31e064a68e8d5ad890c92f303cfb5a3536006), v0.67.4 (9d6ae06250fc0ec536a0e5f35de313b35bad7246)