Skip to content

From v0.68.4 copilot engine broken when sandbox: agent: false with strict: false #30632

@zijianzz

Description

@zijianzz

Summary

Workflows using engine: copilot with sandbox: agent: false and strict: false are completely broken in v0.68.4. The compiled lock file uses copilot_harness.cjs which requires http://api-proxy:10000/reflect at runtime, but no api-proxy container is started when the agent sandbox is disabled. The agent exits immediately with code 1 and 0 bytes of output.

This is a regression from v0.68.x where copilot_driver.cjs worked correctly in the same configuration.

Environment

  • gh-aw compiler: v0.71.6 (also tested v0.71.5)
  • Copilot CLI: 1.0.40 (installed via install_copilot_cli.sh)
  • gh-aw-actions SHA: a9daf37d190509f3592fe62338fc75430bbc640f (v0.71.6)
  • Runner: GitHub-hosted ubuntu-latest
  • Node.js: v24.14.1
  • Working version: gh-aw before v0.68.4 (uses copilot_driver.cjs + CLI 1.0.21)

Workflow frontmatter (relevant parts)

---
strict: false
engine:
  id: copilot
sandbox:
  agent: false
# ...
---

sandbox: agent: false is required because the workflow starts external Docker Compose services (postgres, rabbitmq) for functional tests. Per PR #29483, this key is still valid when strict: false.

Expected behavior

The copilot engine should work the same as v0.68.x: copilot_driver.cjs spawns the CLI binary which authenticates directly using COPILOT_GITHUB_TOKEN without needing an api-proxy sidecar.

Actual behavior

v0.69.x always:

  1. Injects COPILOT_API_KEY: dummy-byok-key-for-offline-mode into the step env
  2. Uses copilot_harness.cjs instead of copilot_driver.cjs
  3. The harness tries to reach http://api-proxy:10000/reflect → fails (fetch failed)
  4. Spawns copilot CLI which exits code 1 after 2 seconds with 0 bytes output
  5. Harness reports "no output produced — not retrying"

Logs from CI (v0.71.6)

[copilot-harness] starting: command=copilot maxRetries=3 initialDelayMs=5000 backoffMultiplier=2 maxDelayMs=60000 nodeVersion=v24.14.1 platform=linux
[copilot-harness] pre-flight: command not found: copilot (F_OK check failed — binary does not exist at this path)
[copilot-harness] resolved --prompt-file: path=/tmp/gh-aw/aw-prompts/prompt.txt size=36959B
[copilot-harness] awf-reflect: fetching http://api-proxy:10000/reflect (timeout=5000ms)
[copilot-harness] awf-reflect: request failed: fetch failed
[copilot-harness] attempt 1: spawning: copilot --add-dir ... --prompt-file ...
[copilot-harness] attempt 1: process started (pid=5273)
[copilot-harness] attempt 1: process exit event exitCode=1
[copilot-harness] attempt 1: process closed exitCode=1 duration=2s stdout=0B stderr=0B hasOutput=false
[copilot-harness] attempt 1 failed: exitCode=1 isCAPIError400=false isMCPPolicyError=false isModelNotSupportedError=false isNullTypeToolCallError=false isAuthError=false hasOutput=false retriesRemaining=3
[copilot-harness] attempt 1: no output produced — not retrying (possible causes: binary not found, permission denied, auth failure, or silent startup crash)
[copilot-harness] awf-reflect: fetching http://api-proxy:10000/reflect (timeout=5000ms)
[copilot-harness] awf-reflect: request failed: fetch failed
[copilot-harness] done: exitCode=1 totalDuration=2s

Note: The F_OK check failed is a warning (it also appeared in v0.68.3 working runs) — the binary IS installed at /usr/local/bin/copilot and verified (GitHub Copilot CLI 1.0.40). The real failure is the silent exit code 1.

Comparison: v0.68.3 (working) vs v0.71.6 (broken)

Aspect v0.68.3 (working) v0.71.6 (broken)
Runner script copilot_driver.cjs copilot_harness.cjs
CLI version 1.0.21 1.0.40
COPILOT_API_KEY Not present dummy-byok-key-for-offline-mode
api-proxy:10000 call Not made Hardcoded in harness, fails
Auth method COPILOT_GITHUB_TOKEN (direct) Broken — CLI exits silently
Result exit 0, 14m 31s, 14011B output exit 1, 2s, 0B output

What we tested (all failed with v0.71.6)

  1. Remove COPILOT_API_KEY from lock file — harness still calls api-proxy and CLI still exits code 1
  2. Recompile with v0.71.6 — same result
  3. --action-tag v0.68.3 — compiler still emits harness + BYOK key (ignores old action tag for runtime selection)

Root cause analysis

The v0.71.x compiler unconditionally:

  1. Emits copilot_harness.cjs as the runtime launcher (replacing copilot_driver.cjs)
  2. Injects COPILOT_API_KEY: dummy-byok-key-for-offline-mode (BYOK mode activation)
  3. The harness expects api-proxy:10000 to be available for token reflection

But when sandbox: agent: false:

  • No AWF agent container is spawned
  • No api-proxy sidecar is started
  • The CLI runs directly on the host runner
  • There is nothing listening on api-proxy:10000

The compiler should detect that sandbox: agent: false means no api-proxy and either:

  • Fall back to copilot_driver.cjs (v0.68 behavior)
  • Not inject COPILOT_API_KEY
  • Make the harness support direct COPILOT_GITHUB_TOKEN auth when api-proxy is unreachable

Current workaround

Pin gh-aw to v0.68.7:

gh extension remove aw
gh extension install github/gh-aw --pin v0.68.7
gh aw compile

This produces a lock file with copilot_driver.cjs + CLI 1.0.21 + no COPILOT_API_KEY, which works correctly.

Suggested fix

When sandbox: agent: false is set in a strict: false workflow, the compiler should:

  1. Use copilot_driver.cjs instead of copilot_harness.cjs (or make the harness gracefully fall back)
  2. NOT inject COPILOT_API_KEY: dummy-byok-key-for-offline-mode
  3. Let the CLI authenticate directly via COPILOT_GITHUB_TOKEN

Alternatively, if copilot_harness.cjs is the intended path forward, it should detect that api-proxy is unreachable and fall back to direct COPILOT_GITHUB_TOKEN authentication instead of silently crashing.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions