Skip to content

bug: sandbox agent fails with Argument list too long (E2BIG) when prompt + env exceed ARG_MAX #26045

@bbonafed

Description

@bbonafed

Problem

When using the AWF sandbox with --env-all and a workflow that has many imports: (or inlined-imports: true), the agent step fails with:

/bin/bash: line 1: /usr/local/bin/node: Argument list too long

Exit code 126. The Copilot CLI never starts — the kernel rejects the execve call before node can even launch.

Root Cause

The Linux kernel limits argv + envp to ~2 MB (ARG_MAX). Two things combine to exceed this:

  1. --prompt "$(cat /tmp/gh-aw/aw-prompts/prompt.txt)" — the entire assembled prompt (which can be 100–200+ KB for workflows with many imported skill/reference files) is shell-expanded into a single argv element.

  2. --env-all — the full GitHub Actions runner environment is forwarded into the container. On hosted runners this can be 1.5–2 MB of envp (hundreds of GITHUB_*, tool-cache, matrix, and runner variables).

Together they exceed the kernel limit. The failure happens inside the AWF container when the entrypoint's script file runs node copilot_driver.cjs ... --prompt "<145KB of text>" and execve returns E2BIG.

Reproduction

  1. Create a reusable workflow (.md) with inlined-imports: true and 10+ imported files totaling >100 KB of markdown content
  2. Use the sandbox (--env-all is always added by the compiler)
  3. Trigger from a repo on a hosted runner with a typical environment
  4. The agent step fails with exit code 126 and the error above

The issue is more likely to surface when:

  • The workflow has many imported skill/reference files
  • The calling workflow passes extra_context with large content (e.g., an Actions Importer draft)
  • The runner has many custom environment variables

Existing Precedent

This was already fixed for the threat detection job in a previous release:

Fix threat detection CLI overflow by using file access instead of inlining agent output

The threat detection job was passing the entire agent output to the detection agent via environment variables, which could cause CLI argument overflow errors when the agent output was large. Modified the threat detection system to use a file-based approach where the agent reads the output file directly using bash tools instead of inlining the full content into the prompt.

The same pattern should be applied to the main agent prompt.

Proposal

Option A: Pass prompt via file path (preferred)

Instead of:

copilotCommand = fmt.Sprintf(`%s %s --prompt "$(cat /tmp/gh-aw/aw-prompts/prompt.txt)"`, ...)

Have the Copilot CLI accept a file path:

copilotCommand = fmt.Sprintf(`%s %s --prompt-file /tmp/gh-aw/aw-prompts/prompt.txt`, ...)

The prompt stays on disk and is read by the node process after execve succeeds, completely bypassing ARG_MAX. This is the cleanest fix and eliminates the issue for any prompt size.

Option B: Read prompt via stdin

copilotCommand = fmt.Sprintf(`cat /tmp/gh-aw/aw-prompts/prompt.txt | %s %s --prompt -`, ...)

Pipe the prompt via stdin instead of argv. This also avoids ARG_MAX.

Option C: Reduce environment footprint

As a complementary measure, --env-all could be smarter about filtering out large, non-essential runner variables (e.g., GITHUB_EVENT_PATH content, tool-cache metadata, matrix variables that the agent doesn't need). The --exclude-env flag exists but only targets secrets today.

Workaround

Currently the only workaround is to reduce the total size of imported content in the .md file, which limits how much reference material can be provided to the agent.

Environment

  • AWF version: v0.25.18
  • gh-aw CLI: v0.68.1
  • Runner: GitHub-hosted ubuntu-latest
  • Trigger: workflow_call with inlined-imports: true

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions