[ARC-DinD] ARC/DinD support in v0.75.4 still requires workflow-level workarounds

## Summary

[#30840 ](https://github.com/github/gh-aw/issues/30840)was closed COMPLETED with the v0.75.0 release, and [#33777](https://github.com/github/gh-aw/issues/33777) is tracking one specific follow-up (unix-socket DOCKER_HOST from a sibling DinD pod). On v0.75.4 the closed-issue scenario (`tcp://`-shaped DOCKER_HOST from a runner pod with a DinD sidecar in the same ARC RunnerScaleSet pod) is still not first-class: getting a working agent run on Copilot + AWF chroot mode required six distinct workflow/infra-level workarounds.

This issue enumerates those remaining gaps in v0.75.4 with concrete reproduction details.

## Repro

- gh-aw v0.75.4 (AWF v0.25.53), Copilot engine.
- ARC scale-set on EKS. `RunnerScaleSet` template has the runner container and a `dind` sidecar in the same pod.
- Runner env: `DOCKER_HOST=tcp://localhost:2375` (matches AWF's TCP-detection regex, so `--docker-host-path-prefix /tmp/gh-aw` is engaged correctly — this part works).
- Stock helm: DinD sidecar uses `docker:dind` (Alpine).
- Shared `gh-aw-tmp` emptyDir mounted at `/tmp` on both containers (standard gh-aw recommendation for ARC).
- Workflow: stock Copilot agent, `sandbox.agent.id: awf`, `tools: { github: {toolsets: [all]}, bash: true }`.

The full workaround set we currently ship is
```yaml
name: 'ARC GAW Bootstrap (workaround)'
description: |
  Stages /etc/passwd, /etc/group, /etc/hosts overrides into the shared DinD
  /tmp volume and copies the runner-installed copilot binary into the DinD
  daemon's /usr/local/bin so AWF's /usr:/host/usr:ro system mount exposes it
  to the chrooted agent. Node is baked into the DinD image.

  Workaround for upstream gh-aw bugs on Kubernetes/ARC runners:
    - https://github.com/github/gh-aw/issues/30838
    - https://github.com/github/gh-aw/issues/30840

  Once both issues are fixed upstream:
    1. Delete this whole workflows/arc-gaw-bootstrap/ directory upstream
       (or .github/workflows/arc-gaw-bootstrap/ in consumer repos).
    2. In each workflow .md, delete every block between
       `# WORKAROUND-START` and `# WORKAROUND-END` (grep-friendly markers).
    3. Drop the `resources:` list from each workflow .md.
    4. Run `gh aw compile` to regenerate the lock files.
runs:
  using: composite
  steps:
    - name: Prepare ARC DinD temp directories and stage copilot into the daemon
      shell: bash
      env:
        DOCKER_HOST: tcp://localhost:2375
      run: bash -eo pipefail "${GITHUB_ACTION_PATH}/prepare-dind-dirs.sh"
    # gh-aw v0.75.4 still hardcodes `github` as an "internal" MCP server in
    # mount_mcp_as_cli.cjs, so it never gets mounted as a CLI shim and
    # workers invoked with `copilot --disable-builtin-mcps` can't see
    # github_* tools. Remove it from INTERNAL_SERVERS so the agent can
    # read issues, list PRs, etc. via the github MCP server.
    - name: Patch mount_mcp_as_cli.cjs to expose github MCP as a CLI tool
      shell: bash
      run: |
        set -euo pipefail
        script="${RUNNER_TEMP}/gh-aw/actions/mount_mcp_as_cli.cjs"
        if [ ! -f "$script" ]; then
          echo "mount_mcp_as_cli.cjs not found at $script — gh-aw version drift?" >&2
          exit 0
        fi
        if grep -q 'INTERNAL_SERVERS = new Set(\["github"\])' "$script"; then
          sed -i 's|INTERNAL_SERVERS = new Set(\["github"\])|INTERNAL_SERVERS = new Set([])|' "$script"
          echo "Patched: removed github from INTERNAL_SERVERS"
        elif grep -q 'INTERNAL_SERVERS = new Set(\[\])' "$script"; then
          echo "Patch already applied or not needed"
        else
          echo "WARN: unrecognized INTERNAL_SERVERS pattern in $script — gh-aw upstream may have changed" >&2
        fi
```
```bash
mkdir -p /tmp/gh-aw/.cache /tmp/gh-aw/.config /tmp/gh-aw/.local/state /tmp/gh-aw/home
chmod -R 0777 /tmp/gh-aw/.cache /tmp/gh-aw/.config /tmp/gh-aw/.local /tmp/gh-aw/home
docker run --rm --user 0:0 --entrypoint /bin/sh \
  -v /tmp:/host-tmp:rw \
  ghcr.io/github/gh-aw-mcpg:v0.3.6@sha256:2bb8eef86006a4c5963c55616a9c51c32f27bfdecb023b8aa6f91f6718d9171c \
  -c 'mkdir -p /host-tmp/gh-aw/.cache /host-tmp/gh-aw/.config /host-tmp/gh-aw/.local/state /host-tmp/gh-aw/home /host-tmp/gh-aw/mcp-logs /host-tmp/gh-aw/mcp-payloads /host-tmp/gh-aw/sandbox/firewall/logs /host-tmp/gh-aw/sandbox/firewall/logs/api-proxy-logs /host-tmp/gh-aw/sandbox/firewall/logs/cli-proxy-logs && chmod -R 0777 /host-tmp/gh-aw'
mkdir -p /tmp/gh-aw/arc-etc
printf '%s\n' 'runner:x:1001:1001:GitHub Actions Runner:/tmp/gh-aw/home:/bin/bash' 'nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin' > /tmp/gh-aw/arc-etc/passwd
printf '%s\n' 'runner:x:1001:' 'nobody:x:65534:' > /tmp/gh-aw/arc-etc/group
printf '%s\n' '127.0.0.1 localhost' '::1 localhost ip6-localhost ip6-loopback' '172.30.0.1 host.docker.internal' > /tmp/gh-aw/arc-etc/hosts
chmod a+r /tmp/gh-aw/arc-etc/passwd /tmp/gh-aw/arc-etc/group /tmp/gh-aw/arc-etc/hosts
# Stage the runner's installed copilot CLI into the DinD daemon's
# /usr/local/bin/copilot so AWF's /usr:/host/usr:ro system mount exposes it
# inside the chrooted agent. Bind-mounting copilot directly into the chroot
# from the runner fails: the daemon doesn't see the runner's filesystem, and
# /host/usr is read-only so a file mountpoint can't be created there.
#
# Node is already pre-installed in the DinD image at /usr/local/bin/node.
# Copilot is staged at runtime rather than baked into the image so its
# version stays bound to whatever gh-aw installs on the runner.
COPILOT_BIN="$(command -v copilot 2>/dev/null || true)"
if [ -n "$COPILOT_BIN" ] && [ -x "$COPILOT_BIN" ]; then
  mkdir -p /tmp/gh-aw/copilot-stage
  cp -Lf "$COPILOT_BIN" /tmp/gh-aw/copilot-stage/copilot.real
  chmod a+rx /tmp/gh-aw/copilot-stage/copilot.real
  # AWF chroot mode passes the AWF agent container's HOME=/home/runner,
  # USER=root, LOGNAME=root through to the agent exec regardless of
  # engine.env (the XDG_* overrides come through; the identity vars do not).
  # With HOME=/home/runner copilot can't write to ~/.copilot in the chrooted
  # DinD filesystem and exits silently with status 1. The shim forces the
  # identity vars to the writable /tmp/gh-aw/home tree before exec'ing the
  # real binary.
  cat > /tmp/gh-aw/copilot-stage/copilot <<'SHIM_EOF'
#!/bin/bash
exec env HOME=/tmp/gh-aw/home USER=runner LOGNAME=runner /usr/local/bin/copilot.real "$@"
SHIM_EOF
  chmod a+rx /tmp/gh-aw/copilot-stage/copilot
  docker run --rm --user 0:0 --entrypoint /bin/sh \
    -v /usr/local/bin:/daemon-usr-local-bin:rw \
    -v /tmp:/host-tmp:ro \
    ghcr.io/github/gh-aw-mcpg:v0.3.6@sha256:2bb8eef86006a4c5963c55616a9c51c32f27bfdecb023b8aa6f91f6718d9171c \
    -c '
      cp /host-tmp/gh-aw/copilot-stage/copilot.real /daemon-usr-local-bin/copilot.real
      chmod 0755 /daemon-usr-local-bin/copilot.real
      cp /host-tmp/gh-aw/copilot-stage/copilot /daemon-usr-local-bin/copilot
      chmod 0755 /daemon-usr-local-bin/copilot
    '
fi
tar -C /tmp/gh-aw -cf - arc-etc | docker run --rm -i --user 0:0 --entrypoint /bin/sh \
  -v /tmp:/host-tmp:rw \
  ghcr.io/github/gh-aw-mcpg:v0.3.6@sha256:2bb8eef86006a4c5963c55616a9c51c32f27bfdecb023b8aa6f91f6718d9171c \
  -c 'mkdir -p /host-tmp/gh-aw && tar -C /host-tmp/gh-aw -xf - && chmod -R a+rX /host-tmp/gh-aw/arc-etc'
if [ -d /tmp/gh-aw/aw-prompts ]; then
  tar -C /tmp/gh-aw -cf - aw-prompts | docker run --rm -i --user 0:0 --entrypoint /bin/sh \
    -v /tmp:/host-tmp:rw \
    ghcr.io/github/gh-aw-mcpg:v0.3.6@sha256:2bb8eef86006a4c5963c55616a9c51c32f27bfdecb023b8aa6f91f6718d9171c \
    -c 'mkdir -p /host-tmp/gh-aw && tar -C /host-tmp/gh-aw -xf - && chmod -R a+rX /host-tmp/gh-aw/aw-prompts'
fi
```

## Remaining gaps in v0.75.4

### Gap 1 — AWF chroot rejects Alpine/musl daemon hosts but the official ARC DinD is Alpine

**Symptom (with stock `docker:dind`):**

```
[entrypoint][WARN] one-shot-token.so failed to load on host dynamic linker (host libc incompatibility, e.g. musl/Alpine) chroot: failed to run command '/bin/sh': No such file or directory
[entrypoint][ERROR] capsh not found on host system
```

**What we had to do:** Build a custom Ubuntu-22.04 DinD image with `docker-ce`, `libcap2-bin` (with `/usr/sbin/capsh` symlinked into `/usr/bin`), Node.js installed at `/usr/local/bin/node`.

**Expected:** Either gh-aw ships a glibc DinD image as a documented companion, or AWF stages `capsh` / `node` / `/bin/sh` into the daemon filesystem itself from a known agent-image bundle (rather than requiring them to be pre-present in the daemon's rootfs).

### Gap 2 — engine.env `HOME` / `USER` / `LOGNAME` are silently overridden

**Symptom:** Copilot exits with status 1 after 9s, producing zero bytes on stdout and stderr. Diagnostic shim placed in `/usr/local/bin/copilot` captured the actual exec env:

```
HOME=/home/runner       # we set HOME=/tmp/gh-aw/home in engine.env
USER=root               # we set USER=runner in engine.env
LOGNAME=root            # we set LOGNAME=runner in engine.env
XDG_CACHE_HOME=/tmp/gh-aw/.cache   # this DID come from engine.env
```

So engine.env partially propagates (`XDG_*` survives) but the identity triple is clobbered to the AWF agent container's pre-capsh values. Copilot then can't write `~/.copilot/state` and dies.

**Workaround:** A one-line shim at `/usr/local/bin/copilot` on the daemon: `exec env HOME=/tmp/gh-aw/home USER=runner LOGNAME=runner /usr/local/bin/copilot.real "$@"`.

**Expected:** `engine.env` should be the authoritative source for `HOME`/`USER`/`LOGNAME` of the agent exec — applied AFTER capsh's user-switch, not before.

### Gap 3 — AWF chroot needs a unix-socket DOCKER_HOST that lives on the daemon's own filesystem

**Symptom (with `engine.env.DOCKER_HOST` left at the runner-pod default `tcp://localhost:2375`):** AWF can't locate the daemon's filesystem to bind-mount as `/host`, falls back to the awf-agent container's own (Alpine) rootfs, and the chroot probe fails as in Gap 1.

**Workaround:**
1. Add a second listener on the DinD daemon: `--host=unix:///dind-sock/docker.sock`
2. Mount a shared `dind-sock` emptyDir at `/dind-sock` on both the runner container and the DinD sidecar.
3. Override the engine env: `engine.env.DOCKER_HOST=unix:///dind-sock/docker.sock`

**Expected:** Either AWF should be able to chroot correctly with a TCP DOCKER_HOST (since the daemon container ID is discoverable from the Docker API), or the chroot setup should accept a configuration knob like `awf.chroot.daemon_filesystem_path` instead of inferring it from DOCKER_HOST's URL scheme.

### Gap 4 — Copilot CLI is installed on the runner pod, but the chroot only sees the daemon pod's `/usr/local/bin`

The `Install GitHub Copilot CLI` step in the gh-aw lock writes `copilot` to `/home/runner/.npm-global/bin/copilot` on the **runner**. The AWF chroot mounts the **daemon's** `/usr` read-only as `/host/usr` — so the runner-installed copilot binary is not visible inside chroot.

**Workaround:** A pre-agent composite action that copies the runner-installed copilot binary into the DinD daemon's `/usr/local/bin/copilot.real` via a helper container with `-v /usr/local/bin:/daemon-usr-local-bin:rw -v /tmp:/host-tmp:ro`.

**Expected:** The `Install GitHub Copilot CLI` step should be aware of ARC/DinD and install into the daemon's filesystem when chroot mode is active. Or the agent image bundle should ship copilot already.

### Gap 5 — `mount_mcp_as_cli.cjs` hardcodes `github` as INTERNAL_SERVERS, so `--disable-builtin-mcps` hides github_* tools

**Symptom:** When the agent harness invokes copilot with `--disable-builtin-mcps` (the AWF default for chroot mode), the github MCP server doesn't get mounted as a CLI shim because it's in the hardcoded `INTERNAL_SERVERS = new Set(["github"])` set in `${RUNNER_TEMP}/gh-aw/actions/mount_mcp_as_cli.cjs`. The agent can't list issues, read PRs, etc.

**Workaround:** A pre-agent `sed -i` patch:

```bash
sed -i 's|INTERNAL_SERVERS = new Set(\["github"\])|INTERNAL_SERVERS = new Set([])|' \
  "${RUNNER_TEMP}/gh-aw/actions/mount_mcp_as_cli.cjs"
```

**Expected:** This is a straight bug — either `INTERNAL_SERVERS` should be empty when `--disable-builtin-mcps` is set, or `github` should be made an opt-in entry rather than hardcoded.

### Gap 6 — `/etc/passwd`, `/etc/group`, `/etc/hosts` overrides still need workflow-level mounts

The daemon's DinD image rootfs has no UID 1001 in `/etc/passwd` (`runner` doesn't exist), no `runner` group, and no `host.docker.internal` in `/etc/hosts`. capsh's user switch + HOME resolution + MCP gateway resolution all break.

**Workaround:** `sandbox.agent.mounts` with daemon-staged files:

```yaml
sandbox:
  agent:
    mounts:
      - /tmp/gh-aw/arc-etc/passwd:/etc/passwd:ro
      - /tmp/gh-aw/arc-etc/group:/etc/group:ro
      - /tmp/gh-aw/arc-etc/hosts:/etc/hosts:ro
```

with a pre-agent step synthesizing those files via a helper container into a daemon-visible path.

**Expected:** AWF should synthesize a minimal `/etc/passwd` / `/etc/group` containing the `awfuser`/`runner` UID it's about to switch to, and an `/etc/hosts` entry for `host.docker.internal` (the host-gateway IP), without requiring the workflow to ship them.

### Gap 7 — `safe-outputs.threat-detection` job runs without the workflow's pre-agent-steps and re-hits Gap 4

**Symptom:** Even with all of the above workarounds applied, the auto-generated `detection` job fails inside AWF chroot:

```
[copilot-harness] attempt 1: spawning: /usr/local/bin/copilot ...
[copilot-harness] attempt 1: failed to start process '/usr/local/bin/copilot':
  spawn /usr/local/bin/copilot ENOENT (code=ENOENT syscall=spawn /usr/local/bin/copilot)
[copilot-harness] attempt 1: process closed exitCode=-2 duration=0s stdout=0B stderr=0B hasOutput=false
...
📄 No lines containing THREAT_DETECTION_RESULT found in 132 lines
##[error]ERR_PARSE: ❌ No THREAT_DETECTION_RESULT found in detection log.
```

The detection job:

1. Runs gh-aw's `Install GitHub Copilot CLI` step, which installs to `/home/runner/.npm-global/bin/copilot` on the **runner** pod.
2. Starts AWF chroot — the daemon-level workarounds (Ubuntu DinD, `capsh`, `node` in image, DOCKER_HOST unix socket) all carry over because they're cluster/helm-level.
3. AWF chroots into the daemon's filesystem and tries to spawn `/usr/local/bin/copilot` — never staged there for the detection job → `ENOENT`.

The job is **silently marked successful** because `GH_AW_DETECTION_CONTINUE_ON_ERROR !== 'false'` (gh-aw's default), which means the overall workflow goes green with no threat detection having actually run. That's a security regression: a workflow author who has correctly configured `safe-outputs.threat-detection` will believe their outputs were screened, when in fact the detector no-op'd.

Per `docs/src/content/docs/reference/steps-jobs.md`, gh-aw exposes `pre-agent-steps`, `post-steps`, and `jobs.<id>.pre-steps`, but NO `pre-detection-steps` or `safe-outputs.threat-detection.pre-steps` hook. The detection job is an auto-generated job with no public injection point, so the same workaround we applied for the agent job cannot be applied here from the workflow level.

**Expected:** The fix for Gap 4 (install copilot CLI into the chroot-runtime overlay when chroot mode is active) is sufficient *if* it covers both the agent and detection jobs. As a contingency, expose a `safe-outputs.threat-detection.pre-steps` hook so users can apply ARC-specific staging until the runtime overlay ships.

Files to touch:

- `pkg/workflow/compiler_threat_detection.go` (or wherever the detection job YAML is emitted) — inject the same staging logic used by the agent job, OR add a frontmatter `pre-steps` field to the threat-detection block.
- `pkg/parser/schemas/frontmatter.json` — schema entry for the new `pre-steps` field if option B is taken.
- `pkg/workflow/threat_detection_test.go` — new test asserting that on ARC/DinD the detection job successfully invokes copilot.

Default `GH_AW_DETECTION_CONTINUE_ON_ERROR` should also be reconsidered: the current behavior masks setup failures as successful no-op detections.

## Root cause summary

The v0.75.0 fix (referenced in #30840's closing comments) addressed the bind-mount-source split-filesystem problem and the squid log-dir ownership issue. It did **not** address:

- Daemon filesystem libc requirements (Gap 1)
- engine.env identity-var propagation through capsh (Gap 2)
- Daemon-filesystem discovery from a TCP DOCKER_HOST (Gap 3)
- Runner-installed agent binaries not being visible in chroot (Gap 4)
- INTERNAL_SERVERS hardcoding (Gap 5 — independent of chroot, but
  blocks the same use case)
- Minimal identity/hosts synthesis in chroot (Gap 6)
- Threat-detection job runs without `pre-agent-steps` and silently no-ops on chroot setup failures (Gap 7)

## Proposed implementation plan

### 1. Bundle an AWF chroot runtime tarball in the agent image

In `gh-aw-firewall`, build a small "chroot-runtime" tarball at image build time containing:

- `capsh` (static or with vendored libcap)
- `/bin/sh`, `/bin/bash`, busybox applets used by AWF entrypoint (`mkdir`, `chmod`, `cat`, `head`, `tee`)
- `libutil.so.1` (already mentioned in #30840)
- A registered location for the engine binary (e.g. `/awf/engine/bin`) that the agent harness exec's through

Stage this tarball from the AWF agent image into the daemon's filesystem at startup via a Docker API helper-container pattern (daemon-visible path → extract into `/awf/runtime` → chroot reads from there). This removes the requirement that users provide a glibc daemon image, and removes Gap 1.

Files to touch:

- `gh-aw-firewall/containers/agent/Dockerfile` — assemble tarball
- `gh-aw-firewall/containers/agent/entrypoint.sh` — extract on
  startup, chroot into `/awf/runtime` overlay
- `gh-aw-firewall/src/services/agent-volumes.ts` — add the
  staging mount

### 2. Preserve engine.env identity vars across capsh

`gh-aw-firewall/containers/agent/entrypoint.sh` performs the capsh user switch. After the switch, environment is rebuilt from `/etc/passwd` (HOME), and the AWF agent container's pre-switch USER/ LOGNAME survive — `engine.env`'s overrides are lost. Apply `HOME`/`USER`/`LOGNAME` from engine.env (passed as `AWF_ENGINE_ENV_*` docker env vars by gh-aw) **after** the capsh exec, before exec'ing the engine binary.

Files to touch:

- `gh-aw-firewall/containers/agent/entrypoint.sh`
- `pkg/workflow/engine_env.go` (gh-aw side: ensure engine.env is forwarded as `AWF_ENGINE_ENV_*`)

### 3. Daemon-filesystem discovery without DOCKER_HOST coupling

Today AWF infers the daemon's bind-mountable filesystem from the DOCKER_HOST URL scheme (`unix://` path's parent dir → daemon-visible mount root). Instead, use the Docker API to inspect the daemon container itself and discover its `MergedDir`/`UpperDir` (overlay2) or equivalent. This makes chroot work with TCP DOCKER_HOST and removes the need for the extra unix-socket listener + shared emptyDir (Gap 3).

Files to touch:

- `gh-aw-firewall/src/host-env.ts` — add `resolveDaemonFilesystem()`
- `gh-aw-firewall/src/services/agent-service.ts` — use the resolved path instead of inferring from DOCKER_HOST

### 4. Install Copilot CLI into the chroot-runtime overlay

The gh-aw `Install GitHub Copilot CLI` step currently `npm i -g`s on the runner. When `runs-on` selects an ARC runner where AWF chroot mode will be enabled, install instead into the chroot-runtime overlay (Gap 1's bundle), via a Docker helper container that writes into the daemon's filesystem. The detection signal is identical to what `--docker-host-path-prefix` already detects.

Files to touch:

- `pkg/workflow/agent_install.go` (or equivalent — the step is generated by gh-aw compile, lookup needed)
- `gh-aw-firewall/src/services/agent-volumes.ts` — expose the chroot-runtime mountpoint to the install step

### 5. Fix INTERNAL_SERVERS hardcoding

In `pkg/workflow/mcp_internal.go` (or wherever the `INTERNAL_SERVERS = new Set(["github"])` literal is generated into
`mount_mcp_as_cli.cjs`), remove the hardcoded entry. With `--disable-builtin-mcps`, all MCP servers should be eligible for CLI mounting, including github.

Files to touch:

- `pkg/workflow/mcp_internal.go` (or the template that emits `mount_mcp_as_cli.cjs`)
- `pkg/workflow/mcp_internal_test.go`

### 6. Synthesize minimal `/etc/passwd`, `/etc/group`, `/etc/hosts` in chroot

`gh-aw-firewall/containers/agent/entrypoint.sh` already writes `/host/etc/resolv.conf` — extend the same machinery to write minimal identity files containing only the AWF user (UID `AWF_USER_UID`,groupname matching `awfuser`) and an `/etc/hosts` containing `host.docker.internal` (resolved from the `extra_hosts` host-gateway).

Files to touch:

- `gh-aw-firewall/containers/agent/entrypoint.sh`
- `gh-aw-firewall/src/services/agent-service.ts` (drop the requirement that workflows mount these themselves)

### 7. Make threat-detection ARC-aware and fail loud on setup errors

Two independent changes:

a. The auto-generated `detection` job must apply the same chroot prerequisites as the `agent` job. With the chroot-runtime overlay (Step 1) and chroot-routed Copilot install (Step 4), the detection job inherits the same setup for free — confirm with an integration test.

b. Change `GH_AW_DETECTION_CONTINUE_ON_ERROR` semantics so that AWF chroot setup failures (`spawn ENOENT`, capsh missing, etc.) propagate to the job conclusion. A model-output-format mismatch is one thing; a copilot binary not being spawnable is another. Differentiate "model produced unparseable output" from "the detection engine never started" and only continue-on-error for the former.

Files to touch:

- `pkg/workflow/compiler_threat_detection.go` — emit the staging prerequisites (or rely on Step 4's overlay).
- `pkg/workflow/threat_detection_parser.go` (or similar) — `distinguish` parse-failure vs spawn-failure.
- `pkg/workflow/threat_detection_test.go`.

## Test plan

Following the patterns in `gh-aw-firewall/tests/integration/chroot-*.test.ts`:

1. **Glibc-free daemon integration test:** spin up an Alpine `docker:dind` as the test daemon, point AWF at it, and confirm the agent step runs to completion (chroot-runtime overlay supplies capsh / sh / etc.).
2. **engine.env propagation unit test:** assert that engine.env `HOME=/foo`, `USER=u`, `LOGNAME=u` reach the engine exec inside chroot (verified via a probe binary that prints `id`, `$HOME`, `$USER`, `$LOGNAME`).
3. **TCP DOCKER_HOST integration test:** identical to the existing ARC integration but with `DOCKER_HOST=tcp://docker-daemon:2375`
   and no shared filesystem mount — agent step must still succeed.
4. **Copilot install routing test:** in chroot mode, the `Install GitHub Copilot CLI` step writes to the chroot-runtime overlay and `/usr/local/bin/copilot` is discoverable from inside chroot.
5. **github MCP CLI mount test:** with `--disable-builtin-mcps`, `mount_mcp_as_cli` emits a github shim.
6. **Synthesized identity-files test:** chroot has `id runner` → UID matching `AWF_USER_UID`, and `getent hosts host.docker.internal` returns the host-gateway IP, without any user-supplied `sandbox.agent.mounts`.
7. **Threat-detection ARC test:** with `safe-outputs.threat-detection` enabled and `runs-on: arc-gaw`, the detection job successfully spawns copilot, produces a parseable `THREAT_DETECTION_RESULT`, and a deliberate spawn failure (e.g. unstaged copilot) causes the detection job to **fail** rather than silently no-op.

`make agent-finish` must pass.

## Acceptance criteria

A consumer can write the workflow below and have it run on an ARC RunnerScaleSet with a stock `docker:dind` sidecar, with no `sandbox.agent.mounts`, no `engine.env` overrides, no `pre-agent-steps`, and no custom DinD image:

```yaml
---
on: { workflow_dispatch: }
runs-on: arc-gaw
engine:
  id: copilot
  model: gpt-5.4
safe-outputs:
  threat-detection:
    runs-on: arc-gaw
tools:
  github: { toolsets: [all] }
  bash: true
---
# Hello agent
```

Both the agent job AND the threat-detection job must run end-to-end.
A workflow that successfully runs the agent but silently no-ops
threat-detection (today's behavior) does not satisfy this criterion.

## Labels

`area:awf`, `area:arc-dind`, `area:engine-copilot`, `type:bug`, `scope:cross-cutting`.

## Related
- #30838, #30840 — closed, fixed in v0.75.0 (bind-source split,squid log dir ownership)
- #33777 — open, unix-socket DOCKER_HOST from sibling daemon pod
- #28888 — closed, MCP gateway on ARC with DinD sidecar

The current issue is scoped to remaining gaps **after** the v0.75.0 fixes, with `tcp://` DOCKER_HOST + DinD-sidecar topology.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARC-DinD] ARC/DinD support in v0.75.4 still requires workflow-level workarounds #34896

Summary

Repro

Remaining gaps in v0.75.4

Gap 1 — AWF chroot rejects Alpine/musl daemon hosts but the official ARC DinD is Alpine

Gap 2 — engine.env `HOME` / `USER` / `LOGNAME` are silently overridden

Gap 3 — AWF chroot needs a unix-socket DOCKER_HOST that lives on the daemon's own filesystem

Gap 4 — Copilot CLI is installed on the runner pod, but the chroot only sees the daemon pod's `/usr/local/bin`

Gap 5 — `mount_mcp_as_cli.cjs` hardcodes `github` as INTERNAL_SERVERS, so `--disable-builtin-mcps` hides github_* tools

Gap 6 — `/etc/passwd`, `/etc/group`, `/etc/hosts` overrides still need workflow-level mounts

Gap 7 — `safe-outputs.threat-detection` job runs without the workflow's pre-agent-steps and re-hits Gap 4

Root cause summary

Proposed implementation plan

1. Bundle an AWF chroot runtime tarball in the agent image

2. Preserve engine.env identity vars across capsh

3. Daemon-filesystem discovery without DOCKER_HOST coupling

4. Install Copilot CLI into the chroot-runtime overlay

5. Fix INTERNAL_SERVERS hardcoding

6. Synthesize minimal `/etc/passwd`, `/etc/group`, `/etc/hosts` in chroot

7. Make threat-detection ARC-aware and fail loud on setup errors

Test plan

Acceptance criteria

Labels

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[ARC-DinD] ARC/DinD support in v0.75.4 still requires workflow-level workarounds #34896

Description

Summary

Repro

Remaining gaps in v0.75.4

Gap 1 — AWF chroot rejects Alpine/musl daemon hosts but the official ARC DinD is Alpine

Gap 2 — engine.env HOME / USER / LOGNAME are silently overridden

Gap 3 — AWF chroot needs a unix-socket DOCKER_HOST that lives on the daemon's own filesystem

Gap 4 — Copilot CLI is installed on the runner pod, but the chroot only sees the daemon pod's /usr/local/bin

Gap 5 — mount_mcp_as_cli.cjs hardcodes github as INTERNAL_SERVERS, so --disable-builtin-mcps hides github_* tools

Gap 6 — /etc/passwd, /etc/group, /etc/hosts overrides still need workflow-level mounts

Gap 7 — safe-outputs.threat-detection job runs without the workflow's pre-agent-steps and re-hits Gap 4

Root cause summary

Proposed implementation plan

1. Bundle an AWF chroot runtime tarball in the agent image

2. Preserve engine.env identity vars across capsh

3. Daemon-filesystem discovery without DOCKER_HOST coupling

4. Install Copilot CLI into the chroot-runtime overlay

5. Fix INTERNAL_SERVERS hardcoding

6. Synthesize minimal /etc/passwd, /etc/group, /etc/hosts in chroot

7. Make threat-detection ARC-aware and fail loud on setup errors

Test plan

Acceptance criteria

Labels

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Gap 2 — engine.env `HOME` / `USER` / `LOGNAME` are silently overridden

Gap 4 — Copilot CLI is installed on the runner pod, but the chroot only sees the daemon pod's `/usr/local/bin`

Gap 5 — `mount_mcp_as_cli.cjs` hardcodes `github` as INTERNAL_SERVERS, so `--disable-builtin-mcps` hides github_* tools

Gap 6 — `/etc/passwd`, `/etc/group`, `/etc/hosts` overrides still need workflow-level mounts

Gap 7 — `safe-outputs.threat-detection` job runs without the workflow's pre-agent-steps and re-hits Gap 4

6. Synthesize minimal `/etc/passwd`, `/etc/group`, `/etc/hosts` in chroot