Skip to content

fix(docker): apply docker-host-path-prefix to all compose service mounts#3218

Merged
lpcox merged 1 commit into
mainfrom
fix/iptables-init-docker-host-path-prefix
May 15, 2026
Merged

fix(docker): apply docker-host-path-prefix to all compose service mounts#3218
lpcox merged 1 commit into
mainfrom
fix/iptables-init-docker-host-path-prefix

Conversation

@salmanmkc
Copy link
Copy Markdown
Collaborator

@salmanmkc salmanmkc commented May 15, 2026

Bug Fix

What was the bug?

On split runner/Docker daemon filesystems (notably ARC + DinD), enabling --docker-host-path-prefix produced two related failure classes:

1. iptables-init handshake timeout (the reported symptom):

[entrypoint][ERROR] Timed out waiting for iptables init container after 30s
[entrypoint] No init container output log found

The agent and awf-iptables-init containers coordinate via a shared bind-mounted init-signal directory:

  • The init container runs setup-iptables.sh > /tmp/awf-init/output.log 2>&1 && touch /tmp/awf-init/ready.
  • The agent's containers/agent/entrypoint.sh polls for /tmp/awf-init/ready (timeout 30s) and falls back to printing /tmp/awf-init/output.log on failure.

buildAgentVolumes() runs every agent-side mount through translateBindMountHostPath(mount, dockerHostPathPrefix) so the init-signal source becomes daemon-resolvable (e.g. /host/<workDir>/init-signal). buildIptablesInitService() did not apply the same translation, so once --docker-host-path-prefix was set the two containers bound to two different daemon-side directories. The init container could complete setup-iptables.sh successfully and the agent would still time out.

2. The same gap existed in every other compose service builder:

  • buildSquidService — squid logs, SSL cert, SSL key, SSL DB
  • buildApiProxyService — api-proxy logs
  • buildCliProxyService — cli-proxy logs and the optional DIFC CA cert mount

Their bind-mount sources were never run through the prefix translation, so on ARC + DinD their logs would silently land in daemon-local directories and the optional file mounts could fail when Docker auto-creates a directory at the unstaged source path.

How did you fix it?

  • Shared host-path-prefix module

    • Extracted normalizeDockerHostPathPrefix, translateBindMountHostPath, and a new applyHostPathPrefixToVolumes(volumes, prefix) array helper into src/services/host-path-prefix.ts. The translation logic is unchanged.
    • agent-volumes.ts re-exports the helpers for backwards compatibility and delegates its existing prefix block to the shared helper (no behavior change for agent volumes).
  • Symmetric translation across the compose stack

    • buildIptablesInitService, buildSquidService, buildApiProxyService, and buildCliProxyService now call applyHostPathPrefixToVolumes(volumes, config.dockerHostPathPrefix) at the end of their volume list construction.
    • buildDohProxyService has no bind mounts and is unchanged.
    • The agent and iptables-init containers now always bind the same daemon-side init-signal directory (the original reported regression).
    • Sibling-service log directories and optional file mounts are now daemon-resolvable when --docker-host-path-prefix is set.
  • Behavioral coverage

    • should mount init-signal dir without translation when dockerHostPathPrefix is unset — confirms the default unprefixed behavior is preserved.
    • should apply dockerHostPathPrefix to the iptables-init init-signal volume — regression test for the original bug, including a cross-check that the agent and iptables-init mount sources stay symmetric.
    • should normalize trailing slash in dockerHostPathPrefix for iptables-init mount — mirrors the existing trailing-slash test for agent volumes.
    • Parameterized symmetric invariant test — walks every bind mount on every compose service and asserts the prefix is applied uniformly when set (/host, /host/) and skipped otherwise (unset, empty, whitespace). This is what protects future service builders from the same class of asymmetric translation bug.

What is the impact?

  • Unblocks ARC + DinD users running with --docker-host-path-prefix /host: they no longer hit the 30s init handshake timeout, and squid/api-proxy/cli-proxy bind-mount sources now resolve in the daemon namespace.
  • Behavior is unchanged when dockerHostPathPrefix is unset (preserved by the early return in applyHostPathPrefixToVolumes).
  • Default unix-socket DOCKER_HOST users see no functional change.

Testing

Local:

  • npx tsc --noEmit — clean.
  • npx eslint on every touched file — 0 errors (only pre-existing warnings that match nearby code style).
  • npx jest src/services/ — 295 passed.
  • Full unit suite — 1848 passed (5 new param cases).

CI (post-push): 62 of 65 checks green. The 3 failures all reproduce on main and are unrelated to this change:

  • Audit Docs Site Package — npm audit CVEs in docs-site/node_modules (mermaid/postcss/uuid/vite/yaml).
  • Test Coverage Reportscripts/ci/workflow-engine-install-security.test.ts "Claude Code CLI installs must include --ignore-scripts".
  • Smoke OpenCode — "No safe outputs were invoked"; failing on main for 4+ consecutive scheduled runs.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@salmanmkc salmanmkc changed the title fix(agent): apply docker-host-path-prefix to iptables-init init-signal volume fix(docker): apply docker-host-path-prefix to iptables-init init-signal volume May 15, 2026
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@salmanmkc salmanmkc marked this pull request as ready for review May 15, 2026 14:18
Copilot AI review requested due to automatic review settings May 15, 2026 14:18
@salmanmkc salmanmkc requested a review from Mossaka as a code owner May 15, 2026 14:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a DinD/split-filesystem bind-mount mismatch where --docker-host-path-prefix was applied to the agent’s init-signal mount but not to the iptables-init service, causing the agent to time out waiting for the /tmp/awf-init/ready handshake file.

Changes:

  • Exported normalizeDockerHostPathPrefix / translateBindMountHostPath from agent-volumes.ts for reuse.
  • Updated buildIptablesInitService() to apply the same host-path prefix translation to the init-signal bind mount as buildAgentVolumes() does.
  • Wired dockerHostPathPrefix through compose generation and added regression tests ensuring agent + init container mount-source symmetry (including trailing-slash normalization).
Show a summary per file
File Description
src/services/agent-volumes.ts Exports the existing docker-host-path-prefix normalization/translation helpers for reuse.
src/services/agent-service.ts Applies docker-host-path-prefix translation to the iptables-init init-signal volume so it matches the agent’s mount source.
src/services/agent-service.test.ts Adds coverage for default behavior, prefixed behavior, and trailing-slash normalization for the iptables-init init-signal volume.
src/compose-generator.ts Passes config.dockerHostPathPrefix into buildIptablesInitService().

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 4/4 changed files
  • Comments generated: 0

@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented May 15, 2026

@copilot address review feedback

Copy link
Copy Markdown
Contributor

Copilot AI commented May 15, 2026

@copilot address review feedback

The Copilot reviewer found 0 issues across all 4 changed files.

All CI failures are pre-existing and unrelated to this change:

  • Test Coverageworkflow-engine-install-security.test.ts failure (Claude CLI install missing --ignore-scripts); reproduces on main.
  • Dependency Vulnerability Audit — 23 vulnerabilities in the docs-site packages; pre-existing.
  • Smoke OpenCode — "No safe outputs were invoked"; agentic workflow smoke test, unrelated to the bind-mount fix.
  • Smoke Copilot BYOKgetaddrinfo EAI_AGAIN DNS failure during artifact upload; transient infrastructure issue.

No code changes are needed.

@salmanmkc
Copy link
Copy Markdown
Collaborator Author

same gap buildSquidService, buildApiProxyService, buildCliProxyService all create unprefixed bind mounts (logs, SSL cert/key, DIFC CA cert). Under ARC/DinD their outputs silently go to daemon-local dirs, and optional file mounts can fail when Docker auto-creates a directory where a file is expected. My PR fixes only the iptables-init handshake — the next layer of pain after gets past the timeout will be missing logs and possibly broken SSL bump / DIFC.

wondering if i shoudl do this in the pr

Copilot finished work on behalf of lpcox May 15, 2026 14:33
Copilot AI requested a review from lpcox May 15, 2026 14:33
@salmanmkc
Copy link
Copy Markdown
Collaborator Author

i'm gonna expand scope

@salmanmkc salmanmkc force-pushed the fix/iptables-init-docker-host-path-prefix branch from 306d670 to b72cd31 Compare May 15, 2026 14:38
@salmanmkc salmanmkc changed the title fix(docker): apply docker-host-path-prefix to iptables-init init-signal volume fix(docker): apply docker-host-path-prefix to all compose service mounts May 15, 2026
@github-actions

This comment has been minimized.

The agent and iptables-init containers coordinate via a shared bind-mounted
init-signal directory at /tmp/awf-init. The iptables-init container writes
ready/output.log there after running setup-iptables.sh, and the agent's
entrypoint waits for those files before continuing.

buildAgentVolumes() applies dockerHostPathPrefix to its mount sources so the
agent's /tmp/awf-init bind is daemon-resolvable on split runner/Docker daemon
filesystems (e.g. ARC + DinD). buildIptablesInitService() did not, so once
--docker-host-path-prefix was set the two containers bound to two different
daemon-side directories. The init container could complete successfully and
the agent would still time out after 30s with 'No init container output log
found' because its bind target stayed empty.

The same gap existed in the squid, api-proxy, and cli-proxy service builders:
their bind-mount sources (squid logs, SSL cert/key/db, api-proxy logs,
cli-proxy logs, optional DIFC CA cert) were never run through the prefix
translation, so on ARC/DinD their logs would land in daemon-local directories
and optional file mounts could fail when Docker auto-creates a directory at
the unstaged source path.

Extract normalize/translate/applyHostPathPrefixToVolumes into a shared
host-path-prefix module and call applyHostPathPrefixToVolumes() at the end
of every service builder's volume list construction. agent-volumes.ts
delegates to the shared helper and re-exports the helpers for backwards
compatibility. doh-proxy has no bind mounts and is unchanged.

Add a parameterized symmetric invariant test that walks every bind mount
on every compose service and asserts the prefix is applied uniformly when
set (and skipped otherwise), so any future service builder is protected
against the same class of asymmetric translation bug.
@github-actions

This comment has been minimized.

@salmanmkc salmanmkc force-pushed the fix/iptables-init-docker-host-path-prefix branch from b72cd31 to 4144266 Compare May 15, 2026 14:41
@salmanmkc
Copy link
Copy Markdown
Collaborator Author

done ready again for review

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK (Offline) Mode

Test Result
GitHub MCP connectivity ❌ (401 - GitHub MCP not authenticated in this environment)
GitHub.com HTTP connectivity ✅ (pre-step passed)
File write/read (/tmp/gh-aw/agent/smoke-test-copilot-byok-25923911456.txt)
BYOK inference (agent → api-proxy → api.githubcopilot.com)

Running in BYOK offline mode (COPILOT_OFFLINE=true) via api-proxy → api.githubcopilot.com.

Overall: PASS (GitHub MCP auth limitation is environment infra, not BYOK path)

Author: @salmanmkc

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results

Test Result
GitHub API (merged PRs) ❌ FAILED
Playwright (github.com title) ✅ PASSED
File verification ✅ PASSED

Overall: FAILED (2/3 passed)

Details:

  • GitHub API: Authentication error (HTTP 401) — gh CLI credentials unavailable in sandbox
  • Playwright: Successfully navigated to github.com, confirmed page title contains 'GitHub'
  • File: Verified smoke-test file exists at expected path with correct content

💥 [THE END] — Illustrated by Smoke Claude

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Smoke Test Results

Test Status
GitHub MCP connectivity ❌ 401 Bad credentials
GitHub.com HTTP connectivity ❌ Template vars not expanded
File write/read ❌ Template vars not expanded

Overall: FAIL

The workflow did not expand ${{ steps.smoke-data.outputs.* }} template variables before passing them to the agent. The pre-computed test data was unavailable.

/cc @salmanmkc

📰 BREAKING: Report filed by Smoke Copilot

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results

  • GitHub MCP Testing: ❌ (Tools missing)
  • GitHub.com Connectivity: ❌ (SSL error 35)
  • File Writing Testing: ✅
  • Bash Tool Testing: ✅

Overall Status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@github-actions
Copy link
Copy Markdown
Contributor

refactor: [Export Audit] Remove test-only re-exports from barrel modules
feat: auto-forward OTEL_* env vars with one-shot token protection for headers
GitHub PR review: ✅
SafeInputs GH CLI: ❌
Playwright title: ✅
Tavily search: ❌
File/bash/build: ✅
Discussion: ❌
Overall status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions
Copy link
Copy Markdown
Contributor

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.15.0 v20.20.2
Go go1.22.12 go1.22.12

Result: Not all tests passed — Python and Node.js versions differ between host and chroot.

Tested by Smoke Chroot

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #3218 · ● 4.1M ·

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results

  • Redis PING: ❌ (timeout — no response on host.docker.internal:6379)
  • PostgreSQL pg_isready: ❌ (no response on host.docker.internal:5432)
  • PostgreSQL SELECT 1: ❌ (skipped — host unreachable)

Overall: FAIL — service containers not reachable from this runner environment.

🔌 Service connectivity validated by Smoke Services

@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented May 15, 2026

@salmanmkc thank you for all the bug fixes, but it's preferrable that you file issues rather than PRs. we try to be very responsive and schedule them as soon as we can. thanks!

@salmanmkc
Copy link
Copy Markdown
Collaborator Author

@salmanmkc thank you for all the bug fixes, but it's preferrable that you file issues rather than PRs. we try to be very responsive and schedule them as soon as we can. thanks!

no worries, but even the code review agent didn't catch that this issue would happen later, so unsure if an issue is really useful in this case.

@salmanmkc
Copy link
Copy Markdown
Collaborator Author

in specifics needing to expand the scope

@lpcox lpcox merged commit 385ec4d into main May 15, 2026
65 of 68 checks passed
@lpcox lpcox deleted the fix/iptables-init-docker-host-path-prefix branch May 15, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants