Skip to content

fix(squid): chown bind-mounted log dirs to proxy user on startup#3532

Merged
lpcox merged 2 commits into
mainfrom
salmanmkc/fix-squid-daemon-side-ownership
May 21, 2026
Merged

fix(squid): chown bind-mounted log dirs to proxy user on startup#3532
lpcox merged 2 commits into
mainfrom
salmanmkc/fix-squid-daemon-side-ownership

Conversation

@salmanmkc
Copy link
Copy Markdown
Collaborator

@salmanmkc salmanmkc commented May 21, 2026

Bug Fix

What was the bug?

On split runner/Docker daemon filesystems (notably ARC + DinD), awf-squid exits with code 1 immediately on startup:

Container awf-squid Error  dependency squid-proxy failed to start
dependency failed to start: container awf-squid exited (1)

This blocks any agent run on those runners, even with gh-aw v0.74.7 (which pins this repo at v0.25.49 — already containing the path-prefix translation fix from #3218).

#3218 fixed the first half of this class of bug: making the runner and the daemon agree on which path the bind mount comes from. But the path-translation alone leaves a second symmetry gap: even when both sides resolve the same source path, the daemon-side directory has to be owned by the right user before the consuming container starts.

Why the existing fix is incomplete

config-writer.ts:56-75 creates the squid logs dir and chowns it to the proxy UID (13), then sets a chmod 0o777 fallback if chown fails. But this all runs against the runner's view of the filesystem. On a DinD setup, the runner sets /$workDir/squid-logs to 13:13, then docker compose up starts the daemon-side bind mount. The daemon's filesystem view is independent — the source path doesn't exist on the daemon side, so the daemon auto-creates it as root-owned (per the documented Docker bind-mount behavior: If you bind-mount a file or directory that does not yet exist on the Docker host, -v creates the endpoint for you. It is always created as a directory.).

The container then sees a root:root directory mounted over /var/log/squid, even though the Dockerfile pre-chowned that path to proxy. Squid (UID 13) tries to open /var/log/squid/access.logEACCES → exit 1. No stderr surfaces in the wrapper output because squid dies before completing its log subsystem init.

This is the same bug class as #3218 — asymmetric handling between the runner and the daemon — but one layer further down the stack. The two halves are independent: #3218 makes the path correct, this PR makes the ownership correct.

How did you fix it?

The compose squid service now runs its entrypoint as root (user: '0:0'), performs a chown preflight, then drops to the proxy user before the image's own entrypoint script runs. The preflight is:

chown proxy:proxy /var/log/squid && \
if [ -d /var/spool/squid_ssl_db ]; then chown proxy:proxy /var/spool/squid_ssl_db; fi && \
exec su -s /bin/bash proxy -c '... (decode config if injected) ... && exec /usr/local/bin/entrypoint.sh'

The image's existing entrypoint.sh (which handles the IPv6 listener strip and execs squid) is unchanged and still runs as the proxy user. Squid itself never runs as root; the root window is bounded to the two chown syscalls and the su exec.

A few subtleties:

  • Non-recursive chown. Only the bind-mount source dir's own ownership needs repair, not its (potentially user-supplied) contents. This keeps the preflight bounded and fast even when proxyLogsDir points at a large host directory.
  • su -s /bin/bash proxy rather than runuser or gosu. su is in util-linux and ships in the plain ubuntu base of the ubuntu/squid image; using it means this PR requires no rebuild of the squid container. Crucially, that keeps the change compatible with the older pinned squid=sha256:… shas that the integration and smoke tests use to validate the runtime stack against historical container artifacts.
  • SSL DB chown is guarded with if [ -d ... ] so it is a no-op when SSL Bump is disabled but engages automatically when it is enabled.
  • The chown preflight applies uniformly whether or not squid config content is injected, since the daemon-side ownership problem is independent of the config-injection mechanism.

Why this lives in the wrapper, not the squid image

A previous draft moved the chown + decode + privilege drop into containers/squid/entrypoint.sh (with gosu installed in the Dockerfile). That is the cleaner long-term architecture, but in this repo the integration and smoke tests pin a specific older squid SHA — a wrapper-only PR like this one can ship without bumping that pin. Once a new squid image is pinned, a follow-up PR can move this logic into the image's entrypoint and remove the wrapper-side override.

What is the impact?

  • Unblocks ARC + DinD users running gh-aw v0.74.7 who have --docker-host-path-prefix engaged. The awf-squid container starts cleanly; no exit code 1; no dependency squid-proxy failed to start cascade.
  • Behavior is unchanged on shared-filesystem runners — the chown is a no-op on dirs already owned by 13:13, and su -s /bin/bash proxy is a thin process hop.
  • The squid image is unchanged. No new pin to bump, no new image to build/push.

Testing

  • npx tsc --noEmit — clean.
  • npx eslint src/services/squid-service.ts src/services/squid-service.test.ts — 0 errors. (6 pre-existing any warnings in surrounding code, matching repo style — not introduced by this change.)
  • npx jest src/services/squid-service.test.ts — 5 passed, including 2 new regression tests:
    • should run squid container as root with a chown preflight that drops privileges — asserts user: '0:0', non-recursive chown proxy:proxy of both /var/log/squid and the conditional SSL DB, the privilege-drop via su -s /bin/bash proxy -c, the ordering (chown before su), and that chown -R is not used anywhere in the entrypoint.
    • should apply the chown preflight even when no squid config content is provided — covers the no-config branch.
  • npx jest (full unit suite) — 2005 of 2005 passed.

Caveats

  • This PR addresses the squid-specific failure mode. The same class of bug would in principle affect api-proxy and cli-proxy if they ever bind-mount directories that need to be owned by their own (non-root) users — but their containers use Alpine addgroup -S which assigns dynamic UIDs, so a future similar report would need the same treatment with the right UID per service. Out of scope here.
  • Does not cover the case where the bind-mount source path on the daemon side is on a read-only filesystem. If a future ARC variant runs the daemon with --read-only, the chown will fail with EROFS and squid will still exit 1 — but with a different log line that would make the new failure mode obvious.

Related

Copilot AI review requested due to automatic review settings May 21, 2026 16:14
@salmanmkc salmanmkc marked this pull request as draft May 21, 2026 16:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 95.98% 96.06% 📈 +0.08%
Statements 95.81% 95.88% 📈 +0.07%
Functions 97.36% 97.36% ➡️ +0.00%
Branches 89.48% 89.52% 📈 +0.04%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/config-writer.ts 83.0% → 85.6% (+2.54%) 83.0% → 85.6% (+2.54%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses Squid startup failures on split runner/Docker-daemon filesystems (e.g., ARC + DinD) by ensuring bind-mounted log (and optional SSL DB) directories are owned by Squid’s non-root proxy user before Squid starts.

Changes:

  • Start the squid-proxy compose service as root, run a chown preflight on bind-mounted directories, then drop privileges to proxy via runuser before invoking the image entrypoint.
  • Apply the same preflight path both when squid config is injected (via AWF_SQUID_CONFIG_B64) and when it is not.
  • Add regression tests asserting user: 0:0, preflight ordering, and uniform application of the privilege-drop wrapper.
Show a summary per file
File Description
src/services/squid-service.ts Adds root-run chown preflight + runuser privilege drop wrapper around Squid startup to fix DinD bind-mount ownership issues.
src/services/squid-service.test.ts Adds regression tests to validate the new root preflight + privilege-drop behavior in both config and no-config branches.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Comment thread src/services/squid-service.ts Outdated
Comment on lines +96 to +113
// Pre-flight chown as root: ensure the bind-mounted log directory (and SSL
// database, when SSL Bump is enabled) are writable by the proxy user (UID 13).
// On split runner/Docker daemon filesystems (e.g. ARC + DinD), Docker
// auto-creates missing bind-mount source paths on the daemon side as
// root-owned, even though the runner pre-chowned its own view of the source
// to UID 13. The bind-mounted (root-owned) directory then overrides the
// Dockerfile-baked /var/log/squid (proxy-owned), and squid exits 1 the first
// time it tries to open access.log. This preflight runs as root, fixes the
// ownership in place, then drops to the proxy user before anything else
// (config decode, sed, squid itself) runs. On shared-filesystem runners this
// is a no-op because the runner already chowned the source dirs to 13:13 in
// config-writer.ts. runuser preserves env vars by default so AWF_SQUID_CONFIG_B64
// remains visible to the dropped-privilege shell.
const SQUID_PROXY_UID = 13;
const SQUID_PROXY_GID = 13;
const chownPreflight =
`chown -R ${SQUID_PROXY_UID}:${SQUID_PROXY_GID} /var/log/squid && ` +
`if [ -d /var/spool/squid_ssl_db ]; then chown -R ${SQUID_PROXY_UID}:${SQUID_PROXY_GID} /var/spool/squid_ssl_db; fi`;
@github-actions

This comment has been minimized.

@lpcox
Copy link
Copy Markdown
Collaborator

lpcox commented May 21, 2026

@copilot address the review feedback

On split runner/Docker daemon filesystems (notably ARC + DinD), Docker
auto-creates missing bind-mount source paths on the daemon side as
root-owned. That bind-mount then overrides the Dockerfile-baked
/var/log/squid (proxy-owned), and squid (UID 13) exits 1 the first
time it tries to open access.log. The wrapper's runner-side mkdir +
chown to UID 13 in config-writer.ts only applies to the runner's view
of the source path — the daemon's view starts empty and is auto-
created with root ownership.

#3218 fixed the path-translation half of this class of bug. This change
covers the second half: even when the path is correct, the daemon-side
ownership has to be repaired before squid starts.

The compose service builder now runs the squid container's entrypoint
as root (user: '0:0'), performs a non-recursive chown of /var/log/squid
(and /var/spool/squid_ssl_db when present) back to the proxy user, and
drops to the proxy user via 'su -s /bin/bash proxy -c ...' before the
image's own entrypoint runs. The image's entrypoint then handles the
existing IPv6 listener strip and execs squid itself, all as proxy.
Squid never runs as root; the root window is bounded to the two chown
syscalls and the su exec.

The chown is non-recursive: only the bind-mount source dir's own
ownership needs repair, not its (potentially user-supplied) contents.
This keeps the preflight bounded and fast even when proxyLogsDir
points at a large host directory.

su was chosen over runuser/gosu because the squid base image is plain
ubuntu (util-linux ships in the base), so su is universally available
without requiring an image rebuild. Doing the fix in the wrapper
instead of in the squid image keeps it compatible with the older pinned
squid shas that the integration and smoke tests use to validate the
runtime stack against historical container artifacts.

Tests:
- Existing 'inject squid config via base64 env var' test still passes:
  the new entrypoint still emits the base64-decode, just wrapped after
  the chown and inside the su privilege drop.
- New 'chown preflight + drop privileges' test asserts user '0:0',
  non-recursive chown of /var/log/squid and the conditional SSL DB
  chown, the ordering (chown before su), and that 'chown -R' is not
  used anywhere in the entrypoint.
- New 'chown preflight applies without config' test covers the
  no-config branch.
- All 2005 unit tests across the repo pass.
@salmanmkc salmanmkc force-pushed the salmanmkc/fix-squid-daemon-side-ownership branch from 9bedf2c to fad4b95 Compare May 21, 2026 16:55
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI commented May 21, 2026

@copilot address the review feedback

Done in the latest commit. chownPreflight now uses chown ... 2>/dev/null || chmod 0777 ... for both the log dir and the SSL DB dir, matching the fallback pattern already in config-writer.ts. The chown was already non-recursive (no -R). The test assertion for the SSL DB line was updated to match the new tolerant pattern.

@github-actions

This comment has been minimized.

Copilot finished work on behalf of lpcox May 21, 2026 17:01
Copilot AI requested a review from lpcox May 21, 2026 17:01
@lpcox lpcox marked this pull request as ready for review May 21, 2026 17:21
@github-actions
Copy link
Copy Markdown
Contributor

✅ Smoke Test Results

  • ✅ GitHub API: 2 recent PRs verified
  • ✅ Playwright check: PASS
  • ✅ File verification: Smoke test passed

Overall: PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions
Copy link
Copy Markdown
Contributor

🔬 Smoke Test: Copilot BYOK (Offline) Mode

Test Result
GitHub MCP connectivity
GitHub.com HTTP ⚠️ pre-step vars unresolved
File write/read ⚠️ pre-step vars unresolved
BYOK inference (agent → api-proxy → api.githubcopilot.com)

Running in BYOK offline mode (COPILOT_OFFLINE=true) via api-proxy → api.githubcopilot.com.

PR by @salmanmkc · assignees: @lpcox, @Copilot

Overall: PARTIAL — BYOK inference path confirmed working; pre-step smoke data vars were not substituted at runtime.

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions
Copy link
Copy Markdown
Contributor

🔥 Smoke Test Results

Test Status
GitHub MCP (list PRs)
GitHub.com connectivity
File write/read

Overall: PASS

PR: fix(squid): chown bind-mounted log dirs to proxy user on startup — author @salmanmkc, assignees @lpcox @Copilot

📰 BREAKING: Report filed by Smoke Copilot

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Codex: FAIL
PRs: Optimize Smoke Claude workflow token footprint by removing Playwright and constraining turns; Reduce test-coverage-improver token burn by narrowing file access and preloading target sources
✅ GitHub PR review; ❌ safeinputs-gh (tool missing)
✅ Playwright title; ❌ Tavily search (tool unavailable)
✅ File/bash; ❌ discussion query (tool missing)
✅ npm ci && npm run build
Overall status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Report

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results

Check Result
Redis PING ❌ Timeout/no response
PostgreSQL pg_isready ❌ No response
PostgreSQL SELECT 1 ❌ No response

Overall: FAILhost.docker.internal is unreachable from this runner environment. Service containers may not be running or the host alias is not resolvable.

🔌 Service connectivity validated by Smoke Services

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color ok ✅ PASS
Go env ok ✅ PASS
Go uuid ok ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx passed ✅ PASS
Node.js execa passed ✅ PASS
Node.js p-limit passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Note (Java): ~/.m2 directory was root-owned, preventing Maven from creating the default local repository. Worked around by using -s with a custom settings.xml pointing to /tmp/gh-aw/agent/m2-repo as the local repository path.

Generated by Build Test Suite for issue #3532 · ● 5.7M ·

@github-actions
Copy link
Copy Markdown
Contributor

Chroot Smoke Test Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.15.0 v22.22.3
Go go1.22.12 go1.22.12

Result: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot.

Tested by Smoke Chroot

@lpcox lpcox merged commit 111aa0d into main May 21, 2026
66 of 68 checks passed
@lpcox lpcox deleted the salmanmkc/fix-squid-daemon-side-ownership branch May 21, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants