Skip to content

feat(dind): gate docker run --privileged / --cap-add behind config#74

Open
luthermonson wants to merge 4 commits into
mainfrom
feat/dind-privileged-gate
Open

feat(dind): gate docker run --privileged / --cap-add behind config#74
luthermonson wants to merge 4 commits into
mainfrom
feat/dind-privileged-gate

Conversation

@luthermonson
Copy link
Copy Markdown
Contributor

Summary

Adds [dind] allow_privileged to control whether sibling containers created through the fake docker daemon may opt into the full elevation stack (Privileged=true, all caps, all devices, seccomp+apparmor unconfined, writable sysfs/cgroupfs). Requests with HostConfig.Privileged=true or HostConfig.CapAdd are rejected with HTTP 403 before any image pull when the gate is closed.

Threat model

A privileged sibling container is effectively root on whatever host runs the containerd that backs dind:

  • Windows / macOS — that backing containerd lives in a managed Linux VM (WSL2 / Hyper-V on Windows, Vz on macOS). A privileged escape stays inside the VM, not the bare metal.
  • Linux host — ephemerd runs directly on the host with no VM fence. A privileged escape is bare-metal-host compromise.

The runner container itself (pkg/runtime/runtime.go) was already locked down to a minimum capability set + seccomp default. This PR closes the same hole for the dind-spawned siblings.

Default policy

The TOML key is a *bool so missing vs. explicit-false are distinguishable. ResolvedAllowPrivileged() picks the default:

Host GOOS Default Reasoning
windows true VM fence present (WSL2/Hyper-V)
darwin true VM fence present (Vz)
linux false No fence — opt-in only

Operators can override explicitly:

[dind]
allow_privileged = false  # deny on every host (e.g. shared CI)
# or
allow_privileged = true   # allow on Linux when workloads are trusted (KIND testing, etc.)

Implementation

  • pkg/config/config.go — new DindConfig.AllowPrivileged *bool + platform-aware ResolvedAllowPrivileged().
  • pkg/dind/dind.goServer.allowPrivileged field + Config.AllowPrivileged plumbing; fixed misleading package doc that previously claimed dind never spawns privileged containers.
  • pkg/dind/containers.go — early 403 short-circuit in handleContainerCreate for both Privileged=true and CapAdd != nil. Runs before image pull so we don't burn bandwidth on rejected requests.
  • pkg/runtime/runtime.goConfig.DindAllowPrivileged forwarded into each per-job dind.Server.
  • cmd/ephemerd/main.go — passes cfg.Dind.ResolvedAllowPrivileged() at both runtime.New sites.

Test plan

  • go test ./pkg/config/... — 5 new tests: platform default, explicit-true/false override, omitted TOML key falls through to platform default, explicit-false on a non-Linux host overrides the default.
  • go test ./pkg/dind/... (compile-checked under GOOS=linux) — 5 new tests: gate-closed rejects Privileged=true with 403 and useful body, gate-closed rejects CapAdd, non-privileged request not gated, nil HostConfig not gated, gate-open passes Privileged through.
  • GOOS=linux ./bin/golangci-lint run ./... → 0 issues.
  • mage build:windows produces a clean 700 MB binary.
  • After merge: confirm the existing KIND-on-Windows-host e2e (which depends on the gate defaulting to true on Windows) still passes — should be a no-op behavior change on this host.

Local CI notes

mage test on Windows hosts trips the documented miekg/pkcs11 cgo failure for packages that transitively import it (pkg/dind is one). GOOS=linux go test -run xxx ./pkg/dind/... compiles clean — the GHA Linux runner will execute the new tests normally.

Adds [dind] allow_privileged to gate the elevation stack a sibling
container can request via the fake docker daemon (Privileged=true,
all caps, all devices, seccomp/apparmor off, writable sysfs/cgroupfs).
Requests carrying HostConfig.Privileged=true or HostConfig.CapAdd are
short-circuited with HTTP 403 when the gate is closed, before any
image pull.

Default policy is platform-aware:

  * Windows / macOS host  → allowed. The dind backing containerd
    runs inside a managed VM (WSL2 / Hyper-V / Vz), so an escape
    stays inside that VM. KIND clusters and other workloads that
    need real privileged continue to work without config changes.
  * Linux host            → denied. ephemerd runs directly on the
    host with no VM fence; a privileged escape is bare-metal-host
    compromise. Operators that trust their workloads can opt in
    with `allow_privileged = true`.

The TOML key uses a *bool so an empty config block is distinguishable
from an explicit `allow_privileged = false`. Both runtime.Config and
the per-job dind.Server gain an AllowPrivileged field; main.go threads
cfg.Dind.ResolvedAllowPrivileged() through at both runtime.New sites.

Also corrects the package doc comment that previously claimed dind
never produces privileged containers — that was true once, isn't now.

Local CI: lint and tests pass cross-compiled for GOOS=linux. `mage
test` on Windows still hits the documented pkcs11/ocicrypt cgo build
failure for pkg/dind on Windows hosts; GOOS=linux go test -run xxx
compiles the package clean.
…ests

The previous tests built a Server with `&client.Client{}` (zero-value)
to satisfy the early nil-check on handleContainerCreate's client.
That works for the 403 short-circuit cases — the gate fires before
the client is touched — but the "not gated" tests proceed past the
gate and call GetImage on the zero-value client, which has no gRPC
ClientConn and panics. CI caught this in
TestHandleContainerCreate_NonPrivilegedRequestNotGated.

Extract the gate decision into a pure function checkPrivilegedGate
that takes the flag and the parsed HostConfig and returns the
rejection message + blocked flag. The handler becomes a one-liner;
the tests for the "not gated" cases call the pure function directly
and don't need a Server or client at all. The 403 tests still
exercise the full handler since that path is self-contained.
Reordering: gate is a request-shape validation, the nil-client check
is a runtime-state check; check shape before state. CI caught this —
the 403 tests construct a Server without a client (because the gate
doesn't need one) and the original ordering returned 500 first.

Also makes the operator-facing behavior more useful: a misconfigured
deploy that lost its containerd client still rejects privileged
requests with the actionable 403 rather than a generic 500.
After moving the request-shape validation ahead of the runtime nil-
client check (so the privileged gate runs without a client), a missing
Image field correctly returns 400 from the new ordering instead of
500 from the old nil-client-first ordering. The test name was always
"NoImage" — 400 is the right answer; the old 500 assertion was an
implementation-detail artifact.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant