fix(docker): fail-closed on tcp+tls:// without cert material#646
Conversation
✅ Mutation Testing ResultsMutation Score: 100.00% (threshold: 60%)
What is mutation testing?Mutation testing measures test quality by introducing small changes (mutations) to the code and checking if tests detect them. A higher score means better test effectiveness.
|
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Code Review
This pull request introduces a fail-closed mechanism for the Docker SDK adapter when using the tcp+tls:// scheme without TLS material, preventing silent downgrades to unauthenticated connections. Changes include updated client logic, a new error type, unit tests, and documentation. Feedback suggests that the implementation should favor logging warnings and falling back to unauthenticated connections rather than failing fast, citing guidelines for handling configuration errors with safe fallbacks.
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
There was a problem hiding this comment.
Pull request overview
This PR hardens the Docker SDK adapter’s explicit tcp+tls:// transport by failing closed when the operator selects tcp+tls:// but provides no certificate material, preventing a silent “mTLS intended → unauthenticated TLS” downgrade.
Changes:
- Add a typed sentinel error (
ErrTCPTLSRequiresCertMaterial) and a constructor-time gate inNewClientWithConfigfortcp+tls://without cert material. - Document the new fail-closed behavior and remediation steps in
docs/TROUBLESHOOTING.md, and record the behavior change in the changelog. - Add focused unit tests covering the failure mode and the two positive “cert material present” paths (config and env).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/TROUBLESHOOTING.md | Adds a troubleshooting section for tcp+tls:// with missing TLS material and the new error message. |
| core/adapters/docker/errors.go | Introduces ErrTCPTLSRequiresCertMaterial sentinel error for branching via errors.Is. |
| core/adapters/docker/client.go | Enforces fail-closed behavior for tcp+tls:// when no cert material is configured. |
| core/adapters/docker/client_tcptls_test.go | Adds unit tests covering the new gate and positive cases when cert material exists. |
| CHANGELOG.md | Notes the new startup failure behavior and upgrade impact for misconfigured tcp+tls://. |
Comments suppressed due to low confidence (1)
docs/TROUBLESHOOTING.md:345
- In the env-based resolution snippet, DOCKER_TLS_VERIFY=1 is presented as part of “cert material”, but the gate is satisfied by DOCKER_CERT_PATH alone (TLS_VERIFY just toggles verification semantics). If the intent is “mTLS with verification”, clarify it’s recommended/typical rather than required to avoid the startup error.
1. Environment (matches the standard `docker` CLI workflow):
```bash
export DOCKER_CERT_PATH=/etc/docker/certs.d/myhost
export DOCKER_TLS_VERIFY=1
</details>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #646 +/- ##
==========================================
+ Coverage 87.80% 87.83% +0.03%
==========================================
Files 88 88
Lines 10907 10918 +11
==========================================
+ Hits 9577 9590 +13
+ Misses 1088 1085 -3
- Partials 242 243 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PR review (`Code review`, `Security review`) flagged that deleting `hasTLSMaterial` creates a build break for two in-flight sibling PRs: - #646 (fix/issue-627): fail-closed gate on `tcp+tls://` without cert material — calls `hasTLSMaterial(config)` from `NewClientWithConfig`. - #647 (fix/issue-634): rewrite `tcp://` → `https://` when TLS material is set — calls `hasTLSMaterial(cfg)` from `upgradeTCPToHTTPSIfTLSMaterial`. Restore the helper with a `//nolint:unused` annotation explaining why it is HEAD-unused. The dead `applyDockerTLS` call is still removed from `applyTCPTransport` (the actual reconciliation #628 was filed for); the helper itself is retained so the sibling PRs rebase cleanly. Refs #628. Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
#646 security review found that the original gate was bypassable: it checked only string presence via hasTLSMaterial, so DOCKER_CERT_PATH pointing at a non-existent or empty directory passed the gate. Downstream applyDockerTLS would then warn-and-continue, falling back to default TLS without a client cert — exactly the silent-mTLS-downgrade the gate was supposed to prevent. Invoke resolveTLSConfig from the gate too so unloadable / unreadable cert material fails closed at startup with a typed error referencing docs/TROUBLESHOOTING.md. Also tighten the error string and CHANGELOG to reflect that hasTLSMaterial only consults TLSCertPath / DOCKER_CERT_PATH — DOCKER_TLS_VERIFY alone does not satisfy the gate (correct behavior: verify-only cannot do mTLS), so the docs no longer claim it does. Add TestNewClientWithConfig_TCPPlusTLSRejectsUnreadableCertPath as the regression guard for the bypass. Refs #627. Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
tcp+tls:// is an *explicit* TLS opt-in scheme. Until now, if an operator set DOCKER_HOST=tcp+tls://daemon:2376 but forgot DOCKER_CERT_PATH / DOCKER_TLS_VERIFY (or the equivalent ClientConfig.TLSCertPath / TLSVerify overrides), resolveTLSConfig returned (nil, nil), applyDockerTLS left transport.TLSClientConfig nil, and the SDK silently dialed TLS using Go's stdlib defaults -- system CA bundle, NO client certificate. Operators who declared mTLS were getting unauthenticated TLS handshakes against any daemon that did not strictly require client auth. This is the analog, for tcp+tls://, of the silent plain-TCP downgrade closed by #612 / #625. Surfaced during the security review of #625; tracked in #627. Fix: in NewClientWithConfig, after resolveDockerHost succeeds, if scheme == tcp+tls and hasTLSMaterial(config) == false, wrap and return the new typed sentinel ErrTCPTLSRequiresCertMaterial. The error message points operators at docs/TROUBLESHOOTING.md, which gains a new section explaining the fix. tcp:// remains fail-open (it is ambiguous -- operators may rely on stdlib defaults). https:// remains fail-open with a warning, matching the upstream SDK's documented posture. TDD: TestNewClientWithConfig_TCPPlusTLSRequiresCertMaterial drives the gate; two positive counterparts cover both ClientConfig.TLSCertPath and DOCKER_CERT_PATH branches of hasTLSMaterial. Existing TestCreateHTTPClient_TCPPlusTLSEnablesTLS and TestNewClientWithConfig_TCPPlusTLSScheme remain green. Refs: #627, #625 Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
#646 security review found that the original gate was bypassable: it checked only string presence via hasTLSMaterial, so DOCKER_CERT_PATH pointing at a non-existent or empty directory passed the gate. Downstream applyDockerTLS would then warn-and-continue, falling back to default TLS without a client cert — exactly the silent-mTLS-downgrade the gate was supposed to prevent. Invoke resolveTLSConfig from the gate too so unloadable / unreadable cert material fails closed at startup with a typed error referencing docs/TROUBLESHOOTING.md. Also tighten the error string and CHANGELOG to reflect that hasTLSMaterial only consults TLSCertPath / DOCKER_CERT_PATH — DOCKER_TLS_VERIFY alone does not satisfy the gate (correct behavior: verify-only cannot do mTLS), so the docs no longer claim it does. Add TestNewClientWithConfig_TCPPlusTLSRejectsUnreadableCertPath as the regression guard for the bypass. Refs #627. Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
After #644 landed (which removed the dead applyDockerTLS call but kept hasTLSMaterial alive per sibling-PR review), the helper is now used here in #646's NewClientWithConfig gate. The //nolint annotation explaining the cross-PR debt is obsolete. Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
|
There was a problem hiding this comment.
Automated approval for maintainer PR
All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.
…RTTLS, webhook egress) Sibling-hunt findings from PR netresearch#646 review. Three additional defense-in-depth gaps where operator intent (TLS, encryption, egress restriction) was silently not honored. 1. Docker https:// with broken cert material (netresearch#653 vector 1): applyDockerTLS warned-and-continued when resolveTLSConfig failed; the SDK then dialed with default TLS — system CA, no client cert. NewClientWithConfig now gates https:// when material IS configured but unloadable, returning ErrHTTPSRequiresUsableCertMaterial. Asymmetry vs tcp+tls://: https:// without any material remains fail-open (system CA bundle is a legitimate posture). 2. SMTP OpportunisticStartTLS default (netresearch#653 vector 2): middlewares.NewMail inherited go-mail's OpportunisticStartTLS, sending credentials and message body in cleartext when STARTTLS was not advertised. New smtp-tls-policy INI key (mandatory|opportunistic|none); default mandatory. Unknown values normalize to mandatory with a WARN log line so a typo cannot weaken transport security. BREAKING for plaintext-only SMTP — see docs/TROUBLESHOOTING.md migration recipes. 3. Webhook empty-allowlist silent allow-all (netresearch#653 vector 3): SetGlobalSecurityConfig now emits a single startup-time slog.Warn when the resolved AllowedHosts admits all hosts. A typo in webhook-allowed-hosts previously yielded wide-open egress with no operator-visible signal. Includes: - Defensive-enum SMTPTLSPolicy.Valid()/Validate() with ErrInvalidSMTPTLSPolicy - Per-job inheritance for SMTPTLSPolicy in cli/config.go - smtp-tls-policy added to docker-labels allow-list - Documentation updates in CONFIGURATION.md and TROUBLESHOOTING.md - TDD tests for all three fixes (red→green verified) Fixes netresearch#653 Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
…tresearch#660) ## Summary Sibling-hunt findings from PR netresearch#646 review. Three additional silent-downgrade vectors where operator intent (TLS, encryption, egress restriction) was silently not honored. All share the same operator-trust failure mode: operator believes a security feature is on; it silently is not. - **Fix 1** — Docker `https://` with broken cert material: `applyDockerTLS` warned-and-continued when `resolveTLSConfig` failed; the SDK then dialed with default TLS (system CA, no client cert). `NewClientWithConfig` now gates `https://` when material IS configured but unloadable, returning `ErrHTTPSRequiresUsableCertMaterial`. Asymmetry vs `tcp+tls://`: `https://` *without* any material remains fail-open (system CA bundle is a legitimate posture). - **Fix 2** — SMTP `OpportunisticStartTLS` default: `middlewares.NewMail` inherited go-mail's `OpportunisticStartTLS`, sending credentials and message body in cleartext when STARTTLS was not advertised. New `smtp-tls-policy` INI key (`mandatory` | `opportunistic` | `none`); default `mandatory`. Unknown values normalize to `mandatory` with a `WARN` log line so a typo cannot weaken transport security. **BREAKING** for plaintext-only SMTP — see `docs/TROUBLESHOOTING.md` migration recipes. - **Fix 3** — Webhook empty-allowlist silent allow-all: `SetGlobalSecurityConfig` now emits a single startup-time `slog.Warn` when the resolved `AllowedHosts` admits all hosts. A typo in `webhook-allowed-hosts` previously yielded wide-open egress with no operator-visible signal. Also includes: - Defensive-enum `SMTPTLSPolicy.Valid()` / `Validate()` with `ErrInvalidSMTPTLSPolicy` - Per-job inheritance for `SMTPTLSPolicy` in `cli/config.go` - `smtp-tls-policy` added to docker-labels allow-list - Documentation updates in `CONFIGURATION.md` and `TROUBLESHOOTING.md` - TDD tests for all three fixes (red → green verified) Fixes netresearch#653 ## Test plan - [x] `go test ./... -count=1 -short -race` — all packages green - [x] `golangci-lint run ./...` — 0 issues - [x] New tests (TDD red → green): - `TestNewClientWithConfig_HTTPSRejectsUnreadableCertPath` - `TestNewClientWithConfig_HTTPSAllowsNoCertMaterial` - `TestNewClientWithConfig_HTTPSAcceptsValidCertMaterial` - `TestSMTPTLSPolicy_Valid` (table-driven) - `TestSMTPTLSPolicy_Validate` - `TestNewMail_RejectsInvalidTLSPolicy` - `TestMail_DefaultPolicyIsMandatory` - `TestMail_PolicyNoneAllowsPlaintext` - `TestSMTPTLSPolicy_ParsedFromINI` (4 sub-tests for INI hydration) - `TestSMTPTLSPolicy_JobInheritsFromGlobal` - `TestSMTPTLSPolicy_JobOverridesGlobal` - `TestSetGlobalSecurityConfig_WarnsOnAllowAll` - `TestSetGlobalSecurityConfig_WarnsOnEmpty` - `TestSetGlobalSecurityConfig_NoWarnOnExplicitAllowList` - `TestSetGlobalSecurityConfig_NoWarnOnNil` - [x] Existing mail tests updated to use `SMTPTLSPolicy: SMTPTLSPolicyNone` against the plaintext test SMTP fixture (5 tests) - [x] Docs drift test (`TestConfigGlobalKeysAreDocumented`) green for new `smtp-tls-policy` key
## Summary Cut the existing `[Unreleased]` CHANGELOG block at [fad5239](netresearch@fad5239) into a versioned `[0.25.0] - 2026-05-14` heading, and fill in two entries for PRs that landed after the previous `[Unreleased]` writeups: - **### Security** — Go toolchain `1.26.2` → `1.26.3` ([netresearch#662](netresearch#662)). Clears six stdlib advisories reachable from this codebase (`net/mail`, `html/template`, `net`, `net/http`); refreshes direct deps (`docker/cli`, `golang.org/x/{crypto,term,text}`) and the full indirect graph. Post-bump `govulncheck` is down to the two unfixable upstream moby advisories on `docker/docker` v28.5.2. - **### Fixed** — `MaxRuntime` cancellation now stops *and removes* the container/service ([netresearch#659](netresearch#659), fixes [netresearch#655](netresearch#655)). Completes [netresearch#651](netresearch#651 deadline wiring with a fresh `context.WithTimeout(context.Background(), jobCleanupTimeout)` cleanup context so stop/remove still runs after the parent deadline fires. Mirrored into `RunServiceJob`. ## Headline changes since v0.24.0 - **Security**: Go 1.26.3 toolchain, three silent-downgrade vectors closed (`https://` mTLS, SMTP STARTTLS default, webhook allow-list typo), fail-closed on `tcp+tls://` without cert material ([netresearch#660](netresearch#660), [netresearch#646](netresearch#646), [netresearch#662](netresearch#662)) - **New**: `tcp+tls://` `DOCKER_HOST` scheme re-enabled ([netresearch#625](netresearch#625)); `DOCKER_TLS_VERIFY` / `DOCKER_CERT_PATH` honored ([netresearch#613](netresearch#613)) - **Correctness**: bounded contexts in scheduler / health / Docker pings ([netresearch#636](netresearch#636), [netresearch#651](netresearch#651)); orphan-container cleanup on MaxRuntime ([netresearch#659](netresearch#659)); pervasive nil-guard pass across the Docker adapter ([netresearch#626](netresearch#626), [netresearch#639](netresearch#639), [netresearch#648](netresearch#648), [netresearch#658](netresearch#658)) - **Refactor / DX**: unified Docker host / scheme resolution ([netresearch#629](netresearch#629)); webhook global config dual-store collapsed ([netresearch#637](netresearch#637)); `[global]` label handling unified across all subsystems ([netresearch#661](netresearch#661)) ## Version bump rationale Pre-1.0 semver — minor bump because the range includes one `feat:` ([netresearch#625](netresearch#625) — `tcp+tls://` scheme re-enabled), several `fix(security):` PRs that surface previously-silent downgrades, and the `[global]` label-handling rework. The webhook key rename in [netresearch#620](netresearch#620) / [netresearch#637](netresearch#637) is shipped under `### Deprecated` (legacy `ofelia.webhooks` form keeps working with a one-shot warning), not as a breaking change. ## Notes - This PR touches **only `CHANGELOG.md`** — matches the v0.24.0 prep pattern ([netresearch#602](netresearch#602)). The Release workflow injects the version into `cli.Version` via ldflags from the tag, so no `cli/version.go` edit is needed. - After merge: signed annotated tag `v0.25.0` will be pushed to the merge commit, triggering [`release-go-app.yml`](https://github.com/netresearch/.github/blob/main/.github/workflows/release-go-app.yml) for binaries, container image, cosign `--bundle` signatures, and SLSA attestations. - Contributor thanks will be added directly to the GitHub release description (not the CHANGELOG, per project convention from v0.24.0). ## Test plan - [x] `go build ./...` clean - [ ] CI green on this PR - [ ] CHANGELOG renders correctly on GitHub Files Changed tab - [ ] After merge: signed annotated tag `v0.25.0` created on the merge commit (`git tag -s v0.25.0 -m "v0.25.0"`) and pushed - [ ] Release workflow run succeeds end-to-end



Symptom
tcp+tls://is an explicit TLS opt-in scheme in the Docker SDK adapter. Until now, if an operator setDOCKER_HOST=tcp+tls://daemon:2376but forgotDOCKER_CERT_PATH/DOCKER_TLS_VERIFY(or the equivalentClientConfig.TLSCertPath/TLSVerifyoverrides), the chainresolveTLSConfig->(nil, nil)->applyDockerTLSno-op lefttransport.TLSClientConfig == nil, and the SDK silently dialed TLS using Go's stdlib defaults: system CA bundle, no client certificate.Operators who declared mTLS were getting unauthenticated TLS handshakes against any daemon that did not strictly require client auth — the analog, for
tcp+tls://, of the silent plain-TCP downgrade closed by #612 / #625. Surfaced during the security review of #625; tracked in #627.Fix
In
NewClientWithConfig, afterresolveDockerHostsucceeds, ifscheme == tcp+tlsandhasTLSMaterial(config) == false, wrap and return the new typed sentinelErrTCPTLSRequiresCertMaterial. The error message points operators atdocs/TROUBLESHOOTING.md, which gains a dedicated section.tcp://remains fail-open (ambiguous scheme; operators may rely on stdlib defaults).https://remains fail-open with a warning, matching the upstream SDK's documented posture.New error
Branchable via
errors.Is(err, docker.ErrTCPTLSRequiresCertMaterial).Tests (TDD)
New file
core/adapters/docker/client_tcptls_test.go:TestNewClientWithConfig_TCPPlusTLSRequiresCertMaterial— drives the gate (RED first, then GREEN).TestNewClientWithConfig_TCPPlusTLSAcceptsCertMaterial— positive:ClientConfig.TLSCertPathset -> no error from the gate.TestNewClientWithConfig_TCPPlusTLSAcceptsEnvCertPath— positive:DOCKER_CERT_PATHset -> no error from the gate.Existing
TestCreateHTTPClient_TCPPlusTLSEnablesTLSandTestNewClientWithConfig_TCPPlusTLSSchemeremain green. Fullgo test ./... -count=1 -shortandgolangci-lint run ./...are clean.Refs
tcp+tls://)