Skip to content

fix(docker): fail-closed on tcp+tls:// without cert material#646

Merged
CybotTM merged 3 commits into
mainfrom
fix/issue-627
May 14, 2026
Merged

fix(docker): fail-closed on tcp+tls:// without cert material#646
CybotTM merged 3 commits into
mainfrom
fix/issue-627

Conversation

@CybotTM
Copy link
Copy Markdown
Member

@CybotTM CybotTM commented May 13, 2026

Symptom

tcp+tls:// is an explicit TLS opt-in scheme in the Docker SDK adapter. Until now, if an operator set DOCKER_HOST=tcp+tls://daemon:2376 but forgot DOCKER_CERT_PATH / DOCKER_TLS_VERIFY (or the equivalent ClientConfig.TLSCertPath / TLSVerify overrides), the chain resolveTLSConfig -> (nil, nil) -> applyDockerTLS no-op left transport.TLSClientConfig == nil, and the SDK silently dialed TLS using Go's stdlib defaults: system CA bundle, no client certificate.

Operators who declared mTLS were getting unauthenticated TLS handshakes against any daemon that did not strictly require client auth — the analog, for tcp+tls://, of the silent plain-TCP downgrade closed by #612 / #625. Surfaced during the security review of #625; tracked in #627.

Fix

In NewClientWithConfig, after resolveDockerHost succeeds, if scheme == tcp+tls and hasTLSMaterial(config) == false, wrap and return the new typed sentinel ErrTCPTLSRequiresCertMaterial. The error message points operators at docs/TROUBLESHOOTING.md, which gains a dedicated section.

  • tcp:// remains fail-open (ambiguous scheme; operators may rely on stdlib defaults).
  • https:// remains fail-open with a warning, matching the upstream SDK's documented posture.

New error

Error: creating docker client: tcp+tls:// requires TLS material: set DOCKER_CERT_PATH/DOCKER_TLS_VERIFY or ClientConfig.TLSCertPath/TLSVerify; see docs/TROUBLESHOOTING.md

Branchable via errors.Is(err, docker.ErrTCPTLSRequiresCertMaterial).

Tests (TDD)

New file core/adapters/docker/client_tcptls_test.go:

  • TestNewClientWithConfig_TCPPlusTLSRequiresCertMaterial — drives the gate (RED first, then GREEN).
  • TestNewClientWithConfig_TCPPlusTLSAcceptsCertMaterial — positive: ClientConfig.TLSCertPath set -> no error from the gate.
  • TestNewClientWithConfig_TCPPlusTLSAcceptsEnvCertPath — positive: DOCKER_CERT_PATH set -> no error from the gate.

Existing TestCreateHTTPClient_TCPPlusTLSEnablesTLS and TestNewClientWithConfig_TCPPlusTLSScheme remain green. Full go test ./... -count=1 -short and golangci-lint run ./... are clean.

Refs

  • Fixes #627
  • Triggered by review of #625 (re-enabled tcp+tls://)
  • TLS plumbing context: #613

Copilot AI review requested due to automatic review settings May 13, 2026 21:51
@github-actions github-actions Bot added documentation Improvements or additions to documentation tests labels May 13, 2026
@github-actions
Copy link
Copy Markdown

✅ Mutation Testing Results

Mutation Score: 100.00% (threshold: 60%)

✨ Good job! Mutation score meets the threshold.

What is mutation testing?

Mutation testing measures test quality by introducing small changes (mutations) to the code and checking if tests detect them. A higher score means better test effectiveness.

  • Killed mutants: Tests caught the mutation (good!)
  • Survived mutants: Tests missed the mutation (needs improvement)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fail-closed mechanism for the Docker SDK adapter when using the tcp+tls:// scheme without TLS material, preventing silent downgrades to unauthenticated connections. Changes include updated client logic, a new error type, unit tests, and documentation. Feedback suggests that the implementation should favor logging warnings and falling back to unauthenticated connections rather than failing fast, citing guidelines for handling configuration errors with safe fallbacks.

Comment thread core/adapters/docker/client.go Outdated
github-actions[bot]
github-actions Bot previously approved these changes May 13, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval for maintainer PR

All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Docker SDK adapter’s explicit tcp+tls:// transport by failing closed when the operator selects tcp+tls:// but provides no certificate material, preventing a silent “mTLS intended → unauthenticated TLS” downgrade.

Changes:

  • Add a typed sentinel error (ErrTCPTLSRequiresCertMaterial) and a constructor-time gate in NewClientWithConfig for tcp+tls:// without cert material.
  • Document the new fail-closed behavior and remediation steps in docs/TROUBLESHOOTING.md, and record the behavior change in the changelog.
  • Add focused unit tests covering the failure mode and the two positive “cert material present” paths (config and env).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
docs/TROUBLESHOOTING.md Adds a troubleshooting section for tcp+tls:// with missing TLS material and the new error message.
core/adapters/docker/errors.go Introduces ErrTCPTLSRequiresCertMaterial sentinel error for branching via errors.Is.
core/adapters/docker/client.go Enforces fail-closed behavior for tcp+tls:// when no cert material is configured.
core/adapters/docker/client_tcptls_test.go Adds unit tests covering the new gate and positive cases when cert material exists.
CHANGELOG.md Notes the new startup failure behavior and upgrade impact for misconfigured tcp+tls://.
Comments suppressed due to low confidence (1)

docs/TROUBLESHOOTING.md:345

  • In the env-based resolution snippet, DOCKER_TLS_VERIFY=1 is presented as part of “cert material”, but the gate is satisfied by DOCKER_CERT_PATH alone (TLS_VERIFY just toggles verification semantics). If the intent is “mTLS with verification”, clarify it’s recommended/typical rather than required to avoid the startup error.
1. Environment (matches the standard `docker` CLI workflow):
   ```bash
   export DOCKER_CERT_PATH=/etc/docker/certs.d/myhost
   export DOCKER_TLS_VERIFY=1
</details>

Comment thread core/adapters/docker/client.go Outdated
Comment thread core/adapters/docker/errors.go
Comment thread docs/TROUBLESHOOTING.md
Comment thread CHANGELOG.md
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.83%. Comparing base (798c41f) to head (2d73d9a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #646      +/-   ##
==========================================
+ Coverage   87.80%   87.83%   +0.03%     
==========================================
  Files          88       88              
  Lines       10907    10918      +11     
==========================================
+ Hits         9577     9590      +13     
+ Misses       1088     1085       -3     
- Partials      242      243       +1     
Flag Coverage Δ
integration 87.83% <100.00%> (+0.03%) ⬆️
unittests 84.92% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

CybotTM added a commit that referenced this pull request May 13, 2026
PR review (`Code review`, `Security review`) flagged that deleting
`hasTLSMaterial` creates a build break for two in-flight sibling PRs:

- #646 (fix/issue-627): fail-closed gate on `tcp+tls://` without cert
  material — calls `hasTLSMaterial(config)` from `NewClientWithConfig`.
- #647 (fix/issue-634): rewrite `tcp://` → `https://` when TLS material
  is set — calls `hasTLSMaterial(cfg)` from `upgradeTCPToHTTPSIfTLSMaterial`.

Restore the helper with a `//nolint:unused` annotation explaining why
it is HEAD-unused. The dead `applyDockerTLS` call is still removed from
`applyTCPTransport` (the actual reconciliation #628 was filed for); the
helper itself is retained so the sibling PRs rebase cleanly.

Refs #628.

Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
CybotTM added a commit that referenced this pull request May 13, 2026
#646 security review found that the original gate was bypassable: it
checked only string presence via hasTLSMaterial, so DOCKER_CERT_PATH
pointing at a non-existent or empty directory passed the gate.
Downstream applyDockerTLS would then warn-and-continue, falling back
to default TLS without a client cert — exactly the silent-mTLS-downgrade
the gate was supposed to prevent.

Invoke resolveTLSConfig from the gate too so unloadable / unreadable
cert material fails closed at startup with a typed error referencing
docs/TROUBLESHOOTING.md.

Also tighten the error string and CHANGELOG to reflect that
hasTLSMaterial only consults TLSCertPath / DOCKER_CERT_PATH —
DOCKER_TLS_VERIFY alone does not satisfy the gate (correct behavior:
verify-only cannot do mTLS), so the docs no longer claim it does.

Add TestNewClientWithConfig_TCPPlusTLSRejectsUnreadableCertPath as
the regression guard for the bypass.

Refs #627.

Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
github-actions[bot]
github-actions Bot previously approved these changes May 13, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval for maintainer PR

All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.

@CybotTM CybotTM added this pull request to the merge queue May 14, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 14, 2026
CybotTM added 3 commits May 14, 2026 07:39
tcp+tls:// is an *explicit* TLS opt-in scheme. Until now, if an
operator set DOCKER_HOST=tcp+tls://daemon:2376 but forgot
DOCKER_CERT_PATH / DOCKER_TLS_VERIFY (or the equivalent
ClientConfig.TLSCertPath / TLSVerify overrides), resolveTLSConfig
returned (nil, nil), applyDockerTLS left transport.TLSClientConfig
nil, and the SDK silently dialed TLS using Go's stdlib defaults --
system CA bundle, NO client certificate. Operators who declared
mTLS were getting unauthenticated TLS handshakes against any daemon
that did not strictly require client auth.

This is the analog, for tcp+tls://, of the silent plain-TCP
downgrade closed by #612 / #625. Surfaced during the security review
of #625; tracked in #627.

Fix: in NewClientWithConfig, after resolveDockerHost succeeds, if
scheme == tcp+tls and hasTLSMaterial(config) == false, wrap and
return the new typed sentinel ErrTCPTLSRequiresCertMaterial. The
error message points operators at docs/TROUBLESHOOTING.md, which
gains a new section explaining the fix.

tcp:// remains fail-open (it is ambiguous -- operators may rely on
stdlib defaults). https:// remains fail-open with a warning, matching
the upstream SDK's documented posture.

TDD: TestNewClientWithConfig_TCPPlusTLSRequiresCertMaterial drives
the gate; two positive counterparts cover both
ClientConfig.TLSCertPath and DOCKER_CERT_PATH branches of
hasTLSMaterial. Existing TestCreateHTTPClient_TCPPlusTLSEnablesTLS
and TestNewClientWithConfig_TCPPlusTLSScheme remain green.

Refs: #627, #625
Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
#646 security review found that the original gate was bypassable: it
checked only string presence via hasTLSMaterial, so DOCKER_CERT_PATH
pointing at a non-existent or empty directory passed the gate.
Downstream applyDockerTLS would then warn-and-continue, falling back
to default TLS without a client cert — exactly the silent-mTLS-downgrade
the gate was supposed to prevent.

Invoke resolveTLSConfig from the gate too so unloadable / unreadable
cert material fails closed at startup with a typed error referencing
docs/TROUBLESHOOTING.md.

Also tighten the error string and CHANGELOG to reflect that
hasTLSMaterial only consults TLSCertPath / DOCKER_CERT_PATH —
DOCKER_TLS_VERIFY alone does not satisfy the gate (correct behavior:
verify-only cannot do mTLS), so the docs no longer claim it does.

Add TestNewClientWithConfig_TCPPlusTLSRejectsUnreadableCertPath as
the regression guard for the bypass.

Refs #627.

Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
After #644 landed (which removed the dead applyDockerTLS call but kept hasTLSMaterial alive per sibling-PR review), the helper is now used here in #646's NewClientWithConfig gate. The //nolint annotation explaining the cross-PR debt is obsolete.

Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval for maintainer PR

All automated quality gates passed. See SECURITY_CONTROLS.md for compensating controls.

@CybotTM CybotTM added this pull request to the merge queue May 14, 2026
Merged via the queue into main with commit bcf01b8 May 14, 2026
27 checks passed
@CybotTM CybotTM deleted the fix/issue-627 branch May 14, 2026 05:50
techsolo12 pushed a commit to techsolo12/ofelia that referenced this pull request May 14, 2026
…RTTLS, webhook egress)

Sibling-hunt findings from PR netresearch#646 review. Three additional defense-in-depth
gaps where operator intent (TLS, encryption, egress restriction) was silently
not honored.

1. Docker https:// with broken cert material (netresearch#653 vector 1):
   applyDockerTLS warned-and-continued when resolveTLSConfig failed; the SDK
   then dialed with default TLS — system CA, no client cert. NewClientWithConfig
   now gates https:// when material IS configured but unloadable, returning
   ErrHTTPSRequiresUsableCertMaterial. Asymmetry vs tcp+tls://: https:// without
   any material remains fail-open (system CA bundle is a legitimate posture).

2. SMTP OpportunisticStartTLS default (netresearch#653 vector 2):
   middlewares.NewMail inherited go-mail's OpportunisticStartTLS, sending
   credentials and message body in cleartext when STARTTLS was not advertised.
   New smtp-tls-policy INI key (mandatory|opportunistic|none); default mandatory.
   Unknown values normalize to mandatory with a WARN log line so a typo cannot
   weaken transport security. BREAKING for plaintext-only SMTP — see
   docs/TROUBLESHOOTING.md migration recipes.

3. Webhook empty-allowlist silent allow-all (netresearch#653 vector 3):
   SetGlobalSecurityConfig now emits a single startup-time slog.Warn when the
   resolved AllowedHosts admits all hosts. A typo in webhook-allowed-hosts
   previously yielded wide-open egress with no operator-visible signal.

Includes:
- Defensive-enum SMTPTLSPolicy.Valid()/Validate() with ErrInvalidSMTPTLSPolicy
- Per-job inheritance for SMTPTLSPolicy in cli/config.go
- smtp-tls-policy added to docker-labels allow-list
- Documentation updates in CONFIGURATION.md and TROUBLESHOOTING.md
- TDD tests for all three fixes (red→green verified)

Fixes netresearch#653

Signed-off-by: Sebastian Mendel <info@sebastianmendel.de>
techsolo12 pushed a commit to techsolo12/ofelia that referenced this pull request May 14, 2026
…tresearch#660)

## Summary

Sibling-hunt findings from PR netresearch#646 review. Three additional
silent-downgrade vectors where operator intent (TLS, encryption, egress
restriction) was silently not honored. All share the same operator-trust
failure mode: operator believes a security feature is on; it silently is
not.

- **Fix 1** — Docker `https://` with broken cert material:
`applyDockerTLS` warned-and-continued when `resolveTLSConfig` failed;
the SDK then dialed with default TLS (system CA, no client cert).
`NewClientWithConfig` now gates `https://` when material IS configured
but unloadable, returning `ErrHTTPSRequiresUsableCertMaterial`.
Asymmetry vs `tcp+tls://`: `https://` *without* any material remains
fail-open (system CA bundle is a legitimate posture).
- **Fix 2** — SMTP `OpportunisticStartTLS` default:
`middlewares.NewMail` inherited go-mail's `OpportunisticStartTLS`,
sending credentials and message body in cleartext when STARTTLS was not
advertised. New `smtp-tls-policy` INI key (`mandatory` | `opportunistic`
| `none`); default `mandatory`. Unknown values normalize to `mandatory`
with a `WARN` log line so a typo cannot weaken transport security.
**BREAKING** for plaintext-only SMTP — see `docs/TROUBLESHOOTING.md`
migration recipes.
- **Fix 3** — Webhook empty-allowlist silent allow-all:
`SetGlobalSecurityConfig` now emits a single startup-time `slog.Warn`
when the resolved `AllowedHosts` admits all hosts. A typo in
`webhook-allowed-hosts` previously yielded wide-open egress with no
operator-visible signal.

Also includes:
- Defensive-enum `SMTPTLSPolicy.Valid()` / `Validate()` with
`ErrInvalidSMTPTLSPolicy`
- Per-job inheritance for `SMTPTLSPolicy` in `cli/config.go`
- `smtp-tls-policy` added to docker-labels allow-list
- Documentation updates in `CONFIGURATION.md` and `TROUBLESHOOTING.md`
- TDD tests for all three fixes (red → green verified)

Fixes netresearch#653

## Test plan

- [x] `go test ./... -count=1 -short -race` — all packages green
- [x] `golangci-lint run ./...` — 0 issues
- [x] New tests (TDD red → green):
  - `TestNewClientWithConfig_HTTPSRejectsUnreadableCertPath`
  - `TestNewClientWithConfig_HTTPSAllowsNoCertMaterial`
  - `TestNewClientWithConfig_HTTPSAcceptsValidCertMaterial`
  - `TestSMTPTLSPolicy_Valid` (table-driven)
  - `TestSMTPTLSPolicy_Validate`
  - `TestNewMail_RejectsInvalidTLSPolicy`
  - `TestMail_DefaultPolicyIsMandatory`
  - `TestMail_PolicyNoneAllowsPlaintext`
  - `TestSMTPTLSPolicy_ParsedFromINI` (4 sub-tests for INI hydration)
  - `TestSMTPTLSPolicy_JobInheritsFromGlobal`
  - `TestSMTPTLSPolicy_JobOverridesGlobal`
  - `TestSetGlobalSecurityConfig_WarnsOnAllowAll`
  - `TestSetGlobalSecurityConfig_WarnsOnEmpty`
  - `TestSetGlobalSecurityConfig_NoWarnOnExplicitAllowList`
  - `TestSetGlobalSecurityConfig_NoWarnOnNil`
- [x] Existing mail tests updated to use `SMTPTLSPolicy:
SMTPTLSPolicyNone` against the plaintext test SMTP fixture (5 tests)
- [x] Docs drift test (`TestConfigGlobalKeysAreDocumented`) green for
new `smtp-tls-policy` key
techsolo12 pushed a commit to techsolo12/ofelia that referenced this pull request May 14, 2026
## Summary

Cut the existing `[Unreleased]` CHANGELOG block at
[fad5239](netresearch@fad5239)
into a versioned `[0.25.0] - 2026-05-14` heading, and fill in two
entries for PRs that landed after the previous `[Unreleased]` writeups:

- **### Security** — Go toolchain `1.26.2` → `1.26.3`
([netresearch#662](netresearch#662)). Clears six
stdlib advisories reachable from this codebase (`net/mail`,
`html/template`, `net`, `net/http`); refreshes direct deps
(`docker/cli`, `golang.org/x/{crypto,term,text}`) and the full indirect
graph. Post-bump `govulncheck` is down to the two unfixable upstream
moby advisories on `docker/docker` v28.5.2.
- **### Fixed** — `MaxRuntime` cancellation now stops *and removes* the
container/service
([netresearch#659](netresearch#659), fixes
[netresearch#655](netresearch#655)). Completes
[netresearch#651](netresearch#651 deadline wiring
with a fresh `context.WithTimeout(context.Background(),
jobCleanupTimeout)` cleanup context so stop/remove still runs after the
parent deadline fires. Mirrored into `RunServiceJob`.

## Headline changes since v0.24.0

- **Security**: Go 1.26.3 toolchain, three silent-downgrade vectors
closed (`https://` mTLS, SMTP STARTTLS default, webhook allow-list
typo), fail-closed on `tcp+tls://` without cert material
([netresearch#660](netresearch#660),
[netresearch#646](netresearch#646),
[netresearch#662](netresearch#662))
- **New**: `tcp+tls://` `DOCKER_HOST` scheme re-enabled
([netresearch#625](netresearch#625));
`DOCKER_TLS_VERIFY` / `DOCKER_CERT_PATH` honored
([netresearch#613](netresearch#613))
- **Correctness**: bounded contexts in scheduler / health / Docker pings
([netresearch#636](netresearch#636),
[netresearch#651](netresearch#651));
orphan-container cleanup on MaxRuntime
([netresearch#659](netresearch#659)); pervasive
nil-guard pass across the Docker adapter
([netresearch#626](netresearch#626),
[netresearch#639](netresearch#639),
[netresearch#648](netresearch#648),
[netresearch#658](netresearch#658))
- **Refactor / DX**: unified Docker host / scheme resolution
([netresearch#629](netresearch#629)); webhook global
config dual-store collapsed
([netresearch#637](netresearch#637)); `[global]`
label handling unified across all subsystems
([netresearch#661](netresearch#661))

## Version bump rationale

Pre-1.0 semver — minor bump because the range includes one `feat:`
([netresearch#625](netresearch#625) — `tcp+tls://`
scheme re-enabled), several `fix(security):` PRs that surface
previously-silent downgrades, and the `[global]` label-handling rework.
The webhook key rename in
[netresearch#620](netresearch#620) /
[netresearch#637](netresearch#637) is shipped under
`### Deprecated` (legacy `ofelia.webhooks` form keeps working with a
one-shot warning), not as a breaking change.

## Notes

- This PR touches **only `CHANGELOG.md`** — matches the v0.24.0 prep
pattern ([netresearch#602](netresearch#602)). The
Release workflow injects the version into `cli.Version` via ldflags from
the tag, so no `cli/version.go` edit is needed.
- After merge: signed annotated tag `v0.25.0` will be pushed to the
merge commit, triggering
[`release-go-app.yml`](https://github.com/netresearch/.github/blob/main/.github/workflows/release-go-app.yml)
for binaries, container image, cosign `--bundle` signatures, and SLSA
attestations.
- Contributor thanks will be added directly to the GitHub release
description (not the CHANGELOG, per project convention from v0.24.0).

## Test plan

- [x] `go build ./...` clean
- [ ] CI green on this PR
- [ ] CHANGELOG renders correctly on GitHub Files Changed tab
- [ ] After merge: signed annotated tag `v0.25.0` created on the merge
commit (`git tag -s v0.25.0 -m "v0.25.0"`) and pushed
- [ ] Release workflow run succeeds end-to-end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docker SDK: tcp+tls:// without cert material silently uses system CA / no client cert (should fail-closed)

2 participants