Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows DNS resolver forwarding #47584

Merged
merged 2 commits into from Apr 19, 2024
Merged

Conversation

robmry
Copy link
Contributor

@robmry robmry commented Mar 18, 2024

- What I did

Make the internal DNS resolver for Windows containers forward requests to upsteam DNS servers when it cannot respond itself, rather than returning SERVFAIL.

Windows containers are normally configured with the internal resolver first for service discovery (container name lookup), then external resolvers from '--dns' or the host's networking configuration.

When a tool like ping gets a SERVFAIL from the internal resolver, it tries the other nameservers. But, nslookup does not, and with this change it does not need to.

- How I did it

The internal resolver learns external server addresses from the container's HNSEndpoint configuration, so it will use the same DNS servers as processes in the container.

The internal resolver for Windows containers listens on the network's gateway address, and each container may have a different set of external DNS servers. So, the resolver uses the source address of the DNS request to select external resolvers.

A feature option can be used to enable forwarding from the internal resolver. In daemon.json ...

  "features": {
	  "windows-dns-proxy": true
  },

The plan is for the default in release 26.1 to be no-proxying (no change in behaviour), then to enable it by-default in 27.0.

- How to verify it

New unit and integration tests. (Although, the integration test can only check that external DNS requests are not forwarded, until the default is changed to enable forwarding.)

Before the change, with no features config, or with "features": { "windows-dns-proxy": false } in daemon.json ...

...
docker run -ti --rm --network mynat --name c1 mcr.microsoft.com/windows/servercore:ltsc2022 powershell.exe
...
PS C:\> nslookup google.com
Server:  UnKnown
Address:  172.31.144.1

Now, with "features": { "windows-dns-proxy": true } in daemon.json ...

.\dockerd.exe
...
docker run -ti --rm --network mynat --name c1 mcr.microsoft.com/windows/servercore:ltsc2022 powershell.exe
...
PS C:\> nslookup google.com
Server:  UnKnown
Address:  172.31.144.1

Non-authoritative answer:
Name:    google.com
Addresses:  2a00:1450:4009:817::200e
          172.217.169.14

- Description for the changelog

Native Windows containers are configured with an internal DNS server for container name
resolution, and external DNS servers for other lookups. Not all resolvers, including `nslookup`
fall back to the external resolvers when they get a SERVFAIL from the internal server. So, the
internal DNS server can now be configured to forward requests to the external resolvers, by
using a feature option in the `daemon.json` file: `"features": { "windows-dns-proxy": true }`.
This will become the default behaviour in a future release.

@robmry robmry changed the title Upstream dns windows Windows DNS resolver forwarding Mar 18, 2024
@robmry robmry force-pushed the upstream_dns_windows branch 2 times, most recently from 5f8c383 to 5266bbe Compare March 19, 2024 20:50
@robmry robmry self-assigned this Mar 20, 2024
@robmry robmry added this to the 27.0.0 milestone Mar 20, 2024
@robmry robmry marked this pull request as ready for review March 20, 2024 09:27
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I have some nits to pick with the implementation details, but the design looks good to me.

daemon/container_operations.go Outdated Show resolved Hide resolved
libnetwork/network_unix.go Outdated Show resolved Hide resolved
libnetwork/network_windows.go Outdated Show resolved Hide resolved
libnetwork/resolver.go Outdated Show resolved Hide resolved
libnetwork/resolver.go Show resolved Hide resolved
libnetwork/network.go Outdated Show resolved Hide resolved
libnetwork/network_windows.go Outdated Show resolved Hide resolved
libnetwork/network_windows.go Show resolved Hide resolved
libnetwork/network_windows.go Outdated Show resolved Hide resolved
@robmry robmry force-pushed the upstream_dns_windows branch 2 times, most recently from f157918 to 327c901 Compare March 21, 2024 15:50
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am only requesting changes because I spotted a (potential) bug with OptionDNSNoProxy under Linux. All my other review comments this round are optional nitpicks.

libnetwork/network_windows.go Outdated Show resolved Hide resolved
libnetwork/resolver.go Outdated Show resolved Hide resolved
libnetwork/resolver.go Outdated Show resolved Hide resolved
libnetwork/sandbox.go Outdated Show resolved Hide resolved
libnetwork/sandbox_dns_unix.go Outdated Show resolved Hide resolved
@robmry
Copy link
Contributor Author

robmry commented Mar 22, 2024

@akerouanton - suggested replacing "--dns-no-proxy" with a feature-flag ... as it's only intended as an emergency off-switch for this change, and we'll want to get rid of it.

I've made it Windows-only, as we don't really have a use-case for it on Linux. So, removed it from the regression test. But I left in Sandbox.RefreshResolverPolicy() as it seems a bit neater anyway.

To disable this change, it's now something like this in daemon.json ...

  "features": {
	  "windows-no-dns-proxy": true
  },

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing Windows-only about OptionDNSNoProxy is the conditions under which the daemon could pass the option. It is violation of the principle of least surprise for constructing sandboxes with OptionDNSNoProxy() to be a no-op on some platforms. IMO it's a footgun: someone might try to contribute a change which relies on passing OptionDNSNoProxy on non-Windows platforms and nobody notices until it's a CVE. Either make OptionDNSNoProxy() disable DNS proxying irrespective of platform, or make it so passing OptionDNSNoProxy() on platforms where it is ineffective would fail fast and loudly. Compile error would be ideal since the bugged code would never make it past CI, but I would also be happy with func (*Controller) NewSandbox returning an error.

daemon/container_operations.go Outdated Show resolved Hide resolved
libnetwork/sandbox.go Outdated Show resolved Hide resolved
libnetwork/sandbox_dns_unix.go Show resolved Hide resolved
libnetwork/endpoint.go Outdated Show resolved Hide resolved
libnetwork/sandbox.go Outdated Show resolved Hide resolved
libnetwork/sandbox.go Outdated Show resolved Hide resolved
Copy link
Member

@akerouanton akerouanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -514,6 +515,12 @@ func getConflictFreeConfiguration(configFile string, flags *pflag.FlagSet) (*Con
return nil, err
}

if runtime.GOOS != "windows" {
if _, ok := config.Features["windows-no-dns-proxy"]; ok {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bike-shedding; would (windows)-disable-dns-proxy be clearer?

Also (question); is this an option we could see becoming opt-in/opt-out for non-Windows? And in that case, do we need the windows- prefix? (we can still document - and error- when used on other platforms)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the name ... they look quite equivalent to me (and the current name is less typing!), but happy to change it if you like?

It started off as a Windows and Linux option - but there's no known use-case for disabling it on Linux, and we don't really need more options to maintain. It's really only meant as an emergency off-switch ... hopefully nobody will need it but, if there's something we didn't think of, being able to restore the old behaviour might save us from having to make a patch in a hurry. So, Albin suggested making it a Windows-only feature flag - which will make it easier to remove in future if it's not useful.

Comment on lines 57 to 58
// Add OS-specific Sandbox options.
if err := buildSandboxOptionsOS(container, cfg, &sboxOptions); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More bike-shedding; I know we have some other "options" / platform-specific code for which we use platform as term for this, e.g. ValidatePlatformConfig and setPlatformDefaults in the daemon/config package. Perhaps it would be clear to follow that convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes - I'll change it.

@@ -514,6 +515,12 @@ func getConflictFreeConfiguration(configFile string, flags *pflag.FlagSet) (*Con
return nil, err
}

if runtime.GOOS != "windows" {
if _, ok := config.Features["windows-no-dns-proxy"]; ok {
return nil, errors.New("feature option 'windows-no-dns-proxy' is only available on Windows")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Other comment just made me realise we have a Config.ValidatePlatformConfig function with implementations for Linux and Windows, so wondering if this should belong there;

// ValidatePlatformConfig checks if any platform-specific configuration settings are invalid.
func (conf *Config) ValidatePlatformConfig() error {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd missed that - thank you, I'll move it.

@vvoland vvoland modified the milestones: 26.1.0, 27.0.0 Apr 10, 2024
Co-authored-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatting with @corhere - we should turn it into an opt-in for 26.1, and then turn into opt-out in 27.0.

@robmry robmry modified the milestones: 27.0.0, 26.1.0 Apr 16, 2024
// By default, the Windows internal resolver does not forward requests to
// external resolvers - but forwarding can be enabled using feature flag
// "windows-dns-proxy":true.
if doproxy, exists := cfg.Features["windows-dns-proxy"]; !exists || !doproxy {
Copy link
Contributor

@vvoland vvoland Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should leave the old name windows-dns-no-proxy and turn the conditional here into just value.

The reason why I think it could be better is that it's a bit confusing to set feature to false to opt-in into a new behavior and IMO it's more intuitive if the missing value is equivalent to explicit false rather than true.

The alternative would be to change the windows-dns-proxy into windows-dns-no-proxy later, but that would mean that users that opt-in now, will have to adjust their config again.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it because it got a bit double-negative-y, I think this naming is a bit simpler. Disabling it, if needed, in 27.0 will look like this - I think its meaning is clear? ...

  "features": {
	  "windows-dns-proxy": false
  },

Docs also have an example of an enabled-by-default feature that can be disabled ... https://docs.docker.com/reference/cli/dockerd/#feature-options

So, I prefer it this way - but can change it back if needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Either way - I'll raise a new PR to change the default, and mark it for milestone 27.0, as soon as this one's merged.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion on my side, but maybe @thaJeztah has some preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... yeah, the ones where defaults change are always a bit tricky I guess. So the new default will become "disabled by default" (but currently it's "enabled by default"?)

From some perspective windows-dns-no-proxy or windows-disable-dns-proxy or windows-dns-proxy-disabled would be more friendly, but also considering what it would look like if we would be showing the features as part of docker info

I don't think we currently expose the features, but I think that's something we could consider, and if we do, for those to reflect the defaults (in this case, that would mean windows-dns-proxy: false?)

Copy link
Contributor Author

@robmry robmry Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for release 26.1.0 the default will be windows-dns-proxy: false.

In 27.0.0, we'll change the default to windows-dns-proxy: true.

If those were displayed in docker info, that's what they'd look like ... clear, I think?

(I reckon our minds are corrupted by Go's zero-values ... we think a default needs to be false, that's why I originally used windows-no-dns-proxy. But, for normal people (!), I think getting rid of the double-negative in windows-no-dns-proxy: false is a good thing.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I think getting rid of the double-negative is great. IMHO it feels more natural to have to turn a feature-flag 'true' to get it enabled and 'false' to opt-out.

Make the internal DNS resolver for Windows containers forward requests
to upsteam DNS servers when it cannot respond itself, rather than
returning SERVFAIL.

Windows containers are normally configured with the internal resolver
first for service discovery (container name lookup), then external
resolvers from '--dns' or the host's networking configuration.

When a tool like ping gets a SERVFAIL from the internal resolver, it
tries the other nameservers. But, nslookup does not, and with this
change it does not need to.

The internal resolver learns external server addresses from the
container's HNSEndpoint configuration, so it will use the same DNS
servers as processes in the container.

The internal resolver for Windows containers listens on the network's
gateway address, and each container may have a different set of external
DNS servers. So, the resolver uses the source address of the DNS request
to select external resolvers.

On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used
to prevent the internal resolver from forwarding requests (restoring the
old behaviour).

Signed-off-by: Rob Murray <rob.murray@docker.com>
Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine as long as we don't mutate the Features (which this PR doesn't do) - so if we ever want to expose the features (for example in docker info), we don't pollute it by default with features that are opt-out and are considered a "normal" code path.

@akerouanton
Copy link
Member

I'm fine as long as we don't mutate the Features (which this PR doesn't do) - so if we ever want to expose the features (for example in docker info), we don't pollute it by default with features that are opt-out and are considered a "normal" code path.

Then, how would you show that a specific flag is opt-out? I think something like that would be the best:

Feature:
    windows-dns-proxy: enabled (default)
    foobar: disabled (default)
    somethingsomething: enabled (user-defined)

@vvoland
Copy link
Contributor

vvoland commented Apr 18, 2024

Hmm, I think I wouldn't mind that output for the actual features like:

Feature:
    containerd-snapshotter: disabled (default)
    buildkit: enabled (default)

But, I don't like showing windows-dns-proxy: enabled (default) because it's not really a feature, but an escape hatch out of a new, default behavior, which most users don't even care about because it's not really relevant outside of WCOW (which is a niche on its own).

@vvoland vvoland merged commit e829cca into moby:master Apr 19, 2024
126 checks passed
renovate bot added a commit to earthly/dind that referenced this pull request Apr 29, 2024
[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [docker/docker](https://togithub.com/docker/docker) | minor | `26.0.2`
-> `26.1.0` |

---

### Release Notes

<details>
<summary>docker/docker (docker/docker)</summary>

### [`v26.1.0`](https://togithub.com/moby/moby/releases/tag/v26.1.0)

[Compare
Source](https://togithub.com/docker/docker/compare/v26.0.2...v26.1.0)

#### 26.1.0

For a full list of pull requests and changes in this release, refer to
the relevant GitHub milestones:

- [docker/cli, 26.1.0
milestone](https://togithub.com/docker/cli/issues?q=is%3Aclosed+milestone%3A26.1.0)
- [moby/moby, 26.1.0
milestone](https://togithub.com/moby/moby/issues?q=is%3Aclosed+milestone%3A26.1.0)
- Deprecated and removed features, see [Deprecated
Features](https://togithub.com/docker/cli/blob/v26.1.0/docs/deprecated.md).
- Changes to the Engine API, see [API version
history](https://togithub.com/moby/moby/blob/v26.1.0/docs/api/version-history.md).

##### New

- Add configurable OpenTelemetry utilities and basic instrumentation to
commands.
For more information, see [OpenTelemetry for the Docker
CLI](https://docs.docker.com/config/otel).
[docker/cli#4889](https://togithub.com/docker/cli/pull/4889)

##### Bug fixes and enhancements

- Native Windows containers are configured with an internal DNS server
for container name resolution, and external DNS servers for other
lookups. Not all resolvers, including `nslookup`, fall back to the
external resolvers when they get a `SERVFAIL` answer from the internal
server. So, the internal DNS server can now be configured to forward
requests to the external resolvers, by setting `"features":
{"windows-dns-proxy": true }` in the `daemon.json` file.
[moby/moby#47584](https://togithub.com/moby/moby/pull/47584)

> \[!NOTE]
> This will be the new default behavior in Docker Engine 27.0.

> \[!WARNING]
> The `windows-dns-proxy` feature flag will be removed in a future
release.

- Swarm: Fix `Subpath` not being passed to the container config.
[moby/moby#47711](https://togithub.com/moby/moby/pull/47711)
- Classic builder: Fix cache miss on `WORKDIR <directory>/` build step
(directory with a trailing slash).
[moby/moby#47723](https://togithub.com/moby/moby/pull/47723)
- containerd image store: Fix `docker images` failing when any image in
the store has unexpected target.
[moby/moby#47738](https://togithub.com/moby/moby/pull/47738)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "after 6am on monday" (UTC), Automerge
- At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/earthly/dind).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zMjEuMiIsInVwZGF0ZWRJblZlciI6IjM3LjMyMS4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZSJdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: idodod <ido@earthly.dev>
@robmry robmry deleted the upstream_dns_windows branch May 1, 2024 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

External DNS forwarding is broken on Windows container NAT networks
5 participants