New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent faults of docker-internal DNS with IPv6 #2492
Comments
i noticed the same with only ipv4: |
Not quite the same: This issue here is totally random. About 1 out of 10 lookups deliver only one of two addresses (either IPv4 or IPv6) and every once in a while it fails fully, delivering no address. |
I've switched to podman, so I do not care anymore. Closing it now. |
We experience the same issue in a dual stack nat environment. From time to time the resolution failes. We bypass by using v4 address (v6 is also working). Right now i have the impression that nginx is part of the game. As nginx resolver is 127.0.0.11 and v6 resolver is the v6 resolve rof the docker host which does not know anything from the internal service names of docker. Can this be the reason? |
Yes, by removing the ipv6 resolver (the docker host resolver) from nginx the frequent 502 errors disappeared. |
I am experiencing the same issue having IPv6 enabled. However, I think that @belfo is right and it also affects IPv4. The resolv.conf of my containers looks as follows
The 127.0.0.11 nameservice is the docker "internal" one and it responds with IPv4 and IPv6 addresses of external hosts and other containers in the same network. The latter one is my router's IPv6 DNS (see moby/moby#41651) whose answers to service names naturally is always NXDOMAIN. However, as I am seeing NXDOMAIN from time to time for other docker services in the same network I would also conclude - with my limited understanding - that 127.0.0.11 sometimes fails for BOTH IPv4 and IPv6. Because, either IPv4 or IPv6 DNS answer should be enough for my services to connect, right? docker info:
|
I don't think so. Both name servers are requested in parallel. If the response from your router arrives first, DNS resolution of the internal Docker names fails. It's a race condition. |
Are they really queried in parallel? Form manpage:
|
This is how i understand this comment: moby/moby#41651 (comment) |
Oh wow that explains why only a subset of services is affected. I will check if those are alpine based. Musl then behaves different from glibc. |
Background / Observation
I’m running a small docker environment on a raspberry pi and I’ve recently enabled IPv6 (because it's nearly 2020).
For various reasons (isolation, daily changing IPv6 prefix from my ISP…) I went the road of disabling the userland proxy (via daemon.json) and adding the ipv6nat container. The IPv6 addresses for the containers are chose from a randomly selected ULA prefix (fd00:).
After convincing a number of containers (node-red, influx…) to actually use IPv6 (because IPv6 is such a new technology, only 20 years old), they are able to reach each other and can be reached from outside accessing only the IPv4 or IPv6 address of the host machine. DNS resolution in my home network also works as expected.
However, a somewhat fishy problem surfaced: The node red container complains roughly every 10 minutes that it couldn’t reach the influx container due to a DNS resolution error (getaddrinfo ENOTFOUND). It tries to access it every minute, but then about 10 concurrent requests. It doesn’t fail always, but in about 1 % of the trials DNS resolution fails, randomly.
Neither the system itself nor the network is under any significant load.
Debugging
I tried to do two things to debug: Log into the container and run nslookup manually and at the same time look at the docker logs after setting the daemon into debug mode. Here are the results:
Normal, expected case
Command line:
Logs:
Half-broken case
Every now and then one of the following happens, i.e. either IPv4 or IPv6 is missing:
Logs:
Error case
Worst case is this, happening less frequently, but still in about 1 % of the cases, causing a failed connection between containers:
Logs:
Causes / Interpretation
Well, I'm no expert on docker networking, but if things fail rarely and randomly, it feels like a race condition to me.
The text was updated successfully, but these errors were encountered: