-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go-lang "netgo" DNS resolver bug with catch-all DNS server entries #10863
Comments
While this is triggered through the angle of distribution stuff (index and registry), it certainly affects any other network operation depending on the same code. |
Ping @MalteJ |
ping @estesp too :) |
@dmp42 I think you shouldn't use a search-domain that resolves a wildcard subdomain. By the way, curl always uses IPv4 unless you do |
@MalteJ maybe. Now:
"Search" should be used as a fallback in case no valid record is found for the fqdn (IIRC). |
hmm, it looks like the DNS will be queried for an AAAA record for |
From tcpdump, here is what is apparently resolved in order:
|
And answers:
|
I think they describe a similar problem: |
My DNS competence really stops right here - I let it to the super savvys to figure out :) |
I am not sure if we can do something about that. To me it sounds more like a Kernel or Golang issue. |
@dmp42 Could you provide exact steps for reproduce, so we can write script for |
@LK4D4 I think you need a DNS that resolves
|
@LK4D4 not entirely trivial but:
Also @chmanie from the original bug report is very friendly and probably willing to help testing since he has a "not working" setup. |
Based on the description this change comes to mind fdd2abe? Although this change supposedly was made in 1.4.0. |
@dmcgowan beat me to it, but yeah..also was in 1.4.0.. this has to be related to registry DNS lookup, not the fact that IPv6 was added to the container network model in 1.5.0 |
Maybe both? |
Given @dmp42's description of the repro scenario, @MalteJ's pointer to the OpenDNS blog seems useful unless I misunderstand the problem: #10863 (comment) - <-- the blog notes the lookup issue for |
Could the move from go1.3.3 -> go1.4.1 be a factor here? That did happen between the two Docker releases, and a quick look at dnsclient_unix.go Note that it's not fun to dig for differences as |
That is my first guess. |
This is the set of changes that I found most interesting--the actual DNS "client" implementation that is called from the code you linked to ( Two things are interesting that I haven't had time to dig through thoroughly--but the |
I'm actually confused with how the DNS server setup gets the client into the state reported.. on either go1.3.3 or go1.4.2 compiled binary doing IP lookups, the following flow happens for index.docker.io: go1.4.2
go 1.3.3
I would have to mispell index.docker.io to get the client to start adding search terms (or mess with looking for index.docker.iioo:
This is with a bind server running in a container, acting as an authoritative master for "testdocker.org" and a Maybe the original reporter (@chmanie) can give some more detail on the DNS setup so I can better reproduce the exact scenario? |
I'm not exactly a dev-op, but happy to help! Could you help me on getting the information you need? |
Sure @chmanie , thanks! What would be most helpful is the exact |
@estesp I believe we are currently experiencing the same issue. Our setup is as follows. Let me know if I can provide anything to help. |
In general a (stub) resolver library, like the one in libc, appends the search domain, then tries to resolv that and fails back to the name without search domain second. BUT since this is a bit expensive, it usually excepts domains with 'more than one dot'. See man resolv.conf under ndots:
The golang resolver seems to have same defaults supports reading ndots from /etc/resolv.conf as well: https://golang.org/src/net/dnsconfig_unix.go |
@estesp I am the hoster from @chmanie's server where the problem occured. The problem was the "search somedomain" directive in the /etc/resolv.conf file. You have most likely the same problem. If you delete this line then should everything work fine. Edit: @chamnie's server is dual stack ready and in the resolv.conf was the "search domain" line. But the docker server didn't (or still doesn't) have a AAAA record. Now the docker tool on @chmanie's server tried to request the docker server with ipv6 first. Because of the "search domain" line could the requested domain resolved, sadly to the ip from "domain", not to the correct ip. That's why the docker tool didn't try to resolve the docker domain over ipv4. Edit 2: |
Side note: It seems we're using netgo, not the default which would use libc. |
Whenever I remove the "search ec2.internal" |
@discordianfish I will rename.. I thought I had a true netgo binary in my little stable of lookup binaries, but I forgot about the I now have go1.3.3, go1.4.2, go1.4.2-docker (built in a Docker dev container against the custom-built Go), and go1.4.2-netgo. The go1.4.2-netgo-built binary exhibits the actual problem:
Note: my silly DNS setup is responding with the useless IPv6 address based on a match-all rule The following trace shows that the netgo lookup code is willing to append the search domain even while the worker (see
|
Note that the core problem appears to be that the request for |
@drieschel by the way, thanks for providing the extra details. At this point, it definitely seems like a bug in the Go 'netgo' (versus cgo -> libc) resolver code related to how A and AAAA record results are handled differently in more recent Go versions. I don't think there is necessarily any specific problem with your DNS setup and Docker, although we definitely seem to have exposed a weird bug to resolve with the Go community. |
@estesp / @drieschel Looks like you isolate the problem very well, can you open an issue upstream? I closed my golang/go#11070 since it's a different issue (the issue @sstarcher has as well), although the issue described here is pretty much confirmed already. |
@discordianfish I just opened golang/go#11081 |
@estesp Great, thanks! |
Quick update--this will most likely be fixed in Go 1.5; this patchset fixes the problem and is under review: https://go-review.googlesource.com/#/c/10836/ I don't know how we want to handle the Docker side of this issue as moving to Go 1.5 is gated on its release and then, a follow-on decision and timeframe for Docker to be built by Go 1.5 compilers. |
The patch for the netgo DNS lookup bug is now merged in Go-lang and appears though it will make the Go 1.5 release. When Docker starts building with Go 1.5, this problem can be validated as resolved and this issue closed. |
@estesp we're on Go 1.5+ now; is it safe to close this issue? |
Yes, we can definitely close now. Sent from my iPhone
|
This is following-up investigation from #10802
In the following scenario:
where "mydns" would return an ipv6 address for any subdomain of mydomain
What happens is that when docker 1.5 resolve
index.docker.io
it will favor the ipv6 returned forindex.docker.io.mydomain.
rather than the A record forindex.docker.io.
This does NOT happen (apparently) if
index.docker.io.mydomain.
returns an ipv4 record.Obviously, this does not happen either with docker 1.4.
Finally, curl (for comparison) does NOT use the ipv6 address, but does correctly use the A record for
index.docker.io.
.To me, this sounds like a DNS resolution order preference bug - not sure if this is a docker bug, or lower.
cc @chmanie @icecrime @stevvoe @dmcgowan
The text was updated successfully, but these errors were encountered: