Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
net: replicate DNS resolution behaviour of getaddrinfo(glibc) in the go dns resolver #18518
What version of Go are you using (
I agree we should prefer to match glibc behavior, but looking at glibc's source code I believe we're already doing that. The code in check_pf.c that you linked to doesn't actually implement the RFC 3484 sorting logic. That's handled by the qsort/rfc3484_sort calls at https://github.molgen.mpg.de/git-mirror/glibc/blob/master/sysdeps/posix/getaddrinfo.c#L2629, which appears to handle unconditionally as long as naddrs>1.
Can you provide more evidence of what you think is going wrong? E.g., output from net.LookupIP under both GODEBUG=netdns=go and GODEBUG=netdns=cgo?
Sure, I'm happy to provide as much information as possible.
The check_pf.c code does not implement rfc3484 directly but the results of it affects the outcome of getaddrinfo.
Walking through getaddrinfo
In getaddrinfo.c, check_pf can be called in either
And since by default, AI_ADDRCONFIG is not set (I am not 100% sure on this), we'll look at the second scenario. In this scenario, if no IPv6 address is found in check_pf,
All this will in turn affect the netmask that is used to do Rule 9 Comparison in rfc3484_sort. https://github.molgen.mpg.de/git-mirror/glibc/blob/master/sysdeps/posix/getaddrinfo.c#L1732
Example results from different commands
Sorry for the long reply, I felt that walking through the code and showing the results of the commands would be the most helpful.
Thanks for elaborating. That was helpful.
Tracking back through glibc commit history, it looks like that behavior was introduced in https://github.molgen.mpg.de/git-mirror/glibc/commit/df18fdb93027b2e18919707d54556f8bb5f4694b, which was reportedly to fix glibc bug 4599: https://sourceware.org/bugzilla/show_bug.cgi?id=4599
There's nothing to suggest to me that the change in behavior was intentional though. The commit/bug was simply that AI_ADDRCONFIG was meant to ignore loopback IPs, but that commit suddenly disabled RFC 3484 when there are no non-loopback IPv6 addresses. I somewhat wonder if it was a mistaken optimization.
We can certainly mimic glibc's behavior here, but it seems very arbitrary to me. I don't see any obvious reason why the presence of IPv6 addresses on a machine should affect sorting of IPv4 addresses.
I'm more inclined to say we should just not apply Rule 9 to IPv4 addresses at all.
We do not set AI_ADDRCONFIG: https://golang.org/src/net/cgo_linux.go
@mdempsky Yeah, I agree that it doesn't look like the change was intentional. However it might also be because of that unintentional change that DNS round robin is working properly for a lot of people.
Personally, I am happy to disable Rule 9 for IPv4 addresses but I am worried that having a different behaviour from glibc will once again confuse people in the future.
Yeah, understood. Here's my thought process:
Currently for simplicity, we're not precisely matching glibc behavior anyway: glibc only applies CommonPrefixLen to IPv4 src/dst pairs that are on the same local subnet, but we instead check for the same special IPv4 block (i.e., 192.168/16, 10/8, ...). This does the "wrong" thing when src/dst are on the same public IPv4 subnet (e.g., 8.8.8/24), or on different subnets within a special block (e.g., 192.168.0/24 and 192.168.1/24).
RFC 6724 updates RFC 3484 to clarify that CommonPrefixLen should only consider the "prefix," not the "interface ID." These terms don't appear explicitly defined for IPv4-mapped addresses, but my best interpretation is it means within 192.168.1/24, we should consider 192.168.1.2, 192.168.1.3, and 192.168.1.4 as equally preferable. But currently glibc and addrselect.go treat .2 and .3 as closer than .4 (which is silly, since they're all on the same local network).
If we don't fix same-subnet detection, the best we can do is to detect when the src/dst addresses are both on the same IPv4 special block and give them preference... but that seems very similar to how RFC 3484 used to classify RFC 1918 subnets as "site-local" scope, which RFC 6724 says broke things and is why they're not classified as "global" scope instead. (Admittedly, they explain it's because of 6to4 tunneling, which I have no idea whether is still in use.)
So I'm currently strongly in favor of just disabling Rule 9 for IPv4 addresses, but willing to reconsider given sufficiently strong evidence for glibc's odd behavior.
If we can make it obvious that the go resolver intentionally deviates from glibc and we are willing to deviate from glibc, I would also much prefer that we disable Rule 9 for IPv4 addresses completely.
As it stands right now,
@pmarks-net From my understanding of RFC6724, the common prefix len of Rule 9 is suppose to be applied to only the first 64 bit of the IPv6 address (routing prefix and subnet id) and then the selection order should be whatever that is returned by the DNS. (https://tools.ietf.org/html/rfc6724 page 15 - Rule 9 and page 29 - common prefixlen).
@pmarks-net Agreed it's possible IPv6 networks will have similar issues and we may eventually need to disable Rule 9 completely. But that doesn't appear to be the case today.
I think the asymmetry is justified in that IPv4 predates Rule 9, so IPv4 network addressing has not been designed to reflect physical colocality. Also, due to address space constraints, it seems unlikely they ever will.
I'm not informed well enough to gauge whether IPv6 is actually doing a better job here, but at least IPv6 network operators are hopefully aware of Rule 9 and can still design their networks accordingly. Hopefully if it ends up not working, RFC 6724 will be revised again.