New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: DNS address resolution quirks (AAAA records inconsistency) #25321

Open
gdm85 opened this Issue May 9, 2018 · 6 comments

Comments

Projects
None yet
5 participants
@gdm85

gdm85 commented May 9, 2018

This bug reports is about an inconsistency on how resolution is handled between the Go resolver and the CGO one.

I do not expect a bugfix (although probably beneficial, but I leave that estimation to others) but at least an understanding of why the Go resolver behaves this way.

What version of Go are you using (go version)?

go version go1.10.1 linux/amd64

Does this issue reproduce with the latest release?

Latest release is 1.10.2 at the time of writing; not tested, by reading the release notes, nothing should have changed on the relevant code.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="[...]"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="[...]"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build482647299=/tmp/go-build -gno-record-gcc-switches"

Test setup

issue25321.go can be obtained from https://play.golang.org/p/kE_Unq4VvkO

IPv6 is disabled on this box; the DNS server may or may not return AAAA records (I have a toggle for that).

When AAAA answers are allowed:

$ nslookup -query=AAAA www.googleapis.com
Server:		127.0.0.1
Address:	127.0.0.1#53

Non-authoritative answer:
www.googleapis.com	canonical name = googleapis.l.google.com.
googleapis.l.google.com	has AAAA address 2a00:1450:4001:825::200a

Authoritative answers can be found from:

When they are not allowed:

$ nslookup -query=AAAA www.googleapis.com
Server:		127.0.0.1
Address:	127.0.0.1#53

*** Can't find www.googleapis.com: No answer

But in both cases, an A query works:

$ nslookup -query=A www.googleapis.com
;; Warning: Message parser reports malformed message packet.
;; Truncated, retrying in TCP mode.
Server:		127.0.0.1
Address:	127.0.0.1#53

Non-authoritative answer:
www.googleapis.com	canonical name = googleapis.l.google.com.
Name:	googleapis.l.google.com
Address: 216.58.207.74
[... amended ...]

Test results

Reminder: IPv6 is always disabled, only the netdns resolver and the responses of the DNS are varying for the below tests.

Command DNS returns AAAA DNS does not return AAAA
go run issue25321.go :finnadie: 2018/05/10 00:52:01 dial failed www.googleapis.com Get http://[2a00:1450:4001:821::200a]:80/: dial tcp [2a00:1450:4001:821::200a]:80: connect: cannot assign requested address
2018/05/10 00:52:11 HTTP failed www.googleapis.com Get http://www.googleapis.com/: dial tcp [2a00:1450:4001:821::200a]:80: connect: cannot assign requested address
:finnadie: 2018/05/10 00:43:23 dial failed www.googleapis.com lookup www.googleapis.com on 127.0.0.1:53: read udp 127.0.0.1:59700->127.0.0.1:53: i/o timeout
2018/05/10 00:43:33 HTTP failed www.googleapis.com Get http://www.googleapis.com/: dial tcp: lookup www.googleapis.com on 127.0.0.1:53: read udp 127.0.0.1:43538->127.0.0.1:53: i/o timeout
GODEBUG=netdns=cgo go run issue25321.go :feelsgood: 2018/05/10 00:47:23 dial failed www.googleapis.com Get http://[2a00:1450:4001:81d::200a]:80/: dial tcp [2a00:1450:4001:81d::200a]:80: connect: cannot assign requested address
2018/05/10 00:47:23 OK www.googleapis.com 192.168.1.12:57248 -> 216.58.207.74:80
:suspect: 2018/05/10 00:43:07 OK www.googleapis.com 192.168.1.12:57156 -> 216.58.207.74:80

Forgive the horrible representation, but there are two log lines at most in those table cells, you can see them better by copy/pasting their content.

Worthy of note: in the case of CGO resolver and AAAA answers allowed, first there is a failure (dialer) and then a success (HTTP request).

Another note: resolving www.bing.com is not affected by this problem, so problem must be related to how the records are returned from the DNS.

Expected results

The expected result for all the 4 combinations would be (since IPv6 is disabled on this box): do not try AAAA and use an A record, like on the bottom-right cell of the tests matrix.

Questions arising from this test

  1. how is the order of answers handled? is there a preference to AAAA records somehow? (I am inclined to think so)
  2. how could ever the resolution timeout when no AAAA is returned? this would be somehow the most serious part of the bug (if acknowledged), although it should first be determined if it is not a problem of the DNS (server-side)

Related

@agnivade agnivade added this to the Go1.11 milestone May 10, 2018

@agnivade

This comment has been minimized.

Member

agnivade commented May 10, 2018

/cc @mikioh

@iangudger

This comment has been minimized.

Contributor

iangudger commented May 17, 2018

Can you test at tip? The Go resolver has been largely rewritten.

@gdm85

This comment has been minimized.

gdm85 commented May 24, 2018

Just tested with master 65c365b, for reference:

$ go version
go version devel +65c365b Wed May 23 23:51:30 2018 +0000 linux/amd64

Results at tip

With AAAA queries returned from server, GODEBUG=netdns=go+10 go run issue25321.go:

go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(www.googleapis.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
2018/05/24 08:15:44 dial failed www.googleapis.com Get http://[2a00:1450:4016:80c::200a]:80/: dial tcp [2a00:1450:4016:80c::200a]:80: connect: cannot assign requested address
go package net: hostLookupOrder(www.googleapis.com) = files,dns
2018/05/24 08:15:44 OK www.googleapis.com 192.168.1.12:35978 -> 172.217.22.234:80

real	0m1.824s
user	0m0.540s
sys	0m0.044s

With AAAA queries returned from server, GODEBUG=netdns=cgo+10 go run issue25321.go :

go package net: using cgo DNS resolver
go package net: hostLookupOrder(www.googleapis.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
2018/05/24 08:15:37 dial failed www.googleapis.com Get http://[2a00:1450:4016:80c::200a]:80/: dial tcp [2a00:1450:4016:80c::200a]:80: connect: cannot assign requested address
go package net: hostLookupOrder(www.googleapis.com) = cgo
2018/05/24 08:15:37 OK www.googleapis.com 192.168.1.12:39562 -> 216.58.207.138:80

real	0m1.606s
user	0m0.476s
sys	0m0.060s

With AAAA queries NOT returned from server, GODEBUG=netdns=go+10 go run issue25321.go:

go package net: GODEBUG setting forcing use of Go's resolver
go package net: hostLookupOrder(www.googleapis.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
go package net: hostLookupOrder(www.google.com) = files,dns
go package net: hostLookupOrder(www.googleapis.com) = files,dns
2018/05/24 08:26:17 OK www.googleapis.com 192.168.1.12:41990 -> 172.217.20.138:80

real	0m2.090s
user	0m0.464s
sys	0m0.044s

With AAAA queries NOT returned from server, GODEBUG=netdns=cgo+10 go run issue25321.go:

go package net: using cgo DNS resolver
go package net: hostLookupOrder(www.googleapis.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
go package net: hostLookupOrder(www.google.com) = cgo
go package net: hostLookupOrder(www.googleapis.com) = cgo
2018/05/24 08:27:06 OK www.googleapis.com 192.168.1.12:43918 -> 172.217.21.10:80

real	0m2.098s
user	0m0.504s
sys	0m0.068s

Conclusion

My conclusion is that problem is still present.

@iangudger

This comment has been minimized.

Contributor

iangudger commented May 25, 2018

@gdm85, it looks like there might have been two bugs.

  1. In your original report, resolution failed when there was only an A and the Go resolver was in use. This was likely caused by the Go resolver rejecting the response containing the A record for some reason. This appears to have been fixed in your update.
  2. In both the original report and the update, Go prefers the IPv6 address over the IPv4 address and does not fall back to the IPv4 address if IPv6 is disabled.

Does that sound right? If so, I think the problem may be a bug in Dial's fallback logic.

@gdm85

This comment has been minimized.

gdm85 commented May 25, 2018

@iangudger yes, I missed that but with 65c365b the top-right scenario has indeed been fixed.

As for the 2nd bug, I have dug a bit deeper. These are the relevant sysctl params (notice the second one):

$ sysctl net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
$ sysctl net.ipv6.bindv6only
net.ipv6.bindv6only = 0

For the records: on this (and similar) boxes no interface has any IPv6 address enabled/used.

I have patched a unit test to quickly determine what probe() is doing (it's a copy/paste from https://github.com/golang/go/blob/master/src/net/ipsock_posix.go with a few t.Log() added):

diff --git a/src/net/ipsock_test.go b/src/net/ipsock_test.go
index aede354..204cda3 100644
--- a/src/net/ipsock_test.go
+++ b/src/net/ipsock_test.go
@@ -7,6 +7,9 @@ package net
 import (
        "reflect"
        "testing"
+       "internal/poll"
+       "runtime"
+       "syscall"
 )
 
 var testInetaddr = func(ip IPAddr) Addr { return &TCPAddr{IP: ip.IP, Port: 5682, Zone: ip.Zone} }
@@ -280,3 +283,59 @@ func TestAddrListPartition(t *testing.T) {
                }
        }
 }
+
+// Probe probes IPv4, IPv6 and IPv4-mapped IPv6 communication
+// capabilities which are controlled by the IPV6_V6ONLY socket option
+// and kernel configuration.
+//
+// Should we try to use the IPv4 socket interface if we're only
+// dealing with IPv4 sockets? As long as the host system understands
+// IPv4-mapped IPv6, it's okay to pass IPv4-mapeed IPv6 addresses to
+// the IPv6 interface. That simplifies our code and is most
+// general. Unfortunately, we need to run on kernels built without
+// IPv6 support too. So probe the kernel to figure it out.
+func TestIPv6(t *testing.T) {
+       s, err := sysSocket(syscall.AF_INET, syscall.SOCK_STREAM, syscall.IPPROTO_TCP)
+       switch err {
+       case syscall.EAFNOSUPPORT, syscall.EPROTONOSUPPORT:
+       case nil:
+               poll.CloseFunc(s)
+               t.Log("p.ipv4Enabled = true")
+       }
+       var probes = []struct {
+               laddr TCPAddr
+               value int
+       }{
+               // IPv6 communication capability
+               {laddr: TCPAddr{IP: ParseIP("::1")}, value: 1},
+               // IPv4-mapped IPv6 address communication capability
+               {laddr: TCPAddr{IP: IPv4(127, 0, 0, 1)}, value: 0},
+       }
+       switch runtime.GOOS {
+       case "dragonfly", "openbsd":
+               // The latest DragonFly BSD and OpenBSD kernels don't
+               // support IPV6_V6ONLY=0. They always return an error
+               // and we don't need to probe the capability.
+               probes = probes[:1]
+       }
+       for i := range probes {
+               s, err := sysSocket(syscall.AF_INET6, syscall.SOCK_STREAM, syscall.IPPROTO_TCP)
+               if err != nil {
+                       continue
+               }
+               defer poll.CloseFunc(s)
+               syscall.SetsockoptInt(s, syscall.IPPROTO_IPV6, syscall.IPV6_V6ONLY, probes[i].value)
+               sa, err := probes[i].laddr.sockaddr(syscall.AF_INET6)
+               if err != nil {
+                       continue
+               }
+               if err := syscall.Bind(s, sa); err != nil {
+                       continue
+               }
+               if i == 0 {
+                       t.Log("p.ipv6Enabled = true")
+               } else {
+                       t.Log("p.ipv4MappedIPv6Enabled = true")
+               }
+       }
+}

Result of running this test:

go test -v -run TestIPv6 ./src/net/
=== RUN   TestIPv6
--- PASS: TestIPv6 (0.00s)
	ipsock_test.go:303: p.ipv4Enabled = true
	ipsock_test.go:338: p.ipv4MappedIPv6Enabled = true
=== RUN   TestIPv6MulticastListener
--- SKIP: TestIPv6MulticastListener (0.00s)
	listen_test.go:615: IPv6 is not supported
=== RUN   TestIPv6LinkLocalUnicastTCP
--- SKIP: TestIPv6LinkLocalUnicastTCP (0.00s)
	tcpsock_test.go:377: IPv6 is not supported
=== RUN   TestIPv6LinkLocalUnicastUDP
--- SKIP: TestIPv6LinkLocalUnicastUDP (0.00s)
	udpsock_test.go:283: IPv6 is not supported
PASS
Socket statistical information:
(inet4, stream|0x80800, tcp): opened=1 connected=0 listened=0 accepted=0 closed=1 openfailed=0 connectfailed=0 listenfailed=0 acceptfailed=0 closefailed=0
(inet6, stream|0x80800, tcp): opened=2 connected=0 listened=0 accepted=0 closed=2 openfailed=0 connectfailed=0 listenfailed=0 acceptfailed=0 closefailed=0

ok  	net	0.002s

Relevant lines:

	ipsock_test.go:303: p.ipv4Enabled = true
	ipsock_test.go:338: p.ipv4MappedIPv6Enabled = true

So in order to reproduce the bug(s), the machine must have IPv6 disabled but IPv6 can bind on IPv4.

To toggle AAAA functionality server-side (DNS) I am using unbound with a modified python filter (I got some clues from https://github.com/berstend/unbound-no-aaaa).

If needed I can reproduce the tests with Go 1.10.1 and master, however - before running further tests - I am also looking at identifying all the test matrix dimensions to eventually create some sort of tests suite.

Temptatively:

  • net.ipv6.conf.all.disable_ipv6 = 1
  • net.ipv6.conf.all.disable_ipv6 = 0 (I have not tested this so far)
  • net.ipv6.bindv6only = 1 (I have not tested this so far)
  • net.ipv6.bindv6only = 0
  • DNS returns A, AAAA
  • DNS returns only A
  • using Go resolver
  • using CGO resolver
  • (add more dimensions here for other OSes if you want to go wild...only Linux tested so far)

Ideally we should be able to identify desired behaviour for all the (valid) permutations; for the records, on Linux all permutations are valid (might not be the case on Windows/BSD/etc).

Another relevant aspect is the total resolution time spent.

Edit: I notice now that the func supportsIPv6() does not seem used for the determination of DNS address to use/return, this is probably intentional but I ignore the rationale...

@arno01

This comment has been minimized.

arno01 commented Oct 9, 2018

Probably this is why the golang apps compiled without CGO (netdns) enabled are throwing Post https://any.com/sdk: dial tcp: lookup any.com on [::1]:53: read udp [::1]:41165->[::1]:53: read: connection refused (see the two referenced terraform and minio client issues above this post)

my-samsung-s8$ sysctl net.ipv6.conf.all.disable_ipv6 net.ipv6.bindv6only
net.ipv6.conf.all.disable_ipv6 = 0
net.ipv6.bindv6only = 0

Can the Go netdns be fixed in a way that it would be falling back to IPv4 if it fails to bind to IPv6? Or, rather, it would not even try IPv6 in the first place if it sees the net.ipv6.bindv6only = 0 or any other relevant to the IPv6 tunable?

There are more people starting to use Samsung DeX, they are going to get blocked by this issue, forced to recompile the software, rather than just directly using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment