New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: DNS resolves slowly #21906

Closed
mupengX opened this Issue Sep 16, 2017 · 33 comments

Comments

Projects
None yet
@mupengX

mupengX commented Sep 16, 2017

What version of Go are you using (go version)?

go1.8.1 & go1.9

Does this issue reproduce with the latest release?

not found yet

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/XXX/go"
GORACE=""
GOROOT="/home/XXX/local/go1.9/go"
GOTOOLDIR="/home/XXX/local/go1.9/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build967902580=/tmp/go-build"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

when I used httpclient to send post request to a URL, I found it took more time than use python or java in the same environment. go took 6S, python took less than 1S.
After analysis and exclusion, I found it spend too much time on DNS resolve. if I replace domain names with the IP, the request take almost as much time as python&java take.
One more thing, all above is only for a specified domain, I'm not sure if there's other special domain name like this cause this problem.

@davecheney

This comment has been minimized.

Contributor

davecheney commented Sep 16, 2017

@mupengX

This comment has been minimized.

mupengX commented Sep 16, 2017

 // For security's sake, sorry, I can't provide detailed domain names
       url := ""
   // json content
	payload := strings.NewReader("")

	req, _ := http.NewRequest("POST", url, payload)

	req.Header.Add("content-type", "application/json")
	req.Header.Add("cache-control", "no-cache")

	client := &http.Client{
		Timeout:   time.Duration(3000) * time.Millisecond,
	}
	now := time.Now()
	res, err := client.Do(req)
	cost := time.Since(now)
	if err != nil {
		fmt.Println(err)
		return
	}

	defer res.Body.Close()
	body, _ := ioutil.ReadAll(res.Body)

	fmt.Println(string(body))
	fmt.Println(cost.Seconds() * 1e3)

       now = time.Now()
	ns, err := net.LookupHost("DOMAIN")
	cost = time.Since(now)
	fmt.Println(cost.Seconds() * 1e3)
	if err != nil {
		fmt.Printf("Err: %s", err.Error())
		return
	}

	for _, n := range ns {
		fmt.Printf("%s\n", n)
	}

@dsnet dsnet changed the title from DNS resolve slowly to net: DNS resolve slowly Sep 16, 2017

@dsnet dsnet changed the title from net: DNS resolve slowly to net: DNS resolves slowly Sep 16, 2017

@mvdan

This comment has been minimized.

Member

mvdan commented Sep 17, 2017

@mupengX the code you pasted isn't self-contained. Please give an entire program - as small as possible - that reproduces the issue. You can use https://play.golang.org/.

@mupengX

This comment has been minimized.

mupengX commented Sep 18, 2017

i'm sorry for that. The go playground does not allow HTTP requests for security.
This is the complete code:

package main

import (
	"fmt"
	"io/ioutil"
	"net"
	"net/http"
	"strings"
	"time"
)

func main() {

	// For security's sake, sorry, I can't provide detailed domain names
	url := ""
	// json content
	payload := strings.NewReader("")

	req, _ := http.NewRequest("POST", url, payload)

	req.Header.Add("content-type", "application/json")
	req.Header.Add("cache-control", "no-cache")

	client := &http.Client{
		Timeout: time.Duration(3000) * time.Millisecond,
	}
	now := time.Now()
	res, err := client.Do(req)
	cost := time.Since(now)
	if err != nil {
		fmt.Println(err)
		return
	}

	defer res.Body.Close()
	body, _ := ioutil.ReadAll(res.Body)

	fmt.Println(string(body))
	fmt.Println(cost.Seconds() * 1e3)


}
@davecheney

This comment has been minimized.

Contributor

davecheney commented Sep 18, 2017

@mvdan

This comment has been minimized.

Member

mvdan commented Sep 18, 2017

Also wondering how you came to the conclusion that it's Go's fault that the DNS resolves slowly (if it indeed is the DNS lookup that takes a while).

Have you tried other tools, like host?

@nussjustin

This comment has been minimized.

Contributor

nussjustin commented Sep 18, 2017

Have you tried using net/http/httptrace package to trace the request and confirm that the DNS lookup is your problem?

There is also httpstat tool which basically collects all the timings in a request using the net/http/httptrace package and formats them in a pretty way.

You can also set GODEBUG=netdns=2 to see how the DNS request is made (using Go or CGO) and force Go/CGO by using GODEBUG=netdns=go or GODEBUG=netdns=cgo. See this CL for more information on the GODEBUG=netdns configuration.

@mupengX

This comment has been minimized.

mupengX commented Sep 25, 2017

First thanks for your suggestion
@davecheney @mvdan
i calculated the cost of DNS resolve by net.LookupIP , it costed 5S. And if I replace domain names with the IP, the http request take almost as much time as python&java take.
i also used nslookup command, it returned quickly.

@nussjustin
i'm sorry i didn't use net/http/httptrace package to trace this.
i tried this: it costed more time with GODEBUG=netdns=go than with GODEBUG=netdns=cgo

Maybe it is a special case with this domain name. Thanks!

@mupengX

This comment has been minimized.

mupengX commented Sep 26, 2017

@nussjustin
i used httptrace to trace the request, the code like this:

package main

import (
	"fmt"
	"io/ioutil"
	"net"
	"net/http"
	"net/http/httptrace"
	"strings"
	"time"
)

func main() {
	
	dialer := &net.Dialer{Timeout: 7 * time.Second}
	// For security's sake, sorry, I can't provide detailed domain names
        url := ""
       // json content
	payload := strings.NewReader("")

	req, _ := http.NewRequest("POST", url, payload)

	req.Header.Add("content-type", "application/json")
	req.Header.Add("cache-control", "no-cache")
	
	trace := &httptrace.ClientTrace{
		DNSStart: func(dnsInfo httptrace.DNSStartInfo) {
			fmt.Printf("dns start: %v\n", time.Now())
		},
		DNSDone: func(dnsDoneInfo httptrace.DNSDoneInfo) {
			fmt.Printf("dns end: %v\n", time.Now())
		},
	}
	req = req.WithContext(httptrace.WithClientTrace(req.Context(), trace))

	transport := &http.Transport{

		DialContext:         (dialer).DialContext,
		Dial:                (dialer).Dial,
		TLSHandshakeTimeout: 2 * time.Second,
	}
	
	client := &http.Client{
		Transport: transport,
		Timeout:   time.Duration(10000) * time.Millisecond,
	}


	now := time.Now()

	res, err := client.Do(req)

	cost := time.Since(now)
	if err != nil {
		fmt.Println(err)
		return
	}

	defer res.Body.Close()
	body, _ := ioutil.ReadAll(res.Body)

	fmt.Println(string(body))
	fmt.Printf("client cost: %v\n", cost.Seconds()*1e3)


}

and get the result :

dns start: 2017-09-26 15:45:28.148847712 +0800 CST
dns end: 2017-09-26 15:45:33.15817369 +0800 CST
client cost: 5114.887975000001

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Mar 30, 2018

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Mar 30, 2018

@iangudger

This comment has been minimized.

Contributor

iangudger commented Mar 30, 2018

@mupengX, would it be possible for you to try again with tip? The native Go resolver has been mostly rewritten since Go 1.10.

@agnivade

This comment has been minimized.

Member

agnivade commented Aug 13, 2018

@iangudger - Possibly it is still there in 1.10. See #26960

@josharian

This comment has been minimized.

Contributor

josharian commented Aug 13, 2018

5114ms is fairly suspicious. IIRC, the Go DNS resolvers includes 5s timer for refreshing /etc/resolv.conf, or at least it used to.

@iangudger

This comment has been minimized.

Contributor

iangudger commented Aug 13, 2018

@mupengX, sorry if I wasn't clear, I meant please re-test with 1.11.

@mavisd

This comment has been minimized.

mavisd commented Aug 29, 2018

Experiencing very slow DNS lookups only for internal (intranet) addresses on Linux (RHEL 7.5) with Go 1.11. This is not an issue with Go 1.10. Here are some of the relevant details:

Go 1.11

go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/var/lib/jenkins/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/var/lib/jenkins/go"
GOPROXY=""
GORACE=""
GOROOT="/var/lib/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/Go_1.11"
GOTMPDIR=""
GOTOOLDIR="/var/lib/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/Go_1.11/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build608307955=/tmp/go-build -gno-record-gcc-switches"

httpstat https://jira.faa.gov -> DNS lookup: 10045ms (lookup times are always at least 10 seconds)

GODEBUG=netdns=2 httpstat https://jira.faa.gov

go package net: dynamic selection of DNS resolver
go package net: hostLookupOrder(jira.faa.gov) = files,dns

namelookup:10049ms

GODEBUG=netdns=cgo httpstat https://jira.faa.gov -> DNS lookup: 7ms

Go 1.10

go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/var/lib/jenkins/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/var/lib/jenkins/go_1.10"
GORACE=""
GOROOT="/var/lib/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/Go"
GOTMPDIR=""
GOTOOLDIR="/var/lib/jenkins/tools/org.jenkinsci.plugins.golang.GolangInstallation/Go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build266949896=/tmp/go-build -gno-record-gcc-switches"

httpstat https://jira.faa.gov -> DNS lookup: 33ms

GODEBUG=netdns=2 httpstat https://jira.faa.gov

go package net: dynamic selection of DNS resolver
go package net: hostLookupOrder(jira.faa.gov) = files,dns

namelookup:35ms

GODEBUG=netdns=cgo httpstat https://jira.faa.gov -> DNS lookup: 6ms

Non-Go tools

host -v jira.faa.gov

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64067
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;jira.faa.gov.			IN	A

;; ANSWER SECTION:
jira.faa.gov.		7200	IN	A	172.x.x.x

Received 46 bytes from 172.x.x.x#53 in 7 ms
Trying "jira.faa.gov"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57404
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;jira.faa.gov.			IN	AAAA

;; AUTHORITY SECTION:
faa.gov.		0	IN	SOA	<omitted>. <omitted>. 2018082800 10800 1800 604800 1800

Received 81 bytes from 172.x.x.x#53 in 2 ms
Trying "jira.faa.gov"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 13241
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;jira.faa.gov.			IN	MX

;; AUTHORITY SECTION:
faa.gov.		0	IN	SOA	<omitted>. <omitted>. 2018082800 10800 1800 604800 1800

Received 81 bytes from 172.x.x.x#53 in 1 ms
dig jira.faa.gov

; <<>> DiG 9.9.4-RedHat-9.9.4-61.el7 <<>> jira.faa.gov
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 12472
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;jira.faa.gov.			IN	A

;; ANSWER SECTION:
jira.faa.gov.		7200	IN	A	172.x.x.x

;; Query time: 3 msec
;; SERVER: 172.x.x.x#53(172.x.x.x)
;; WHEN: Tue Aug 28 22:00:59 CDT 2018
;; MSG SIZE  rcvd: 57
while true; do dig jira.faa.gov | grep time; sleep 2; done

;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 10 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 2 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec
;; Query time: 1 msec

Notes

  • jira.faa.gov is an internally resolvable (intranet) hostname
  • httpstat is from https://github.com/davecheney/httpstat
  • All internal faa.gov domains are having the same issue when using Go 1.11 (unless cgo lookup is used)
@iangudger

This comment has been minimized.

Contributor

iangudger commented Aug 29, 2018

Is there an entry for jira.faa.gov in your hosts file? Would it be possible for you to collect packet captures with tcpdump and post them here?

@agnivade

This comment has been minimized.

Member

agnivade commented Aug 29, 2018

Aiming for 1.12 since a lot of people lately seem to bump into this.

@mikioh @bradfitz

@agnivade agnivade modified the milestones: Unplanned, Go1.12 Aug 29, 2018

@mavisd

This comment has been minimized.

mavisd commented Aug 29, 2018

There is no entry for jira.faa.gov in the /etc/hosts file (although the file is somewhat large with over 2000 entries). I wanted to note that the DNS lookup times are suspiciously similar each time, always taking just 30-50ms over 10 seconds. It's almost like part of the DNS lookup is timing out at 10 seconds before finally resolving using another lookup. I'll capture the packets with tcpdump and post here shortly.

@mavisd

This comment has been minimized.

mavisd commented Aug 29, 2018

Posting strace below until I can isolate tcpdump capture:

strace -f -e trace=network -s 10000 -r httpstat https://jira.faa.gov
strace: Process 25366 attached
strace: Process 25365 attached
strace: Process 25367 attached
[pid 25364]      0.000000 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
[pid 25366]      0.001671 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 5
[pid 25366]      0.000275 setsockopt(5, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 25366]      0.000494 connect(5, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP A>")}, 16) = 0
[pid 25366]      0.000556 getsockname(5, {sa_family=AF_INET, sin_port=htons(49727), sin_addr=inet_addr("<Private IP B>")}, [16]) = 0
[pid 25366]      0.000180 getpeername(5, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP A>")}, [16]) = 0
[pid 25366]      0.000714 setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 25366]      0.000415 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP A>")}, 16) = 0
[pid 25366]      0.000500 getsockname(3, {sa_family=AF_INET, sin_port=htons(59083), sin_addr=inet_addr("<Private IP B>")}, [16]) = 0
[pid 25366]      0.000274 getpeername(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP A>")}, [16]) = 0
[pid 25366]      0.001471 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
[pid 25366]      0.000515 setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 25366]      0.000572 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP C>")}, 16) = 0
[pid 25366]      0.000468 getsockname(3, {sa_family=AF_INET, sin_port=htons(60955), sin_addr=inet_addr("<Private IP B>")}, [16]) = 0
[pid 25366]      0.000324 getpeername(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP C>")}, [16]) = 0
strace: Process 25368 attached
[pid 25368]      0.002348 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
[pid 25368]      0.000450 setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 25368]      0.000391 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP D>")}, 16) = 0
[pid 25368]      0.000463 getsockname(3, {sa_family=AF_INET, sin_port=htons(33115), sin_addr=inet_addr("<Private IP B>")}, [16]) = 0
[pid 25368]      0.000422 getpeername(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP D>")}, [16]) = 0
[pid 25368]      4.999070 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
[pid 25368]      0.000830 setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 25368]      0.000366 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP A>")}, 16) = 0
[pid 25368]      0.000610 getsockname(3, {sa_family=AF_INET, sin_port=htons(45386), sin_addr=inet_addr("<Private IP B>")}, [16]) = 0
[pid 25368]      0.000294 getpeername(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP A>")}, [16]) = 0
[pid 25364]      0.001998 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
[pid 25364]      0.000770 setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 25364]      0.000657 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP C>")}, 16) = 0
[pid 25364]      0.000819 getsockname(3, {sa_family=AF_INET, sin_port=htons(42528), sin_addr=inet_addr("<Private IP B>")}, [16]) = 0
[pid 25364]      0.000676 getpeername(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP C>")}, [16]) = 0
[pid 25368]      0.002157 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
[pid 25368]      0.000680 setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 25368]      0.000594 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP D>")}, 16) = 0
[pid 25368]      0.000794 getsockname(3, {sa_family=AF_INET, sin_port=htons(35194), sin_addr=inet_addr("<Private IP B>")}, [16]) = 0
[pid 25368]      0.000403 getpeername(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP D>")}, [16]) = 0
[pid 25368]      4.998392 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3
[pid 25368]      0.002491 setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 25368]      0.000810 connect(3, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("<Private IP E>")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 25364]      0.002335 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 25364]      0.000560 getpeername(3, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("<Private IP E>")}, [16]) = 0
[pid 25364]      0.000620 getsockname(3, {sa_family=AF_INET, sin_port=htons(44562), sin_addr=inet_addr("<Private IP B>")}, [16]) = 0
[pid 25364]      0.000654 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0

Connected to <Private IP E>:443
strace: Process 25377 attached

strace: Process 25377 attached

HTTP/1.1 200
<HEADERS OMITTED>

Body discarded

  DNS Lookup   TCP Connection   TLS Handshake   Server Processing   Content Transfer
[  10061ms  |           8ms  |        169ms  |            103ms  |           222ms  ]
            |                |               |                   |                  |
   namelookup:10061ms        |               |                   |                  |
                       connect:10069ms       |                   |                  |
                                   pretransfer:10239ms           |                  |
                                                     starttransfer:10343ms          |
                                                                                total:10565ms  
[pid 25377]      0.504838 +++ exited with 0 +++
[pid 25368]      0.000415 +++ exited with 0 +++
[pid 25367]      0.000261 +++ exited with 0 +++
[pid 25366]      0.000348 +++ exited with 0 +++
[pid 25365]      0.000301 +++ exited with 0 +++
     0.000270 +++ exited with 0 +++
@iangudger

This comment has been minimized.

Contributor

iangudger commented Aug 29, 2018

Would it be possible for you to re-run the strace also logging sendto, recvfrom, read, write, and the full arguments (packet data)? I can probably debug with that.

@mavisd

This comment has been minimized.

mavisd commented Aug 29, 2018

After digging into this a bit more, I believe I've isolated my issue. The /etc/resolv.conf on the Linux machine contains three nameserver entries. After comparing the nameserver entries with the strace output listed above, the last nameserver entry is referenced before trying to open a socket to it (and the syscall takes almost 5 seconds - 4.999070s)

[pid 25368]      0.000422 getpeername(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("<Private IP D>")}, [16]) = 0
[pid 25368]      4.999070 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 3

This happens twice in the strace output for this nameserver. After commenting out this nameserver entry in the /etc/resolv.conf file, the DNS lookup time is reduced to 30-50ms with Go 1.11. Finally, I ran a strace using Go 1.10 after adding back the nameserver entry and it is not used during the DNS lookup. So in my case, the new Go-based DNS lookup algorithm in 1.11 tries to contact all three nameservers listed in /etc/resolv.conf; whereas, in 1.10 it only contacted the first nameserver listed.

Note: the last nameserver entry is not responding when using host to perform DNS lookup either

@mavisd

This comment has been minimized.

mavisd commented Aug 29, 2018

Is this the expected behavior for Go 1.11 on Linux (trying to resolve DNS lookups using all the nameservers listed in /etc/resolv.conf)?

@iangudger

This comment has been minimized.

Contributor

iangudger commented Aug 29, 2018

I don't think so. I think the behavior in this regard is supposed to be the same as 1.10.

@mavisd

This comment has been minimized.

mavisd commented Aug 30, 2018

I believe I've found a way to reproduce the issue using a public domain name. The /etc/resolv.conf file on the Linux machine looked like this when the 10+ second DNS lookup times were happening with Go 1.11 (and not with Go 1.10) with subdomains ending in faa.gov:

domain amc.faa.gov
search amc.faa.gov act.faa.gov ad.faa.gov faa.gov jccbi.gov 
nameserver 172.25.84.100
nameserver 172.25.116.100
nameserver 172.19.50.27

The last nameserver 172.19.50.27 entry was an old entry that no longer resolved to an actual machine. Public domains like www.google.com or www.yahoo.com were resolving within tens of milliseconds even with this configuration. However, if I add, for example, yahoo.com to the search line, now Go 1.11 takes 10+ seconds to resolve sports.yahoo.com (trying all nameserver entries); whereas, Go 1.10 resolves within tens of milliseconds with this configuration (only using the first nameserver entry). Also, host and dig both resolve sports.yahoo.com with a few milliseconds using this configuration. It appears this configuration causes Go-based DNS lookup on 1.11 to try all nameservers listed in the /etc/resolv.conf; whereas, Go-based DNS lookup on 1.10 and standard command line tools only use the first nameserver entry.

@iangudger

This comment has been minimized.

Contributor

iangudger commented Aug 30, 2018

First, @mvdan's bug seems to be a regression and different from the original bug.

I think maybe the issue is that you are getting an empty answers section for the IPv6 lookup and that the new DNS client in 1.11 treats this as an error and then goes on to the next DNS server.

@mavisd

This comment has been minimized.

mavisd commented Aug 30, 2018

You are correct, the DNS nameservers used are returning empty answers for IPv6 lookups (including for sports.yahoo.com). If this behavior is expected, then my issue is not an actual issue. Thanks for your help.

@iangudger

This comment has been minimized.

Contributor

iangudger commented Aug 30, 2018

Returning an empty answer section is allowed. I will try to put together a fix.

@lrita

This comment has been minimized.

lrita commented Sep 1, 2018

In my service, the net.Resolver stalling 5s+ occurred once every 150 queries. This service is built in go1.11 .

The AAAA reponse alway this:

@iangudger

This comment has been minimized.

Contributor

iangudger commented Sep 6, 2018

I have filed #27525 to track this regression.

@gopherbot

This comment has been minimized.

gopherbot commented Sep 13, 2018

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@gopherbot gopherbot closed this Sep 13, 2018

@agnivade

This comment has been minimized.

Member

agnivade commented Sep 13, 2018

@iangudger - Does #27525 supercede this ? Or should I reopen this ?

@iangudger

This comment has been minimized.

Contributor

iangudger commented Sep 13, 2018

#27525 only supersedes the new issue here. The original issue timed out waiting for info. I think the current state is correct.

@mupengX

This comment has been minimized.

mupengX commented Sep 18, 2018

@iangudger
i'm sorry for my late reply and it's hard to reappear the scene at that time, maybe it's a specific case. if i meet it again i will provide more information.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment