Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: net/http: DNS lookup timeouts could return net.DNSError instead of poll.TimeoutError #39178

Open
mhale opened this issue May 20, 2020 · 0 comments
Labels
Projects
Milestone

Comments

@mhale
Copy link

@mhale mhale commented May 20, 2020

I get DNS lookup timeouts many times a day when running Go HTTP clients on Kubernetes, where UDP packets are sometimes dropped due to a race condition in the Linux kernel. I'd like to propose a small standard library change to help debug these scenarios.

What version of Go are you using (go version)?

$ go version
go version go1.14.3 linux/amd64

Does this issue reproduce with the latest release?

Yes. This occurs in Go 1.14.[1-3] and version devel +c88f6989e1 Tue May 19 04:10:43 2020 +0000 linux/amd64.

What operating system and processor architecture are you using (go env)?

It seems to apply to all platforms. I'm experiencing it most often on Kubernetes on Linux but the executable is compiled on a regular Linux server.

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/mhale/.cache/go-build"
GOENV="/home/mhale/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/mhale/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build197760953=/tmp/go-build -gno-record-gcc-switches"

What did you do?

For any http.Client use with a timeout set on the transport, when the timeout is exceeded and an error is returned it can be challenging to discover the cause of the timeout from the error message. Is it DNS related or TCP related or something else? Consider the following simulation of a DNS lookup that does not return within the timeout.

DNS lookup timeout simulator code
package main
  
import (
    "context"
    "net"
    "net/http"
    "time"
)

var timeout = 50 * time.Millisecond
var host = "http://example.com/"

// delayedDialer introduces a delay when performing DNS lookups.
func delayedDialer(ctx context.Context, network, address string) (net.Conn, error) {
    time.Sleep(2 * timeout)
    d := net.Dialer{}
    return d.DialContext(ctx, network, address)
}

func main() {
    r := &net.Resolver{
        PreferGo: true,
        Dial:     delayedDialer,
    }
    tr := &http.Transport{
        DialContext: (&net.Dialer{
            Timeout:  timeout,
            Resolver: r,
        }).DialContext,
    }
    client := &http.Client{Transport: tr}
    _, err := client.Get(host)
    if err != nil {
        panic(err)
    }
}

In the event of a DNS lookup timeout, the output is:

panic: Get "http://example.com/": dial tcp: i/o timeout

In the event of a TCP connection timeout (comment out the time.Sleep() call) the output is:

panic: Get "http://example.com/": dial tcp 93.184.216.34:80: i/o timeout

The missing IP address in the DNS lookup timeout output is the only clue that the root cause is a DNS problem, rather than a TCP problem. If users do not expect to see an IP address in that error message, they won't know it is missing, and their debugging and testing effort may be wasted investigating possible TCP-related causes, instead of investigating DNS (which normally runs over UDP).

I think the error message could communicate the root cause to users more clearly.

What did you expect to see?

panic: Get "http://example.com/": dial tcp: lookup example.com: i/o timeout

What did you see instead?

panic: Get "http://example.com/": dial tcp: i/o timeout

Proposed solution

After a DNS lookup timeout, the error returned by client.Get() looks like this in Go 1.14.3:

&url.Error{
  Op:  "Get",
  URL: "http://example.com/",
  Err: &net.OpError{
    Op:     "dial",
    Net:    "tcp",
    Source: nil,
    Addr:   nil,
    Err:    &poll.TimeoutError{},
  },
}

Note: The &poll.TimeoutError{} will become &net.timeoutError{} in future releases after commit
d422f54.

The resolver in net/lookup.go is the source of the *poll.TimeoutError. Instead it could return a *net.DNSError error when a lookup times out. Doing so would provide additional error context for all use cases (not only http.Client) and may save debugging time. For example client.Get() could return:

&url.Error{
  Op:  "Get",
  URL: "http://example.com/",
  Err: &net.OpError{
    Op:     "dial",
    Net:    "tcp",
    Source: nil,
    Addr:   nil,
    Err:    &net.DNSError{
      Err:         "i/o timeout",
      Name:        "example.com",
      Server:      "",
      IsTimeout:   true,
      IsTemporary: false,
      IsNotFound:  false,
    },
  },
}

This would preserve the existing "i/o timeout" string and the ability to call err.Timeout(), while adding "lookup example.com: " to the output and the capability to explicitly check for a *net.DNSError. The output from my simulator would also be what I'm expecting to see:

panic: Get "http://example.com/": dial tcp: lookup example.com: i/o timeout

The following change to implement this works in my usage and passes the tests in all.bash.

git diff Output
$ git diff
diff --git a/src/net/lookup.go b/src/net/lookup.go
index 5f7119872a..b1a22dd4e6 100644
--- a/src/net/lookup.go
+++ b/src/net/lookup.go
@@ -314,6 +314,9 @@ func (r *Resolver) lookupIPAddr(ctx context.Context, network, host string) ([]IP
                        }()
                }
                err := mapErr(ctx.Err())
+               if t, ok := err.(timeout); ok && t.Timeout() {
+                       err = &DNSError{Err: err.Error(), Name: host, IsTimeout: true}
+               }
                if trace != nil && trace.DNSDone != nil {
                        trace.DNSDone(nil, false, err)
                }

I can submit a pull request if anyone thinks that this is a good idea and implementation.

@odeke-em odeke-em changed the title net/http: DNS lookup timeouts could return net.DNSError instead of poll.TimeoutError proposal: net/http: DNS lookup timeouts could return net.DNSError instead of poll.TimeoutError May 23, 2020
@gopherbot gopherbot added this to the Proposal milestone May 23, 2020
@gopherbot gopherbot added the Proposal label May 23, 2020
@rsc rsc added this to Incoming in Proposals Jun 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Incoming
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.