Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Resolver errors on Windows with code WSATRY_AGAIN #55050

Open
toothrot opened this issue Sep 13, 2022 · 12 comments
Open

net: Resolver errors on Windows with code WSATRY_AGAIN #55050

toothrot opened this issue Sep 13, 2022 · 12 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@toothrot
Copy link
Contributor

toothrot commented Sep 13, 2022

#!watchflakes
post <- goos == "windows" && `getaddrinfow: This is usually a temporary error`
--- FAIL: TestScript (0.01s)
    --- FAIL: TestScript/mod_proxy_errors (0.07s)
        script_test.go:282: 
            # (2022-09-09T21:03:34Z)
            # Server responses should be truncated to some reasonable number of lines.
            # (For now, exactly eight.) (0.041s)
            > ! go list -m vcs-test.golang.org/auth/ormanylines@latest
            [stderr]
            go: vcs-test.golang.org/auth/ormanylines@latest: unrecognized import path "vcs-test.golang.org/auth/ormanylines": https fetch: Get "https://vcs-test.golang.org/auth/ormanylines?go-get=1": dial tcp: lookup vcs-test.golang.org: getaddrinfow: This is usually a temporary error during hostname resolution and means that the local server did not receive a response from an authoritative server.
            [exit status 1]
            > stderr '\tserver response:\n(.|\n)*\tline 8\n\t\[Truncated: too many lines.\]$'
            FAIL: testdata\script\mod_proxy_errors.txt:10: no match for `(?m)\tserver response:\n(.|\n)*\tline 8\n\t\[Truncated: too many lines.\]$` found in stderr

2022-09-09T20:29:05-54182ff/windows-amd64-longtest

@bcmills
Copy link
Member

bcmills commented Sep 14, 2022

This failure occurred within a cmd/go test, but does not appear to originate in cmd/go.

@bcmills
Copy link
Member

bcmills commented Sep 14, 2022

From what I can tell, the getaddrinfow: prefix in the error message comes from here:
https://cs.opensource.google/go/go/+/master:src/net/lookup_windows.go;l=123;drc=f9c0264107a9a36832d70781fe100cff16917855

Per https://docs.microsoft.com/en-us/windows/win32/api/ws2tcpip/nf-ws2tcpip-getaddrinfoexw, the error text corresponds to the error code WSATRY_AGAIN.

@bcmills bcmills changed the title cmd/go: TestScript/mod_proxy_errors fails on https fetch failure net: Resolver errors on Windows with code WSATRY_AGAIN Sep 14, 2022
@bcmills bcmills added OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 14, 2022
@bcmills bcmills added this to the Backlog milestone Sep 14, 2022
@bcmills
Copy link
Member

bcmills commented Sep 14, 2022

(attn @ianlancetaylor @neild; CC @golang/windows)

I wonder if the net.Resolver implementation should, y'know, “try again” when it receives the error WSATRY_AGAIN..? 😅

@bcmills
Copy link
Member

bcmills commented Sep 14, 2022

This is closely related to #52094 and #52108 (CC @rolandshoemaker).

@beoran
Copy link

beoran commented Sep 16, 2022

A casual look in the source code shows that on Unix platforms EAGAIN is used often (and rightly so)

if err == syscall.EAGAIN && fd.pd.pollable() {
.
It's probably a great idea to handle WSATRY_AGAIN similarly in all windows related code and system calls that might return that error.

@alexbrainman
Copy link
Member

alexbrainman commented Sep 18, 2022

I wonder if the net.Resolver implementation should, y'know, “try again” when it receives the error WSATRY_AGAIN..? 😅

How do you propose we change net.Resolver implementation to try again ?

WSATRY_AGAIN indicates temporary failure in name resolution. Do you suggest we try after 1 second or something? But 1 second might not be long enough.

We can also report temporary failure in name resolution status as part of returned error message. But I am not sure how useful that information even for the net.Resolver user.

go/src/internal/poll/fd_unix.go

This code retries reads from TCP connections. I don't see how that code is relevant to GetAddrInfoExW WSATRY_AGAIN.

Alex

@beoran
Copy link

beoran commented Sep 18, 2022

https://learn.microsoft.com/en-us/search/?terms=WSATRY_AGAIN: My meaning is that this error code is returned by several windows API functions, and should be handled for all of them that Go uses.

@qmuntal
Copy link
Contributor

qmuntal commented Sep 23, 2022

Adding a data point: dotnet/runtime retries once without backoff when calling GetAddrInfoExW and getting WSATRY_AGAIN back (code), but this retry attempt is more related to limitations on GetAddrInfoExW which don't apply to us, because we are using GetAddrInfoW.

See this issue for more context: dotnet/runtime#29935, and specially this comment: dotnet/runtime#29935 (comment).

@bcmills
Copy link
Member

bcmills commented Sep 23, 2022

@alexbrainman

How do you propose we change net.Resolver implementation to try again ?

WSATRY_AGAIN indicates temporary failure in name resolution. Do you suggest we try after 1 second or something? But 1 second might not be long enough.

The usual solution I would reach for is to retry with exponential backoff.

It appears that all of the Resolver methods accept context.Context arguments, so one option might be to continue to retry until one of:

  • the passed-in Context is done
  • GetAddrInfoW succeeds
  • GetAddrInfoW returns an error code other than WSATRY_AGAIN.

But I'm curious what the Unix implementation does in terms of default timeouts and retries. There are two configuration parameters here that seem relevant:
https://cs.opensource.google/go/go/+/master:src/net/dnsconfig.go;l=22-23;drc=d7df872267f9071e678732f9469824d629cac595

@bcmills
Copy link
Member

bcmills commented Sep 23, 2022

It looks like our default on Unix is 5 seconds and 2 attempts, unless the system's resolv.conf states otherwise:
https://cs.opensource.google/go/go/+/master:src/net/dnsconfig_unix.go;l=20-21;drc=d7df872267f9071e678732f9469824d629cac595

So that might be a good starting point, at least. (Ideally the defaults should be factored out to be platform-independent!)

@gopherbot
Copy link

gopherbot commented Sep 28, 2022

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "windows" && `getaddrinfow: This is usually a temporary error`
2022-09-09 20:29 windows-amd64-longtest go@54182ff5 cmd/go.TestScript (log)
go test proxy running at GOPROXY=http://127.0.0.1:54144/mod
--- FAIL: TestScript (0.01s)
    --- FAIL: TestScript/mod_proxy_errors (0.07s)
        script_test.go:282: 
            # (2022-09-09T21:03:34Z)
            # Server responses should be truncated to some reasonable number of lines.
            # (For now, exactly eight.) (0.041s)
            > ! go list -m vcs-test.golang.org/auth/ormanylines@latest
            [stderr]
            go: vcs-test.golang.org/auth/ormanylines@latest: unrecognized import path "vcs-test.golang.org/auth/ormanylines": https fetch: Get "https://vcs-test.golang.org/auth/ormanylines?go-get=1": dial tcp: lookup vcs-test.golang.org: getaddrinfow: This is usually a temporary error during hostname resolution and means that the local server did not receive a response from an authoritative server.
            [exit status 1]
            > stderr '\tserver response:\n(.|\n)*\tline 8\n\t\[Truncated: too many lines.\]$'
            FAIL: testdata\script\mod_proxy_errors.txt:10: no match for `(?m)\tserver response:\n(.|\n)*\tline 8\n\t\[Truncated: too many lines.\]$` found in stderr

watchflakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
Status: No status
Development

No branches or pull requests

6 participants