Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: A lookup performed by a canceled context might affect subsequent lookups #22724

Closed
tt opened this issue Nov 14, 2017 · 16 comments
Closed

net: A lookup performed by a canceled context might affect subsequent lookups #22724

tt opened this issue Nov 14, 2017 · 16 comments
Labels
CherryPickApproved Used during the release process for point releases FrozenDueToAge release-blocker
Milestone

Comments

@tt
Copy link
Contributor

tt commented Nov 14, 2017

What version of Go are you using (go version)?

go version go1.9.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/tt"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.9.2/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.9.2/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/v3/slr36r597rn0rrfzq_my6nv00000gn/T/go-build723069031=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

package main

import (
	"context"
	"fmt"
	"net"
	"testing"
)

func TestLookupIPAddr(t *testing.T) {
	ctx := context.Background()
	canceledCtx, cancel := context.WithCancel(ctx)
	cancel()

	var resolver net.Resolver

	_, err := resolver.LookupIPAddr(canceledCtx, "example.com")
	if fmt.Sprintf("%#v", err) != `&errors.errorString{s:"operation was canceled"}` {
		t.Fatalf("unexpected error: %q", err)
	}

	_, err = resolver.LookupIPAddr(ctx, "example.com")
	if err != nil {
		t.Fatalf("unexpected error: %q", err)
	}
}

(This is a reduced example; it happened to me through the use of net/http.)

What did you expect to see?

The test should produce no output.

Resolving the IP address using a canceled context should fail and the following attempt to resolve the IP address using a non-canceled context should succeed.

What did you see instead?

The test produces the following output:

--- FAIL: TestLookupIPAddr (0.00s)
	lookup_test.go:24: unexpected error: "lookup example.com on 89.233.43.71:53: dial udp 89.233.43.71:53: operation was canceled"
FAIL

That is, the second attempt to resolve the IP address using the non-canceled context returns the result of the previous invocation.

This is happening as parallel lookups for the same hosts only execute once:

go/src/net/lookup.go

Lines 220 to 224 in bb3be40

// lookupGroup merges LookupIPAddr calls together for lookups
// for the same host. The lookupGroup key is is the LookupIPAddr.host
// argument.
// The return values are ([]IPAddr, error).
var lookupGroup singleflight.Group

... and following the first lookup, this goroutine exists:

1 @ 0x101298e 0x100535e 0x1188023 0x117c53d 0x118d9f6 0x1188982 0x118ae0c 0x115fade 0x105ca41
#	0x1188022	net.cgoLookupIP+0x72				/usr/local/Cellar/go/1.9.2/libexec/src/net/cgo_unix.go:212
#	0x117c53c	net.(*Resolver).lookupIP+0x12c			/usr/local/Cellar/go/1.9.2/libexec/src/net/lookup_unix.go:95
#	0x118d9f5	net.(*Resolver).(net.lookupIP)-fm+0x55		/usr/local/Cellar/go/1.9.2/libexec/src/net/lookup.go:187
#	0x1188981	net.glob..func10+0x51				/usr/local/Cellar/go/1.9.2/libexec/src/net/hook.go:19
#	0x118ae0b	net.(*Resolver).LookupIPAddr.func1+0x5b		/usr/local/Cellar/go/1.9.2/libexec/src/net/lookup.go:193
#	0x115fadd	internal/singleflight.(*Group).doCall+0x2d	/usr/local/Cellar/go/1.9.2/libexec/src/internal/singleflight/singleflight.go:93

Sleeping for just a millisecond between the two lookups allow the goroutine to complete and therefore resolves the IP address properly.

(It makes no difference if you re-initialize net.Resolver as the singleflight.Group variable is shared across instances.)

It's easy to fix this specific example (a sequential lookup) by "forgetting" the result if the context is canceled (similar to the solution applied in 77595e4 for #8602). I have a change for this and will submit it.

I do wonder if it would be more correct to never pass the context to the inner lookup; otherwise a goroutine that is canceled or times out will be able to affect other goroutines waiting for that same response.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/77670 mentions this issue: net: Forget lookups for canceled contexts

@bradfitz bradfitz added this to the Go1.10 milestone Nov 17, 2017
@bradfitz bradfitz reopened this Nov 17, 2017
@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label Nov 17, 2017
@rsc
Copy link
Contributor

rsc commented Nov 22, 2017

@bradfitz why did you reopen this? Does the CL not actually fix the problem?

@bradfitz
Copy link
Contributor

It was reverted due to build breakages and code review comments.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/79715 mentions this issue: net: Forget lookups for canceled contexts

@odeke-em
Copy link
Member

odeke-em commented Dec 6, 2017

Ping @tt, please take a look at the feedback that @mikioh posted plus the gentle ping from @bradfitz. Thank you.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.10, Go1.11 Jan 3, 2018
@Ragnaroek
Copy link

Ragnaroek commented Jan 22, 2018

This issue is very serious for us. We have a lot of canceled contexts and are flooded by this 'operation was canceled' error. For us this is a blocker. Can you please consider this to fix in Go 1.10?

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jan 23, 2018

At the moment we don't even have an approved patch for this problem. The earlier attempt to fix it (https://golang.org/cl/77670) had to be rolled back, as described in the comments there. The problem exists in 1.9, so it's not new in 1.10. The simple fix for this--just checking whether the context has been canceled, as is done in https://golang.org/cl/79715 --will re-break #20703. We need a more sophisticated fix.

So I think it is too late to fix this for 1.10. Sorry. I will mark this as a release blocker for 1.11.

@ianlancetaylor
Copy link
Contributor

This is a hard problem. As I just wrote on CL 79715, I see now that the earlier change (CL 45999) is not right. We should not be memoizing the result of a context cancellation. And simply reverting CL 45999 isn't enough. After all, another DNS lookup could arrive between the context cancellation and the lookupGroup.Forget, could get folded into the existing singleflight lookup, and could see the cancellation.

The problem is that singleflight is designed for tasks that either succeed or fail. It is not designed for tasks that get canceled, where the cancelation of one task should not affect another task using the same singleflight key. I think we need to separate the cancelation of the task from the cancelation of the singleflight operation.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/100840 mentions this issue: net: don't let cancelation of a DNS lookup affect another lookup

@ianlancetaylor
Copy link
Contributor

Reopening for 1.10.1

@andybons
Copy link
Member

CL 100840 OK for Go 1.10.1

@andybons andybons added CherryPickApproved Used during the release process for point releases and removed NeedsFix The path to resolution is known, but the work has not been done. labels Mar 27, 2018
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/102787 mentions this issue: [release-branch.go1.10] net: don't let cancelation of a DNS lookup affect another lookup

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/103215 mentions this issue: [release-branch.go1.9] net: don't let cancelation of a DNS lookup affect another lookup

gopherbot pushed a commit that referenced this issue Mar 29, 2018
…fect another lookup

Updates #8602
Updates #20703
Fixes #22724

Change-Id: I27b72311b2c66148c59977361bd3f5101e47b51d
Reviewed-on: https://go-review.googlesource.com/100840
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-on: https://go-review.googlesource.com/102787
Run-TryBot: Andrew Bonventre <andybons@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@ghost
Copy link

ghost commented Apr 2, 2018

@andybons is this has been fixed in 1.10.1?

@odeke-em
Copy link
Member

odeke-em commented Apr 2, 2018

@andybons ama let you finish ;)

@AliGrab yes, please and Go1.10.1 was released last week https://twitter.com/golang/status/979377544196755456, please get it and try again.

@devops-eatigo
Copy link

devops-eatigo commented Jun 12, 2018

@odeke-em Any one can confirm this fix is in the latest golang 1.10.3? how about 1.10.0?

@golang golang locked and limited conversation to collaborators Jun 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CherryPickApproved Used during the release process for point releases FrozenDueToAge release-blocker
Projects
None yet
Development

No branches or pull requests

9 participants