Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: A lookup performed by a canceled context might affect subsequent lookups #22724

Closed
tt opened this issue Nov 14, 2017 · 16 comments

Comments

Projects
None yet
10 participants
@tt
Copy link
Contributor

commented Nov 14, 2017

What version of Go are you using (go version)?

go version go1.9.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/tt"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.9.2/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.9.2/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/v3/slr36r597rn0rrfzq_my6nv00000gn/T/go-build723069031=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

package main

import (
	"context"
	"fmt"
	"net"
	"testing"
)

func TestLookupIPAddr(t *testing.T) {
	ctx := context.Background()
	canceledCtx, cancel := context.WithCancel(ctx)
	cancel()

	var resolver net.Resolver

	_, err := resolver.LookupIPAddr(canceledCtx, "example.com")
	if fmt.Sprintf("%#v", err) != `&errors.errorString{s:"operation was canceled"}` {
		t.Fatalf("unexpected error: %q", err)
	}

	_, err = resolver.LookupIPAddr(ctx, "example.com")
	if err != nil {
		t.Fatalf("unexpected error: %q", err)
	}
}

(This is a reduced example; it happened to me through the use of net/http.)

What did you expect to see?

The test should produce no output.

Resolving the IP address using a canceled context should fail and the following attempt to resolve the IP address using a non-canceled context should succeed.

What did you see instead?

The test produces the following output:

--- FAIL: TestLookupIPAddr (0.00s)
	lookup_test.go:24: unexpected error: "lookup example.com on 89.233.43.71:53: dial udp 89.233.43.71:53: operation was canceled"
FAIL

That is, the second attempt to resolve the IP address using the non-canceled context returns the result of the previous invocation.

This is happening as parallel lookups for the same hosts only execute once:

go/src/net/lookup.go

Lines 220 to 224 in bb3be40

// lookupGroup merges LookupIPAddr calls together for lookups
// for the same host. The lookupGroup key is is the LookupIPAddr.host
// argument.
// The return values are ([]IPAddr, error).
var lookupGroup singleflight.Group

... and following the first lookup, this goroutine exists:

1 @ 0x101298e 0x100535e 0x1188023 0x117c53d 0x118d9f6 0x1188982 0x118ae0c 0x115fade 0x105ca41
#	0x1188022	net.cgoLookupIP+0x72				/usr/local/Cellar/go/1.9.2/libexec/src/net/cgo_unix.go:212
#	0x117c53c	net.(*Resolver).lookupIP+0x12c			/usr/local/Cellar/go/1.9.2/libexec/src/net/lookup_unix.go:95
#	0x118d9f5	net.(*Resolver).(net.lookupIP)-fm+0x55		/usr/local/Cellar/go/1.9.2/libexec/src/net/lookup.go:187
#	0x1188981	net.glob..func10+0x51				/usr/local/Cellar/go/1.9.2/libexec/src/net/hook.go:19
#	0x118ae0b	net.(*Resolver).LookupIPAddr.func1+0x5b		/usr/local/Cellar/go/1.9.2/libexec/src/net/lookup.go:193
#	0x115fadd	internal/singleflight.(*Group).doCall+0x2d	/usr/local/Cellar/go/1.9.2/libexec/src/internal/singleflight/singleflight.go:93

Sleeping for just a millisecond between the two lookups allow the goroutine to complete and therefore resolves the IP address properly.

(It makes no difference if you re-initialize net.Resolver as the singleflight.Group variable is shared across instances.)

It's easy to fix this specific example (a sequential lookup) by "forgetting" the result if the context is canceled (similar to the solution applied in 77595e4 for #8602). I have a change for this and will submit it.

I do wonder if it would be more correct to never pass the context to the inner lookup; otherwise a goroutine that is canceled or times out will be able to affect other goroutines waiting for that same response.

@gopherbot

This comment has been minimized.

Copy link

commented Nov 14, 2017

Change https://golang.org/cl/77670 mentions this issue: net: Forget lookups for canceled contexts

@gopherbot gopherbot closed this in 6a3d4be Nov 15, 2017

@bradfitz bradfitz added this to the Go1.10 milestone Nov 17, 2017

@bradfitz bradfitz reopened this Nov 17, 2017

@bradfitz bradfitz added the NeedsFix label Nov 17, 2017

@rsc

This comment has been minimized.

Copy link
Contributor

commented Nov 22, 2017

@bradfitz why did you reopen this? Does the CL not actually fix the problem?

@bradfitz

This comment has been minimized.

Copy link
Member

commented Nov 22, 2017

It was reverted due to build breakages and code review comments.

@gopherbot

This comment has been minimized.

Copy link

commented Nov 23, 2017

Change https://golang.org/cl/79715 mentions this issue: net: Forget lookups for canceled contexts

@odeke-em

This comment has been minimized.

Copy link
Member

commented Dec 6, 2017

Ping @tt, please take a look at the feedback that @mikioh posted plus the gentle ping from @bradfitz. Thank you.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.10, Go1.11 Jan 3, 2018

@Ragnaroek

This comment has been minimized.

Copy link

commented Jan 22, 2018

This issue is very serious for us. We have a lot of canceled contexts and are flooded by this 'operation was canceled' error. For us this is a blocker. Can you please consider this to fix in Go 1.10?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Jan 23, 2018

At the moment we don't even have an approved patch for this problem. The earlier attempt to fix it (https://golang.org/cl/77670) had to be rolled back, as described in the comments there. The problem exists in 1.9, so it's not new in 1.10. The simple fix for this--just checking whether the context has been canceled, as is done in https://golang.org/cl/79715 --will re-break #20703. We need a more sophisticated fix.

So I think it is too late to fix this for 1.10. Sorry. I will mark this as a release blocker for 1.11.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 15, 2018

This is a hard problem. As I just wrote on CL 79715, I see now that the earlier change (CL 45999) is not right. We should not be memoizing the result of a context cancellation. And simply reverting CL 45999 isn't enough. After all, another DNS lookup could arrive between the context cancellation and the lookupGroup.Forget, could get folded into the existing singleflight lookup, and could see the cancellation.

The problem is that singleflight is designed for tasks that either succeed or fail. It is not designed for tasks that get canceled, where the cancelation of one task should not affect another task using the same singleflight key. I think we need to separate the cancelation of the task from the cancelation of the singleflight operation.

@gopherbot

This comment has been minimized.

Copy link

commented Mar 15, 2018

Change https://golang.org/cl/100840 mentions this issue: net: don't let cancelation of a DNS lookup affect another lookup

@gopherbot gopherbot closed this in bd85943 Mar 16, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 16, 2018

Reopening for 1.10.1

@andybons

This comment has been minimized.

Copy link
Member

commented Mar 27, 2018

CL 100840 OK for Go 1.10.1

@gopherbot

This comment has been minimized.

Copy link

commented Mar 27, 2018

Change https://golang.org/cl/102787 mentions this issue: [release-branch.go1.10] net: don't let cancelation of a DNS lookup affect another lookup

@gopherbot

This comment has been minimized.

Copy link

commented Mar 28, 2018

Change https://golang.org/cl/103215 mentions this issue: [release-branch.go1.9] net: don't let cancelation of a DNS lookup affect another lookup

gopherbot pushed a commit that referenced this issue Mar 29, 2018

[release-branch.go1.10] net: don't let cancelation of a DNS lookup af…
…fect another lookup

Updates #8602
Updates #20703
Fixes #22724

Change-Id: I27b72311b2c66148c59977361bd3f5101e47b51d
Reviewed-on: https://go-review.googlesource.com/100840
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-on: https://go-review.googlesource.com/102787
Run-TryBot: Andrew Bonventre <andybons@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>

@andybons andybons closed this Mar 29, 2018

@AliGrab

This comment has been minimized.

Copy link

commented Apr 2, 2018

@andybons is this has been fixed in 1.10.1?

@odeke-em

This comment has been minimized.

Copy link
Member

commented Apr 2, 2018

@andybons ama let you finish ;)

@AliGrab yes, please and Go1.10.1 was released last week https://twitter.com/golang/status/979377544196755456, please get it and try again.

@devops-eatigo

This comment has been minimized.

Copy link

commented Jun 12, 2018

@odeke-em Any one can confirm this fix is in the latest golang 1.10.3? how about 1.10.0?

erdema-ft added a commit to Financial-Times/concept-rw-elasticsearch that referenced this issue Jun 15, 2018

geoah added a commit to Financial-Times/concept-rw-elasticsearch that referenced this issue Jun 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.