-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: A lookup performed by a canceled context might affect subsequent lookups #22724
Comments
Change https://golang.org/cl/77670 mentions this issue: |
@bradfitz why did you reopen this? Does the CL not actually fix the problem? |
It was reverted due to build breakages and code review comments. |
Change https://golang.org/cl/79715 mentions this issue: |
This issue is very serious for us. We have a lot of canceled contexts and are flooded by this 'operation was canceled' error. For us this is a blocker. Can you please consider this to fix in Go 1.10? |
At the moment we don't even have an approved patch for this problem. The earlier attempt to fix it (https://golang.org/cl/77670) had to be rolled back, as described in the comments there. The problem exists in 1.9, so it's not new in 1.10. The simple fix for this--just checking whether the context has been canceled, as is done in https://golang.org/cl/79715 --will re-break #20703. We need a more sophisticated fix. So I think it is too late to fix this for 1.10. Sorry. I will mark this as a release blocker for 1.11. |
This is a hard problem. As I just wrote on CL 79715, I see now that the earlier change (CL 45999) is not right. We should not be memoizing the result of a context cancellation. And simply reverting CL 45999 isn't enough. After all, another DNS lookup could arrive between the context cancellation and the lookupGroup.Forget, could get folded into the existing singleflight lookup, and could see the cancellation. The problem is that singleflight is designed for tasks that either succeed or fail. It is not designed for tasks that get canceled, where the cancelation of one task should not affect another task using the same singleflight key. I think we need to separate the cancelation of the task from the cancelation of the singleflight operation. |
Change https://golang.org/cl/100840 mentions this issue: |
Reopening for 1.10.1 |
CL 100840 OK for Go 1.10.1 |
Change https://golang.org/cl/102787 mentions this issue: |
Change https://golang.org/cl/103215 mentions this issue: |
…fect another lookup Updates #8602 Updates #20703 Fixes #22724 Change-Id: I27b72311b2c66148c59977361bd3f5101e47b51d Reviewed-on: https://go-review.googlesource.com/100840 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-on: https://go-review.googlesource.com/102787 Run-TryBot: Andrew Bonventre <andybons@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
@andybons is this has been fixed in |
@andybons ama let you finish ;) @AliGrab yes, please and Go1.10.1 was released last week https://twitter.com/golang/status/979377544196755456, please get it and try again. |
@odeke-em Any one can confirm this fix is in the latest golang 1.10.3? how about 1.10.0? |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?What did you do?
(This is a reduced example; it happened to me through the use of
net/http
.)What did you expect to see?
The test should produce no output.
Resolving the IP address using a canceled context should fail and the following attempt to resolve the IP address using a non-canceled context should succeed.
What did you see instead?
The test produces the following output:
That is, the second attempt to resolve the IP address using the non-canceled context returns the result of the previous invocation.
This is happening as parallel lookups for the same hosts only execute once:
go/src/net/lookup.go
Lines 220 to 224 in bb3be40
... and following the first lookup, this goroutine exists:
Sleeping for just a millisecond between the two lookups allow the goroutine to complete and therefore resolves the IP address properly.
(It makes no difference if you re-initialize
net.Resolver
as thesingleflight.Group
variable is shared across instances.)It's easy to fix this specific example (a sequential lookup) by "forgetting" the result if the context is canceled (similar to the solution applied in 77595e4 for #8602). I have a change for this and will submit it.
I do wonder if it would be more correct to never pass the context to the inner lookup; otherwise a goroutine that is canceled or times out will be able to affect other goroutines waiting for that same response.
The text was updated successfully, but these errors were encountered: