-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: for/range with iterator results in program that never terminates #69434
Comments
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
When I run this program it spends a lot of time in refreshCounter trying to count to three hundred billion: for i := 0; i < pr.n-1; i++ {
pr.bitmask &= rand.Uint64()
} I don't think the problem is in the compiler or runtime. |
@adonovan Thanks for taking a look. Good sleuthing. I agree with the symptom that you uncovered, but continue to suspect something fishy is going on here. I sincerely wish I had a simpler reproduction of this, but well, I assume that this new iterator feature is not blatantly broken... And so a simpler reproduction may not be in the cards. Anyhow, based on your clue, here is some possible firmer evidence that something is amiss: I've added a // check panics if pr.n is not the expected value
func (pr *ProbabilisticSkipper) check(n int) {
if pr.n != n {
panic(fmt.Sprintf("check: pr.n != n %d != %d", pr.n, n))
}
} If I place a So, the loop now looks like this: wordSkipper.check(rounds) // No panic
for word := range wordReader.All() {
wordSkipper.check(rounds) // Panic always occurs here
// ... all other loop logic...
wordSkipper.check(rounds) // Never panics here
} So the mystery for me is, why is this struct member value seemingly "randomly" changing between iterations of the "iterator" loop (but not the "pull" loop)? And why not for debug builds? My iterator function doesn't have anything to do with this Here's the new reproduction code with the above changes: https://go.dev/play/p/iEMVpdhy3go Thanks again for taking a look. I've been programming for along time, so I've been humbled enough times that I'm certainly not beyond considering that there's some subtle issue with my code. But I have spent a bunch of time looking at this now, and boy is it not obvious what's going on here. Like I said above, this feels super racy no me, but I'm not doing anything overtly concurrent... |
This does look like a compiler bug to me. The variable |
It's likely a compiler issue, but it's not the issue of range-over-func blocks forever, instead, it's caused by the generated code doesn't handle the parameter properly, which triggers a long-time running for-loop inside users code. I tried to clear the code in this go playground. Probably you can focus on this code snippet: if len(words) >= memorySize {
rounds++ // generated store ssa instruction doesn't take effect, rounds value is wrong as 309237645415
// fmt.Println(rounds) // generated load ssa instruction take effect
wordSkipper = NewProbabilisticSkipper(rounds)
for w := range words {
if roundRemover.ShouldSkip() {
delete(words, w)
}
}
}
rounds++ // generated store ssa instruction doesn't take effect, ...
// fmt.Println(rounds) // generated load ssa instruction take effect
// rounds++ // generated store ssa instruction doesn't take effect, ...
fmt.Println(rounds) // generated load ssa instruction take effect
rand.Intn(rounds) // 3. cannot work, used to check whether std lib matters
load(rounds) // 4. cannot work, when it's custom function
readInterface(rounds) // 5. work, which receives an interface
func readInterface(r interface{}) {
fmt.Println(r)
}
func load(r int) {
fmt.Println(r)
_ = r
} The output is shown below when I use
Looks like the compiler hasn't passed the allocation to the function call properly. Besides, currently if there is a load instruction and the parameter is passed as an interface before function call it works well. But the store instruction doesn't take effects. Not sure the fix is to respect the store instruction here as well. |
(Apologies for my dismissive response. If Keith says it's a compiler bug, it's a compiler bug.) |
Change https://go.dev/cl/614096 mentions this issue: |
This issue does not need a backport, #69511 will do the job. |
Change https://go.dev/cl/614195 mentions this issue: |
…ngefunc CL 584596 "-range<N>" suffix to the name of closure generated for a rangefunc loop body. However, this breaks the condition that escape analysis uses for checking whether a closure contains within function, which is "F.funcN" for outer function "F" and closure "funcN". Fixing this by adding new "-rangeN" to the condition. Updates #69434 Fixes #69511 Change-Id: I411de8f63b69a6514a9e9504d49d62e00ce4115d Reviewed-on: https://go-review.googlesource.com/c/go/+/614096 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-on: https://go-review.googlesource.com/c/go/+/614195
Change https://go.dev/cl/615915 mentions this issue: |
Updates #69434 Change-Id: I780c5ed63561eb8fa998bb0e6cdc77a904ff29c8 Reviewed-on: https://go-review.googlesource.com/c/go/+/615915 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Change https://go.dev/cl/616698 mentions this issue: |
CL 615915 simplified test for issue 69434, using gcflags maymorestack to force stack moving, making program failed with invalid stack pointer. However, it seems that this maymorestack is broken on riscv64. At least gotip-linux-riscv64 is currently broken. This CL fixes this problem by using the initial approach, growing stack size big enough to force stack moving. Updates #69434 Fixes #69714 Cq-Include-Trybots: luci.golang.try:gotip-linux-riscv64 Change-Id: I95255fba884a200f75bcda34d58e9717e4a952ad Reviewed-on: https://go-review.googlesource.com/c/go/+/616698 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com>
Go version
go version go1.23.1 darwin/amd64
Output of
go env
in your module/workspace:What did you do?
Modified an existing "pull" style
io.Reader
loop to instead use a go 1.23 iterator, with no other changes to the following logic in the loop body.So this:
Became this:
Where new
wordReader.All()
iterator implementation is based onwordReader.Read()
:A full reproduction including both the original "pull" and new "iterator" versions of the loop is available here:
https://go.dev/play/p/ENULYowKCLS
What did you see happen?
Executing with the new range/iterator loop does not terminate. In the go playground link above this manifests as the original "pull" style loop completing immediately (in
EstimateUniqueWordsPull()
) , and then the program times-out the playground while runningEstimateUniqueWordsIter()
.If run locally as a CLI, the program does not terminate.
Note! Frustratingly there are many cases that rescue this non-termination behavior. Running under a debug build or using the race detector works as expected.
As do even seemingly trivial changes to the code, e.g. in the Playground repro:
// fmt.Println("Rounds", rounds) // ######### Uncomment line to rescue
Simply adding a single Println reverts to the expected behavior. This feels very racy to me, but there are no goroutine calls in the reproducing code, so this is a pretty clear indication that something bad is happening out of sight.
What did you expect to see?
I expect the two loops documented above as implemented in
EstimateUniqueWordsPull()
andEstimateUniqueWordsIter()
to behave essentially identically, each running nearly instantaneously on a small test input, and the program should return a result for each and properly terminate.Note, in the reproduction code in the playground, it would be normal for either of the above functions to return values resulting in either the "PASS" case or the "expected X, got Y" mismatch. This code implements a probabilistic counting estimation algorithm that uses randomness, as described here:
https://www.quantamagazine.org/computer-scientists-invent-an-efficient-new-way-to-count-20240516/
The text was updated successfully, but these errors were encountered: