Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: for range loop reading past slice end #40367

Closed
greyhu opened this issue Jul 23, 2020 · 25 comments
Closed

cmd/compile: for range loop reading past slice end #40367

greyhu opened this issue Jul 23, 2020 · 25 comments

Comments

@greyhu
Copy link

@greyhu greyhu commented Jul 23, 2020

What version of Go are you using (go version)?

1.13.5,1.14.6

Does this issue reproduce with the latest release?

yes!

What operating system and processor architecture are you using (go env)?

linux,amd64

What did you do?

try this in playground

package main
import "fmt"
func main() {
        rates:=[]int32{1,2,3,4,5,6}
	for star, rate := range rates {
		if star+1 < 1 {
			panic("")
		}

		fmt.Println(star, rate)
	}
}

What did you expect to see?

0,1
1,2
2,3
3,4
4,5
5,6

What did you see instead?

timeout running program
0 1
1 2
2 3
3 4
4 5
5 6
6 180344
7 192
8 4846592
9 0
10 180424
11 192
12 4846720
13 0
14 5822560
15 192
16 438224
17 192
18 4396406
19 0
20 385024
.....

@ALTree ALTree changed the title for range with panic cause error in linux cmd/compile: for range loop reading past slice end Jul 23, 2020
@ALTree
Copy link
Member

@ALTree ALTree commented Jul 23, 2020

Thanks for the report. I can reproduce this on tip, too.

cc @randall77 @cherrymui

@ALTree
Copy link
Member

@ALTree ALTree commented Jul 23, 2020

(Tentatively labelling as a 1.15 release blocker, feel free to change)

@feilengcui008
Copy link
Contributor

@feilengcui008 feilengcui008 commented Jul 23, 2020

it seems that changing the line if star+1 < 1 { to if start+2 < 2 { (or other number not equal to 1) will work correctly.

@marigonzes
Copy link

@marigonzes marigonzes commented Jul 23, 2020

This issue appears to have been introduced a while ago. According to bisection, commit b812eec seems to be the most likely culprit

@fancl20
Copy link

@fancl20 fancl20 commented Jul 23, 2020

			// Check for i2 < max or max > i2.
 			var max *Value
 			if r == lt && pred.Control.Args[0] == i2 {
 				max = pred.Control.Args[1]
 			} else if r == gt && pred.Control.Args[1] == i2 {
 				max = pred.Control.Args[0]
 			} else {
 				continue
 			}

  			// Check condition 4 now that we have a
 			// candidate max. For this we can query the
 			// fact table. We "prove" min < max by showing
 			// that min >= max is unsat. (This may simply
 			// compare two constants; that's fine.)
 			ft.checkpoint()
 			ft.update(b, min, max, tr.d, gt|eq)
 			proved := ft.unsat
 			ft.restore()

  			if proved {
 				// We know that min <= i1 < max.
 				if b.Func.pass.debug > 0 {
 					printIndVar(b, i1, min, max, 1, 0)
 				}
 				ft.update(b, min, i1, tr.d, lt|eq)
 				ft.update(b, i1, max, tr.d, lt)
 			}

Here the condition is i2 < max and min < max. What we (think we) prove is min <= i1 < max.
The problem is when we put these together,

   i2 < max
-> i1 + 1 < max
-> i1 - min + 1 < max - min && i1 - min >= 0
-> 1 < max - min
-> min + 1 < max

min < max is actually not sufficient.

Here is another way to reproduce (similar to the test in that commit)

package main

func main() {
	i := 0
top:
	if i < 2 {
		i++
		if i < 1 {
			return
		}
		println(i)
		goto top
	}
}
@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Jul 23, 2020

Wow, thanks for reporting. Yeah, as you mentioned, this is related to the prove pass. Running with -gcflags=-d=ssa/prove/off gets the correct result.

cc @dr2chase

@toothrot
Copy link
Contributor

@toothrot toothrot commented Jul 23, 2020

Moving the milestone to 1.16, as this issue, while significant, is old enough not to block the 1.15 release at this point in the cycle. Please let me know if I am wrong.

@toothrot toothrot modified the milestones: Go1.15, Go1.16 Jul 23, 2020
@zdjones
Copy link
Contributor

@zdjones zdjones commented Jul 23, 2020

I'm going to take a crack at patching this in the morning (if noone else has gotten to it by then). It is late here, but my completely unconfirmed guess is that this loop has been compiled to OFORUNTIL (placing the increment and condition after the body) and func addLocalInductiveFacts, which is then trying to infer the loop induction variable by finding the OFORUNTIL pattern, is confusing the star +1 < 1 check with the loop's increment and condition. This would explain why star + 2 < 2 doesn't have the same effect, because any increment other then +1 will not be confused with the loop increment.

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 24, 2020

Change https://golang.org/cl/244579 mentions this issue: cmd/compile: don't add inductive fact if phi block have no direct edge to control block

@bnkamalesh
Copy link

@bnkamalesh bnkamalesh commented Jul 25, 2020

Interesting, reproducible for all primitive number types as well as for boolean slice.

EDIT: I just checked a few more, here's what I got, might be useful.

https://play.golang.org/p/EuZTWueokp8

package main

import "fmt"

func main() {
	
	// following ones work fine
	rates := []int{1}
	// rates := make([]int32, 5, 5) 
	// rates := make([]int32, 5)
	
	// following ones break
	// rates := make([]int32, 1, 5)
	// rates := make([]int32, 0, 5)
	// rates := make([]int32, 4, 5)
	
	rates = append(rates, 2, 3, 4)
	
	for star, rate := range rates {
		if star+1 < 1 {
			panic("")
		}

		fmt.Println(star, rate)
	}
}
@networkimprov
Copy link

@networkimprov networkimprov commented Jul 28, 2020

This has a small fix, and pretty terrible consequences. Shouldn't it be 1.15?

And also needs 1.13 & 1.14 backport...

cc @ianlancetaylor

@randall77
Copy link
Contributor

@randall77 randall77 commented Jul 28, 2020

This would be really nice to fix for 1.15. As it has been broken for several releases, our policy is to not block a release for it.
Hence the 1.16 milestone.
There's no hurry for backport issues. If/when we have a fix that looks good, and looks backportable, we can open backport issues.

@networkimprov
Copy link

@networkimprov networkimprov commented Jul 28, 2020

Could you consider reverting the commit that causes it in 1.15?

And backporting that since both 1.13 & 1.14 are affected?

@randall77
Copy link
Contributor

@randall77 randall77 commented Jul 28, 2020

Sure, that's a possibility. It's a pretty old CL, though, it might not revert cleanly.
We could roll forward at tip but revert in 1.13 and 1.14. Options are available.

@networkimprov
Copy link

@networkimprov networkimprov commented Jul 28, 2020

In case it wasn't clear, I meant revert it before the 1.15 release.

@zdjones
Copy link
Contributor

@zdjones zdjones commented Jul 28, 2020

Summary

Generic CSE is turning this code into a pattern that prove incorrectly identifies as OFORUNTIL. Prove adds the OFORUNTIL induction variable bounds as facts in the loop header, leading to the removal of the loop exit branch.

Background

Prove has two ways of identifying loop induction variables and their min/mix:

  1. findIndVars finds " normal" loop induction vars, and their bounds, for normal loops
  2. addLocalInductiveFacts finds "OFORINTUL" loop induction vars, and their bounds, for OFORUNTIL loops.

During prove proper (DFS of the SSA dominator tree), prove adds facts based on the bounds of loop induction variables in two ways - the difference is part of the present issue:

  1. bounds of "normal" loop induction variables are added as facts when entering the body of the loop.
  2. bounds of "OFORUNTIL" induction variable are added as facts when entering the loop header (I'm not sure how this works out in a real OFORUNTIL case, I need to see one of those.).

Why is prove removing the loop exit

When prove analyses this SSA, addLocalInductiveFacts is incorrectly identifying the loop as OFORUNTIL, and therefore adding the loop induction bounds as facts in the loop header. But this is a normal loop, not OFORUNTIL, so the loop condition is in the loop header. So when we enter the loop header we learn 0 <= star < 1. With these facts present, prove finds the negative branch from the loop header, star >= 6, unsatisfiable, and removes the negative branch. Because the negative branch of the loop header is the loop exit, removing it means the loop will never exit.

How is generic CSE involved

Generic CSE is identifying the Add64 star 1 from star + 1 < 1 in the loop body, and the Add64 star 1 from the star++ loop increment as commom subexpressions, and removing the latter of the two (gcse deadcode does the actual removal). Generic CSE also then modifies the OpPhi in the loop header so it's loop-back edge variable references the remaining Add64 star 1.

b4: ← b2 b6
    v47 (5) = Phi <int> v10 v70 (star[int])        <-------- v70, the input from the loop-back edge
    ...
    v49 (+5) = Less64 <bool> v47 v11
If v49 → b5 b7 (likely) (5)

b5: ← b4
    v51 (+6) = Add64 <int> v47 v16
    v52 (6) = Less64 <bool> v51 v16
If v52 → b9 b6 (6)

b6: ← b5
    ...                                                         <------ print() code elided
    v70 (+5) = Add64 <int> v47 v16       <------  loop increment
Plain → b4 (5)

b7: ← b4
Ret v72 (5)

becomes

b4: ← b2 b6
    v47 (5) = Phi <int> v10 v51 (star[int]).      <-------- v51, new input variable
    ...
    v49 (+5) = Less64 <bool> v47 v11
If v49 → b5 b7 (likely) (5)

b5: ← b4
    v51 (+6) = Add64 <int> v47 v16     <------ this is now the loop increment
    v52 (6) = Less64 <bool> v51 v16.  <------ addLocalInductiveFacts thinks this is OFORUNTIL loop condition
If v52 → b9 b6 (6)

b6: ← b5
    ...                                                        <------ print() code elided
Plain → b4 (+5)

b7: ← b4
Ret v72 (5)

The result of the above is a loop header with a loop induction OpPhi referencing a variable in a predecessor block, along it's loop-back edge, that increments the loop induction var and checks whether the result is less than some constant. This is the pattern that addLocalInductiveFacts is looking for.

edit: typos

@fancl20
Copy link

@fancl20 fancl20 commented Jul 29, 2020

Why is prove removing the loop exit

When prove analyses this SSA, addLocalInductiveFacts is incorrectly identifying the loop as OFORUNTIL, and therefore adding the loop induction bounds as facts in the loop header. But this is a normal loop, not OFORUNTIL, so the loop condition is in the loop header. So when we enter the loop header we learn 0 <= star < 1. With these facts present, prove finds the negative branch from the loop header, star >= 6, unsatisfiable, and removes the negative branch. Because the negative branch of the loop header is the loop exit, removing it means the loop will never exit.

Learning 0 <= star < 1 is wrong as I said. We should never prove 0 <= star < 1 because star + 1 < 1 can't be satisfied.

What (we think) we prove:

0 <= star < 1

What we require:

star + 1 < 1 AND (i2 < max)
0 < 1        AND (min < max)
0 <= star        (min <= i1)

But it's obvious that star + 1 < 1 means star < 0. We can prove there is a paradox here! We should use min + 1 < max instead.

Another example may help. It's not starting from 0, without i+1 and as I understand actually a OFORUNTIL case.

package main

func main() {
	i := 5
top:
	i++
	if i < 6 {
		return
	}
	if i > 6 {
		return
	}
	println(i)
	goto top
}

edit: fix a lot of typos : (

@fancl20
Copy link

@fancl20 fancl20 commented Jul 29, 2020

A more OFORUNTIL one:

package main

func main() {
	i := 5
top:
	i++
	println(i)
	if !(i < 6 || i > 6) {
		goto top
	}
}
@bnkamalesh
Copy link

@bnkamalesh bnkamalesh commented Jul 29, 2020

any idea why it's not a problem for string slices or other type slices? I've been able to reproduce this only for numbers & boolean slices

@dgryski
Copy link
Contributor

@dgryski dgryski commented Jul 29, 2020

Is it worth trying to figure out how often this happens in practice on a large codebase? Given that it's been around for a while it seems fairly rare.

@WZZ1998
Copy link

@WZZ1998 WZZ1998 commented Jul 29, 2020

So, is there some reliable estimation on time to fix this bug?

@randall77
Copy link
Contributor

@randall77 randall77 commented Jul 29, 2020

There's never a reliable estimation on bug fix time :)
You can follow along with the current suggested CL at https://go-review.googlesource.com/c/go/+/244579

@dr2chase
Copy link
Contributor

@dr2chase dr2chase commented Jul 29, 2020

@WZZ1998 as of now I'm happy with the fix, but waiting to see whether it makes the cut for 1.15 or a dot release, and would love other eyes on my assessment of what the original code thought it was doing, and how the fix interacts with that.

I'm also benchmarking the fix to see if this accidentally causes terrible regressions; there's a chance that we lose some (correct) optimization that we had come to expect, since the fixed code will match a more restricted set of patterns.

@gopherbot gopherbot closed this in 7f86080 Jul 30, 2020
@randall77
Copy link
Contributor

@randall77 randall77 commented Jul 30, 2020

@gopherbot, please open backport issues for 1.14 and 1.13.

@gopherbot
Copy link

@gopherbot gopherbot commented Jul 30, 2020

Backport issue(s) opened: #40500 (for 1.13), #40501 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.