New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: eliminate more bounds checks #17370

Open
carl-mastrangelo opened this Issue Oct 6, 2016 · 8 comments

Comments

Projects
None yet
9 participants
@carl-mastrangelo
Member

carl-mastrangelo commented Oct 6, 2016

What version of Go are you using (go version)?

go version devel +fe77c5b Mon Oct 3 18:26:43 2016 +0000 linux/amd64

What operating system and processor architecture are you using (go env)?

What did you do?

https://play.golang.org/p/KvKjpisUOl

What did you expect to see?

bounds checks eliminated from assembly

What did you see instead?

TEXT main.blah(SB) /home/notcarl/Desktop/bce.go
    bce.go:9    0x401000    64488b0c25f8ffffff  FS MOVQ FS:0xfffffff8, CX       
    bce.go:9    0x401009    483b6110        CMPQ 0x10(CX), SP           
    bce.go:9    0x40100d    0f864b010000        JBE 0x40115e                
    bce.go:9    0x401013    4883ec38        SUBQ $0x38, SP              
    bce.go:9    0x401017    48896c2430      MOVQ BP, 0x30(SP)           
    bce.go:9    0x40101c    488d6c2430      LEAQ 0x30(SP), BP           
    bce.go:10   0x401021    488b0570620900      MOVQ 0x96270(IP), AX            
    bce.go:10   0x401028    488b0d61620900      MOVQ 0x96261(IP), CX            
    bce.go:12   0x40102f    488b1502080b00      MOVQ 0xb0802(IP), DX            
    bce.go:12   0x401036    4889c3          MOVQ AX, BX             
    bce.go:12   0x401039    4829d0          SUBQ DX, AX             
    bce.go:12   0x40103c    4883f804        CMPQ $0x4, AX               
    bce.go:12   0x401040    0f8ccf000000        JL 0x401115             
    bce.go:15   0x401046    488d4203        LEAQ 0x3(DX), AX            
    bce.go:15   0x40104a    4839d8          CMPQ BX, AX             
    bce.go:15   0x40104d    0f83bb000000        JAE 0x40110e                
    bce.go:16   0x401053    4839da          CMPQ BX, DX             
    bce.go:16   0x401056    0f83ab000000        JAE 0x401107                
    bce.go:16   0x40105c    0fb60c11        MOVZX 0(CX)(DX*1), CX           
    bce.go:16   0x401060    80f980          CMPL $0x80, CL              
    bce.go:16   0x401063    760a            JBE 0x40106f                
    bce.go:34   0x401065    488b6c2430      MOVQ 0x30(SP), BP           
    bce.go:34   0x40106a    4883c438        ADDQ $0x38, SP              
    bce.go:34   0x40106e    c3          RET                 
    bce.go:19   0x40106f    488d4a01        LEAQ 0x1(DX), CX            
    bce.go:19   0x401073    48890dbe070b00      MOVQ CX, 0xb07be(IP)            
    bce.go:20   0x40107a    488b1d17620900      MOVQ 0x96217(IP), BX            
    bce.go:20   0x401081    488b3508620900      MOVQ 0x96208(IP), SI            
    bce.go:20   0x401088    4839d9          CMPQ BX, CX             
    bce.go:20   0x40108b    7373            JAE 0x401100                
    bce.go:20   0x40108d    0fb64c1601      MOVZX 0x1(SI)(DX*1), CX         
    bce.go:20   0x401092    80f980          CMPL $0x80, CL              
    bce.go:20   0x401095    77ce            JA 0x401065             
    bce.go:23   0x401097    488d4a02        LEAQ 0x2(DX), CX            
    bce.go:23   0x40109b    48890d96070b00      MOVQ CX, 0xb0796(IP)            
    bce.go:24   0x4010a2    488b1def610900      MOVQ 0x961ef(IP), BX            
    bce.go:24   0x4010a9    488b35e0610900      MOVQ 0x961e0(IP), SI            
    bce.go:24   0x4010b0    4839d9          CMPQ BX, CX             
    bce.go:24   0x4010b3    7344            JAE 0x4010f9                
    bce.go:24   0x4010b5    0fb64c1602      MOVZX 0x2(SI)(DX*1), CX         
    bce.go:24   0x4010ba    80f980          CMPL $0x80, CL              
    bce.go:24   0x4010bd    77a6            JA 0x401065             
    bce.go:27   0x4010bf    48890572070b00      MOVQ AX, 0xb0772(IP)            
    bce.go:28   0x4010c6    488b0dcb610900      MOVQ 0x961cb(IP), CX            
    bce.go:28   0x4010cd    488b1dbc610900      MOVQ 0x961bc(IP), BX            
    bce.go:28   0x4010d4    4839c8          CMPQ CX, AX             
    bce.go:28   0x4010d7    7319            JAE 0x4010f2                
    bce.go:28   0x4010d9    0fb6441303      MOVZX 0x3(BX)(DX*1), AX         
    bce.go:28   0x4010de    3c80            CMPL $0x80, AL              
    bce.go:28   0x4010e0    7783            JA 0x401065             
    bce.go:31   0x4010e2    488d4204        LEAQ 0x4(DX), AX            
    bce.go:31   0x4010e6    4889054b070b00      MOVQ AX, 0xb074b(IP)            
    bce.go:34   0x4010ed    e973ffffff      JMP 0x401065                
    bce.go:28   0x4010f2    e8c9ed0100      CALL runtime.panicindex(SB)     
    bce.go:28   0x4010f7    0f0b            UD2                 
    bce.go:24   0x4010f9    e8c2ed0100      CALL runtime.panicindex(SB)     
    bce.go:24   0x4010fe    0f0b            UD2                 
    bce.go:20   0x401100    e8bbed0100      CALL runtime.panicindex(SB)     
    bce.go:20   0x401105    0f0b            UD2                 
    bce.go:16   0x401107    e8b4ed0100      CALL runtime.panicindex(SB)     
    bce.go:16   0x40110c    0f0b            UD2                 
    bce.go:15   0x40110e    e8aded0100      CALL runtime.panicindex(SB)     
    bce.go:15   0x401113    0f0b            UD2                 
    bce.go:36   0x401115    488d05ad430600      LEAQ 0x643ad(IP), AX            
    bce.go:36   0x40111c    4889442420      MOVQ AX, 0x20(SP)           
    bce.go:36   0x401121    48c744242803000000  MOVQ $0x3, 0x28(SP)         
    bce.go:36   0x40112a    488d056f380500      LEAQ 0x5386f(IP), AX            
    bce.go:36   0x401131    48890424        MOVQ AX, 0(SP)              
    bce.go:36   0x401135    488d442420      LEAQ 0x20(SP), AX           
    bce.go:36   0x40113a    4889442408      MOVQ AX, 0x8(SP)            
    bce.go:36   0x40113f    e81c770000      CALL runtime.convT2E(SB)        
    bce.go:36   0x401144    488b442410      MOVQ 0x10(SP), AX           
    bce.go:36   0x401149    488b4c2418      MOVQ 0x18(SP), CX           
    bce.go:36   0x40114e    48890424        MOVQ AX, 0(SP)              
    bce.go:36   0x401152    48894c2408      MOVQ CX, 0x8(SP)            
    bce.go:36   0x401157    e8a4fd0100      CALL runtime.gopanic(SB)        
    bce.go:36   0x40115c    0f0b            UD2                 
    bce.go:9    0x40115e    e89d600400      CALL runtime.morestack_noctxt(SB)   
    bce.go:9    0x401163    e998feffff      JMP main.blah(SB)           

I think the chain of panicindex calls should be removed, since both the:

 if l - i < 4 {
    goto fail
  }

and _ = buf[i+3] calls should make it clear that the subsequent accesses cannot possibly go out of bounds. Sorry if this is noise and it's WAI.

@bradfitz

This comment has been minimized.

Member

bradfitz commented Oct 6, 2016

@bradfitz bradfitz added the Performance label Oct 6, 2016

@bradfitz bradfitz added this to the Unplanned milestone Oct 6, 2016

@dsnet

This comment has been minimized.

Member

dsnet commented Oct 6, 2016

/cc @cherrymui. This is relevant to protobuf optimizations.

@cherrymui

This comment has been minimized.

Contributor

cherrymui commented Oct 7, 2016

The current compiler seems good to eliminate bound checks for constant indices, but does not do much for non-constant not inside a loop.
One quick thing you can do is to use constant indices, like:

func blah() {
        b := buf
        l := len(b)
        if l-i < 4 {
                goto fail
        }
        b = b[i:]
        _ = b[3]
        if b[0] > 0x80 {
                goto done
        }
        i++
        if b[1] > 0x80 {
                goto done
        }
        i++
        if b[2] > 0x80 {
                goto done
        }
        i++
        if b[3] > 0x80 {
                goto done
        }
        i++

done:
        return
fail:
        panic("bad")
}

At least it can get down to two bound checks.

@charlie-ht

This comment has been minimized.

charlie-ht commented Oct 19, 2016

Hi, with this program I see what I think is unnecessary bound checking,

func isPalindrome(s string) bool {
    runes := []rune(s)
    for len(runes) > 1 {
        if runes[0] != runes[len(runes)-1] {
            return false
        }
        runes = runes[1:len(runes)-1]
    }
    return true
}

Relevant assembly loop is,

0000000000002097        movq    0x18(%rsp), %rax
000000000000209c        movq    0x20(%rsp), %rcx  // len(runes)
00000000000020a1        movq    0x28(%rsp), %rdx
00000000000020a6        cmpq    $0x1, %rcx
00000000000020aa        jle     0x20ec
00000000000020ac        movl    (%rax), %ebx             /|
00000000000020ae        leaq    -0x1(%rcx), %rsi         /| %rsi = len(runes) - 1
00000000000020b2        cmpq    %rcx, %rsi               /| %rcx = len(runes)
00000000000020b5        jae     0x2127                   /| panicindex (seem unnecessary)
00000000000020b7        movl    -0x4(%rax,%rcx,4), %edi  /|
00000000000020bb        cmpl    %edi, %ebx               /|
00000000000020bd        jne     0x210f                   // if runes[0] != runes[len(runes)-1]
00000000000020bf        cmpq    $0x1, %rsi
00000000000020c3        jb      0x2108
00000000000020c5        cmpq    %rdx, %rsi
00000000000020c8        ja      0x2108
00000000000020ca        addq    $-0x2, %rcx
00000000000020ce        leaq    -0x1(%rdx), %rbx
00000000000020d2        cmpq    $0x1, %rdx
00000000000020d6        je      0x2104
00000000000020d8        movq    $0x1, %rdx
00000000000020df        leaq    (%rax,%rdx,4), %rax
00000000000020e3        movq    %rbx, %rdx
00000000000020e6        cmpq    $0x1, %rcx
00000000000020ea        jg      0x20ac

Version: go version go1.7.1 darwin/amd64
Platform: OSX

Thought maybe now with SSA Go will have range analysis and would know in this case that since len(runes) > 1, runes[len(runes)-1] does not need a bound check.

@rillig

This comment has been minimized.

Contributor

rillig commented Nov 19, 2016

Another testcase: https://play.golang.org/p/ZIvG8AUHJL

go version devel +7dc97d9 windows/amd64

  • Accessing groups[4] on a line of its own eliminates further bounds checks, as expected.
  • Accessing groups[4] as the leftmost expression in a multiple-assignment doesn't eliminate bounds checks further to the right, which is unexpected.
  • The language specification doesn't require array indexes to be evaluated from left to right. Therefore it shouldn't be necessary to access groups[4] to the left of the others, just to avoid further bounds checks.
@randall77

This comment has been minimized.

Contributor

randall77 commented Nov 21, 2016

@rillig , it looks like your case isn't really a bug.
Order of evaluation of the right-hand side is not specified by the spec, and the compiler happens to decide to do the _ = groups[4] portion of the multiassignment last. So you need to put the groups[4] access in a separate statement to make sure the bounds checks are redundant.

That said, maybe it would be possible to decide what order to evaluate the RHS of a multiassignment in order to minimize the bounds checks needed. That sounds like a pretty low-priority optimization though, as there is an easy workaround.

@willnewton

This comment has been minimized.

willnewton commented Dec 21, 2016

With Go 1.7.4 this code produces bounds checks on all 4 stores in the loop.

func boundsCheck(s []int) {
        n := len(s) / 4
	for i := 0; i < n; i++ {
	    	s[i] = i
		s[i+1] = i 
		s[i+2] = i
		s[i+3] = i
	}
}

It seems like it should be possible to reorder the stores to eliminate 3 of the bounds checks:

func boundsCheck(s []int) {
	n := len(s) / 4
	for i := 0; i < n; i++ {
		s[i+3] = i
	    	s[i] = i
		s[i+1] = i 
		s[i+2] = i
	}
}

However this results in 4 bounds checks as before.

@rasky

This comment has been minimized.

Member

rasky commented Aug 20, 2018

@willnewton the compiler isn't currently able to deduce that n >= 0, so it cannot prove that i is never -2 or similar value with which i+3 would be valid but i+1 would panic.

In fact, we currently handle only loops in which the counter bound is a constant or a special value like len(slice). We're not able to optimize index accesses in loops where we have only partial information on the loop bound (eg: in this case, it would be enough to know that the loop bound is non-negative).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment