Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: BCE is better with reslicing than index variables #28942

mvdan opened this issue Nov 25, 2018 · 5 comments

cmd/compile: BCE is better with reslicing than index variables #28942

mvdan opened this issue Nov 25, 2018 · 5 comments
compiler/runtime Performance


Copy link

mvdan commented Nov 25, 2018

Take this piece of code:

$ cat f.go
package p

func slice(p []byte) {
        for len(p) > 4 {
                // zero bounds checks.
                _ = p[0]
                _ = p[1]
                _ = p[2]
                _ = p[3]

                p = p[4:] // reslicing is expensive.

func index(p []byte) {
        i := 0
        for i < len(p) {
                _ = p[i+3] // BCE hint; bounds check

                _ = p[i]   // unexpected bounds check
                _ = p[i+1] // unexpected bounds check
                _ = p[i+2] // unexpected bounds check
                _ = p[i+3]

                i += 4 // incrementing i is cheap.
$ go version
go version devel +6d5caf38e3 Thu Nov 22 02:59:55 2018 +0000 linux/amd64
$ go build -gcflags=-d=ssa/check_bce/debug=1 f.go
# command-line-arguments
./f.go:18:8: Found IsInBounds
./f.go:20:8: Found IsInBounds
./f.go:21:8: Found IsInBounds
./f.go:22:8: Found IsInBounds

It's easy to see why the first variant has zero bounds checks. However, reslicing can be expensive in a hot loop, so sometimes the code is rewritten to use indexes.

This is what the second variant does. I do realise that the two loops aren't equivalent - for example, if len(p) == 5, the second loop will panic since the length is not a multiple of 4. So I understand why the compiler needs to insert one bounds check.

Still, it seems to me like it should insert one, not four, since I've added the BCE hint. My first thought was that it couldn't prove that i >= 0, but changing the index to be unsigned still doesn't remove all the bounds checks that I'd expect.

I encountered this in the base64 encoder:

di, si := 0, 0
n := (len(src) / 3) * 3
for si < n {
        // Convert 3x 8bit source bytes into 4 bytes
        val := uint(src[si+0])<<16 | uint(src[si+1])<<8 | uint(src[si+2]) // 3 bounds checks

        dst[di+0] = enc.encode[val>>18&0x3F] // bounds check
        dst[di+1] = enc.encode[val>>12&0x3F] // bounds check
        dst[di+2] = enc.encode[val>>6&0x3F]  // bounds check
        dst[di+3] = enc.encode[val&0x3F]     // bounds check

        si += 3
        di += 4

Rewriting the loop to use reslicing like for len(src) >= 3 { ...; src = src[3:] } does remove the bounds checks, but slows down the encoder noticeably. Just like in my simple example above, BCE hints like _ = src[si+2] and unsigned indexes didn't help either.

I think there's probably a way to rewrite the loop index logic to trick BCE, but I think the compiler could be made smarter here. I'm not sure whether that should be an enhancement in the prove pass, or in the bounds check elimination pass.

/cc @aclements @rasky @josharian

@mvdan mvdan added this to the Unplanned milestone Nov 25, 2018
Copy link

dgryski commented Nov 25, 2018

Possibly related: #24876

Copy link
Member Author

mvdan commented Nov 25, 2018

I believe this also affects the base64 decoder - see the comment above. The code iterates over a pair of slices in chunks, much like the encoder:

si := 0
for strconv.IntSize >= 64 && len(src)-si >= 8 && len(dst)-n >= 8 {
        if dn, ok := decode64(
        ); ok {

As before, I tried combining unsigned integers and BCE hints, but no results. I also tried juggling with the for loop conditions, but that didn't make a difference either.

Copy link
Member Author

mvdan commented Nov 27, 2018

I just realised that removing bounds checks from a loop that jumps more than one position per iteration might not be as easy as it sounds, because of overflows. Imagine:

i := 0
for i < len(data) {
    _ = data[i+0]
    _ = data[i+1]
    _ = data[i+2]
    _ = data[i+3]
    i += 4

If len(data) happens to be the maximum integer value, then i+2 could overflow and give a negative number, which would always be a panic when used as an index. Still, a _ = data[i+3] hint should still remove that "is negative" bounds check, since i+2 can't possibly overflow if i+3 didn't.

I presume this trouble disappears if we make the index variable unsigned, since an overflow will just get the index back to 0, which will definitely be within bounds.

Copy link

agnivade commented Feb 26, 2019

/cc @zdjones

Copy link
Member Author

mvdan commented Apr 23, 2019 might be interesting to the folks following this thread.

@gopherbot gopherbot added the compiler/runtime label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
compiler/runtime Performance
None yet

No branches or pull requests

4 participants