Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: loops compiled differently when using range or index #31205

Open
marigonzes opened this Issue Apr 2, 2019 · 5 comments

Comments

Projects
None yet
4 participants
@marigonzes
Copy link

marigonzes commented Apr 2, 2019

What version of Go are you using (go version)?

$ go version
go version devel +4091cf9 Sun Mar 31 23:35:35 2019 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What did you do?

I compiled two functions that sum all the elements of a slice of bytes using "range" or using an index:

"Range" version:

func sumRange(slc []byte) byte {
    res := byte(0)
    for _, v := range slc {
        res += v
    }
    return res
}

Index version:

func sumIndex(slc []byte) byte {
    res := byte(0)
    for i := 0; i < len(slc); i++ {
        res += slc[i]
    }
    return res
}

What did you expect to see?

I expected both versions of the function to compile to the exact same code.

What did you see instead?

Instead, the looping part of the function is different, resulting in differences in performance:

"Range" version:

movblzx (CX)(DX*1), SI
incq    DX
addl    SI, BX

Index version:

leaq    1(DX), SI
movblzx (CX)(DX*1), DI
addl    DI, BX
movq    SI, DX
@mundaym

This comment has been minimized.

Copy link
Member

mundaym commented Apr 2, 2019

What is the performance difference?

@marigonzes

This comment has been minimized.

Copy link
Author

marigonzes commented Apr 2, 2019

Using the following benchmark (https://play.golang.org/p/rCxWqmQXvq5), I get these numbers:

BenchmarkSumRange-8   	    3000	    437643 ns/op
BenchmarkSumIndex-8   	    3000	    482681 ns/op

So, using an explicit index seems to be slower than using "range".

@marigonzes

This comment has been minimized.

Copy link
Author

marigonzes commented Apr 2, 2019

Something I noticed is that, if I write "sumRange" as follows, the difference disappears:

func sumRange(slc []byte) byte {
    res := byte(0)
    for i := range slc {
        res += slc[i]
    }
    return res
}
@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Apr 2, 2019

@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Apr 2, 2019

This appears to occur because of choices in the scheduler.
The inner loop in sumRange schedules like this:

v19 (5) = MOVBloadidx1 <byte> v26 v10 v1 (v[byte])
v24 (+5) = ADDQconst <int> [1] v10
v21 (+6) = ADDL <byte> v30 v19 (res[byte])

The inner loop in sumIndex schedules like this:

v27 (+13) = ADDQconst <int> [1] v11 (i[int])
v23 (+14) = MOVBloadidx1 <byte> v18 v11 v1
v24 (14) = ADDL <byte> v33 v23 (res[byte])

The schedule in sumIndex is a problem because the ADDQconst can't overwrite its argument, as the argument is not dead yet. So it has to target a different register and then get copied back.

This has been an issue for a while. I noted it in 2016: #16122 (comment)

Perhaps just a tweak to the scheduler is needed.

I'm unclear as to why the scheduler makes different choices here. Might just be how the values happened to be ordered on entry to the scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.