-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: Load optimization suboptimal #48222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
FWIW casting to func foo(x []byte) uint32 {
y := (*[4]byte)(x) // Avoid bounds check for clarity
switch y[0] {
case 0:
return binary.LittleEndian.Uint32(y[:])
}
return 0
} ->
|
Want to do another one, @agarciamontoro? |
Consider this program:
In this situation, we can't do a single 4-byte load inside the case because we might get into a situation where the bottom 8 bits of the returned value are not 0. That would be surprising. It can only happen if there's a data race, so technically it would be ok, but reloading from memory when the program doesn't do so, is something we try to avoid. The OP's program has 2 separate loads, so it would be ok to do the 4-byte load optimization there. But unforunately, we do CSE before we do the 4-byte load optimization, so at that point the compiler can't tell the difference between the OP's program and mine. It errs on the side of caution and doesn't do the 4-byte load optimization. |
I would think it is always better to reload when constructing the value includes just a single load from memory. Even constructing a 16 bit value:
... the second part has a considerably longer latency than a single |
Thanks for the offer, @josharian! I'm right now investigating to do another one, so I'll pass on this opportunity and leave it open for someone else to grab :) |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes. And also previous versions.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Compile https://play.golang.org/p/oWj4bSaNiHN
Godbolt: https://go.godbolt.org/z/YnGcGW8Gc
What did you expect to see?
A single
MOVL
instruction used to load uint32.What did you see instead?
The compiler tries to put the value together with the previous load, 2 other loads, 2 shifts and 2 or operations.
It it safe to rely on the value being in L1 when it has just been loaded. Some modern archs even do memory -> virtual register aliasing. I don't see any cases where this "optimization" is a benefit compared to straight up reloading the value.
The text was updated successfully, but these errors were encountered: