cmd/compile: suboptimal arm64 output #43145
Labels
arch-arm64
NeedsDecision
Feedback is required from experts, contributors, and/or the community before a change can be made.
Performance
Milestone
What version of Go are you using (
go version
)?What did you do?
Compiled this function.
Full codebase at FiloSottile/edwards25519#8
What did you expect to see?
What did you see instead?
The compiler figures out the same AND, ADD, and LSR+MADD that my hand-written assembly uses, but note how it loads the inputs twice from memory and looks like it doesn't know about STP and LDP.
Not sure which part makes the most effect, but I got a 10% speedup on some high-level functions (although not on microbenchmarks of thinner functions) between my assembly and the compiler with
go:noinline
. (Interestingly, if I let the compiler inline the high level functions get even slower, while the thin ones get faster.)The text was updated successfully, but these errors were encountered: