cmd/compile: inefficient assembly code on arm64 #43357
Comments
I dug into the details a bit further. The code seems optimal when The culprit are the rewrite rules in where it states: Since x and y are already loaded from memory as a single byte and is zero extended Any ideas how to fix this? |
Maybe we can add rules of the form And there are so many other rules that generate If we want to add the above rewirte rules to fix these problems, we need too many. Any other good ideas to fix them? Thank you. @randall77 @cherrymui BTW, if @fkuehnel requires this case to have a efficient assembly code soon, I can submit a patch to add above rewrite rules to fix it. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Compile this code with:
go build -gcflags -S lomuto.go
https://play.golang.org/p/F8fPWbzvDRO
What did you expect to see?
with clang -O3 -S
I see a tight inner loop between LBB0_2 and LBB0_5 with very minimal instructions
What did you see instead?
I see excessive register usage and many more instructions between address 64 and 124
The text was updated successfully, but these errors were encountered: