New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suboptimal instruction sequence when operands are reordered #57255
Comments
@llvm/issue-subscribers-backend-risc-v |
@llvm/issue-subscribers-backend-arm |
The IR for @foo looks silly @rotateright define dso_local i32 @foo(i32 noundef %0) local_unnamed_addr #0 {
%2 = mul i32 %0, -5
%3 = xor i32 %2, -1
%4 = add i32 %3, %0
ret i32 %4
} |
The instructions are the same for X86 but the operands are not. The lea for baz uses |
We do better with Zba. We manage a 2 instruction serial dependency for @baz foo: # @foo
sh2add a1, a0, a0
add a0, a0, a1
addi a0, a0, -1
ret
bar: # @bar
sh2add a1, a0, a0
add a0, a0, a1
addi a0, a0, -1
ret
baz: # @baz
sh1add a0, a0, a0
li a1, -1
sh1add a0, a0, a1
ret |
Yeah, I didn't step through the transforms, but I think we want to reduce all of these to baz() form in IR since that's shortest: That means the backend needs more mul expansion smarts if we are trying to avoid the mul in asm. |
This comment is not correct:
...
...
In the baz() case, it decomposed as (3x) + (3x), but the others were 5*x + x. I don't think that makes a perf difference on any recent CPU, but it does show that we could do better at canonicalizing in codegen too? |
Yeah, missed it. But w.r.t. performance all three will be similar while for Arm and Riscv there is difference.
agreed! |
Fixes AArch64 part with https://reviews.llvm.org/D132322 |
On the IR side, we could generalize the factoring to patterns with no constants: |
On 2nd thought, madd codegen looks fine either way (at least for aarch64):
The 'tgt' form with dependent ops might have a longer critical path without madd, but that should be invertible in codegen. |
Proposal to improve canonicalization in IR: |
The stronger one-use checks prevented transforms like this: (x * y) + x --> x * (y + 1) (x * y) - x --> x * (y - 1) https://alive2.llvm.org/ce/z/eMhvQa This is one of the IR transforms suggested in issue #57255. This should be better in IR because it removes a use of a variable operand (we already fold the case with a constant multiply operand). The backend should be able to re-distribute the multiply if that's better for the target. Differential Revision: https://reviews.llvm.org/D132412
Negator can create non-obvious math while trying hard to avoid subtraction. issue #57255
~(A * C1) + A --> (A * (1 - C1)) - 1 This is a non-obvious mix of bitwise logic and math: https://alive2.llvm.org/ce/z/U7ACVT The pattern may be produced by Negator from the more typical code seen in issue #57255.
That part should be fixed in IR at least with: D132412 was initially going to change the bar() IR, but that patch ended up fixing another pattern, and we still miss bar(). It's still up to the backend to decompose the mul if that's better for the target. |
Lower a = b * C -1 into madd a) instcombine change b * C -1 --> b * C + (-1) b) machine-combine change b * C + (-1) --> madd Assembler will transform the neg immedate of sub to add, see https://gcc.godbolt.org/z/cTcxePPf4 Fixes AArch64 part of #57255. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134336
This issue is still exist for the 2nd function bar, https://godbolt.org/z/vjree4xfY |
Proposal to improve canonicalization in IR of bar , https://reviews.llvm.org/D136623 |
@vfdff It seems like the patch was abandoned? are you still planning to work on it? |
Hi! This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:
For more instructions on how to submit a patch to LLVM, see our documentation. If you have any further questions about this issue, don't hesitate to ask via a comment on this Github issue. @llvm/issue-subscribers-good-first-issue |
@hiraditya I can rebase that change onto trunk and try to getting it in. Would that be helpful? |
yes please. |
Hi, is anyone actively working on this issue? I see an open pull request corresponding to this issue, was wondering if it's being actively worked on? |
I‘m not working on it, I think it is pleasure you can continue on this as there is no update for a long time @gxyd |
Thank you for your reply. I'll get started with it then. |
@hiraditya , can you maybe assign the issue to me? |
Should probably ask @tetsuo-cpp what their plans for the PR are first. |
@gxyd Go for it! Thanks for working on this. Unfortunately, I got busy and won't have time to spend on getting that PR in. I'll close the open PR but will leave the branch on my fork in case you want to use it. Although, it's been a while since I rebased so it might not be useful to you. |
$ riscv64-clang -O3
In the last case
mulw
is generated.Aarch64 also has this issue
$ arm64-clang -O3
In the last case
mul
is generated.However, X86 generates the same instruction sequence for all three
The text was updated successfully, but these errors were encountered: