-
Notifications
You must be signed in to change notification settings - Fork 11.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in code quality for mul=>shift conversions in multiple backends #53829
Comments
I assume this is interfering with the negative power of 2 special case lowering in the backend. |
Adding @rotateright - I'm also @spatel-gh, but the other ID is mapped to my LLVM commits. |
Removing a few more bits from the example, we have these equivalent representations:
|
x86-64:
vs.
AArch64:
vs.
RISCV with "m" (should we test with any other attributes?):
vs.
|
More generally, we're inverting a transform that is done by InstCombine's Negator. This is the fold in DAGCombiner - no target hooks or other constraints:
|
I think we can reverse this in SDAG's demanded bits, but there's at least one missing fold exposed. The existing fold in the previous comment seems too aggressive (it may not always be good to replace a single multiply with shift + negate), but we can probably always profitably fold mul+add/sub --> shl+sub. |
This fold is done in IR: https://alive2.llvm.org/ce/z/jWyFrP There is an x86 test that shows an improvement from the added flexibility of using add (commutative). The other diffs are presumed neutral. Note that this could also be folded to an 'xor', but I'm not sure if that would be universally better (eg, x86 can convert adds more easily into LEA). This helps prevent regressions from a potential fold for issue #53829.
This is a fix for a regression discussed in: llvm/llvm-project#53829 We cleared more high multiplier bits with 995d400, but that can lead to worse codegen because we would fail to recognize the now disguised multiplication by neg-power-of-2 as a shift-left. The problem exists independently of the IR change in the case that the multiply already had cleared high bits. We also convert shl+sub into mul+add in instcombine's negator. This patch fills in the high-bits to see the shift transform opportunity. Alive2 attempt to show correctness: https://alive2.llvm.org/ce/z/GgSKVX The AArch64, RISCV, and MIPS diffs look like clear wins. The x86 code requires an extra move register in the minimal examples, but it's still an improvement to get rid of the multiply on all CPUs that I am aware of (because multiply is never as fast as a shift). There's a potential follow-up noted by the TODO comment. We should already convert that pattern into shl+add in IR, so it's probably not common: https://alive2.llvm.org/ce/z/7QY_Ga Fixes #53829 Differential Revision: https://reviews.llvm.org/D120216
I spotted that rG995d400f3a3c: [InstCombine] reduce mul operands based on undemanded high bits caused some regressions in terms of code that previously resulted in shifts being selected generating a multiply instead. I got as far as bisecting the commit, but not as far as looking in any detail at mitigations - so creating this bug so others don't duplicate effort.
I don't think there's any suggestion the modification is wrong - this kind of issue is quite common with these demanded bits optimisations.
Take this example from
fn2_4
in 20040629-1.c from the GCC torture suite.clang-generated .ll:
Result of
opt -O2
after 995d400:Result of
opt -O2
before that commit:From a quick look, this results in mul instructions for at least i686/x86_64, RISC-V, and AArch64 when they weren't present before.
CC @spatel-gh @topperc
The text was updated successfully, but these errors were encountered: