Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Missing fold for (1 + 2 * a * b > 0) ? a : a * b --> a * b < 0 ? a * b : a #61538

Open
k-arrows opened this issue Mar 20, 2023 · 13 comments

Comments

@k-arrows
Copy link

Test code:

int foo(int a, int b)
{
    return (1 + 2 * a * b > 0) ? a : a * b ;
}

clang

foo(int, int):                               // @foo(int, int)
        mul     w8, w0, w1
        mov     w9, #1                          // =0x1
        orr     w8, w9, w8, lsl #1
        cmp     w8, #1
        csinc   w8, w1, wzr, lt
        mul     w0, w8, w0
        ret

gcc

foo(int, int):
        mul     w1, w0, w1
        cmp     w1, 0
        csel    w0, w1, w0, lt
        ret

https://godbolt.org/z/b79ehqG77

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 20, 2023

@llvm/issue-subscribers-backend-aarch64

@junaire
Copy link
Member

junaire commented Apr 5, 2023

It's not applied to the AArch64 target but to other targets as well. I believe it's a missing fold in the middle end.
See https://alive2.llvm.org/ce/z/P-FXNQ

@junaire
Copy link
Member

junaire commented Apr 5, 2023

https://godbolt.org/z/dPh8P14q1

@junaire junaire changed the title [AArch64] optimization of "(1 + 2 * a * b > 0) ? a : a * b" [InstCombine] Missing fold for (1 + 2 * a * b > 0) ? a : a * b --> a * b < 0 ? a * b : a Apr 5, 2023
@k-arrows
Copy link
Author

k-arrows commented Apr 5, 2023

It's not applied to the AArch64 target but to other targets as well. I believe it's a missing fold in the middle end. See https://alive2.llvm.org/ce/z/P-FXNQ

Thank you for your comment. I'm sorry but the issue title was misleading.

@junaire
Copy link
Member

junaire commented Apr 5, 2023

Even these:

Positive * a * b + 1 > 0 ? a : a * b --> a * b < 0 : a * b ? a
Negative * a * b + 1 > 0 ? a : a * b --> a * b > 0 : a * b ? a

https://alive2.llvm.org/ce/z/9rD0BK

@junaire
Copy link
Member

junaire commented Apr 5, 2023

It's not applied to the AArch64 target but to other targets as well. I believe it's a missing fold in the middle end. See https://alive2.llvm.org/ce/z/P-FXNQ

Thank you for your comment. I'm sorry but the issue title was misleading.

Feel free to correct it!

junaire added a commit that referenced this issue Apr 13, 2023
We can eliminate the or operation based on the predicate and the
relation between OrC and C.

sge: X | OrC s>= C --> X s>= 0 iff OrC s>= C s>= 0
sgt: X | OrC s>  C --> X s>= 0 iff OrC s>  C s>= 0
sle: X | OrC s<= C --> X s<  0 iff OrC s>  C s>= 0
slt: X | OrC s<  C --> X s<  0 iff OrC s>= C s>= 0

Alive2 links:
sge: https://alive2.llvm.org/ce/z/W-6FHE
sgt: https://alive2.llvm.org/ce/z/TKK2yJ
sle: https://alive2.llvm.org/ce/z/vURQGM
slt: https://alive2.llvm.org/ce/z/JAsVfw

Related issue: #61538

Signed-off-by: Jun Zhang <jun@junz.org>

Differential Revision: https://reviews.llvm.org/D147597
@nikic
Copy link
Contributor

nikic commented Apr 13, 2023

Can you please confirm whether this report was based on a real-world motivation? I want to understand whether we should accept a non-principled fix for this or not.

@junaire
Copy link
Member

junaire commented Apr 13, 2023

Can you please confirm whether this report was based on a real-world motivation? I want to understand whether we should accept a non-principled fix for this or not.

Sorry, I don't understand. What do you mean by "based on a real-world motivation"? I'm not the original bug reporter so I don't know if this is based on some real codebase or just random fuzzing. However, IMHO it's an optimizing compiler's job to optimize the code as long as 1. the transform is beneficial 2. it doesn't cause noticeable compile time regression.

I'll be surprised to see clang can't optimize this trivial example: https://godbolt.org/z/bvPhno5xT

@k-arrows
Copy link
Author

I'm the original reporter. The conditional expression considered in this issue (I mean "1 + 2 * a * b > 0" ) was originally inspired by the following derivative formula. Such calculations are necessary when considering some kind of differential equation.

$$\{(1+\alpha u)u\}'=(1+2\alpha u)u'$$

If limited to integers, we can fold as indicated in the title of this issue.

However, IMHO it's an optimizing compiler's job to optimize the code as long as 1. the transform is beneficial 2. it doesn't cause noticeable compile time regression.

I agree with junaire. I believe that compilers are based on steady improvement. I would like to express my sincere respect for junaire's efforts (creating patches and responding to reviews).

@nikic
Copy link
Contributor

nikic commented Apr 13, 2023

My question here is whether there is production code that will benefit from this optimization. If yes, I am much more willing to accept a hacky solution to the problem. If no, then I will insist on a principled solution.

@junaire
Copy link
Member

junaire commented Apr 13, 2023

My question here is whether there is production code that will benefit from this optimization. If yes, I am much more willing to accept a hacky solution to the problem. If no, then I will insist on a principled solution.

What do you mean by " hacky solution to the problem"? I guess there may be some misunderstanding here.
FWIW, I don't intend to add some instcombine matches mentioned in earlier comments:

Positive * a * b + 1 > 0 ? a : a * b --> a * b < 0 : a * b ? a
Negative * a * b + 1 > 0 ? a : a * b --> a * b > 0 : a * b ? a

That's too arbitrary and I want some principled solutions as well. For example, I think we at least can improve the following cases:

  1. icmp(X | OrC, C) --> icmp(X, 0) just landed in e3175f7
  2. icmp(Factor * X, * Y, 0) --> icmp(X* Y, 0) I created https://reviews.llvm.org/D148210 but that may be a wrong approach as you commented earlier.

After all of these low-hanging fruits, I can still see some unnecessary computation but that's another story.
Or do you mean the two improvements are also hacky? 😨

junaire added a commit to junaire/llvm-project that referenced this issue Apr 26, 2023
The existing logic in `foldICmpMulConstant` can only optimize the
constant if we have (X * Y) * C. However, reassociate pass currently
produces this form with the constant on the inner multiply, so let's
reassociate it in foldICmpMulConstant to enable further optimizations.

Related issue: llvm#61538

Signed-off-by: Jun Zhang <jun@junz.org>

Differential Revision: https://reviews.llvm.org/D148210
flemairen6 pushed a commit to Xilinx/llvm-project that referenced this issue May 10, 2023
We can eliminate the or operation based on the predicate and the
relation between OrC and C.

sge: X | OrC s>= C --> X s>= 0 iff OrC s>= C s>= 0
sgt: X | OrC s>  C --> X s>= 0 iff OrC s>  C s>= 0
sle: X | OrC s<= C --> X s<  0 iff OrC s>  C s>= 0
slt: X | OrC s<  C --> X s<  0 iff OrC s>= C s>= 0

Alive2 links:
sge: https://alive2.llvm.org/ce/z/W-6FHE
sgt: https://alive2.llvm.org/ce/z/TKK2yJ
sle: https://alive2.llvm.org/ce/z/vURQGM
slt: https://alive2.llvm.org/ce/z/JAsVfw

Related issue: llvm#61538

Signed-off-by: Jun Zhang <jun@junz.org>

Differential Revision: https://reviews.llvm.org/D147597
@hamza-spl
Copy link

hamza-spl commented Jun 5, 2023

Hi,
I'm interested in what @junaire did here https://reviews.llvm.org/D148210, I had implemented something similar to this, my approach was to add the logic in foldICmpWithZero() as I supposed the optimization only makes sens when comparing to 0 , so we need to completely ignore the unuseful multiplication by the constant in this case, otherwise I can't identify what kind of optimizations could this transformation trigger in more general cases, do you have any idea ?
@junaire could you please let me know if you still working on this ?
Thanks.

@pinskia
Copy link

pinskia commented Dec 25, 2023

Note the missed optimization in this case is something more like:

int foo1(int a, int b)
{
    return (2 * a * b < 0) ? a : a * b ;
}

Looks like LLVM is converting it to:

int foo2(int a, int b)
{
  int c = a*b;
  int d = (2 * c < 0) ? 1 : b ;
 return d * a;
}

Note GCC does not use csel for the ?: until late(r) in the pipeline and expands it into if/then/else after the front-end.

Also 2* is not able to remove by LLVM for some reason, it was converted to <<1 looks like too early too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants