[InstCombine] optimize powi(X,Y) * X with Ofast #69998

vfdff · 2023-10-24T02:33:48Z

Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast

For this case, when the Y is 3, then powi(X, 4) is replaced by X2 = X * X; X2 * X2 in the further step.
Similar to D109954, who requires reassoc.

Fixes #69862.

llvmbot · 2023-10-24T02:35:09Z

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: Allen (vfdff)

Changes

Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast

For this case, when the Y is 3, then powi(X, 4) is replaced by X2 = X * X; X2 * X2 in the further step.
Similar to D109954, who requires reassoc.

Fixes #69862.

Full diff: https://github.com/llvm/llvm-project/pull/69998.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp (+12)
(modified) llvm/test/Transforms/InstCombine/powi.ll (+49)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index bc784390c23be49..d3b07113ed7a183 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -716,6 +716,18 @@ Instruction *InstCombinerImpl::visitFMul(BinaryOperator &I) {
       return replaceInstUsesWith(I, Pow);
     }
 
+    // powi(X, Y) * X --> powi(X, Y+1)
+    // X * powi(X, Y) --> powi(X, Y+1)
+    if (match(&I, m_c_FMul(m_OneUse(m_Intrinsic<Intrinsic::powi>(m_Value(X),
+                                                                 m_Value(Y))),
+                           m_Deferred(X))) &&
+        willNotOverflowSignedAdd(Y, ConstantInt::get(Y->getType(), 1), I)) {
+      auto *Y1 = Builder.CreateAdd(Y, ConstantInt::get(Y->getType(), 1));
+      auto *NewPow = Builder.CreateIntrinsic(
+          Intrinsic::powi, {X->getType(), Y1->getType()}, {X, Y1}, &I);
+      return replaceInstUsesWith(I, NewPow);
+    }
+
     if (I.isOnlyUserOfAnyOperand()) {
       // pow(X, Y) * pow(X, Z) -> pow(X, Y + Z)
       if (match(Op0, m_Intrinsic<Intrinsic::pow>(m_Value(X), m_Value(Y))) &&
diff --git a/llvm/test/Transforms/InstCombine/powi.ll b/llvm/test/Transforms/InstCombine/powi.ll
index 89efbb6f4536113..95722d09a17ad32 100644
--- a/llvm/test/Transforms/InstCombine/powi.ll
+++ b/llvm/test/Transforms/InstCombine/powi.ll
@@ -341,3 +341,52 @@ define double @fdiv_pow_powi_negative_variable(double %x, i32 %y) {
   %div = fdiv reassoc nnan double %p1, %x
   ret double %div
 }
+
+; powi(X, Y) * X --> powi(X, Y+1)
+define double @powi_fmul_powi_x(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x(
+; CHECK-NEXT:    [[MUL:%.*]] = call reassoc double @llvm.powi.f64.i32(double [[X:%.*]], i32 4)
+; CHECK-NEXT:    ret double [[MUL]]
+;
+  %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 3)
+  %mul = fmul reassoc double %p1, %x
+  ret double %mul
+}
+
+; Negative test: Multi-use
+define double @powi_fmul_powi_x_multi_use(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x_multi_use(
+; CHECK-NEXT:    [[P1:%.*]] = tail call double @llvm.powi.f64.i32(double [[X:%.*]], i32 3)
+; CHECK-NEXT:    tail call void @use(double [[P1]])
+; CHECK-NEXT:    [[MUL:%.*]] = fmul reassoc double [[P1]], [[X]]
+; CHECK-NEXT:    ret double [[MUL]]
+;
+  %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 3)
+  tail call void @use(double %p1)
+  %mul = fmul reassoc double %p1, %x
+  ret double %mul
+}
+
+; Negative test: Miss fmf flag
+define double @powi_fmul_powi_x_missing_reassoc(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x_missing_reassoc(
+; CHECK-NEXT:    [[P1:%.*]] = tail call double @llvm.powi.f64.i32(double [[X:%.*]], i32 3)
+; CHECK-NEXT:    [[MUL:%.*]] = fmul double [[P1]], [[X]]
+; CHECK-NEXT:    ret double [[MUL]]
+;
+  %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 3)
+  %mul = fmul double %p1, %x
+  ret double %mul
+}
+
+; Negative test: overflow
+define double @powi_fmul_powi_x_overflow(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x_overflow(
+; CHECK-NEXT:    [[P1:%.*]] = tail call double @llvm.powi.f64.i32(double [[X:%.*]], i32 2147483647)
+; CHECK-NEXT:    [[MUL:%.*]] = fmul reassoc double [[P1]], [[X]]
+; CHECK-NEXT:    ret double [[MUL]]
+;
+  %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 2147483647) ; INT_MAX
+  %mul = fmul reassoc double %p1, %x
+  ret double %mul
+}

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

nikic

We have three variants of this optimization now: powi(x,a)*powi(x,b) => powi(x,a+b), powi(x,a)/x => powi(x,a-1) and powi(x,a)*x => powi(x,a+1).

I'd suggest to move most of the transform into a helper function, that will accept x, a, b, x, a, -1 and x, a, 1 in the above cases and then do the common checks and transform.

This will also fix the bug in the current powi * powi transform, which fails to perform the necessary overflow check.

arsenm · 2023-10-24T08:32:24Z

The ARM case already does end up with this optimization? How did that happen?

vfdff · 2023-10-24T11:37:45Z

We have three variants of this optimization now: powi(x,a)*powi(x,b) => powi(x,a+b), powi(x,a)/x => powi(x,a-1) and powi(x,a)*x => powi(x,a+1).

I'd suggest to move most of the transform into a helper function, that will accept x, a, b, x, a, -1 and x, a, 1 in the above cases and then do the common checks and transform.

This will also fix the bug in the current powi * powi transform, which fails to perform the necessary overflow check.

Three variants of this optimization have many differences except the pattern match form, and it seems the only common check is hasAllowReassoc() for instruction I?
1、powi(X, Y) / X --> powi(X, Y-1) need hasAllowReassoc() and hasNoNaNs() for both I and powi(X, Y) , called in visitFDiv, check
2、powi(X, Y) * X --> powi(X, Y+1) need hasAllowReassoc() for both I and powi(X, Y) , called in visitFMul
3、powi(x, y) * powi(x, z) -> powi(x, y + z) need hasAllowReassoc() for all I、powi(X, Y) and powi(X, Z) , called in visitFMul

vfdff · 2023-10-30T01:09:02Z

hi @arsenm, @nikic : I have another idea, not sure if it's appropriate?

According to above discussion, many other optimizations have similar problems due to the need to recursively determine the operand fmt attributes involved. But in fact, the problem has not been exposed for so long. Is it because these fmt attributes are set in units of at least one function, that is, IR operands in the same function should be consistent? That's why this problem has not been exposed for so long?
If so, then we can just keep the same as we are now.

jcranmer-intel · 2023-10-30T21:09:22Z

* According to above discussion, many other optimizations have similar problems due to the need to recursively determine the operand `fmt attributes` involved. But in fact, the problem has not been exposed for so long. Is it because these `fmt attributes` are set in units of at least one function, that is, IR operands in the same function should be consistent? That's why this problem has not been exposed for so long?

Fast-math flags tend to be applied on a per-project basis via command-line (i.e., per-IR module) flags, so it tends to be very rare for the flags to differ within a function, unless you're doing LTO, which is why it's unlikely to crop up very frequently.

vfdff · 2024-03-02T07:20:21Z

* According to above discussion, many other optimizations have similar problems due to the need to recursively determine the operand `fmt attributes` involved. But in fact, the problem has not been exposed for so long. Is it because these `fmt attributes` are set in units of at least one function, that is, IR operands in the same function should be consistent? That's why this problem has not been exposed for so long?
Fast-math flags tend to be applied on a per-project basis via command-line (i.e., per-IR module) flags, so it tends to be very rare for the flags to differ within a function, unless you're doing LTO, which is why it's unlikely to crop up very frequently.

Thanks, I add the restrict for the operands of fmul.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

github-actions · 2024-03-04T13:31:43Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arsenm · 2024-03-05T11:31:56Z

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

+      auto *Powi = dyn_cast<IntrinsicInst>(I.getOperand(0));
+      if (!Powi)
+        Powi = cast<IntrinsicInst>(I.getOperand(1));
+      if (Powi->hasAllowReassoc())


We should really have a way to put this flag check inside the matcher

arsenm · 2024-03-05T11:32:07Z

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

@@ -571,6 +571,50 @@ Instruction *InstCombinerImpl::foldFPSignBitOps(BinaryOperator &I) {
  return nullptr;
 }

+Instruction *InstCombinerImpl::foldPowiReassoc(BinaryOperator &I) {
+  Value *X, *Y, *Z;


Sink these down where they are used, looks like shadowing now

arsenm · 2024-03-05T11:33:58Z

llvm/test/Transforms/InstCombine/powi.ll

  tail call void @use(double %p1)
-  %p2 = tail call double @llvm.powi.f64.i32(double %x, i32 %y)
+  %p2 = tail call reassoc double @llvm.powi.f64.i32(double %x, i32 %y)


If we don't have a test where the 2 powi declarations have different integer types, should add one

arsenm · 2024-03-07T06:44:21Z

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

-                         m_Deferred(X))) &&
-      willNotOverflowSignedAdd(Y, ConstantInt::get(Y->getType(), 1), I))
-    return createPowiExpr(I, *this, X, Y, ConstantInt::get(Y->getType(), 1));
+  if (match(&I, m_c_FMul(m_AllowReassoc(m_OneUse(m_Intrinsic<Intrinsic::powi>(


the m_AllowReassoc should be paired with the m_Intrinsic part, not after the oneUse

arsenm · 2024-03-07T06:45:57Z

llvm/include/llvm/IR/PatternMatch.h

+};
+
+template <typename T>
+inline AllowReassoc_match<T> m_AllowReassoc(const T &SubPattern) {


This isn't quite the API I had in my mind. I envisioned the required flags as a template parameter to the existing matchers, so you could have something like:
m_FMul<Reassoc>(), m_Intrinsic<powi, Reassoc>

However I wasn't expecting you to do anything for this in this patch. This is fine for now

Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast For this case, when the Y is 3, then powi(X, 4) is replaced by X2 = X * X; X2 * X2 in the further step. Similar to D109954, who requires reassoc. Fixes llvm#69862.

According the discussion, except the fmul itself, all its operands should also have reassoc flag. Add new API m_AllowReassoc to check reassoc flag

vfdff requested review from arsenm, dtcxzyw, goldsteinn and jcranmer-intel October 24, 2023 02:33

vfdff requested a review from nikic as a code owner October 24, 2023 02:33

llvmbot added the llvm:transforms label Oct 24, 2023

arsenm reviewed Oct 24, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp Outdated Show resolved Hide resolved

nikic reviewed Oct 24, 2023

View reviewed changes

vfdff mentioned this pull request Nov 4, 2023

[InstCombine] Fix the correctness of missing check reassoc attribute #71277

Closed

vfdff force-pushed the PR69862 branch 2 times, most recently from 322d458 to 4423e80 Compare March 2, 2024 07:15

arsenm requested changes Mar 4, 2024

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp Outdated Show resolved Hide resolved

vfdff force-pushed the PR69862 branch from 865e087 to 86b9575 Compare March 4, 2024 13:39

vfdff requested review from nikic and arsenm March 5, 2024 01:07

arsenm reviewed Mar 5, 2024

View reviewed changes

vfdff force-pushed the PR69862 branch from 86b9575 to b71285c Compare March 7, 2024 01:19

llvmbot added the llvm:ir label Mar 7, 2024

vfdff requested a review from arsenm March 7, 2024 01:28

arsenm reviewed Mar 7, 2024

View reviewed changes

vfdff force-pushed the PR69862 branch from b71285c to 5b57e64 Compare March 7, 2024 08:16

vfdff requested a review from arsenm March 7, 2024 08:16

arsenm approved these changes Mar 14, 2024

View reviewed changes

vfdff added 3 commits March 14, 2024 22:05

[InstCombine] optimize powi(X,Y) * X with Ofast

0985202

Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast For this case, when the Y is 3, then powi(X, 4) is replaced by X2 = X * X; X2 * X2 in the further step. Similar to D109954, who requires reassoc. Fixes llvm#69862.

[InstCombine] create a helper function foldPowiReassoc, NFC

1752b9e

[InstCombine] Add restrict reassoc for the operands of fmul

2d6988a

According the discussion, except the fmul itself, all its operands should also have reassoc flag. Add new API m_AllowReassoc to check reassoc flag

vfdff force-pushed the PR69862 branch from 5b57e64 to 2d6988a Compare March 14, 2024 14:06

vfdff merged commit 2d6988a into llvm:main Mar 14, 2024
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] optimize powi(X,Y) * X with Ofast #69998

[InstCombine] optimize powi(X,Y) * X with Ofast #69998

vfdff commented Oct 24, 2023

llvmbot commented Oct 24, 2023 •

edited

nikic left a comment

arsenm commented Oct 24, 2023

vfdff commented Oct 24, 2023

vfdff commented Oct 30, 2023

jcranmer-intel commented Oct 30, 2023

vfdff commented Mar 2, 2024

github-actions bot commented Mar 4, 2024 •

edited

arsenm Mar 5, 2024

arsenm Mar 5, 2024

arsenm Mar 5, 2024

arsenm Mar 7, 2024

arsenm Mar 7, 2024

[InstCombine] optimize powi(X,Y) * X with Ofast #69998

[InstCombine] optimize powi(X,Y) * X with Ofast #69998

Conversation

vfdff commented Oct 24, 2023

llvmbot commented Oct 24, 2023 • edited

nikic left a comment

Choose a reason for hiding this comment

arsenm commented Oct 24, 2023

vfdff commented Oct 24, 2023

vfdff commented Oct 30, 2023

jcranmer-intel commented Oct 30, 2023

vfdff commented Mar 2, 2024

github-actions bot commented Mar 4, 2024 • edited

arsenm Mar 5, 2024

Choose a reason for hiding this comment

arsenm Mar 5, 2024

Choose a reason for hiding this comment

arsenm Mar 5, 2024

Choose a reason for hiding this comment

arsenm Mar 7, 2024

Choose a reason for hiding this comment

arsenm Mar 7, 2024

Choose a reason for hiding this comment

llvmbot commented Oct 24, 2023 •

edited

github-actions bot commented Mar 4, 2024 •

edited