Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] optimize powi(X,Y) * X with Ofast #69998

Merged
merged 3 commits into from
Mar 14, 2024
Merged

Conversation

vfdff
Copy link
Contributor

@vfdff vfdff commented Oct 24, 2023

Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast

For this case, when the Y is 3, then powi(X, 4) is replaced by X2 = X * X; X2 * X2 in the further step.
Similar to D109954, who requires reassoc.

Fixes #69862.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 24, 2023

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-transforms

Author: Allen (vfdff)

Changes

Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast

For this case, when the Y is 3, then powi(X, 4) is replaced by X2 = X * X; X2 * X2 in the further step.
Similar to D109954, who requires reassoc.

Fixes #69862.


Full diff: https://github.com/llvm/llvm-project/pull/69998.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp (+12)
  • (modified) llvm/test/Transforms/InstCombine/powi.ll (+49)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
index bc784390c23be49..d3b07113ed7a183 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
@@ -716,6 +716,18 @@ Instruction *InstCombinerImpl::visitFMul(BinaryOperator &I) {
       return replaceInstUsesWith(I, Pow);
     }
 
+    // powi(X, Y) * X --> powi(X, Y+1)
+    // X * powi(X, Y) --> powi(X, Y+1)
+    if (match(&I, m_c_FMul(m_OneUse(m_Intrinsic<Intrinsic::powi>(m_Value(X),
+                                                                 m_Value(Y))),
+                           m_Deferred(X))) &&
+        willNotOverflowSignedAdd(Y, ConstantInt::get(Y->getType(), 1), I)) {
+      auto *Y1 = Builder.CreateAdd(Y, ConstantInt::get(Y->getType(), 1));
+      auto *NewPow = Builder.CreateIntrinsic(
+          Intrinsic::powi, {X->getType(), Y1->getType()}, {X, Y1}, &I);
+      return replaceInstUsesWith(I, NewPow);
+    }
+
     if (I.isOnlyUserOfAnyOperand()) {
       // pow(X, Y) * pow(X, Z) -> pow(X, Y + Z)
       if (match(Op0, m_Intrinsic<Intrinsic::pow>(m_Value(X), m_Value(Y))) &&
diff --git a/llvm/test/Transforms/InstCombine/powi.ll b/llvm/test/Transforms/InstCombine/powi.ll
index 89efbb6f4536113..95722d09a17ad32 100644
--- a/llvm/test/Transforms/InstCombine/powi.ll
+++ b/llvm/test/Transforms/InstCombine/powi.ll
@@ -341,3 +341,52 @@ define double @fdiv_pow_powi_negative_variable(double %x, i32 %y) {
   %div = fdiv reassoc nnan double %p1, %x
   ret double %div
 }
+
+; powi(X, Y) * X --> powi(X, Y+1)
+define double @powi_fmul_powi_x(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x(
+; CHECK-NEXT:    [[MUL:%.*]] = call reassoc double @llvm.powi.f64.i32(double [[X:%.*]], i32 4)
+; CHECK-NEXT:    ret double [[MUL]]
+;
+  %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 3)
+  %mul = fmul reassoc double %p1, %x
+  ret double %mul
+}
+
+; Negative test: Multi-use
+define double @powi_fmul_powi_x_multi_use(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x_multi_use(
+; CHECK-NEXT:    [[P1:%.*]] = tail call double @llvm.powi.f64.i32(double [[X:%.*]], i32 3)
+; CHECK-NEXT:    tail call void @use(double [[P1]])
+; CHECK-NEXT:    [[MUL:%.*]] = fmul reassoc double [[P1]], [[X]]
+; CHECK-NEXT:    ret double [[MUL]]
+;
+  %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 3)
+  tail call void @use(double %p1)
+  %mul = fmul reassoc double %p1, %x
+  ret double %mul
+}
+
+; Negative test: Miss fmf flag
+define double @powi_fmul_powi_x_missing_reassoc(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x_missing_reassoc(
+; CHECK-NEXT:    [[P1:%.*]] = tail call double @llvm.powi.f64.i32(double [[X:%.*]], i32 3)
+; CHECK-NEXT:    [[MUL:%.*]] = fmul double [[P1]], [[X]]
+; CHECK-NEXT:    ret double [[MUL]]
+;
+  %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 3)
+  %mul = fmul double %p1, %x
+  ret double %mul
+}
+
+; Negative test: overflow
+define double @powi_fmul_powi_x_overflow(double noundef %x) {
+; CHECK-LABEL: @powi_fmul_powi_x_overflow(
+; CHECK-NEXT:    [[P1:%.*]] = tail call double @llvm.powi.f64.i32(double [[X:%.*]], i32 2147483647)
+; CHECK-NEXT:    [[MUL:%.*]] = fmul reassoc double [[P1]], [[X]]
+; CHECK-NEXT:    ret double [[MUL]]
+;
+  %p1 = tail call double @llvm.powi.f64.i32(double %x, i32 2147483647) ; INT_MAX
+  %mul = fmul reassoc double %p1, %x
+  ret double %mul
+}

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have three variants of this optimization now: powi(x,a)*powi(x,b) => powi(x,a+b), powi(x,a)/x => powi(x,a-1) and powi(x,a)*x => powi(x,a+1).

I'd suggest to move most of the transform into a helper function, that will accept x, a, b, x, a, -1 and x, a, 1 in the above cases and then do the common checks and transform.

This will also fix the bug in the current powi * powi transform, which fails to perform the necessary overflow check.

@arsenm
Copy link
Contributor

arsenm commented Oct 24, 2023

The ARM case already does end up with this optimization? How did that happen?

@vfdff
Copy link
Contributor Author

vfdff commented Oct 24, 2023

We have three variants of this optimization now: powi(x,a)*powi(x,b) => powi(x,a+b), powi(x,a)/x => powi(x,a-1) and powi(x,a)*x => powi(x,a+1).

I'd suggest to move most of the transform into a helper function, that will accept x, a, b, x, a, -1 and x, a, 1 in the above cases and then do the common checks and transform.

This will also fix the bug in the current powi * powi transform, which fails to perform the necessary overflow check.

Three variants of this optimization have many differences except the pattern match form, and it seems the only common check is hasAllowReassoc() for instruction I?
1、powi(X, Y) / X --> powi(X, Y-1) need hasAllowReassoc() and hasNoNaNs() for both I and powi(X, Y) , called in visitFDiv, check
2、powi(X, Y) * X --> powi(X, Y+1) need hasAllowReassoc() for both I and powi(X, Y) , called in visitFMul
3、powi(x, y) * powi(x, z) -> powi(x, y + z) need hasAllowReassoc() for all I、powi(X, Y) and powi(X, Z) , called in visitFMul

@vfdff
Copy link
Contributor Author

vfdff commented Oct 30, 2023

hi @arsenm, @nikic : I have another idea, not sure if it's appropriate?

  • According to above discussion, many other optimizations have similar problems due to the need to recursively determine the operand fmt attributes involved. But in fact, the problem has not been exposed for so long. Is it because these fmt attributes are set in units of at least one function, that is, IR operands in the same function should be consistent? That's why this problem has not been exposed for so long?
  • If so, then we can just keep the same as we are now.

@jcranmer-intel
Copy link
Contributor

* According to above discussion, many other optimizations have similar problems due to the need to recursively determine the operand `fmt attributes` involved. But in fact, the problem has not been exposed for so long. Is it because these `fmt attributes` are set in units of at least one function, that is, IR operands in the same function should be consistent? That's why this problem has not been exposed for so long?

Fast-math flags tend to be applied on a per-project basis via command-line (i.e., per-IR module) flags, so it tends to be very rare for the flags to differ within a function, unless you're doing LTO, which is why it's unlikely to crop up very frequently.

@vfdff
Copy link
Contributor Author

vfdff commented Mar 2, 2024

* According to above discussion, many other optimizations have similar problems due to the need to recursively determine the operand `fmt attributes` involved. But in fact, the problem has not been exposed for so long. Is it because these `fmt attributes` are set in units of at least one function, that is, IR operands in the same function should be consistent? That's why this problem has not been exposed for so long?

Fast-math flags tend to be applied on a per-project basis via command-line (i.e., per-IR module) flags, so it tends to be very rare for the flags to differ within a function, unless you're doing LTO, which is why it's unlikely to crop up very frequently.

Thanks, I add the restrict for the operands of fmul.

Copy link

github-actions bot commented Mar 4, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines 594 to 597
auto *Powi = dyn_cast<IntrinsicInst>(I.getOperand(0));
if (!Powi)
Powi = cast<IntrinsicInst>(I.getOperand(1));
if (Powi->hasAllowReassoc())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really have a way to put this flag check inside the matcher

@@ -571,6 +571,50 @@ Instruction *InstCombinerImpl::foldFPSignBitOps(BinaryOperator &I) {
return nullptr;
}

Instruction *InstCombinerImpl::foldPowiReassoc(BinaryOperator &I) {
Value *X, *Y, *Z;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sink these down where they are used, looks like shadowing now

tail call void @use(double %p1)
%p2 = tail call double @llvm.powi.f64.i32(double %x, i32 %y)
%p2 = tail call reassoc double @llvm.powi.f64.i32(double %x, i32 %y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have a test where the 2 powi declarations have different integer types, should add one

m_Deferred(X))) &&
willNotOverflowSignedAdd(Y, ConstantInt::get(Y->getType(), 1), I))
return createPowiExpr(I, *this, X, Y, ConstantInt::get(Y->getType(), 1));
if (match(&I, m_c_FMul(m_AllowReassoc(m_OneUse(m_Intrinsic<Intrinsic::powi>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the m_AllowReassoc should be paired with the m_Intrinsic part, not after the oneUse

};

template <typename T>
inline AllowReassoc_match<T> m_AllowReassoc(const T &SubPattern) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite the API I had in my mind. I envisioned the required flags as a template parameter to the existing matchers, so you could have something like:
m_FMul<Reassoc>(), m_Intrinsic<powi, Reassoc>

However I wasn't expecting you to do anything for this in this patch. This is fine for now

Try to transform the powi(X, Y) * X into powi(X, Y+1) with Ofast

For this case, when the Y is 3, then powi(X, 4) is replaced by
X2 = X * X; X2 * X2 in the further step.
Similar to D109954, who requires reassoc.

Fixes llvm#69862.
According the discussion, except the fmul itself, all its operands
should also have reassoc flag.
Add new API m_AllowReassoc to check reassoc flag
@vfdff vfdff merged commit 2d6988a into llvm:main Mar 14, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimization for powi(x, y) * x
5 participants