Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Fold ctpop(X) eq/ne 1 if X is non-zero #67268

Closed
wants to merge 3 commits into from

Conversation

dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented Sep 24, 2023

This patch does the following folds if we know X is non-zero:
ctpop(X) == 1 -> ctpop(X) <u 2
ctpop(X) != 1 -> ctpop(X) >u 1
The latter forms give better codegen than the former: https://godbolt.org/z/5beeq8fGz
Alive2: https://alive2.llvm.org/ce/z/GQRQ5T
Fixes #57328.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 24, 2023

@llvm/pr-subscribers-llvm-transforms

Changes

This patch folds pattern ctpop(X) &lt;u 2 into ctpop(X) == 1 if we know X is non-zero.
Fixes #57328.


Full diff: https://github.com/llvm/llvm-project/pull/67268.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+8-1)
  • (modified) llvm/test/Transforms/InstCombine/ispow2.ll (+1-1)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index a219dac7acfbe16..9aafd83d42d0756 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -3412,6 +3412,14 @@ static Instruction *foldCtpopPow2Test(ICmpInst &I, IntrinsicInst *CtpopLhs,
                                       const SimplifyQuery &Q) {
   assert(CtpopLhs->getIntrinsicID() == Intrinsic::ctpop &&
          "Non-ctpop intrin in ctpop fold");
+
+  const ICmpInst::Predicate Pred = I.getPredicate();
+  // If we know X is non-zero, we can fold isPow2OrZero into isPow2.
+  if (Pred == ICmpInst::ICMP_ULT && CRhs == 2 &&
+      isKnownNonZero(CtpopLhs, Q.DL, /*Depth*/ 0, Q.AC, Q.CxtI, Q.DT))
+    return ICmpInst::Create(Instruction::ICmp, ICmpInst::ICMP_EQ, CtpopLhs,
+                            ConstantInt::get(CtpopLhs->getType(), 1));
+
   if (!CtpopLhs->hasOneUse())
     return nullptr;
 
@@ -3423,7 +3431,6 @@ static Instruction *foldCtpopPow2Test(ICmpInst &I, IntrinsicInst *CtpopLhs,
   // If we know any bit of X can be folded to:
   //    IsPow2       : X & (~Bit) == 0
   //    NotPow2      : X & (~Bit) != 0
-  const ICmpInst::Predicate Pred = I.getPredicate();
   if (((I.isEquality() || Pred == ICmpInst::ICMP_UGT) && CRhs == 1) ||
       (Pred == ICmpInst::ICMP_ULT && CRhs == 2)) {
     Value *Op = CtpopLhs->getArgOperand(0);
diff --git a/llvm/test/Transforms/InstCombine/ispow2.ll b/llvm/test/Transforms/InstCombine/ispow2.ll
index bbd693b11b388ad..60eb522a144927f 100644
--- a/llvm/test/Transforms/InstCombine/ispow2.ll
+++ b/llvm/test/Transforms/InstCombine/ispow2.ll
@@ -198,7 +198,7 @@ define i1 @is_pow2_non_zero(i32 %x) {
 ; CHECK-NEXT:    [[NOTZERO:%.*]] = icmp ne i32 [[X:%.*]], 0
 ; CHECK-NEXT:    call void @llvm.assume(i1 [[NOTZERO]])
 ; CHECK-NEXT:    [[T0:%.*]] = tail call i32 @llvm.ctpop.i32(i32 [[X]]), !range [[RNG0]]
-; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[T0]], 2
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[T0]], 1
 ; CHECK-NEXT:    ret i1 [[CMP]]
 ;
   %notzero = icmp ne i32 %x, 0

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctpop < 2 is better than ctpop == 1 for the backend, and we will no be able to recover from this transform in the backend if it comes from an assume.

@goldsteinn
Copy link
Contributor

Maybe leave a comment explaining as much though. But nikic is right, we intentionally don't do this. In the backend if we still have non-zero info we optimize this properly.

@dtcxzyw dtcxzyw changed the title [InstCombine] Fold ctpop(X) <u 2 into ctpop(X) == 1 if X is non-zero [InstCombine] Fold ctpop(X) == 1 into ctpop(X) <u 2 if X is non-zero Sep 24, 2023
@dtcxzyw dtcxzyw changed the title [InstCombine] Fold ctpop(X) == 1 into ctpop(X) <u 2 if X is non-zero [InstCombine] Fold ctpop(X) eq/ne 1 if X is non-zero Sep 24, 2023
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fold goes against the usual direction in IR, and in particular conflicts with this generic fold:

// A <u C -> A == C-1 if min(A)+1 == C
if (*CmpC == Op0Min + 1)
return new ICmpInst(ICmpInst::ICMP_EQ, Op0,
ConstantInt::get(Op1->getType(), *CmpC - 1));
This is risky and can easily lead to infinite loops. I think we won't get one right now because KnownBits for ctpop is a bit weak (
case Intrinsic::ctpop: {
), but if it was later strengthened and e.g. inferred KnownBits ...0?1 then we could run into an infinite compile loop at that point.

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Sep 26, 2023

This fold goes against the usual direction in IR, and in particular conflicts with this generic fold:

// A <u C -> A == C-1 if min(A)+1 == C
if (*CmpC == Op0Min + 1)
return new ICmpInst(ICmpInst::ICMP_EQ, Op0,
ConstantInt::get(Op1->getType(), *CmpC - 1));

This is risky and can easily lead to infinite loops. I think we won't get one right now because KnownBits for ctpop is a bit weak (

case Intrinsic::ctpop: {

), but if it was later strengthened and e.g. inferred KnownBits ...0?1 then we could run into an infinite compile loop at that point.

Can we canonicalize ctpop(X) <u 2 into ctpop(X) == 1 in InstCombine and do inverse transform in CodeGenPrepare?

@nikic
Copy link
Contributor

nikic commented Sep 26, 2023

In theory yes, but in practice CGP drops assumes at the very start, and I don't want to have a special ctpop fold run before that. I'd only take that if we stop dropping assumes in CGP (which is on the long-term roadmap, but tricky).

@dtcxzyw dtcxzyw closed this Oct 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missed optimization / canonicalization for testing if an integer is a power of 2
4 participants