[InstCombine] Fold `ctpop(X) eq/ne 1` if X is non-zero #67268

dtcxzyw · 2023-09-24T15:25:26Z

This patch does the following folds if we know X is non-zero:
ctpop(X) == 1 -> ctpop(X) <u 2
ctpop(X) != 1 -> ctpop(X) >u 1
The latter forms give better codegen than the former: https://godbolt.org/z/5beeq8fGz
Alive2: https://alive2.llvm.org/ce/z/GQRQ5T
Fixes #57328.

llvmbot · 2023-09-24T15:26:26Z

@llvm/pr-subscribers-llvm-transforms

Changes

This patch folds pattern ctpop(X) <u 2 into ctpop(X) == 1 if we know X is non-zero.
Fixes #57328.

Full diff: https://github.com/llvm/llvm-project/pull/67268.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+8-1)
(modified) llvm/test/Transforms/InstCombine/ispow2.ll (+1-1)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index a219dac7acfbe16..9aafd83d42d0756 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -3412,6 +3412,14 @@ static Instruction *foldCtpopPow2Test(ICmpInst &I, IntrinsicInst *CtpopLhs,
                                       const SimplifyQuery &Q) {
   assert(CtpopLhs->getIntrinsicID() == Intrinsic::ctpop &&
          "Non-ctpop intrin in ctpop fold");
+
+  const ICmpInst::Predicate Pred = I.getPredicate();
+  // If we know X is non-zero, we can fold isPow2OrZero into isPow2.
+  if (Pred == ICmpInst::ICMP_ULT && CRhs == 2 &&
+      isKnownNonZero(CtpopLhs, Q.DL, /*Depth*/ 0, Q.AC, Q.CxtI, Q.DT))
+    return ICmpInst::Create(Instruction::ICmp, ICmpInst::ICMP_EQ, CtpopLhs,
+                            ConstantInt::get(CtpopLhs->getType(), 1));
+
   if (!CtpopLhs->hasOneUse())
     return nullptr;
 
@@ -3423,7 +3431,6 @@ static Instruction *foldCtpopPow2Test(ICmpInst &I, IntrinsicInst *CtpopLhs,
   // If we know any bit of X can be folded to:
   //    IsPow2       : X & (~Bit) == 0
   //    NotPow2      : X & (~Bit) != 0
-  const ICmpInst::Predicate Pred = I.getPredicate();
   if (((I.isEquality() || Pred == ICmpInst::ICMP_UGT) && CRhs == 1) ||
       (Pred == ICmpInst::ICMP_ULT && CRhs == 2)) {
     Value *Op = CtpopLhs->getArgOperand(0);
diff --git a/llvm/test/Transforms/InstCombine/ispow2.ll b/llvm/test/Transforms/InstCombine/ispow2.ll
index bbd693b11b388ad..60eb522a144927f 100644
--- a/llvm/test/Transforms/InstCombine/ispow2.ll
+++ b/llvm/test/Transforms/InstCombine/ispow2.ll
@@ -198,7 +198,7 @@ define i1 @is_pow2_non_zero(i32 %x) {
 ; CHECK-NEXT:    [[NOTZERO:%.*]] = icmp ne i32 [[X:%.*]], 0
 ; CHECK-NEXT:    call void @llvm.assume(i1 [[NOTZERO]])
 ; CHECK-NEXT:    [[T0:%.*]] = tail call i32 @llvm.ctpop.i32(i32 [[X]]), !range [[RNG0]]
-; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[T0]], 2
+; CHECK-NEXT:    [[CMP:%.*]] = icmp eq i32 [[T0]], 1
 ; CHECK-NEXT:    ret i1 [[CMP]]
 ;
   %notzero = icmp ne i32 %x, 0

nikic

ctpop < 2 is better than ctpop == 1 for the backend, and we will no be able to recover from this transform in the backend if it comes from an assume.

goldsteinn · 2023-09-24T17:00:51Z

Maybe leave a comment explaining as much though. But nikic is right, we intentionally don't do this. In the backend if we still have non-zero info we optimize this properly.

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

…is non-zero

nikic

This fold goes against the usual direction in IR, and in particular conflicts with this generic fold:

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Lines 6207 to 6210 in bb764ec

    
           // A <u C -> A == C-1 if min(A)+1 == C 
        
           if (*CmpC == Op0Min + 1) 
        
             return new ICmpInst(ICmpInst::ICMP_EQ, Op0, 
        
                                 ConstantInt::get(Op1->getType(), *CmpC - 1));

This is risky and can easily lead to infinite loops. I think we won't get one right now because KnownBits for ctpop is a bit weak (

llvm-project/llvm/lib/Analysis/ValueTracking.cpp

Line 1560 in d3505c2

case Intrinsic::ctpop: {

), but if it was later strengthened and e.g. inferred KnownBits ...0?1 then we could run into an infinite compile loop at that point.

dtcxzyw · 2023-09-26T02:47:47Z

This fold goes against the usual direction in IR, and in particular conflicts with this generic fold:

llvm-project/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Lines 6207 to 6210 in bb764ec

// A <u C -> A == C-1 if min(A)+1 == C

if (*CmpC == Op0Min + 1)

return new ICmpInst(ICmpInst::ICMP_EQ, Op0,

ConstantInt::get(Op1->getType(), *CmpC - 1));

This is risky and can easily lead to infinite loops. I think we won't get one right now because KnownBits for ctpop is a bit weak (

llvm-project/llvm/lib/Analysis/ValueTracking.cpp

Line 1560 in d3505c2

case Intrinsic::ctpop: {

), but if it was later strengthened and e.g. inferred KnownBits ...0?1 then we could run into an infinite compile loop at that point.

Can we canonicalize ctpop(X) <u 2 into ctpop(X) == 1 in InstCombine and do inverse transform in CodeGenPrepare?

nikic · 2023-09-26T08:35:05Z

In theory yes, but in practice CGP drops assumes at the very start, and I don't want to have a special ctpop fold run before that. I'd only take that if we stop dropping assumes in CGP (which is on the long-term roadmap, but tricky).

dtcxzyw requested review from nikic, goldsteinn and spatel-gh September 24, 2023 15:25

llvmbot added the llvm:transforms label Sep 24, 2023

nikic requested changes Sep 24, 2023

View reviewed changes

dtcxzyw force-pushed the fold-is-pow2-nonzero branch from 511e83e to 0b124db Compare September 24, 2023 17:28

dtcxzyw changed the title ~~[InstCombine] Fold ctpop(X) <u 2 into ctpop(X) == 1 if X is non-zero~~ [InstCombine] Fold ctpop(X) == 1 into ctpop(X) <u 2 if X is non-zero Sep 24, 2023

nikic reviewed Sep 24, 2023

View reviewed changes

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Outdated Show resolved Hide resolved

dtcxzyw changed the title ~~[InstCombine] Fold ctpop(X) == 1 into ctpop(X) <u 2 if X is non-zero~~ [InstCombine] Fold ctpop(X) eq/ne 1 if X is non-zero Sep 24, 2023

dtcxzyw added 3 commits September 25, 2023 01:50

[InstCombine] Simplify pattern isPow2OrZero if X is non-zero

66f4509

[InstCombine] Fold pattern ctpop(X) == 1 into ctpop(X) <u 2 if X …

4a055bf

…is non-zero

[InstCombine] Fold pattern ctpop(X) != 1 into ctpop(X) >u 1 if X …

772cecc

…is non-zero

dtcxzyw force-pushed the fold-is-pow2-nonzero branch from 0b124db to 772cecc Compare September 24, 2023 18:09

dtcxzyw requested a review from nikic September 24, 2023 18:11

nikic reviewed Sep 25, 2023

View reviewed changes

dtcxzyw closed this Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Fold `ctpop(X) eq/ne 1` if X is non-zero #67268

[InstCombine] Fold `ctpop(X) eq/ne 1` if X is non-zero #67268

dtcxzyw commented Sep 24, 2023 •

edited

llvmbot commented Sep 24, 2023

nikic left a comment

goldsteinn commented Sep 24, 2023

nikic left a comment

dtcxzyw commented Sep 26, 2023

nikic commented Sep 26, 2023

	// A <u C -> A == C-1 if min(A)+1 == C
	if (*CmpC == Op0Min + 1)
	return new ICmpInst(ICmpInst::ICMP_EQ, Op0,
	ConstantInt::get(Op1->getType(), *CmpC - 1));

[InstCombine] Fold ctpop(X) eq/ne 1 if X is non-zero #67268

[InstCombine] Fold ctpop(X) eq/ne 1 if X is non-zero #67268

Conversation

dtcxzyw commented Sep 24, 2023 • edited

llvmbot commented Sep 24, 2023

nikic left a comment

Choose a reason for hiding this comment

goldsteinn commented Sep 24, 2023

nikic left a comment

Choose a reason for hiding this comment

dtcxzyw commented Sep 26, 2023

nikic commented Sep 26, 2023

[InstCombine] Fold `ctpop(X) eq/ne 1` if X is non-zero #67268

[InstCombine] Fold `ctpop(X) eq/ne 1` if X is non-zero #67268

dtcxzyw commented Sep 24, 2023 •

edited