[InstCombine] limit icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1 to hasOneUse #74318

ChunyuLiao · 2023-12-04T13:30:20Z

Testcase ir: https://gcc.godbolt.org/z/oKTPE7v48

  %cmp11 = icmp sgt i32 %n, 0
  br i1 %cmp11, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %mul = shl i32 %n, 4
  %smax = tail call i32 @llvm.smax.i32(i32 %mul, i32 1)
  %wide.trip.count = zext nneg i32 %smax to i64
  br label %for.body

If add the limitations, @llvm.smax shouldn't be needed.

  %mul = shl nsw i32 %n, 4
  %cmp11 = icmp sgt i32 %mul, 0
  br i1 %cmp11, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %smax = tail call i32 @llvm.smax.i32(i32 %mul, i32 1) --- this shouldn't be needed

If we need icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1， perhaps we can implement this at DAG combine.

llvmbot · 2023-12-04T13:30:49Z

@llvm/pr-subscribers-llvm-transforms

Author: Liao Chunyu (ChunyuLiao)

Changes

Fix: #30121

Testcase ir: https://gcc.godbolt.org/z/oKTPE7v48

  %cmp11 = icmp sgt i32 %n, 0
  br i1 %cmp11, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %mul = shl i32 %n, 4
  %smax = tail call i32 @<!-- -->llvm.smax.i32(i32 %mul, i32 1)
  %wide.trip.count = zext nneg i32 %smax to i64
  br label %for.body

If add the limitations, @llvm.smax shouldn't be needed.

  %mul = shl nsw i32 %n, 4
  %cmp11 = icmp sgt i32 %mul, 0
  br i1 %cmp11, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %smax = tail call i32 @<!-- -->llvm.smax.i32(i32 %mul, i32 1) --- this shouldn't be needed

If we need icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1， perhaps we can implement this at DAG combine.

Full diff: https://github.com/llvm/llvm-project/pull/74318.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+3-2)
(added) llvm/test/Transforms/IndVarSimplify/pr30121.ll (+53)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 1d09d9b44a9e5..5dafd0dcd5ef5 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -2231,7 +2231,8 @@ Instruction *InstCombinerImpl::foldICmpShlConstant(ICmpInst &Cmp,
   //
   // NB: sge/sle with a constant will canonicalize to sgt/slt.
   if (Shl->hasNoSignedWrap() &&
-      (Pred == ICmpInst::ICMP_SGT || Pred == ICmpInst::ICMP_SLT))
+      ((Pred == ICmpInst::ICMP_SGT && Shl->hasOneUse()) ||
+       Pred == ICmpInst::ICMP_SLT))
     if (C.isZero() || (Pred == ICmpInst::ICMP_SGT ? C.isAllOnes() : C.isOne()))
       return new ICmpInst(Pred, Shl->getOperand(0), Cmp.getOperand(1));
 
@@ -2251,7 +2252,7 @@ Instruction *InstCombinerImpl::foldICmpShlConstant(ICmpInst &Cmp,
   // NSW guarantees that we are only shifting out sign bits from the high bits,
   // so we can ASHR the compare constant without needing a mask and eliminate
   // the shift.
-  if (Shl->hasNoSignedWrap()) {
+  if (Shl->hasNoSignedWrap() && Shl->hasOneUse()) {
     if (Pred == ICmpInst::ICMP_SGT) {
       // icmp Pred (shl nsw X, ShiftAmt), C --> icmp Pred X, (C >>s ShiftAmt)
       APInt ShiftedC = C.ashr(*ShiftAmt);
diff --git a/llvm/test/Transforms/IndVarSimplify/pr30121.ll b/llvm/test/Transforms/IndVarSimplify/pr30121.ll
new file mode 100644
index 0000000000000..c69971d46e98f
--- /dev/null
+++ b/llvm/test/Transforms/IndVarSimplify/pr30121.ll
@@ -0,0 +1,53 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -passes='instcombine,indvars'  < %s | FileCheck %s
+
+target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
+
+define void @g(ptr nocapture noundef %a, i32 noundef signext %n) {
+; CHECK-LABEL: @g(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[MUL:%.*]] = shl nsw i32 [[N:%.*]], 4
+; CHECK-NEXT:    [[CMP11:%.*]] = icmp sgt i32 [[MUL]], 0
+; CHECK-NEXT:    br i1 [[CMP11]], label [[FOR_BODY_PREHEADER:%.*]], label [[FOR_COND_CLEANUP:%.*]]
+; CHECK:       for.body.preheader:
+; CHECK-NEXT:    [[WIDE_TRIP_COUNT:%.*]] = zext i32 [[MUL]] to i64
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.cond.cleanup.loopexit:
+; CHECK-NEXT:    br label [[FOR_COND_CLEANUP]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret void
+; CHECK:       for.body:
+; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A:%.*]], i64 [[INDVARS_IV]]
+; CHECK-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[ADD:%.*]] = shl nsw i32 [[TMP0]], 1
+; CHECK-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], [[WIDE_TRIP_COUNT]]
+; CHECK-NEXT:    br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP_LOOPEXIT:%.*]]
+;
+entry:
+  %mul = shl nsw i32 %n, 4
+  %cmp11 = icmp sgt i32 %mul, 0
+  br i1 %cmp11, label %for.body.preheader, label %for.cond.cleanup
+
+for.body.preheader:                               ; preds = %entry
+  br label %for.body
+
+for.cond.cleanup.loopexit:                        ; preds = %for.body
+  br label %for.cond.cleanup
+
+for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
+  ret void
+
+for.body:                                         ; preds = %for.body.preheader, %for.body
+  %i.012 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
+  %idxprom = zext nneg i32 %i.012 to i64
+  %arrayidx = getelementptr inbounds i32, ptr %a, i64 %idxprom
+  %0 = load i32, ptr %arrayidx, align 4
+  %add = shl nsw i32 %0, 1
+  store i32 %add, ptr %arrayidx, align 4
+  %inc = add nuw nsw i32 %i.012, 1
+  %cmp = icmp slt i32 %inc, %mul
+  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
+}

nikic · 2023-12-04T13:43:32Z

I don't think this is the right way to fix this issue. If I'm reading the transformation log right, the problem seems to be that we lose nsw information in IndVarSimplify?

ChunyuLiao · 2023-12-06T13:35:07Z

I don't think this is the right way to fix this issue. If I'm reading the transformation log right, the problem seems to be that we lose nsw information in IndVarSimplify?

Could you help give more information? Thanks.
My understanding is that shl's nsw should be retained in IndVarSimplify.

Violent do try. Keep shl's nsw, but it doesn't fix the issue.

diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 8c29c242215d..b43feb621ad6 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -7309,7 +7309,7 @@ static bool programUndefinedIfUndefOrPoison(const Value *V,
 
       // If an operand is poison and propagates it, mark I as yielding poison.
       for (const Use &Op : I.operands()) {
-        if (YieldsPoison.count(Op) && propagatesPoison(Op)) {
+        if (propagatesPoison(Op)) {
           YieldsPoison.insert(&I);
           break;
         }

ChunyuLiao · 2024-02-29T02:52:20Z

I guess I found where SCEV takes effect against this patch in:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/ScalarEvolution.cpp#L11872

std::optional<APInt>
ScalarEvolution::computeConstantDifference(const SCEV *More, const SCEV *Less) {
  // We avoid subtracting expressions here because this function is usually
  // fairly deep in the call stack (i.e. is called many times).

  // X - X = 0.
  if (More == Less)
    return APInt(getTypeSizeInBits(More->getType()), 0);

There are some optimizations for SCEVAddRecExpr and SCEVAddExpr in computeConstantDifference, but in this example it's: SCEVType = llvm::scUnknown, so it can't be handled

llvm/test/Transforms/IndVarSimplify/pr30121.ll

…> C1 to hasOneUse Fix: llvm#30121 Testcase ir: https://gcc.godbolt.org/z/oKTPE7v48 ``` %cmp11 = icmp sgt i32 %n, 0 br i1 %cmp11, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %mul = shl i32 %n, 4 %smax = tail call i32 @llvm.smax.i32(i32 %mul, i32 1) %wide.trip.count = zext nneg i32 %smax to i64 br label %for.body ``` If add the limitations, @llvm.smax shouldn't be needed. ``` %mul = shl nsw i32 %n, 4 %cmp11 = icmp sgt i32 %mul, 0 br i1 %cmp11, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry %smax = tail call i32 @llvm.smax.i32(i32 %mul, i32 1) --- this shouldn't be needed ``` If we need icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1， perhaps we can implement this at DAG combine.

PR Link: llvm/llvm-project#74318

ChunyuLiao requested a review from spatel-gh December 4, 2023 13:30

ChunyuLiao requested a review from nikic as a code owner December 4, 2023 13:30

llvmbot added the llvm:transforms label Dec 4, 2023

dtcxzyw requested a review from goldsteinn December 4, 2023 13:32

goldsteinn requested a review from dtcxzyw February 29, 2024 18:20

goldsteinn reviewed Feb 29, 2024

View reviewed changes

llvm/test/Transforms/IndVarSimplify/pr30121.ll Show resolved Hide resolved

[IndVarSimplify]Add pre-commit tests for PR74318. NFC.

32d0a08

goldsteinn mentioned this pull request Mar 4, 2024

[IndVarSimplify]Add pre-commit tests for PR74318. NFC. #83770

Closed

ChunyuLiao force-pushed the fix-30121 branch from 2bdf1a8 to 2775678 Compare March 5, 2024 01:30

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Mar 22, 2024

pre-commit: test PR74318

c506e53

PR Link: llvm/llvm-project#74318

This was referenced Mar 22, 2024

pre-commit: test PR74318 dtcxzyw/llvm-opt-benchmark#444

Open

Test PR74318 dtcxzyw/llvm-ci#1119

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] limit icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1 to hasOneUse #74318

[InstCombine] limit icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1 to hasOneUse #74318

ChunyuLiao commented Dec 4, 2023

llvmbot commented Dec 4, 2023

nikic commented Dec 4, 2023

ChunyuLiao commented Dec 6, 2023

ChunyuLiao commented Feb 29, 2024

[InstCombine] limit icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1 to hasOneUse #74318

Are you sure you want to change the base?

[InstCombine] limit icmp sgt (shl nsw X, C1), C0 --> icmp sgt X, C0 >> C1 to hasOneUse #74318

Conversation

ChunyuLiao commented Dec 4, 2023

llvmbot commented Dec 4, 2023

nikic commented Dec 4, 2023

ChunyuLiao commented Dec 6, 2023

ChunyuLiao commented Feb 29, 2024