[InstCombine] Canonicalize `icmp pred (X +/- C1), C2` into `icmp pred X, C2 -/+ C1` with nowrap flag implied by with.overflow intrinsic #75511

dtcxzyw · 2023-12-14T18:15:46Z

This patch tries to canonicalize the pattern Overflow | icmp pred Res, C2 into Overflow | icmp pred X, C2 +/- C1, where Overflow and Res are return values of xxx.with.overflow X, C1.
Alive2: https://alive2.llvm.org/ce/z/PhR_3S

Fixes #75360.

llvmbot · 2023-12-14T18:16:15Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Yingwei Zheng (dtcxzyw)

Changes

This patch tries to simplify the pattern Overflow | icmp pred Res, C into Overflow or true, where Overflow and Res are return values of a call to with.overflow intrinsic.
Alive2: https://alive2.llvm.org/ce/z/4-LEV2

Fixes #75360.

Full diff: https://github.com/llvm/llvm-project/pull/75511.diff

4 Files Affected:

(modified) llvm/include/llvm/Analysis/ValueTracking.h (+5)
(modified) llvm/lib/Analysis/InstructionSimplify.cpp (+35)
(modified) llvm/lib/Analysis/ValueTracking.cpp (+4-4)
(modified) llvm/test/Transforms/InstCombine/sadd-with-overflow.ll (+78)

diff --git a/llvm/include/llvm/Analysis/ValueTracking.h b/llvm/include/llvm/Analysis/ValueTracking.h
index a3186e61b94adf..baa16306ebf5df 100644
--- a/llvm/include/llvm/Analysis/ValueTracking.h
+++ b/llvm/include/llvm/Analysis/ValueTracking.h
@@ -863,6 +863,11 @@ ConstantRange computeConstantRange(const Value *V, bool ForSigned,
                                    const DominatorTree *DT = nullptr,
                                    unsigned Depth = 0);
 
+/// Combine constant ranges from computeConstantRange() and computeKnownBits().
+ConstantRange
+computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
+                                       bool ForSigned, const SimplifyQuery &SQ);
+
 /// Return true if this function can prove that the instruction I will
 /// always transfer execution to one of its successors (including the next
 /// instruction that follows within a basic block). E.g. this is not
diff --git a/llvm/lib/Analysis/InstructionSimplify.cpp b/llvm/lib/Analysis/InstructionSimplify.cpp
index 2a45acf63aa2ca..fa2e42a4c22e60 100644
--- a/llvm/lib/Analysis/InstructionSimplify.cpp
+++ b/llvm/lib/Analysis/InstructionSimplify.cpp
@@ -2313,6 +2313,36 @@ static Value *simplifyOrLogic(Value *X, Value *Y) {
   return nullptr;
 }
 
+/// Res, Overflow = xxx_with_overflow X, Y
+/// Try to simplify the pattern "Overflow | icmp pred Res, C".
+static Value *simplifyOrOfICmpAndWithOverflow(Value *Op0, Value *Op1,
+                                              const SimplifyQuery &SQ) {
+  const WithOverflowInst *WO;
+  const APInt *C;
+  ICmpInst::Predicate Pred;
+  if (!match(Op0, m_ExtractValue<1>(m_WithOverflowInst(WO))) ||
+      !match(Op1, m_ICmp(Pred, m_ExtractValue<0>(m_Specific(WO)), m_APInt(C))))
+    return nullptr;
+
+  // See if we can infer the result of icmp from the nowrap flag.
+  const auto LHS_CR = llvm::computeConstantRangeIncludingKnownBits(
+      WO->getLHS(), ICmpInst::isSigned(Pred), SQ);
+  const auto RHS_CR = llvm::computeConstantRangeIncludingKnownBits(
+      WO->getRHS(), ICmpInst::isSigned(Pred), SQ);
+  const auto DomCR = LHS_CR.overflowingBinaryOp(WO->getBinaryOp(), RHS_CR,
+                                                WO->getNoWrapKind());
+  const auto CR = llvm::ConstantRange::makeExactICmpRegion(Pred, *C);
+
+  ConstantRange Intersection = DomCR.intersectWith(CR);
+  ConstantRange Difference = DomCR.difference(CR);
+  if (Intersection.isEmptySet())
+    return Op0;
+  if (Difference.isEmptySet())
+    return ConstantInt::getTrue(Op0->getType());
+
+  return nullptr;
+}
+
 /// Given operands for an Or, see if we can fold the result.
 /// If not, this returns null.
 static Value *simplifyOrInst(Value *Op0, Value *Op1, const SimplifyQuery &Q,
@@ -2480,6 +2510,11 @@ static Value *simplifyOrInst(Value *Op0, Value *Op1, const SimplifyQuery &Q,
       if (*Implied == true)
         return ConstantInt::getTrue(Op1->getType());
     }
+
+    if (auto *V = simplifyOrOfICmpAndWithOverflow(Op0, Op1, Q))
+      return V;
+    if (auto *V = simplifyOrOfICmpAndWithOverflow(Op1, Op0, Q))
+      return V;
   }
 
   if (Value *V = simplifyByDomEq(Instruction::Or, Op0, Op1, Q, MaxRecurse))
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 5445746ab2a1bc..e5469a6d659090 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -6228,10 +6228,10 @@ static OverflowResult mapOverflowResult(ConstantRange::OverflowResult OR) {
 }
 
 /// Combine constant ranges from computeConstantRange() and computeKnownBits().
-static ConstantRange
-computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
-                                       bool ForSigned,
-                                       const SimplifyQuery &SQ) {
+ConstantRange
+llvm::computeConstantRangeIncludingKnownBits(const WithCache<const Value *> &V,
+                                             bool ForSigned,
+                                             const SimplifyQuery &SQ) {
   ConstantRange CR1 =
       ConstantRange::fromKnownBits(V.getKnownBits(SQ), ForSigned);
   ConstantRange CR2 = computeConstantRange(V, ForSigned, SQ.IIQ.UseInstrInfo);
diff --git a/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll b/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
index 4b37ccbe3370b6..a784028ec3a245 100644
--- a/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
+++ b/llvm/test/Transforms/InstCombine/sadd-with-overflow.ll
@@ -122,3 +122,81 @@ define { i32, i1 } @fold_sub_simple(i32 %x) {
   %b = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 30)
   ret { i32, i1 } %b
 }
+
+; Tests from PR75360
+define i1 @ckd_add_unsigned(i31 %num) {
+; CHECK-LABEL: @ckd_add_unsigned(
+; CHECK-NEXT:    [[A2:%.*]] = icmp eq i31 [[NUM:%.*]], -1
+; CHECK-NEXT:    ret i1 [[A2]]
+;
+  %a0 = zext i31 %num to i32
+  %a1 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a0, i32 1)
+  %a2 = extractvalue { i32, i1 } %a1, 1
+  %a3 = extractvalue { i32, i1 } %a1, 0
+  %a4 = icmp slt i32 %a3, 0
+  %a5 = or i1 %a2, %a4
+  ret i1 %a5
+}
+
+define i1 @ckd_add_unsigned_commuted(i31 %num) {
+; CHECK-LABEL: @ckd_add_unsigned_commuted(
+; CHECK-NEXT:    [[A2:%.*]] = icmp eq i31 [[NUM:%.*]], -1
+; CHECK-NEXT:    ret i1 [[A2]]
+;
+  %a0 = zext i31 %num to i32
+  %a1 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a0, i32 1)
+  %a2 = extractvalue { i32, i1 } %a1, 1
+  %a3 = extractvalue { i32, i1 } %a1, 0
+  %a4 = icmp slt i32 %a3, 0
+  %a5 = or i1 %a4, %a2
+  ret i1 %a5
+}
+
+define i1 @ckd_add_unsigned_imply_true(i31 %num) {
+; CHECK-LABEL: @ckd_add_unsigned_imply_true(
+; CHECK-NEXT:    ret i1 true
+;
+  %a0 = zext i31 %num to i32
+  %a1 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a0, i32 1)
+  %a2 = extractvalue { i32, i1 } %a1, 1
+  %a3 = extractvalue { i32, i1 } %a1, 0
+  %a4 = icmp sgt i32 %a3, -1
+  %a5 = or i1 %a2, %a4
+  ret i1 %a5
+}
+
+define i1 @ckd_add_unsigned_fail1(i32 %a0) {
+; CHECK-LABEL: @ckd_add_unsigned_fail1(
+; CHECK-NEXT:    [[A1:%.*]] = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 [[A0:%.*]], i32 1)
+; CHECK-NEXT:    [[A2:%.*]] = extractvalue { i32, i1 } [[A1]], 1
+; CHECK-NEXT:    [[A3:%.*]] = extractvalue { i32, i1 } [[A1]], 0
+; CHECK-NEXT:    [[A4:%.*]] = icmp slt i32 [[A3]], 0
+; CHECK-NEXT:    [[A5:%.*]] = or i1 [[A2]], [[A4]]
+; CHECK-NEXT:    ret i1 [[A5]]
+;
+  %a1 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a0, i32 1)
+  %a2 = extractvalue { i32, i1 } %a1, 1
+  %a3 = extractvalue { i32, i1 } %a1, 0
+  %a4 = icmp slt i32 %a3, 0
+  %a5 = or i1 %a2, %a4
+  ret i1 %a5
+}
+
+define i1 @ckd_add_unsigned_fail2(i31 %num) {
+; CHECK-LABEL: @ckd_add_unsigned_fail2(
+; CHECK-NEXT:    [[A0:%.*]] = zext i31 [[NUM:%.*]] to i32
+; CHECK-NEXT:    [[A1:%.*]] = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 [[A0]], i32 1)
+; CHECK-NEXT:    [[A2:%.*]] = extractvalue { i32, i1 } [[A1]], 1
+; CHECK-NEXT:    [[A3:%.*]] = extractvalue { i32, i1 } [[A1]], 0
+; CHECK-NEXT:    [[A4:%.*]] = icmp slt i32 [[A3]], 2
+; CHECK-NEXT:    [[A5:%.*]] = or i1 [[A2]], [[A4]]
+; CHECK-NEXT:    ret i1 [[A5]]
+;
+  %a0 = zext i31 %num to i32
+  %a1 = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a0, i32 1)
+  %a2 = extractvalue { i32, i1 } %a1, 1
+  %a3 = extractvalue { i32, i1 } %a1, 0
+  %a4 = icmp slt i32 %a3, 2
+  %a5 = or i1 %a2, %a4
+  ret i1 %a5
+}

nikic · 2023-12-14T20:34:53Z

I'm not sure this is the right approach to the problem. The way I would view it is that the icmp here is of the form (x +nsw c) spred c2, where the nsw is implied by the or condition. This means we can transform the icmp into x spred (c2 - c). From there, it will enjoy all the usual icmp optimizations.

dtcxzyw · 2023-12-15T06:35:43Z

I'm not sure this is the right approach to the problem. The way I would view it is that the icmp here is of the form (x +nsw c) spred c2, where the nsw is implied by the or condition. This means we can transform the icmp into x spred (c2 - c). From there, it will enjoy all the usual icmp optimizations.

Alive2: https://alive2.llvm.org/ce/z/PA3SAK

… X, C2 -/+ C1` with nowrap flag implied by with.overflow intrinsic

dtcxzyw · 2023-12-15T08:02:36Z

@sadlerap Do you think it is helpful to handle the pattern u/ssub.with.overflow C, X?

nikic

LGTM. Can you please add a test with commuted or operands?

dtcxzyw · 2023-12-15T09:09:27Z

LGTM. Can you please add a test with commuted or operands?

I have added a test @ckd_add_unsigned_commuted.

sadlerap · 2023-12-15T22:01:45Z

@dtcxzyw I'm not entirely sure it's necessary. LLVM already optimizes the signed case, and it seems that underflowing subtraction is already optimized for unsigned integers. As far as I can tell, InstCombine was already handling underflowing subtraction - it looks like overflowing addition wasn't getting optimized.

dtcxzyw requested a review from goldsteinn December 14, 2023 18:15

dtcxzyw requested a review from nikic as a code owner December 14, 2023 18:15

llvmbot added llvm:analysis llvm:transforms labels Dec 14, 2023

dtcxzyw added 2 commits December 15, 2023 15:41

[InstCombine] Add pre-commit tests from PR75360. NFC.

4ac4033

[InstCombine] Canonicalize icmp pred (X +/- C1), C2 into `icmp pred…

7250691

… X, C2 -/+ C1` with nowrap flag implied by with.overflow intrinsic

dtcxzyw force-pushed the simp-ckd-add-zext branch from abb1119 to 7250691 Compare December 15, 2023 07:54

dtcxzyw changed the title ~~[InstSimplify] Infer icmp from with.overflow intrinsics~~ [InstCombine] Canonicalize icmp pred (X +/- C1), C2 into icmp pred X, C2 -/+ C1 with nowrap flag implied by with.overflow intrinsic Dec 15, 2023

nikic approved these changes Dec 15, 2023

View reviewed changes

dtcxzyw merged commit c59ea32 into llvm:main Dec 16, 2023
4 checks passed

dtcxzyw deleted the simp-ckd-add-zext branch December 16, 2023 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Canonicalize `icmp pred (X +/- C1), C2` into `icmp pred X, C2 -/+ C1` with nowrap flag implied by with.overflow intrinsic #75511

[InstCombine] Canonicalize `icmp pred (X +/- C1), C2` into `icmp pred X, C2 -/+ C1` with nowrap flag implied by with.overflow intrinsic #75511

dtcxzyw commented Dec 14, 2023 •

edited

llvmbot commented Dec 14, 2023 •

edited

nikic commented Dec 14, 2023 •

edited

dtcxzyw commented Dec 15, 2023

dtcxzyw commented Dec 15, 2023

nikic left a comment

dtcxzyw commented Dec 15, 2023

sadlerap commented Dec 15, 2023 •

edited

[InstCombine] Canonicalize icmp pred (X +/- C1), C2 into icmp pred X, C2 -/+ C1 with nowrap flag implied by with.overflow intrinsic #75511

[InstCombine] Canonicalize icmp pred (X +/- C1), C2 into icmp pred X, C2 -/+ C1 with nowrap flag implied by with.overflow intrinsic #75511

Conversation

dtcxzyw commented Dec 14, 2023 • edited

llvmbot commented Dec 14, 2023 • edited

nikic commented Dec 14, 2023 • edited

dtcxzyw commented Dec 15, 2023

dtcxzyw commented Dec 15, 2023

nikic left a comment

Choose a reason for hiding this comment

dtcxzyw commented Dec 15, 2023

sadlerap commented Dec 15, 2023 • edited

[InstCombine] Canonicalize `icmp pred (X +/- C1), C2` into `icmp pred X, C2 -/+ C1` with nowrap flag implied by with.overflow intrinsic #75511

[InstCombine] Canonicalize `icmp pred (X +/- C1), C2` into `icmp pred X, C2 -/+ C1` with nowrap flag implied by with.overflow intrinsic #75511

dtcxzyw commented Dec 14, 2023 •

edited

llvmbot commented Dec 14, 2023 •

edited

nikic commented Dec 14, 2023 •

edited

sadlerap commented Dec 15, 2023 •

edited