Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ValueTracking] Extend LHS/RHS with matching operand to work without constants. #85557

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

goldsteinn
Copy link
Contributor

Previously we only handled the L0 == R0 case if both L1 and R1
where constant.

We can get more out of the analysis using general constant ranges
instead.

For example, X u> Y implies X != 0.

In general, any strict comparison on X implies that X is not equal
to the boundary value for the sign and constant ranges with/without
sign bits can be useful in deducing implications.

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 16, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (goldsteinn)

Changes

Previously we only handled the L0 == R0 case if both L1 and R1
where constant.

We can get more out of the analysis using general constant ranges
instead.

For example, X u> Y implies X != 0.

In general, any strict comparison on X implies that X is not equal
to the boundary value for the sign and constant ranges with/without
sign bits can be useful in deducing implications.


Full diff: https://github.com/llvm/llvm-project/pull/85557.diff

7 Files Affected:

  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+44-17)
  • (modified) llvm/test/Transforms/InstCombine/assume.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/range-check.ll (+2-10)
  • (modified) llvm/test/Transforms/InstCombine/select.ll (+2-4)
  • (modified) llvm/test/Transforms/InstCombine/shift.ll (+3-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr23997.ll (+3-4)
  • (modified) llvm/test/Transforms/NewGVN/pr35125.ll (+3-6)
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index edbeede910d7f7..b2846274a36e73 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -8499,20 +8499,20 @@ isImpliedCondMatchingOperands(CmpInst::Predicate LPred,
   return std::nullopt;
 }
 
-/// Return true if "icmp LPred X, LC" implies "icmp RPred X, RC" is true.
-/// Return false if "icmp LPred X, LC" implies "icmp RPred X, RC" is false.
+/// Return true if "icmp LPred X, LCR" implies "icmp RPred X, RCR" is true.
+/// Return false if "icmp LPred X, LCR" implies "icmp RPred X, RCR" is false.
 /// Otherwise, return std::nullopt if we can't infer anything.
-static std::optional<bool> isImpliedCondCommonOperandWithConstants(
-    CmpInst::Predicate LPred, const APInt &LC, CmpInst::Predicate RPred,
-    const APInt &RC) {
-  ConstantRange DomCR = ConstantRange::makeExactICmpRegion(LPred, LC);
-  ConstantRange CR = ConstantRange::makeExactICmpRegion(RPred, RC);
-  ConstantRange Intersection = DomCR.intersectWith(CR);
-  ConstantRange Difference = DomCR.difference(CR);
-  if (Intersection.isEmptySet())
-    return false;
-  if (Difference.isEmptySet())
+static std::optional<bool>
+isImpliedCondCommonOperandWithCR(CmpInst::Predicate LPred, ConstantRange LCR,
+                                 CmpInst::Predicate RPred, ConstantRange RCR) {
+  ConstantRange DomCR = ConstantRange::makeAllowedICmpRegion(LPred, LCR);
+  // If all true values for lhs and true for rhs, lhs implies rhs
+  if (DomCR.icmp(RPred, RCR))
     return true;
+
+  // If there is no overlap, lhs implies not rhs
+  if (DomCR.icmp(CmpInst::getInversePredicate(RPred), RCR))
+    return false;
   return std::nullopt;
 }
 
@@ -8532,11 +8532,38 @@ static std::optional<bool> isImpliedCondICmps(const ICmpInst *LHS,
   CmpInst::Predicate LPred =
       LHSIsTrue ? LHS->getPredicate() : LHS->getInversePredicate();
 
-  // Can we infer anything when the 0-operands match and the 1-operands are
-  // constants (not necessarily matching)?
-  const APInt *LC, *RC;
-  if (L0 == R0 && match(L1, m_APInt(LC)) && match(R1, m_APInt(RC)))
-    return isImpliedCondCommonOperandWithConstants(LPred, *LC, RPred, *RC);
+  if (L0 == R1) {
+    std::swap(R0, R1);
+    RPred = ICmpInst::getSwappedPredicate(RPred);
+  }
+  if (L1 == R0) {
+    std::swap(L0, L1);
+    LPred = ICmpInst::getSwappedPredicate(LPred);
+  }
+
+  // See if we can infer anything if operand-0 matches and we have at least one
+  // constant operand-1.
+  if (L0 == R0 && L0->getType()->isIntOrIntVectorTy()) {
+    // Potential TODO: We could also further use the constant range of L0/R0 to
+    // further constraint the constant ranges. At the moment this leads to
+    // several regressions related to not transforming `multi_use(A + C0) eq/ne
+    // C1` (see discussion: D58633).
+    ConstantRange LCR = computeConstantRange(
+        L1, ICmpInst::isSigned(LPred), /* UseInstrInfo=*/true, /*AC=*/nullptr,
+        /*CxtI=*/nullptr, /*DT=*/nullptr, Depth);
+    ConstantRange RCR = computeConstantRange(
+        R1, ICmpInst::isSigned(RPred), /* UseInstrInfo=*/true, /*AC=*/nullptr,
+        /*CxtI=*/nullptr, /*DT=*/nullptr, Depth);
+    // Even if L1/R1 are not both constant, we can still sometimes deduce
+    // relationship from a single constant. For example X u> Y implies X != 0.
+    if (auto R = isImpliedCondCommonOperandWithCR(LPred, LCR, RPred, RCR))
+      return R;
+    // If both L1/R1 where exact constant ranges and we didn't get anything
+    // here, we won't be able to deduce this.
+    const APInt *Unused;
+    if (match(L1, m_APInt(Unused)) && match(R1, m_APInt(Unused)))
+      return std::nullopt;
+  }
 
   // Can we infer anything when the two compares have matching operands?
   bool AreSwappedOps;
diff --git a/llvm/test/Transforms/InstCombine/assume.ll b/llvm/test/Transforms/InstCombine/assume.ll
index 927f0a86b0a252..87c75fb2b55592 100644
--- a/llvm/test/Transforms/InstCombine/assume.ll
+++ b/llvm/test/Transforms/InstCombine/assume.ll
@@ -386,7 +386,7 @@ define i1 @nonnull5(ptr %a) {
 define i32 @assumption_conflicts_with_known_bits(i32 %a, i32 %b) {
 ; CHECK-LABEL: @assumption_conflicts_with_known_bits(
 ; CHECK-NEXT:    store i1 true, ptr poison, align 1
-; CHECK-NEXT:    ret i32 1
+; CHECK-NEXT:    ret i32 poison
 ;
   %and1 = and i32 %b, 3
   %B1 = lshr i32 %and1, %and1
diff --git a/llvm/test/Transforms/InstCombine/range-check.ll b/llvm/test/Transforms/InstCombine/range-check.ll
index 0d138b6ba7e79d..210e57c1d1fe4c 100644
--- a/llvm/test/Transforms/InstCombine/range-check.ll
+++ b/llvm/test/Transforms/InstCombine/range-check.ll
@@ -340,11 +340,7 @@ define i1 @negative4_logical(i32 %x, i32 %n) {
 
 define i1 @negative5(i32 %x, i32 %n) {
 ; CHECK-LABEL: @negative5(
-; CHECK-NEXT:    [[NN:%.*]] = and i32 [[N:%.*]], 2147483647
-; CHECK-NEXT:    [[A:%.*]] = icmp sgt i32 [[NN]], [[X:%.*]]
-; CHECK-NEXT:    [[B:%.*]] = icmp sgt i32 [[X]], -1
-; CHECK-NEXT:    [[C:%.*]] = or i1 [[A]], [[B]]
-; CHECK-NEXT:    ret i1 [[C]]
+; CHECK-NEXT:    ret i1 true
 ;
   %nn = and i32 %n, 2147483647
   %a = icmp slt i32 %x, %nn
@@ -355,11 +351,7 @@ define i1 @negative5(i32 %x, i32 %n) {
 
 define i1 @negative5_logical(i32 %x, i32 %n) {
 ; CHECK-LABEL: @negative5_logical(
-; CHECK-NEXT:    [[NN:%.*]] = and i32 [[N:%.*]], 2147483647
-; CHECK-NEXT:    [[A:%.*]] = icmp sgt i32 [[NN]], [[X:%.*]]
-; CHECK-NEXT:    [[B:%.*]] = icmp sgt i32 [[X]], -1
-; CHECK-NEXT:    [[C:%.*]] = or i1 [[A]], [[B]]
-; CHECK-NEXT:    ret i1 [[C]]
+; CHECK-NEXT:    ret i1 true
 ;
   %nn = and i32 %n, 2147483647
   %a = icmp slt i32 %x, %nn
diff --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll
index a84904106eced4..d9734242a86891 100644
--- a/llvm/test/Transforms/InstCombine/select.ll
+++ b/llvm/test/Transforms/InstCombine/select.ll
@@ -2925,10 +2925,8 @@ define i8 @select_replacement_loop3(i32 noundef %x) {
 
 define i16 @select_replacement_loop4(i16 noundef %p_12) {
 ; CHECK-LABEL: @select_replacement_loop4(
-; CHECK-NEXT:    [[CMP1:%.*]] = icmp ult i16 [[P_12:%.*]], 2
-; CHECK-NEXT:    [[AND1:%.*]] = and i16 [[P_12]], 1
-; CHECK-NEXT:    [[AND2:%.*]] = select i1 [[CMP1]], i16 [[AND1]], i16 0
-; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i16 [[AND2]], [[P_12]]
+; CHECK-NEXT:    [[AND1:%.*]] = and i16 [[P_12:%.*]], 1
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp ult i16 [[P_12]], 2
 ; CHECK-NEXT:    [[AND3:%.*]] = select i1 [[CMP2]], i16 [[AND1]], i16 0
 ; CHECK-NEXT:    ret i16 [[AND3]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/shift.ll b/llvm/test/Transforms/InstCombine/shift.ll
index 62f32c28683711..bef7fc81a7d1f9 100644
--- a/llvm/test/Transforms/InstCombine/shift.ll
+++ b/llvm/test/Transforms/InstCombine/shift.ll
@@ -1751,12 +1751,11 @@ define void @ashr_out_of_range_1(ptr %A) {
 ; CHECK-NEXT:    [[L:%.*]] = load i177, ptr [[A:%.*]], align 4
 ; CHECK-NEXT:    [[L_FROZEN:%.*]] = freeze i177 [[L]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i177 [[L_FROZEN]], -1
-; CHECK-NEXT:    [[B:%.*]] = select i1 [[TMP1]], i177 0, i177 [[L_FROZEN]]
-; CHECK-NEXT:    [[TMP2:%.*]] = trunc i177 [[B]] to i64
+; CHECK-NEXT:    [[TMP6:%.*]] = trunc i177 [[L_FROZEN]] to i64
+; CHECK-NEXT:    [[TMP2:%.*]] = select i1 [[TMP1]], i64 0, i64 [[TMP6]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr i177, ptr [[A]], i64 [[TMP2]]
 ; CHECK-NEXT:    [[G11:%.*]] = getelementptr i8, ptr [[TMP3]], i64 -24
-; CHECK-NEXT:    [[C17:%.*]] = icmp sgt i177 [[B]], [[L_FROZEN]]
-; CHECK-NEXT:    [[TMP4:%.*]] = sext i1 [[C17]] to i64
+; CHECK-NEXT:    [[TMP4:%.*]] = sext i1 [[TMP1]] to i64
 ; CHECK-NEXT:    [[G62:%.*]] = getelementptr i177, ptr [[G11]], i64 [[TMP4]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i177 [[L_FROZEN]], -1
 ; CHECK-NEXT:    [[B28:%.*]] = select i1 [[TMP5]], i177 0, i177 [[L_FROZEN]]
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr23997.ll b/llvm/test/Transforms/LoopVectorize/X86/pr23997.ll
index 0b16d80a4adbc5..5aeac1101fe223 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/pr23997.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr23997.ll
@@ -12,8 +12,7 @@ define void @foo(ptr addrspace(1) align 8 dereferenceable_or_null(16), ptr addrs
 ; CHECK:       preheader:
 ; CHECK-NEXT:    [[DOT10:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP0:%.*]], i64 16
 ; CHECK-NEXT:    [[DOT12:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP1:%.*]], i64 16
-; CHECK-NEXT:    [[UMAX2:%.*]] = call i64 @llvm.umax.i64(i64 [[TMP2:%.*]], i64 1)
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 16
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2:%.*]], 16
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_MEMCHECK:%.*]]
 ; CHECK:       vector.memcheck:
 ; CHECK-NEXT:    [[TMP3:%.*]] = shl i64 [[TMP2]], 3
@@ -25,7 +24,7 @@ define void @foo(ptr addrspace(1) align 8 dereferenceable_or_null(16), ptr addrs
 ; CHECK-NEXT:    [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
 ; CHECK-NEXT:    br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[N_VEC:%.*]] = and i64 [[UMAX2]], -16
+; CHECK-NEXT:    [[N_VEC:%.*]] = and i64 [[TMP2]], -16
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -49,7 +48,7 @@ define void @foo(ptr addrspace(1) align 8 dereferenceable_or_null(16), ptr addrs
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[UMAX2]], [[N_VEC]]
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[TMP2]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ]
diff --git a/llvm/test/Transforms/NewGVN/pr35125.ll b/llvm/test/Transforms/NewGVN/pr35125.ll
index 9a96594e3446db..6724538a5a7f29 100644
--- a/llvm/test/Transforms/NewGVN/pr35125.ll
+++ b/llvm/test/Transforms/NewGVN/pr35125.ll
@@ -18,15 +18,12 @@ define i32 @main() #0 {
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp ult i32 [[STOREMERGE]], [[PHIOFOPS]]
 ; CHECK-NEXT:    br i1 [[CMP2]], label [[IF_THEN3:%.*]], label [[IF_END6:%.*]]
 ; CHECK:       if.then3:
-; CHECK-NEXT:    [[TOBOOL:%.*]] = icmp eq i32 [[STOREMERGE]], -1
-; CHECK-NEXT:    br i1 [[TOBOOL]], label [[LOR_RHS:%.*]], label [[LOR_END:%.*]]
+; CHECK-NEXT:    br i1 false, label [[LOR_RHS:%.*]], label [[LOR_END:%.*]]
 ; CHECK:       lor.rhs:
-; CHECK-NEXT:    [[TOBOOL5:%.*]] = icmp ne i32 [[TMP0]], 0
-; CHECK-NEXT:    [[PHITMP:%.*]] = zext i1 [[TOBOOL5]] to i32
+; CHECK-NEXT:    store i8 poison, ptr null, align 1
 ; CHECK-NEXT:    br label [[LOR_END]]
 ; CHECK:       lor.end:
-; CHECK-NEXT:    [[TMP1:%.*]] = phi i32 [ 1, [[IF_THEN3]] ], [ [[PHITMP]], [[LOR_RHS]] ]
-; CHECK-NEXT:    store i32 [[TMP1]], ptr @a, align 4
+; CHECK-NEXT:    store i32 1, ptr @a, align 4
 ; CHECK-NEXT:    br label [[IF_END6]]
 ; CHECK:       if.end6:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr @a, align 4

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 16, 2024

@llvm/pr-subscribers-llvm-analysis

Author: None (goldsteinn)

Changes

Previously we only handled the L0 == R0 case if both L1 and R1
where constant.

We can get more out of the analysis using general constant ranges
instead.

For example, X u&gt; Y implies X != 0.

In general, any strict comparison on X implies that X is not equal
to the boundary value for the sign and constant ranges with/without
sign bits can be useful in deducing implications.


Full diff: https://github.com/llvm/llvm-project/pull/85557.diff

7 Files Affected:

  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+44-17)
  • (modified) llvm/test/Transforms/InstCombine/assume.ll (+1-1)
  • (modified) llvm/test/Transforms/InstCombine/range-check.ll (+2-10)
  • (modified) llvm/test/Transforms/InstCombine/select.ll (+2-4)
  • (modified) llvm/test/Transforms/InstCombine/shift.ll (+3-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr23997.ll (+3-4)
  • (modified) llvm/test/Transforms/NewGVN/pr35125.ll (+3-6)
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index edbeede910d7f7..b2846274a36e73 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -8499,20 +8499,20 @@ isImpliedCondMatchingOperands(CmpInst::Predicate LPred,
   return std::nullopt;
 }
 
-/// Return true if "icmp LPred X, LC" implies "icmp RPred X, RC" is true.
-/// Return false if "icmp LPred X, LC" implies "icmp RPred X, RC" is false.
+/// Return true if "icmp LPred X, LCR" implies "icmp RPred X, RCR" is true.
+/// Return false if "icmp LPred X, LCR" implies "icmp RPred X, RCR" is false.
 /// Otherwise, return std::nullopt if we can't infer anything.
-static std::optional<bool> isImpliedCondCommonOperandWithConstants(
-    CmpInst::Predicate LPred, const APInt &LC, CmpInst::Predicate RPred,
-    const APInt &RC) {
-  ConstantRange DomCR = ConstantRange::makeExactICmpRegion(LPred, LC);
-  ConstantRange CR = ConstantRange::makeExactICmpRegion(RPred, RC);
-  ConstantRange Intersection = DomCR.intersectWith(CR);
-  ConstantRange Difference = DomCR.difference(CR);
-  if (Intersection.isEmptySet())
-    return false;
-  if (Difference.isEmptySet())
+static std::optional<bool>
+isImpliedCondCommonOperandWithCR(CmpInst::Predicate LPred, ConstantRange LCR,
+                                 CmpInst::Predicate RPred, ConstantRange RCR) {
+  ConstantRange DomCR = ConstantRange::makeAllowedICmpRegion(LPred, LCR);
+  // If all true values for lhs and true for rhs, lhs implies rhs
+  if (DomCR.icmp(RPred, RCR))
     return true;
+
+  // If there is no overlap, lhs implies not rhs
+  if (DomCR.icmp(CmpInst::getInversePredicate(RPred), RCR))
+    return false;
   return std::nullopt;
 }
 
@@ -8532,11 +8532,38 @@ static std::optional<bool> isImpliedCondICmps(const ICmpInst *LHS,
   CmpInst::Predicate LPred =
       LHSIsTrue ? LHS->getPredicate() : LHS->getInversePredicate();
 
-  // Can we infer anything when the 0-operands match and the 1-operands are
-  // constants (not necessarily matching)?
-  const APInt *LC, *RC;
-  if (L0 == R0 && match(L1, m_APInt(LC)) && match(R1, m_APInt(RC)))
-    return isImpliedCondCommonOperandWithConstants(LPred, *LC, RPred, *RC);
+  if (L0 == R1) {
+    std::swap(R0, R1);
+    RPred = ICmpInst::getSwappedPredicate(RPred);
+  }
+  if (L1 == R0) {
+    std::swap(L0, L1);
+    LPred = ICmpInst::getSwappedPredicate(LPred);
+  }
+
+  // See if we can infer anything if operand-0 matches and we have at least one
+  // constant operand-1.
+  if (L0 == R0 && L0->getType()->isIntOrIntVectorTy()) {
+    // Potential TODO: We could also further use the constant range of L0/R0 to
+    // further constraint the constant ranges. At the moment this leads to
+    // several regressions related to not transforming `multi_use(A + C0) eq/ne
+    // C1` (see discussion: D58633).
+    ConstantRange LCR = computeConstantRange(
+        L1, ICmpInst::isSigned(LPred), /* UseInstrInfo=*/true, /*AC=*/nullptr,
+        /*CxtI=*/nullptr, /*DT=*/nullptr, Depth);
+    ConstantRange RCR = computeConstantRange(
+        R1, ICmpInst::isSigned(RPred), /* UseInstrInfo=*/true, /*AC=*/nullptr,
+        /*CxtI=*/nullptr, /*DT=*/nullptr, Depth);
+    // Even if L1/R1 are not both constant, we can still sometimes deduce
+    // relationship from a single constant. For example X u> Y implies X != 0.
+    if (auto R = isImpliedCondCommonOperandWithCR(LPred, LCR, RPred, RCR))
+      return R;
+    // If both L1/R1 where exact constant ranges and we didn't get anything
+    // here, we won't be able to deduce this.
+    const APInt *Unused;
+    if (match(L1, m_APInt(Unused)) && match(R1, m_APInt(Unused)))
+      return std::nullopt;
+  }
 
   // Can we infer anything when the two compares have matching operands?
   bool AreSwappedOps;
diff --git a/llvm/test/Transforms/InstCombine/assume.ll b/llvm/test/Transforms/InstCombine/assume.ll
index 927f0a86b0a252..87c75fb2b55592 100644
--- a/llvm/test/Transforms/InstCombine/assume.ll
+++ b/llvm/test/Transforms/InstCombine/assume.ll
@@ -386,7 +386,7 @@ define i1 @nonnull5(ptr %a) {
 define i32 @assumption_conflicts_with_known_bits(i32 %a, i32 %b) {
 ; CHECK-LABEL: @assumption_conflicts_with_known_bits(
 ; CHECK-NEXT:    store i1 true, ptr poison, align 1
-; CHECK-NEXT:    ret i32 1
+; CHECK-NEXT:    ret i32 poison
 ;
   %and1 = and i32 %b, 3
   %B1 = lshr i32 %and1, %and1
diff --git a/llvm/test/Transforms/InstCombine/range-check.ll b/llvm/test/Transforms/InstCombine/range-check.ll
index 0d138b6ba7e79d..210e57c1d1fe4c 100644
--- a/llvm/test/Transforms/InstCombine/range-check.ll
+++ b/llvm/test/Transforms/InstCombine/range-check.ll
@@ -340,11 +340,7 @@ define i1 @negative4_logical(i32 %x, i32 %n) {
 
 define i1 @negative5(i32 %x, i32 %n) {
 ; CHECK-LABEL: @negative5(
-; CHECK-NEXT:    [[NN:%.*]] = and i32 [[N:%.*]], 2147483647
-; CHECK-NEXT:    [[A:%.*]] = icmp sgt i32 [[NN]], [[X:%.*]]
-; CHECK-NEXT:    [[B:%.*]] = icmp sgt i32 [[X]], -1
-; CHECK-NEXT:    [[C:%.*]] = or i1 [[A]], [[B]]
-; CHECK-NEXT:    ret i1 [[C]]
+; CHECK-NEXT:    ret i1 true
 ;
   %nn = and i32 %n, 2147483647
   %a = icmp slt i32 %x, %nn
@@ -355,11 +351,7 @@ define i1 @negative5(i32 %x, i32 %n) {
 
 define i1 @negative5_logical(i32 %x, i32 %n) {
 ; CHECK-LABEL: @negative5_logical(
-; CHECK-NEXT:    [[NN:%.*]] = and i32 [[N:%.*]], 2147483647
-; CHECK-NEXT:    [[A:%.*]] = icmp sgt i32 [[NN]], [[X:%.*]]
-; CHECK-NEXT:    [[B:%.*]] = icmp sgt i32 [[X]], -1
-; CHECK-NEXT:    [[C:%.*]] = or i1 [[A]], [[B]]
-; CHECK-NEXT:    ret i1 [[C]]
+; CHECK-NEXT:    ret i1 true
 ;
   %nn = and i32 %n, 2147483647
   %a = icmp slt i32 %x, %nn
diff --git a/llvm/test/Transforms/InstCombine/select.ll b/llvm/test/Transforms/InstCombine/select.ll
index a84904106eced4..d9734242a86891 100644
--- a/llvm/test/Transforms/InstCombine/select.ll
+++ b/llvm/test/Transforms/InstCombine/select.ll
@@ -2925,10 +2925,8 @@ define i8 @select_replacement_loop3(i32 noundef %x) {
 
 define i16 @select_replacement_loop4(i16 noundef %p_12) {
 ; CHECK-LABEL: @select_replacement_loop4(
-; CHECK-NEXT:    [[CMP1:%.*]] = icmp ult i16 [[P_12:%.*]], 2
-; CHECK-NEXT:    [[AND1:%.*]] = and i16 [[P_12]], 1
-; CHECK-NEXT:    [[AND2:%.*]] = select i1 [[CMP1]], i16 [[AND1]], i16 0
-; CHECK-NEXT:    [[CMP2:%.*]] = icmp eq i16 [[AND2]], [[P_12]]
+; CHECK-NEXT:    [[AND1:%.*]] = and i16 [[P_12:%.*]], 1
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp ult i16 [[P_12]], 2
 ; CHECK-NEXT:    [[AND3:%.*]] = select i1 [[CMP2]], i16 [[AND1]], i16 0
 ; CHECK-NEXT:    ret i16 [[AND3]]
 ;
diff --git a/llvm/test/Transforms/InstCombine/shift.ll b/llvm/test/Transforms/InstCombine/shift.ll
index 62f32c28683711..bef7fc81a7d1f9 100644
--- a/llvm/test/Transforms/InstCombine/shift.ll
+++ b/llvm/test/Transforms/InstCombine/shift.ll
@@ -1751,12 +1751,11 @@ define void @ashr_out_of_range_1(ptr %A) {
 ; CHECK-NEXT:    [[L:%.*]] = load i177, ptr [[A:%.*]], align 4
 ; CHECK-NEXT:    [[L_FROZEN:%.*]] = freeze i177 [[L]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = icmp eq i177 [[L_FROZEN]], -1
-; CHECK-NEXT:    [[B:%.*]] = select i1 [[TMP1]], i177 0, i177 [[L_FROZEN]]
-; CHECK-NEXT:    [[TMP2:%.*]] = trunc i177 [[B]] to i64
+; CHECK-NEXT:    [[TMP6:%.*]] = trunc i177 [[L_FROZEN]] to i64
+; CHECK-NEXT:    [[TMP2:%.*]] = select i1 [[TMP1]], i64 0, i64 [[TMP6]]
 ; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr i177, ptr [[A]], i64 [[TMP2]]
 ; CHECK-NEXT:    [[G11:%.*]] = getelementptr i8, ptr [[TMP3]], i64 -24
-; CHECK-NEXT:    [[C17:%.*]] = icmp sgt i177 [[B]], [[L_FROZEN]]
-; CHECK-NEXT:    [[TMP4:%.*]] = sext i1 [[C17]] to i64
+; CHECK-NEXT:    [[TMP4:%.*]] = sext i1 [[TMP1]] to i64
 ; CHECK-NEXT:    [[G62:%.*]] = getelementptr i177, ptr [[G11]], i64 [[TMP4]]
 ; CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i177 [[L_FROZEN]], -1
 ; CHECK-NEXT:    [[B28:%.*]] = select i1 [[TMP5]], i177 0, i177 [[L_FROZEN]]
diff --git a/llvm/test/Transforms/LoopVectorize/X86/pr23997.ll b/llvm/test/Transforms/LoopVectorize/X86/pr23997.ll
index 0b16d80a4adbc5..5aeac1101fe223 100644
--- a/llvm/test/Transforms/LoopVectorize/X86/pr23997.ll
+++ b/llvm/test/Transforms/LoopVectorize/X86/pr23997.ll
@@ -12,8 +12,7 @@ define void @foo(ptr addrspace(1) align 8 dereferenceable_or_null(16), ptr addrs
 ; CHECK:       preheader:
 ; CHECK-NEXT:    [[DOT10:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP0:%.*]], i64 16
 ; CHECK-NEXT:    [[DOT12:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[TMP1:%.*]], i64 16
-; CHECK-NEXT:    [[UMAX2:%.*]] = call i64 @llvm.umax.i64(i64 [[TMP2:%.*]], i64 1)
-; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2]], 16
+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP2:%.*]], 16
 ; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_MEMCHECK:%.*]]
 ; CHECK:       vector.memcheck:
 ; CHECK-NEXT:    [[TMP3:%.*]] = shl i64 [[TMP2]], 3
@@ -25,7 +24,7 @@ define void @foo(ptr addrspace(1) align 8 dereferenceable_or_null(16), ptr addrs
 ; CHECK-NEXT:    [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
 ; CHECK-NEXT:    br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
 ; CHECK:       vector.ph:
-; CHECK-NEXT:    [[N_VEC:%.*]] = and i64 [[UMAX2]], -16
+; CHECK-NEXT:    [[N_VEC:%.*]] = and i64 [[TMP2]], -16
 ; CHECK-NEXT:    br label [[VECTOR_BODY:%.*]]
 ; CHECK:       vector.body:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
@@ -49,7 +48,7 @@ define void @foo(ptr addrspace(1) align 8 dereferenceable_or_null(16), ptr addrs
 ; CHECK-NEXT:    [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
 ; CHECK:       middle.block:
-; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[UMAX2]], [[N_VEC]]
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[N_VEC]], [[TMP2]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], label [[LOOPEXIT:%.*]], label [[SCALAR_PH]]
 ; CHECK:       scalar.ph:
 ; CHECK-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[PREHEADER]] ], [ 0, [[VECTOR_MEMCHECK]] ]
diff --git a/llvm/test/Transforms/NewGVN/pr35125.ll b/llvm/test/Transforms/NewGVN/pr35125.ll
index 9a96594e3446db..6724538a5a7f29 100644
--- a/llvm/test/Transforms/NewGVN/pr35125.ll
+++ b/llvm/test/Transforms/NewGVN/pr35125.ll
@@ -18,15 +18,12 @@ define i32 @main() #0 {
 ; CHECK-NEXT:    [[CMP2:%.*]] = icmp ult i32 [[STOREMERGE]], [[PHIOFOPS]]
 ; CHECK-NEXT:    br i1 [[CMP2]], label [[IF_THEN3:%.*]], label [[IF_END6:%.*]]
 ; CHECK:       if.then3:
-; CHECK-NEXT:    [[TOBOOL:%.*]] = icmp eq i32 [[STOREMERGE]], -1
-; CHECK-NEXT:    br i1 [[TOBOOL]], label [[LOR_RHS:%.*]], label [[LOR_END:%.*]]
+; CHECK-NEXT:    br i1 false, label [[LOR_RHS:%.*]], label [[LOR_END:%.*]]
 ; CHECK:       lor.rhs:
-; CHECK-NEXT:    [[TOBOOL5:%.*]] = icmp ne i32 [[TMP0]], 0
-; CHECK-NEXT:    [[PHITMP:%.*]] = zext i1 [[TOBOOL5]] to i32
+; CHECK-NEXT:    store i8 poison, ptr null, align 1
 ; CHECK-NEXT:    br label [[LOR_END]]
 ; CHECK:       lor.end:
-; CHECK-NEXT:    [[TMP1:%.*]] = phi i32 [ 1, [[IF_THEN3]] ], [ [[PHITMP]], [[LOR_RHS]] ]
-; CHECK-NEXT:    store i32 [[TMP1]], ptr @a, align 4
+; CHECK-NEXT:    store i32 1, ptr @a, align 4
 ; CHECK-NEXT:    br label [[IF_END6]]
 ; CHECK:       if.end6:
 ; CHECK-NEXT:    [[TMP2:%.*]] = load i32, ptr @a, align 4

@goldsteinn
Copy link
Contributor Author

@nikic
Copy link
Contributor

nikic commented Mar 17, 2024

@dtcxzyw Can you please test this?

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Mar 17, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 17, 2024

I am not sure whether it is suitable to use computeConstantRange in isImpliedCond.

See also nikic's comment: #69840 (comment)

@nikic
Copy link
Contributor

nikic commented Mar 17, 2024

Yeah, I can't say I'm particularly fond of the direction.

From the test diffs, a fold we're missing is this: https://alive2.llvm.org/ce/z/-njJr8 Particularly profitable if the new comparison folds later, but also seems generally beneficial.

@goldsteinn
Copy link
Contributor Author

I am not sure whether it is suitable to use computeConstantRange in isImpliedCond.

See also nikic's comment: #69840 (comment)

So originally my goal was just to handle cases like X u> Y implies X != 0.

My first attempt was a bespoke set of switching statements, but pretty quickly saw that just changing to CR would be an easier and less bugprone way to impl that. At that point I wasn't using computeConstantRange, just getFull if the argument wasn't an APInt.

Then figured, if there is no big compile time impact just using computeConstantRange, that just purely more information, so here we are.

My point is I think basically each step seems like a reasonable improvement on the alternative. If the compile time impact is a proper concern, think it may make sense to constrain computeConstantRange (maybe pass MaxDepth - 1), but other than that, not sure what the rationale against is.

@goldsteinn
Copy link
Contributor Author

Yeah, I can't say I'm particularly fond of the direction.

From the test diffs, a fold we're missing is this: https://alive2.llvm.org/ce/z/-njJr8 Particularly profitable if the new comparison folds later, but also seems generally beneficial.

Would think its more principled to more generally handle the implication between conditions rather than add a transform for each possible case.

@nikic
Copy link
Contributor

nikic commented Mar 17, 2024

Yeah, I can't say I'm particularly fond of the direction.
From the test diffs, a fold we're missing is this: https://alive2.llvm.org/ce/z/-njJr8 Particularly profitable if the new comparison folds later, but also seems generally beneficial.

Would think its more principled to more generally handle the implication between conditions rather than add a transform for each possible case.

This is still an implied condition based transform. In fact, now that I look for it, it seems like foldSelectICmp() should be doing exactly that. I think maybe it doesn't trigger because the condition becomes icmp eq 2, %x and we need to handle non-canonical icmp order in isImpliedCond.

(This is not intended as a full replacement for what you want here, just something I noticed in the tests.)

@nikic
Copy link
Contributor

nikic commented Mar 17, 2024

Ah, I think the reason I saw it in the diffs is that this patch adds the non-canonical order handling -- the improvement is not actually related to the constant range support at all. Could you please split this out into a separate patch?

@goldsteinn
Copy link
Contributor Author

Ah, I think the reason I saw it in the diffs is that this patch adds the non-canonical order handling -- the improvement is not actually related to the constant range support at all. Could you please split this out into a separate patch?

Didn't realize L0 / R0 could be constants. ill split that out. Seperate patch or commit?

@nikic
Copy link
Contributor

nikic commented Mar 17, 2024

Separate patch please.

@goldsteinn
Copy link
Contributor Author

Separate patch please.

Done, see: #85575

@goldsteinn goldsteinn force-pushed the perf/goldsteinn/implied-with-cr branch from 69ab890 to cc7df74 Compare March 17, 2024 22:54
@goldsteinn
Copy link
Contributor Author

Rebased

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request Mar 18, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 18, 2024

This patch seems to block SROA: dtcxzyw/llvm-opt-benchmark#419 (comment).

@goldsteinn
Copy link
Contributor Author

goldsteinn commented Mar 19, 2024

This patch seems to block SROA: dtcxzyw/llvm-opt-benchmark#419 (comment).

Seems to boil down to simplifications happening earlier.

A reduced form:


define void @fun0() {
entry:
  %first111 = alloca [0 x [0 x [0 x ptr]]], i32 0, align 8
  store i64 0, ptr %first111, align 8
  %last = getelementptr i8, ptr %first111, i64 8
  call void @fun3(ptr %first111, ptr %last)
  ret void
}

define void @fun3(ptr %first, ptr %last, ptr %p_in) {
entry:
  %sub.ptr.lhs.cast = ptrtoint ptr %last to i64
  %sub.ptr.rhs.cast = ptrtoint ptr %first to i64
  %sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
  %call = ashr exact i64 %sub.ptr.sub, 3
  %call2 = load volatile i64, ptr %p_in, align 8
  %cmp = icmp ugt i64 %call, %call2
  br i1 %cmp, label %common.ret, label %if.else

common.ret:                                       ; preds = %if.else29, %if.else, %entry
  ret void

if.else:                                          ; preds = %entry
  %c_load.cast0.i = ptrtoint ptr %last to i64
  %c_load.cast.div0.i = ashr exact i64 %c_load.cast0.i, 3
  %cmp24.not = icmp ult i64 %c_load.cast.div0.i, %call
  br i1 %cmp24.not, label %if.else29, label %common.ret

if.else29:                                        ; preds = %if.else
  %n_is_c = call i1 @llvm.is.constant.i64(i64 %c_load.cast.div0.i)
  %cmp2 = icmp eq i64 %c_load.cast.div0.i, -1
  %or.cond1 = and i1 %n_is_c, %cmp2
  %add.ptr = getelementptr i64, ptr %first, i64 %c_load.cast.div0.i
  %.pre = ptrtoint ptr %add.ptr to i64
  %ptr.lhs.pre-phi = select i1 %or.cond1, i64 0, i64 %.pre
  %ptr.sub = sub i64 %ptr.lhs.pre-phi, %sub.ptr.rhs.cast
  call void @llvm.memmove.p0.p0.i64(ptr null, ptr %first, i64 %ptr.sub, i1 false)
  br label %common.ret
}

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
declare void @llvm.memmove.p0.p0.i64(ptr nocapture writeonly, ptr nocapture readonly, i64, i1 immarg) #0

; Function Attrs: convergent nocallback nofree nosync nounwind willreturn memory(none)
declare i1 @llvm.is.constant.i64(i64) #1

attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
attributes #1 = { convergent nocallback nofree nosync nounwind willreturn memory(none) }

Where we go awry is when fold:

  %c_load.cast.div0.i = ashr exact i64 %c_load.cast0.i, 3
  %cmp24.not = icmp ult i64 %c_load.cast.div0.i, %call

->

  %cmp24.not = icmp ugt i64 %sub.ptr.sub, %c_load.cast0.i

Which eventually results in the following diff after inlining:

  %c_load.cast.div0.i.i = ashr exact i64 %c_load.cast0.i.i, 3
  %cmp24.not.i = icmp ult i64 %c_load.cast.div0.i.i, 1
  %cmp24.not.i = icmp ugt i64 8, %c_load.cast0.i.i

Then finally:

  %cmp24.not.i = icmp eq ptr %c_load0.i.i, null

vs

  %cmp24.not.i = icmp ult ptr %c_load0.i.i, inttoptr (i64 8 to ptr)

Essentially we throw away the information that the low 3 bits of the pointer are zero
before we have enough information to fully reduce the compare to something easy to
analyze.

Looking into a fix...

Edit: Similiar to last time, there is no point where it seems we make a "bad decision",
its just that the order we make good decisions varies.

@goldsteinn
Copy link
Contributor Author

This patch seems to block SROA: dtcxzyw/llvm-opt-benchmark#419 (comment).

Seems to boil down to simplifications happening earlier.

A reduced form:


define void @fun0() {
entry:
  %first111 = alloca [0 x [0 x [0 x ptr]]], i32 0, align 8
  store i64 0, ptr %first111, align 8
  %last = getelementptr i8, ptr %first111, i64 8
  call void @fun3(ptr %first111, ptr %last)
  ret void
}

define void @fun3(ptr %first, ptr %last, ptr %p_in) {
entry:
  %sub.ptr.lhs.cast = ptrtoint ptr %last to i64
  %sub.ptr.rhs.cast = ptrtoint ptr %first to i64
  %sub.ptr.sub = sub i64 %sub.ptr.lhs.cast, %sub.ptr.rhs.cast
  %call = ashr exact i64 %sub.ptr.sub, 3
  %call2 = load volatile i64, ptr %p_in, align 8
  %cmp = icmp ugt i64 %call, %call2
  br i1 %cmp, label %common.ret, label %if.else

common.ret:                                       ; preds = %if.else29, %if.else, %entry
  ret void

if.else:                                          ; preds = %entry
  %c_load.cast0.i = ptrtoint ptr %last to i64
  %c_load.cast.div0.i = ashr exact i64 %c_load.cast0.i, 3
  %cmp24.not = icmp ult i64 %c_load.cast.div0.i, %call
  br i1 %cmp24.not, label %if.else29, label %common.ret

if.else29:                                        ; preds = %if.else
  %n_is_c = call i1 @llvm.is.constant.i64(i64 %c_load.cast.div0.i)
  %cmp2 = icmp eq i64 %c_load.cast.div0.i, -1
  %or.cond1 = and i1 %n_is_c, %cmp2
  %add.ptr = getelementptr i64, ptr %first, i64 %c_load.cast.div0.i
  %.pre = ptrtoint ptr %add.ptr to i64
  %ptr.lhs.pre-phi = select i1 %or.cond1, i64 0, i64 %.pre
  %ptr.sub = sub i64 %ptr.lhs.pre-phi, %sub.ptr.rhs.cast
  call void @llvm.memmove.p0.p0.i64(ptr null, ptr %first, i64 %ptr.sub, i1 false)
  br label %common.ret
}

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
declare void @llvm.memmove.p0.p0.i64(ptr nocapture writeonly, ptr nocapture readonly, i64, i1 immarg) #0

; Function Attrs: convergent nocallback nofree nosync nounwind willreturn memory(none)
declare i1 @llvm.is.constant.i64(i64) #1

attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
attributes #1 = { convergent nocallback nofree nosync nounwind willreturn memory(none) }

Where we go awry is when fold:

  %c_load.cast.div0.i = ashr exact i64 %c_load.cast0.i, 3
  %cmp24.not = icmp ult i64 %c_load.cast.div0.i, %call

->

  %cmp24.not = icmp ugt i64 %sub.ptr.sub, %c_load.cast0.i

Which eventually results in the following diff after inlining:

  %c_load.cast.div0.i.i = ashr exact i64 %c_load.cast0.i.i, 3
  %cmp24.not.i = icmp ult i64 %c_load.cast.div0.i.i, 1
  %cmp24.not.i = icmp ugt i64 8, %c_load.cast0.i.i

Then finally:

  %cmp24.not.i = icmp eq ptr %c_load0.i.i, null

vs

  %cmp24.not.i = icmp ult ptr %c_load0.i.i, inttoptr (i64 8 to ptr)

Essentially we throw away the information that the low 3 bits of the pointer are zero before we have enough information to fully reduce the compare to something easy to analyze.

Looking into a fix...

Edit: Similiar to last time, there is no point where it seems we make a "bad decision", its just that the order we make good decisions varies.

The only really thing I can think of is to preserve the information w/ assumes
when we fold a single use shr exact. But that requires the use has
a noundef b.c violating an assume is immediate UB:
https://alive2.llvm.org/ce/z/AkRPAs

Don't think thats really a great path to go down although some
method of ensuring we don't throw away information when
folding would be nice.

@goldsteinn
Copy link
Contributor Author

@nikic, is the idea of this patch okay? Or are you strongly opposed to using computeConstantRange here.
Ill rework if thats the case (want to at least handle basic stuff like X u> Y -> X != 0 which we don't need computeConstantRange for).

@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 22, 2024

The only really thing I can think of is to preserve the information w/ assumes
when we fold a single use shr exact.

We met the same issue in dtcxzyw/llvm-opt-benchmark#49 (comment).

@goldsteinn
Copy link
Contributor Author

The only really thing I can think of is to preserve the information w/ assumes
when we fold a single use shr exact.

We met the same issue in dtcxzyw/llvm-opt-benchmark#49 (comment).

Yeah I think its a fairly common issue. And it seems you guys
reached the same conclusions.

@goldsteinn goldsteinn force-pushed the perf/goldsteinn/implied-with-cr branch from cc7df74 to 2f155d6 Compare April 10, 2024 17:28
@goldsteinn
Copy link
Contributor Author

Rebased, limited CR analysis to MaxRecursiveDepth - 1. My feeling is this is mostly to capture relationships like X u> Y implies X != 0. Using CR is IMO the easiest way to avoid implementing bespoke logic here.

@nikic
Copy link
Contributor

nikic commented Apr 11, 2024

Rebased, limited CR analysis to MaxRecursiveDepth - 1. My feeling is this is mostly to capture relationships like X u> Y implies X != 0. Using CR is IMO the easiest way to avoid implementing bespoke logic here.

Using MaxRecursiveDepth - 1 is not particularly useful in this context, because computeConstantRange() is already essentially non-recursive.

This change would be ok if it were free, but it isn't (https://llvm-compile-time-tracker.com/compare.php?from=7d60232b38b66138dae1b31027d73ee5b9df5c58&to=2f155d6f9baacec48a9f69abffbfbca91ef57b46&stat=instructions:u). I don't think that it justifies the cost.

@goldsteinn
Copy link
Contributor Author

Rebased, limited CR analysis to MaxRecursiveDepth - 1. My feeling is this is mostly to capture relationships like X u> Y implies X != 0. Using CR is IMO the easiest way to avoid implementing bespoke logic here.

Using MaxRecursiveDepth - 1 is not particularly useful in this context, because computeConstantRange() is already essentially non-recursive.

This change would be ok if it were free, but it isn't (https://llvm-compile-time-tracker.com/compare.php?from=7d60232b38b66138dae1b31027d73ee5b9df5c58&to=2f155d6f9baacec48a9f69abffbfbca91ef57b46&stat=instructions:u). I don't think that it justifies the cost.

Okay, ill work to make it free.

…constants.

Previously we only handled the `L0 == R0` case if both `L1` and `R1`
where constant.

We can get more out of the analysis using general constant ranges
instead.

For example, `X u> Y` implies `X != 0`.

In general, any strict comparison on `X` implies that `X` is not equal
to the boundary value for the sign and constant ranges with/without
sign bits can be useful in deducing implications.
@goldsteinn goldsteinn force-pushed the perf/goldsteinn/implied-with-cr branch from 2f155d6 to 98198d7 Compare April 12, 2024 01:28
@goldsteinn
Copy link
Contributor Author

Rebased, limited CR analysis to MaxRecursiveDepth - 1. My feeling is this is mostly to capture relationships like X u> Y implies X != 0. Using CR is IMO the easiest way to avoid implementing bespoke logic here.

Using MaxRecursiveDepth - 1 is not particularly useful in this context, because computeConstantRange() is already essentially non-recursive.
This change would be ok if it were free, but it isn't (https://llvm-compile-time-tracker.com/compare.php?from=7d60232b38b66138dae1b31027d73ee5b9df5c58&to=2f155d6f9baacec48a9f69abffbfbca91ef57b46&stat=instructions:u). I don't think that it justifies the cost.

Okay, ill work to make it free.

@nikic, limited to requiring one constant (either L1 or R1) and that seems to address all the compile time concerns:
https://llvm-compile-time-tracker.com/compare.php?from=7aa371687ace40b85f04e21956e03f1e93052b56&to=98198d78a9a2c0e8e5eadf60ff0f4b57ba248698&stat=instructions:u

@goldsteinn
Copy link
Contributor Author

ping

@goldsteinn
Copy link
Contributor Author

ping2

@goldsteinn
Copy link
Contributor Author

ping

dtcxzyw added a commit to dtcxzyw/llvm-opt-benchmark that referenced this pull request May 12, 2024
@dtcxzyw
Copy link
Member

dtcxzyw commented May 12, 2024

This patch seems to block SROA: dtcxzyw/llvm-opt-benchmark#419 (comment).

This problem is still outstanding. Any thoughts about preserving information from the exact flag?

@goldsteinn
Copy link
Contributor Author

This patch seems to block SROA: dtcxzyw/llvm-opt-benchmark#419 (comment).

This problem is still outstanding. Any thoughts about preserving information from the exact flag?

I think this is a more general problem that we don't have any good solution for. Think the diff is net positive so would prefer to not block this patch w/ the potentially intractable problem of folds throwing away information

@goldsteinn
Copy link
Contributor Author

ping

1 similar comment
@goldsteinn
Copy link
Contributor Author

ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants