Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Add transforms for icmp uPred (trunc x),(truncOrZext(y)) -> icmp uPred x,y #71309

Closed
wants to merge 2 commits into from

Conversation

goldsteinn
Copy link
Contributor

  • [InstCombine] Add tests for transforming (icmp eq/ne trunc(x), truncOrZext(y)); NFC
  • [InstCombine] Add transforms for (icmp uPred (trunc x),(truncOrZext(y)))->(icmp uPred x,y)

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 5, 2023

@llvm/pr-subscribers-llvm-transforms

Author: None (goldsteinn)

Changes
  • [InstCombine] Add tests for transforming (icmp eq/ne trunc(x), truncOrZext(y)); NFC
  • [InstCombine] Add transforms for (icmp uPred (trunc x),(truncOrZext(y)))->(icmp uPred x,y)

Full diff: https://github.com/llvm/llvm-project/pull/71309.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+50)
  • (modified) llvm/test/Transforms/InstCombine/eq-of-parts.ll (+4-22)
  • (added) llvm/test/Transforms/InstCombine/icmp-of-trunc-ext.ll (+172)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index f06657c8cd7633d..ee749e1b0537334 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -1545,6 +1545,53 @@ Instruction *InstCombinerImpl::foldICmpTruncConstant(ICmpInst &Cmp,
   return nullptr;
 }
 
+/// Fold icmp (trunc X), (trunc Y).
+/// Fold icmp (trunc X), (zext Y).
+static Instruction *foldICmpTruncWithTruncOrExt(ICmpInst &Cmp,
+                                                InstCombinerImpl &IC,
+                                                const SimplifyQuery &Q) {
+  if (!Cmp.isEquality() && !Cmp.isUnsigned())
+    return nullptr;
+
+  Value *X, *Y;
+  ICmpInst::Predicate Pred;
+  bool YIsZext = false;
+  // Try to match icmp (trunc X), (trunc Y)
+  if (match(&Cmp, m_ICmp(Pred, m_Trunc(m_Value(X)), m_Trunc(m_Value(Y))))) {
+    if (X->getType() != Y->getType() &&
+        (!Cmp.getOperand(0)->hasOneUse() || !Cmp.getOperand(1)->hasOneUse()))
+      return nullptr;
+  }
+  // Try to match icmp (trunc X), (zext Y)
+  else if (match(&Cmp, m_c_ICmp(Pred, m_Trunc(m_Value(X)),
+                                m_OneUse(m_ZExt(m_Value(Y)))))) {
+
+    YIsZext = true;
+  } else {
+    return nullptr;
+  }
+
+  Type *TruncTy = Cmp.getOperand(0)->getType();
+  Type *BaseTy = X->getType();
+
+  unsigned TruncBits = TruncTy->getScalarSizeInBits();
+
+  // Check if the trunc is unneeded.
+  KnownBits KnownX = computeKnownBits(X, Q.DL, 0, Q.AC, Q.CxtI, Q.DT);
+  if (KnownX.Zero.countl_one() < KnownX.getBitWidth() - TruncBits)
+    return nullptr;
+
+  if (!YIsZext) {
+    // If Y is also a trunc, make sure it is unneeded.
+    KnownBits KnownY = computeKnownBits(Y, Q.DL, 0, Q.AC, Q.CxtI, Q.DT);
+    if (KnownY.Zero.countl_one() < KnownY.getBitWidth() - TruncBits)
+      return nullptr;
+  }
+
+  Value *NewY = IC.Builder.CreateZExtOrTrunc(Y, BaseTy);
+  return IC.replaceInstUsesWith(Cmp, IC.Builder.CreateICmp(Pred, X, NewY));
+}
+
 /// Fold icmp (xor X, Y), C.
 Instruction *InstCombinerImpl::foldICmpXorConstant(ICmpInst &Cmp,
                                                    BinaryOperator *Xor,
@@ -6907,6 +6954,9 @@ Instruction *InstCombinerImpl::visitICmpInst(ICmpInst &I) {
   if (Instruction *Res = foldICmpUsingKnownBits(I))
     return Res;
 
+  if (Instruction *Res = foldICmpTruncWithTruncOrExt(I, *this, Q))
+    return Res;
+
   // Test if the ICmpInst instruction is used exclusively by a select as
   // part of a minimum or maximum operation. If so, refrain from doing
   // any other folding. This helps out other analyses which understand
diff --git a/llvm/test/Transforms/InstCombine/eq-of-parts.ll b/llvm/test/Transforms/InstCombine/eq-of-parts.ll
index 57b15ae3b96e66e..217e37b85933949 100644
--- a/llvm/test/Transforms/InstCombine/eq-of-parts.ll
+++ b/llvm/test/Transforms/InstCombine/eq-of-parts.ll
@@ -584,17 +584,8 @@ define i1 @eq_21_not_adjacent(i32 %x, i32 %y) {
 
 define i1 @eq_shift_in_zeros(i32 %x, i32 %y) {
 ; CHECK-LABEL: @eq_shift_in_zeros(
-; CHECK-NEXT:    [[X_321:%.*]] = lshr i32 [[X:%.*]], 8
-; CHECK-NEXT:    [[X_1:%.*]] = trunc i32 [[X_321]] to i8
-; CHECK-NEXT:    [[X_32:%.*]] = lshr i32 [[X]], 16
-; CHECK-NEXT:    [[X_2:%.*]] = trunc i32 [[X_32]] to i24
-; CHECK-NEXT:    [[Y_321:%.*]] = lshr i32 [[Y:%.*]], 8
-; CHECK-NEXT:    [[Y_1:%.*]] = trunc i32 [[Y_321]] to i8
-; CHECK-NEXT:    [[Y_32:%.*]] = lshr i32 [[Y]], 16
-; CHECK-NEXT:    [[Y_2:%.*]] = trunc i32 [[Y_32]] to i24
-; CHECK-NEXT:    [[C_1:%.*]] = icmp eq i8 [[X_1]], [[Y_1]]
-; CHECK-NEXT:    [[C_2:%.*]] = icmp eq i24 [[X_2]], [[Y_2]]
-; CHECK-NEXT:    [[C_210:%.*]] = and i1 [[C_2]], [[C_1]]
+; CHECK-NEXT:    [[C_210_UNSHIFTED:%.*]] = xor i32 [[X:%.*]], [[Y:%.*]]
+; CHECK-NEXT:    [[C_210:%.*]] = icmp ult i32 [[C_210_UNSHIFTED]], 256
 ; CHECK-NEXT:    ret i1 [[C_210]]
 ;
   %x.321 = lshr i32 %x, 8
@@ -1249,17 +1240,8 @@ define i1 @ne_21_not_adjacent(i32 %x, i32 %y) {
 
 define i1 @ne_shift_in_zeros(i32 %x, i32 %y) {
 ; CHECK-LABEL: @ne_shift_in_zeros(
-; CHECK-NEXT:    [[X_321:%.*]] = lshr i32 [[X:%.*]], 8
-; CHECK-NEXT:    [[X_1:%.*]] = trunc i32 [[X_321]] to i8
-; CHECK-NEXT:    [[X_32:%.*]] = lshr i32 [[X]], 16
-; CHECK-NEXT:    [[X_2:%.*]] = trunc i32 [[X_32]] to i24
-; CHECK-NEXT:    [[Y_321:%.*]] = lshr i32 [[Y:%.*]], 8
-; CHECK-NEXT:    [[Y_1:%.*]] = trunc i32 [[Y_321]] to i8
-; CHECK-NEXT:    [[Y_32:%.*]] = lshr i32 [[Y]], 16
-; CHECK-NEXT:    [[Y_2:%.*]] = trunc i32 [[Y_32]] to i24
-; CHECK-NEXT:    [[C_1:%.*]] = icmp ne i8 [[X_1]], [[Y_1]]
-; CHECK-NEXT:    [[C_2:%.*]] = icmp ne i24 [[X_2]], [[Y_2]]
-; CHECK-NEXT:    [[C_210:%.*]] = or i1 [[C_2]], [[C_1]]
+; CHECK-NEXT:    [[C_210_UNSHIFTED:%.*]] = xor i32 [[X:%.*]], [[Y:%.*]]
+; CHECK-NEXT:    [[C_210:%.*]] = icmp ugt i32 [[C_210_UNSHIFTED]], 255
 ; CHECK-NEXT:    ret i1 [[C_210]]
 ;
   %x.321 = lshr i32 %x, 8
diff --git a/llvm/test/Transforms/InstCombine/icmp-of-trunc-ext.ll b/llvm/test/Transforms/InstCombine/icmp-of-trunc-ext.ll
new file mode 100644
index 000000000000000..0673f013790d5c8
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/icmp-of-trunc-ext.ll
@@ -0,0 +1,172 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -passes=instcombine < %s | FileCheck %s
+
+declare void @llvm.assume(i1)
+declare void @use(i16)
+define i1 @icmp_trunc_x_trunc_y(i32 %x, i32 %y) {
+; CHECK-LABEL: @icmp_trunc_x_trunc_y(
+; CHECK-NEXT:    [[X_LB_ONLY:%.*]] = icmp ult i32 [[X:%.*]], 65536
+; CHECK-NEXT:    [[Y_LB_ONLY:%.*]] = icmp ult i32 [[Y:%.*]], 65536
+; CHECK-NEXT:    call void @llvm.assume(i1 [[X_LB_ONLY]])
+; CHECK-NEXT:    call void @llvm.assume(i1 [[Y_LB_ONLY]])
+; CHECK-NEXT:    [[R:%.*]] = icmp eq i32 [[X]], [[Y]]
+; CHECK-NEXT:    ret i1 [[R]]
+;
+  %x_lb_only = icmp ult i32 %x, 65536
+  %y_lb_only = icmp ult i32 %y, 65536
+  call void @llvm.assume(i1 %x_lb_only)
+  call void @llvm.assume(i1 %y_lb_only)
+  %x16 = trunc i32 %x to i16
+  %y16 = trunc i32 %y to i16
+  %r = icmp eq i16 %x16, %y16
+  ret i1 %r
+}
+
+define i1 @icmp_trunc_x_trunc_y_2(i32 %x, i64 %y) {
+; CHECK-LABEL: @icmp_trunc_x_trunc_y_2(
+; CHECK-NEXT:    [[X_LB_ONLY:%.*]] = icmp ult i32 [[X:%.*]], 12345
+; CHECK-NEXT:    [[Y_LB_ONLY:%.*]] = icmp ult i64 [[Y:%.*]], 65536
+; CHECK-NEXT:    call void @llvm.assume(i1 [[X_LB_ONLY]])
+; CHECK-NEXT:    call void @llvm.assume(i1 [[Y_LB_ONLY]])
+; CHECK-NEXT:    [[TMP1:%.*]] = zext i32 [[X]] to i64
+; CHECK-NEXT:    [[R:%.*]] = icmp ugt i64 [[TMP1]], [[Y]]
+; CHECK-NEXT:    ret i1 [[R]]
+;
+  %x_lb_only = icmp ult i32 %x, 12345
+  %y_lb_only = icmp ult i64 %y, 65536
+  call void @llvm.assume(i1 %x_lb_only)
+  call void @llvm.assume(i1 %y_lb_only)
+  %x16 = trunc i32 %x to i16
+  %y16 = trunc i64 %y to i16
+  %r = icmp ult i16 %y16, %x16
+  ret i1 %r
+}
+
+define i1 @icmp_trunc_x_trunc_y_3(i64 %x, i32 %y) {
+; CHECK-LABEL: @icmp_trunc_x_trunc_y_3(
+; CHECK-NEXT:    [[X_LB_ONLY:%.*]] = icmp ult i64 [[X:%.*]], 123
+; CHECK-NEXT:    [[Y_LB_ONLY:%.*]] = icmp ult i32 [[Y:%.*]], 256
+; CHECK-NEXT:    call void @llvm.assume(i1 [[X_LB_ONLY]])
+; CHECK-NEXT:    call void @llvm.assume(i1 [[Y_LB_ONLY]])
+; CHECK-NEXT:    [[TMP1:%.*]] = trunc i64 [[X]] to i32
+; CHECK-NEXT:    [[R:%.*]] = icmp uge i32 [[TMP1]], [[Y]]
+; CHECK-NEXT:    ret i1 [[R]]
+;
+  %x_lb_only = icmp ult i64 %x, 123
+  %y_lb_only = icmp ult i32 %y, 256
+  call void @llvm.assume(i1 %x_lb_only)
+  call void @llvm.assume(i1 %y_lb_only)
+  %xi8 = trunc i64 %x to i8
+  %yi8 = trunc i32 %y to i8
+  %r = icmp ule i8 %yi8, %xi8
+  ret i1 %r
+}
+
+define i1 @icmp_trunc_x_trunc_y_fail_maybe_dirty_upper(i32 %x, i32 %y) {
+; CHECK-LABEL: @icmp_trunc_x_trunc_y_fail_maybe_dirty_upper(
+; CHECK-NEXT:    [[X_LB_ONLY:%.*]] = icmp ult i32 [[X:%.*]], 65536
+; CHECK-NEXT:    [[Y_LB_ONLY:%.*]] = icmp ult i32 [[Y:%.*]], 65537
+; CHECK-NEXT:    call void @llvm.assume(i1 [[X_LB_ONLY]])
+; CHECK-NEXT:    call void @llvm.assume(i1 [[Y_LB_ONLY]])
+; CHECK-NEXT:    [[X16:%.*]] = trunc i32 [[X]] to i16
+; CHECK-NEXT:    [[Y16:%.*]] = trunc i32 [[Y]] to i16
+; CHECK-NEXT:    [[R:%.*]] = icmp ne i16 [[X16]], [[Y16]]
+; CHECK-NEXT:    ret i1 [[R]]
+;
+  %x_lb_only = icmp ult i32 %x, 65536
+  %y_lb_only = icmp ult i32 %y, 65537
+  call void @llvm.assume(i1 %x_lb_only)
+  call void @llvm.assume(i1 %y_lb_only)
+  %x16 = trunc i32 %x to i16
+  %y16 = trunc i32 %y to i16
+  %r = icmp ne i16 %x16, %y16
+  ret i1 %r
+}
+
+define i1 @icmp_trunc_x_trunc_y_fail_maybe_dirty_upper_2(i32 %x, i32 %y) {
+; CHECK-LABEL: @icmp_trunc_x_trunc_y_fail_maybe_dirty_upper_2(
+; CHECK-NEXT:    [[X_LB_ONLY:%.*]] = icmp slt i32 [[X:%.*]], 65536
+; CHECK-NEXT:    [[Y_LB_ONLY:%.*]] = icmp ult i32 [[Y:%.*]], 65536
+; CHECK-NEXT:    call void @llvm.assume(i1 [[X_LB_ONLY]])
+; CHECK-NEXT:    call void @llvm.assume(i1 [[Y_LB_ONLY]])
+; CHECK-NEXT:    [[X16:%.*]] = trunc i32 [[X]] to i16
+; CHECK-NEXT:    [[Y16:%.*]] = trunc i32 [[Y]] to i16
+; CHECK-NEXT:    [[R:%.*]] = icmp ne i16 [[X16]], [[Y16]]
+; CHECK-NEXT:    ret i1 [[R]]
+;
+  %x_lb_only = icmp slt i32 %x, 65536
+  %y_lb_only = icmp ult i32 %y, 65536
+  call void @llvm.assume(i1 %x_lb_only)
+  call void @llvm.assume(i1 %y_lb_only)
+  %x16 = trunc i32 %x to i16
+  %y16 = trunc i32 %y to i16
+  %r = icmp ne i16 %x16, %y16
+  ret i1 %r
+}
+
+define i1 @icmp_trunc_x_zext_y(i32 %x, i8 %y) {
+; CHECK-LABEL: @icmp_trunc_x_zext_y(
+; CHECK-NEXT:    [[X_LB_ONLY:%.*]] = icmp ult i32 [[X:%.*]], 65536
+; CHECK-NEXT:    call void @llvm.assume(i1 [[X_LB_ONLY]])
+; CHECK-NEXT:    [[TMP1:%.*]] = zext i8 [[Y:%.*]] to i32
+; CHECK-NEXT:    [[R:%.*]] = icmp ult i32 [[TMP1]], [[X]]
+; CHECK-NEXT:    ret i1 [[R]]
+;
+  %x_lb_only = icmp ult i32 %x, 65536
+  call void @llvm.assume(i1 %x_lb_only)
+  %x16 = trunc i32 %x to i16
+  %y16 = zext i8 %y to i16
+  %r = icmp ugt i16 %x16, %y16
+  ret i1 %r
+}
+
+define i1 @icmp_trunc_x_zext_y_2(i64 %x, i8 %y) {
+; CHECK-LABEL: @icmp_trunc_x_zext_y_2(
+; CHECK-NEXT:    [[X_LB_ONLY:%.*]] = icmp ult i64 [[X:%.*]], 65536
+; CHECK-NEXT:    call void @llvm.assume(i1 [[X_LB_ONLY]])
+; CHECK-NEXT:    [[TMP1:%.*]] = zext i8 [[Y:%.*]] to i64
+; CHECK-NEXT:    [[R:%.*]] = icmp uge i64 [[TMP1]], [[X]]
+; CHECK-NEXT:    ret i1 [[R]]
+;
+  %x_lb_only = icmp ult i64 %x, 65536
+  call void @llvm.assume(i1 %x_lb_only)
+  %x16 = trunc i64 %x to i16
+  %y16 = zext i8 %y to i16
+  %r = icmp uge i16 %y16, %x16
+  ret i1 %r
+}
+
+define i1 @icmp_trunc_x_zext_y_3(i8 %x, i64 %y) {
+; CHECK-LABEL: @icmp_trunc_x_zext_y_3(
+; CHECK-NEXT:    [[Y_LB_ONLY:%.*]] = icmp ult i64 [[Y:%.*]], 65536
+; CHECK-NEXT:    call void @llvm.assume(i1 [[Y_LB_ONLY]])
+; CHECK-NEXT:    [[TMP1:%.*]] = zext i8 [[X:%.*]] to i64
+; CHECK-NEXT:    [[R:%.*]] = icmp ne i64 [[TMP1]], [[Y]]
+; CHECK-NEXT:    ret i1 [[R]]
+;
+  %y_lb_only = icmp ult i64 %y, 65536
+  call void @llvm.assume(i1 %y_lb_only)
+  %x16 = zext i8 %x to i16
+  %y16 = trunc i64 %y to i16
+  %r = icmp ne i16 %y16, %x16
+  ret i1 %r
+}
+
+define i1 @icmp_trunc_x_zext_y_fail_multiuse(i32 %x, i8 %y) {
+; CHECK-LABEL: @icmp_trunc_x_zext_y_fail_multiuse(
+; CHECK-NEXT:    [[X_LB_ONLY:%.*]] = icmp ult i32 [[X:%.*]], 65536
+; CHECK-NEXT:    call void @llvm.assume(i1 [[X_LB_ONLY]])
+; CHECK-NEXT:    [[X16:%.*]] = trunc i32 [[X]] to i16
+; CHECK-NEXT:    [[Y16:%.*]] = zext i8 [[Y:%.*]] to i16
+; CHECK-NEXT:    call void @use(i16 [[Y16]])
+; CHECK-NEXT:    [[R:%.*]] = icmp ule i16 [[X16]], [[Y16]]
+; CHECK-NEXT:    ret i1 [[R]]
+;
+  %x_lb_only = icmp ult i32 %x, 65536
+  call void @llvm.assume(i1 %x_lb_only)
+  %x16 = trunc i32 %x to i16
+  %y16 = zext i8 %y to i16
+  call void @use(i16 %y16)
+  %r = icmp ule i16 %x16, %y16
+  ret i1 %r
+}

@dtcxzyw dtcxzyw changed the title goldsteinn/trunc [InstCombine] Add transforms for icmp uPred (trunc x),(truncOrZext(y)) -> icmp uPred x,y Nov 5, 2023
@nikic
Copy link
Contributor

nikic commented Nov 5, 2023

Should this be guarded by a shouldChangeType() heuristic? Say you're doing icmp eq (trunc i256 to i32, trunc i256 to i32). Is folding that to icmp eq i256 a good idea?

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Waiting for additional approval from @nikic.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check whether this has compile-time impact (due to computeKnownBits)?

(We could reuse the KnownBits from foldICmpUsingKnownBits if necessary.)

@goldsteinn
Copy link
Contributor Author

Can you please check whether this has compile-time impact (due to computeKnownBits)?

(We could reuse the KnownBits from foldICmpUsingKnownBits if necessary.)

Seems like minimal affect: https://llvm-compile-time-tracker.com/compare.php?from=2b71f91b06ad4f5a0c54725b06283fd731620b92&to=2343dfccca46d158de240fca8e0588ae4cc46203&stat=instructions:u

Slight regression in ThinLTO. Larger improvement in stage2 + O3.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sr-tream pushed a commit to sr-tream/llvm-project that referenced this pull request Nov 20, 2023
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants