[InstCombine] Use zext's nneg flag for icmp folding #70845

leo-ard · 2023-10-31T18:01:04Z

This PR fixes #55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics.

For the story :

Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented #67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly.

TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine.

This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.

llvmbot · 2023-10-31T18:05:25Z

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: Léonard Oest O'Leary (leo-ard)

Changes

This PR fixes #55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics.

For the story :

Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented #67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly.

TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine.

This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.

Full diff: https://github.com/llvm/llvm-project/pull/70845.diff

5 Files Affected:

(added) clang/test/CodeGen/X86/min_max.c (+19)
(modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+8-2)
(modified) llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll (+2-3)
(modified) llvm/test/Transforms/InstCombine/icmp-ext-ext.ll (+16-33)
(added) llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll (+185)

diff --git a/clang/test/CodeGen/X86/min_max.c b/clang/test/CodeGen/X86/min_max.c
new file mode 100644
index 000000000000000..7af8181cc9ff367
--- /dev/null
+++ b/clang/test/CodeGen/X86/min_max.c
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 %s -O2 -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s
+
+short vecreduce_smax_v2i16(int n, short* v)
+{
+  // CHECK: @llvm.smax
+  short p = 0;
+  for (int i = 0; i < n; ++i)
+    p = p < v[i] ? v[i] : p;
+  return p;
+}
+
+short vecreduce_smin_v2i16(int n, short* v)
+{
+  // CHECK: @llvm.smin
+  short p = 0;
+  for (int i = 0; i < n; ++i)
+    p = p > v[i] ? v[i] : p;
+  return p;
+}
\ No newline at end of file
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 2ff27abc79318c4..572872397b6baae 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -5587,11 +5587,17 @@ Instruction *InstCombinerImpl::foldICmpWithZextOrSext(ICmpInst &ICmp) {
         return new ICmpInst(ICmp.getPredicate(), Builder.CreateOr(X, Y),
                             Constant::getNullValue(X->getType()));
 
+      // Treat "zext nneg" as "sext"
+      bool IsNonNeg0 = isa<PossiblyNonNegInst>(ICmp.getOperand(0));
+      bool IsNonNeg1 = isa<PossiblyNonNegInst>(ICmp.getOperand(1));
+
       // If we have mismatched casts, treat the zext of a non-negative source as
       // a sext to simulate matching casts. Otherwise, we are done.
       // TODO: Can we handle some predicates (equality) without non-negative?
-      if ((IsZext0 && isKnownNonNegative(X, DL, 0, &AC, &ICmp, &DT)) ||
-          (IsZext1 && isKnownNonNegative(Y, DL, 0, &AC, &ICmp, &DT)))
+      if ((IsZext0 &&
+           (IsNonNeg0 || isKnownNonNegative(X, DL, 0, &AC, &ICmp, &DT))) ||
+          (IsZext1 &&
+           (IsNonNeg1 || isKnownNonNegative(Y, DL, 0, &AC, &ICmp, &DT))))
         IsSignedExt = true;
       else
         return nullptr;
diff --git a/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll b/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll
index 2e70a95dfde6233..b24a71b8dc15ea6 100644
--- a/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll
+++ b/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll
@@ -403,9 +403,8 @@ define i1 @different_size_sext_sext_ule(i7 %x, i4 %y) {
 
 define i1 @different_size_sext_zext_ne(i7 %x, i4 %y) {
 ; CHECK-LABEL: @different_size_sext_zext_ne(
-; CHECK-NEXT:    [[SX:%.*]] = sext i7 [[X:%.*]] to i25
-; CHECK-NEXT:    [[ZY:%.*]] = zext i4 [[Y:%.*]] to i25
-; CHECK-NEXT:    [[R:%.*]] = icmp ne i25 [[SX]], [[ZY]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i4 [[Y:%.*]] to i7
+; CHECK-NEXT:    [[R:%.*]] = icmp ne i7 [[TMP1]], [[X:%.*]]
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
   %sx = sext i7 %x to i25
diff --git a/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll b/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll
index f70e48e27384619..87532c1faff1526 100644
--- a/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll
@@ -119,9 +119,7 @@ define <2 x i1> @sext_sext_uge_op0_wide(<2 x i16> %x, <2 x i8> %y) {
 
 define i1 @zext_sext_sgt(i8 %x, i8 %y) {
 ; CHECK-LABEL: @zext_sext_sgt(
-; CHECK-NEXT:    [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp sgt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp sgt i8 [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = zext i8 %x to i32
@@ -132,9 +130,7 @@ define i1 @zext_sext_sgt(i8 %x, i8 %y) {
 
 define i1 @zext_sext_ugt(i8 %x, i8 %y) {
 ; CHECK-LABEL: @zext_sext_ugt(
-; CHECK-NEXT:    [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp ugt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp ugt i8 [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = zext i8 %x to i32
@@ -145,9 +141,7 @@ define i1 @zext_sext_ugt(i8 %x, i8 %y) {
 
 define i1 @zext_sext_eq(i8 %x, i8 %y) {
 ; CHECK-LABEL: @zext_sext_eq(
-; CHECK-NEXT:    [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp eq i32 [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = zext i8 %x to i32
@@ -158,9 +152,8 @@ define i1 @zext_sext_eq(i8 %x, i8 %y) {
 
 define i1 @zext_sext_sle_op0_narrow(i8 %x, i16 %y) {
 ; CHECK-LABEL: @zext_sext_sle_op0_narrow(
-; CHECK-NEXT:    [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = sext i16 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp sle i32 [[A]], [[B]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[X:%.*]] to i16
+; CHECK-NEXT:    [[C:%.*]] = icmp sle i16 [[TMP1]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = zext i8 %x to i32
@@ -171,9 +164,8 @@ define i1 @zext_sext_sle_op0_narrow(i8 %x, i16 %y) {
 
 define i1 @zext_sext_ule_op0_wide(i9 %x, i8 %y) {
 ; CHECK-LABEL: @zext_sext_ule_op0_wide(
-; CHECK-NEXT:    [[A:%.*]] = zext i9 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp ule i32 [[A]], [[B]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[Y:%.*]] to i9
+; CHECK-NEXT:    [[C:%.*]] = icmp uge i9 [[TMP1]], [[X:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = zext i9 %x to i32
@@ -184,9 +176,7 @@ define i1 @zext_sext_ule_op0_wide(i9 %x, i8 %y) {
 
 define i1 @sext_zext_slt(i8 %x, i8 %y) {
 ; CHECK-LABEL: @sext_zext_slt(
-; CHECK-NEXT:    [[A:%.*]] = sext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp slt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp slt i8 [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = sext i8 %x to i32
@@ -197,9 +187,7 @@ define i1 @sext_zext_slt(i8 %x, i8 %y) {
 
 define i1 @sext_zext_ult(i8 %x, i8 %y) {
 ; CHECK-LABEL: @sext_zext_ult(
-; CHECK-NEXT:    [[A:%.*]] = sext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp ult i32 [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp ult i8 [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = sext i8 %x to i32
@@ -210,9 +198,7 @@ define i1 @sext_zext_ult(i8 %x, i8 %y) {
 
 define <2 x i1> @sext_zext_ne(<2 x i8> %x, <2 x i8> %y) {
 ; CHECK-LABEL: @sext_zext_ne(
-; CHECK-NEXT:    [[A:%.*]] = sext <2 x i8> [[X:%.*]] to <2 x i32>
-; CHECK-NEXT:    [[B:%.*]] = zext <2 x i8> [[Y:%.*]] to <2 x i32>
-; CHECK-NEXT:    [[C:%.*]] = icmp ne <2 x i32> [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp ne <2 x i8> [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret <2 x i1> [[C]]
 ;
   %a = sext <2 x i8> %x to <2 x i32>
@@ -223,9 +209,8 @@ define <2 x i1> @sext_zext_ne(<2 x i8> %x, <2 x i8> %y) {
 
 define i1 @sext_zext_sge_op0_narrow(i5 %x, i8 %y) {
 ; CHECK-LABEL: @sext_zext_sge_op0_narrow(
-; CHECK-NEXT:    [[A:%.*]] = sext i5 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp sge i32 [[A]], [[B]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i5 [[X:%.*]] to i8
+; CHECK-NEXT:    [[C:%.*]] = icmp sge i8 [[TMP1]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = sext i5 %x to i32
@@ -236,9 +221,8 @@ define i1 @sext_zext_sge_op0_narrow(i5 %x, i8 %y) {
 
 define i1 @sext_zext_uge_op0_wide(i16 %x, i8 %y) {
 ; CHECK-LABEL: @sext_zext_uge_op0_wide(
-; CHECK-NEXT:    [[A:%.*]] = sext i16 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp uge i32 [[A]], [[B]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[Y:%.*]] to i16
+; CHECK-NEXT:    [[C:%.*]] = icmp ule i16 [[TMP1]], [[X:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = sext i16 %x to i32
@@ -396,9 +380,8 @@ define i1 @zext_eq_sext(i1 %a, i1 %b) {
 
 define i1 @zext_eq_sext_fail_not_i1(i1 %a, i8 %b) {
 ; CHECK-LABEL: @zext_eq_sext_fail_not_i1(
-; CHECK-NEXT:    [[CONV:%.*]] = zext i1 [[A:%.*]] to i32
-; CHECK-NEXT:    [[CONV3_NEG:%.*]] = sext i8 [[B:%.*]] to i32
-; CHECK-NEXT:    [[TOBOOL4:%.*]] = icmp eq i32 [[CONV]], [[CONV3_NEG]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i1 [[A:%.*]] to i8
+; CHECK-NEXT:    [[TOBOOL4:%.*]] = icmp eq i8 [[TMP1]], [[B:%.*]]
 ; CHECK-NEXT:    ret i1 [[TOBOOL4]]
 ;
   %conv = zext i1 %a to i32
diff --git a/llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll b/llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll
new file mode 100644
index 000000000000000..4e9597f7e70daaa
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll
@@ -0,0 +1,185 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --tool ./bin/opt --version 3
+; See PRXXX for more details
+; RUN-./bin/opt: opt < %s -S -passes=ipsccp | FileCheck %s
+
+
+define signext i32 @sext_sext(i16 %x, i16 %y) {
+; CHECK-LABEL: define signext i32 @sext_sext(
+; CHECK-SAME: i16 [[X:%.*]], i16 [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CONV:%.*]] = sext i16 [[X]] to i32
+; CHECK-NEXT:    [[CONV1:%.*]] = sext i16 [[Y]] to i32
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp sgt i16 [[X]], [[Y]]
+; CHECK-NEXT:    br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK:       cond.true:
+; CHECK-NEXT:    br label [[COND_END:%.*]]
+; CHECK:       cond.false:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       cond.end:
+; CHECK-NEXT:    [[COND:%.*]] = phi i32 [ 0, [[COND_TRUE]] ], [ 1, [[COND_FALSE]] ]
+; CHECK-NEXT:    ret i32 [[COND]]
+;
+entry:
+  %conv = sext i16 %x to i32
+  %conv1 = sext i16 %y to i32
+  %cmp2 = icmp sgt i32 %conv, %conv1
+  br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true:                                        ; preds = %for.body
+  br label %cond.end
+
+cond.false:                                       ; preds = %for.body
+  br label %cond.end
+
+cond.end:                                         ; preds = %cond.false, %cond.true
+  %cond = phi i32 [ 0, %cond.true ], [ 1, %cond.false ]
+  ret i32 %cond
+}
+
+
+define signext i32 @zext_zext(i16 %x, i16 %y) {
+; CHECK-LABEL: define signext i32 @zext_zext(
+; CHECK-SAME: i16 [[X:%.*]], i16 [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CONV:%.*]] = zext i16 [[X]] to i32
+; CHECK-NEXT:    [[CONV1:%.*]] = zext i16 [[Y]] to i32
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp sgt i16 [[X]], [[Y]]
+; CHECK-NEXT:    br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK:       cond.true:
+; CHECK-NEXT:    br label [[COND_END:%.*]]
+; CHECK:       cond.false:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       cond.end:
+; CHECK-NEXT:    [[COND:%.*]] = phi i32 [ 0, [[COND_TRUE]] ], [ 1, [[COND_FALSE]] ]
+; CHECK-NEXT:    ret i32 [[COND]]
+;
+entry:
+  %conv = zext i16 %x to i32
+  %conv1 = zext i16 %y to i32
+  %cmp2 = icmp sgt i32 %conv, %conv1
+  br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true:                                        ; preds = %for.body
+  br label %cond.end
+
+cond.false:                                       ; preds = %for.body
+  br label %cond.end
+
+cond.end:                                         ; preds = %cond.false, %cond.true
+  %cond = phi i32 [ 0, %cond.true ], [ 1, %cond.false ]
+  ret i32 %cond
+}
+
+
+define signext i16 @zext_positive_and_sext(i32 noundef %n, ptr noundef %v) {
+; CHECK-LABEL: define signext i16 @zext_positive_and_sext(
+; CHECK-SAME: i32 noundef [[N:%.*]], ptr noundef [[V:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_COND:%.*]]
+; CHECK:       for.cond:
+; CHECK-NEXT:    [[P_0:%.*]] = phi i16 [ 0, [[ENTRY:%.*]] ], [ [[CONV8:%.*]], [[COND_END:%.*]] ]
+; CHECK-NEXT:    [[I_0:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC:%.*]], [[COND_END]] ]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[I_0]], [[N]]
+; CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[CONV:%.*]] = zext i16 [[P_0]] to i32
+; CHECK-NEXT:    [[IDXPROM:%.*]] = sext i32 [[I_0]] to i64
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr i16, ptr [[V]], i64 [[IDXPROM]]
+; CHECK-NEXT:    [[TMP0:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
+; CHECK-NEXT:    [[CONV1:%.*]] = sext i16 [[TMP0]] to i32
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp slt i16 [[P_0]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK:       cond.true:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       cond.false:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret i16 [[P_0]]
+; CHECK:       cond.end:
+; CHECK-NEXT:    [[COND:%.*]] = phi i32 [ [[CONV1]], [[COND_TRUE]] ], [ [[CONV]], [[COND_FALSE]] ]
+; CHECK-NEXT:    [[CONV8]] = trunc i32 [[COND]] to i16
+; CHECK-NEXT:    [[INC]] = add nsw i32 [[I_0]], 1
+; CHECK-NEXT:    br label [[FOR_COND]]
+;
+entry:
+  br label %for.cond
+
+for.cond:                                         ; preds = %cond.end, %entry
+  %p.0 = phi i16 [ 0, %entry ], [ %conv8, %cond.end ]
+  %i.0 = phi i32 [ 0, %entry ], [ %inc, %cond.end ]
+  %cmp = icmp slt i32 %i.0, %n
+  br i1 %cmp, label %for.body, label %for.cond.cleanup
+
+for.body:                                         ; preds = %for.cond
+  %conv = zext i16 %p.0 to i32                    ;; %p.0 is always positive here
+  %idxprom = sext i32 %i.0 to i64
+  %arrayidx = getelementptr i16, ptr %v, i64 %idxprom
+  %0 = load i16, ptr %arrayidx, align 2
+  %conv1 = sext i16 %0 to i32
+  %cmp2 = icmp slt i32 %conv, %conv1
+  br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true:                                        ; preds = %for.body
+  br label %cond.end
+
+cond.false:                                       ; preds = %for.body
+  br label %cond.end
+
+for.cond.cleanup:                                 ; preds = %for.cond
+  ret i16 %p.0
+
+cond.end:                                         ; preds = %cond.false, %cond.true
+  %cond = phi i32 [ %conv1, %cond.true ], [ %conv, %cond.false ]
+  %conv8 = trunc i32 %cond to i16
+  %inc = add nsw i32 %i.0, 1
+  br label %for.cond
+}
+
+
+
+define signext i16 @sext_and_zext_positive(i16 %x) {
+; CHECK-LABEL: define signext i16 @sext_and_zext_positive(
+; CHECK-SAME: i16 [[X:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_COND:%.*]]
+; CHECK:       for.cond:
+; CHECK-NEXT:    [[V:%.*]] = phi i16 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[COND_END:%.*]] ]
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[CONV:%.*]] = zext i16 [[V]] to i32
+; CHECK-NEXT:    [[CONV1:%.*]] = sext i16 [[X]] to i32
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp slt i16 [[X]], [[V]]
+; CHECK-NEXT:    br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK:       cond.true:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       cond.false:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       cond.end:
+; CHECK-NEXT:    [[A:%.*]] = phi i16 [ 10, [[COND_TRUE]] ], [ 20, [[COND_FALSE]] ]
+; CHECK-NEXT:    [[INC]] = add nuw nsw i16 [[A]], 1
+; CHECK-NEXT:    br label [[FOR_COND]]
+;
+entry:
+  br label %for.cond
+
+for.cond:                                         ; preds = %cond.end, %entry
+  %v = phi i16 [ 0, %entry ], [ %inc, %cond.end ] ;; always positive
+  br label %for.body
+
+for.body:                                         ; preds = %for.cond
+  %conv = zext i16 %v to i32                    ;; %p.0 is always positive here
+  %conv1 = sext i16 %x to i32                    ;; %p.0 is always positive here
+  %cmp2 = icmp slt i32 %conv1, %conv ;; positive/positive
+  br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true:                                        ; preds = %for.body
+  br label %cond.end
+
+cond.false:                                       ; preds = %for.body
+  br label %cond.end
+
+cond.end:                                         ; preds = %cond.false, %cond.true
+  %a = phi i16 [ 10, %cond.true ],  [ 20, %cond.false ]
+  %inc = add i16 %a, 1
+  br label %for.cond
+}

clang/test/CodeGen/X86/min_max.c

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll

leo-ard · 2023-11-01T19:14:41Z

Thanks for taking the time to review the PR. I just added another test in PhaseOrdering to make sure that the min/max intrinsics are generated

leo-ard · 2023-11-08T15:22:22Z

@nikic ping:)

llvm/test/Transforms/PhaseOrdering/min_max_loop.ll

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll

llvm/test/Transforms/InstCombine/icmp-ext-ext.ll

github-actions · 2023-11-10T19:51:42Z

✅ With the latest revision this PR passed the C/C++ code formatter.

dtcxzyw · 2023-11-11T08:33:05Z

llvm/test/Transforms/PhaseOrdering/min_max_loop.ll

@@ -0,0 +1,145 @@
+; RUN: opt < %s --O3 -S | FileCheck %s


Could you please:

Convert this IR into SSA form as @nikic mentioned.

Drop unused attributes.

Re-generate tests with llvm/utils/update_test_checks.py.

define i16 @vecreduce_smin_v2i16(i32 %n, ptr %v) { entry: br label %for.cond for.cond: %p.0 = phi i16 [ 0, %entry ], [ %conv8, %for.inc ] %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %cmp = icmp slt i32 %i.0, %n br i1 %cmp, label %for.body, label %for.end for.body: %conv = sext i16 %p.0 to i32 %idxprom = sext i32 %i.0 to i64 %arrayidx = getelementptr inbounds i16, ptr %v, i64 %idxprom %0 = load i16, ptr %arrayidx, align 2 %conv1 = sext i16 %0 to i32 %cmp2 = icmp sgt i32 %conv, %conv1 br i1 %cmp2, label %cond.true, label %cond.false cond.true: %idxprom4 = sext i32 %i.0 to i64 %arrayidx5 = getelementptr inbounds i16, ptr %v, i64 %idxprom4 %1 = load i16, ptr %arrayidx5, align 2 %conv6 = sext i16 %1 to i32 br label %cond.end cond.false: %conv7 = sext i16 %p.0 to i32 br label %cond.end cond.end: %cond = phi i32 [ %conv6, %cond.true ], [ %conv7, %cond.false ] %conv8 = trunc i32 %cond to i16 br label %for.inc for.inc: %inc = add nsw i32 %i.0, 1 br label %for.cond for.end: ret i16 %p.0 } define i16 @vecreduce_smax_v2i16(i32 %n, ptr %v) { entry: br label %for.cond for.cond: %p.0 = phi i16 [ 0, %entry ], [ %conv8, %for.inc ] %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %cmp = icmp slt i32 %i.0, %n br i1 %cmp, label %for.body, label %for.end for.body: %conv = sext i16 %p.0 to i32 %idxprom = sext i32 %i.0 to i64 %arrayidx = getelementptr inbounds i16, ptr %v, i64 %idxprom %0 = load i16, ptr %arrayidx, align 2 %conv1 = sext i16 %0 to i32 %cmp2 = icmp slt i32 %conv, %conv1 br i1 %cmp2, label %cond.true, label %cond.false cond.true: %idxprom4 = sext i32 %i.0 to i64 %arrayidx5 = getelementptr inbounds i16, ptr %v, i64 %idxprom4 %1 = load i16, ptr %arrayidx5, align 2 %conv6 = sext i16 %1 to i32 br label %cond.end cond.false: %conv7 = sext i16 %p.0 to i32 br label %cond.end cond.end: %cond = phi i32 [ %conv6, %cond.true ], [ %conv7, %cond.false ] %conv8 = trunc i32 %cond to i16 br label %for.inc for.inc: %inc = add nsw i32 %i.0, 1 br label %for.cond for.end: ret i16 %p.0 }

How did you get the SSA form ? I wasn't able to do it on my side. Here are the commands that I did :

> build_release/bin/clang -S -emit-llvm min_max.c -fno-discard-value-names -o min_max.ll > build_release/bin/opt -S -passes=sroa min_max.ll > min_max2.ll > head -n 20 min_max2.ll ; ModuleID = 'min_max.ll' source_filename = "min_max.c" target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128" target triple = "arm64-apple-macosx14.0.0" ; Function Attrs: noinline nounwind optnone ssp uwtable(sync) define signext i16 @vecreduce_smin_v2i16(i32 noundef %n, ptr noundef %v) #0 { entry: %n.addr = alloca i32, align 4 %v.addr = alloca ptr, align 8 %p = alloca i16, align 2 %i = alloca i32, align 4 store i32 %n, ptr %n.addr, align 4 store ptr %v, ptr %v.addr, align 8 store i16 0, ptr %p, align 2 store i32 0, ptr %i, align 4 br label %for.cond for.cond: ; preds = %for.inc, %entry %0 = load i32, ptr %i, align 4

You need to use -O2 -Xclang -disable-llvm-optzns, or manually drop the optnone attributes.

My command: bin/opt -S -O3 -print-changed min_max.ll
Then I got IR after the SROA pass :)

llvm/test/Transforms/PhaseOrdering/min_max_loop.ll

dtcxzyw

LGTM

llvm/test/Transforms/PhaseOrdering/min_max_loop.ll

nikic

LGTM

dtcxzyw

LGTM. Thank you!

Do you have the access to merge PR?

leo-ard · 2023-11-12T16:48:09Z

Do you have the access to merge PR?

No I don't. This is my first PR on LLVM:)

Thanks for your time and insightful comments !

@rotateright

This PR fixes llvm#55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics. For the story : Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented llvm#67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly. TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine. This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.

Local branch amd-gfx 958d879 Merged main:8569465adf5e into amd-gfx:c81d641827d7 Remote branch main ff36411 [InstCombine] Use zexts nneg flag for icmp folding (llvm#70845)

leo-ard requested a review from nikic as a code owner October 31, 2023 18:01

llvmbot added clang Clang issues not falling into any other category backend:X86 llvm:transforms labels Oct 31, 2023

leo-ard mentioned this pull request Oct 31, 2023

[SCCP] [Transform] Adding ICMP folding for zext and sext in SCCPSolver #67594

Closed

nikic reviewed Nov 1, 2023

View reviewed changes

clang/test/CodeGen/X86/min_max.c Outdated Show resolved Hide resolved

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Outdated Show resolved Hide resolved

llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll Outdated Show resolved Hide resolved

leo-ard force-pushed the leo-ard/instcombine-use-nneg-flag-for-icmp-folding branch from c737a83 to 24bf431 Compare November 1, 2023 19:05

leo-ard requested a review from nikic November 2, 2023 13:32

dtcxzyw requested changes Nov 8, 2023

View reviewed changes

leo-ard added 12 commits November 9, 2023 14:12

Add NonNeg check for InstCombine

00d0c18

Add tests for min/max

ee19789

Regenerate tests

0fc4460

Move test to InstCombine

78c4c44

Reformat according to git clang format

20612e2

check with hasNonNeg (comment 2)

a3a3082

change test to instcombine (comment 3)

5fc1476

Moving end-to-end test to PhaseOrdering (comment 3)

9d004e1

Fixing failing reg tests

77476a6

Drop isKnownNonNegative() call

6b044c0

Regenerate min/max test for readability

1adba3e

Change instcombine icmp folding tests

3f2ad99

leo-ard force-pushed the leo-ard/instcombine-use-nneg-flag-for-icmp-folding branch from fb714bd to 3f2ad99 Compare November 10, 2023 19:38

leo-ard commented Nov 10, 2023

View reviewed changes

llvm/test/Transforms/InstCombine/icmp-ext-ext.ll Show resolved Hide resolved

Remove whitespace at the end of comment lines

b6f51fe

leo-ard requested a review from dtcxzyw November 11, 2023 03:03

dtcxzyw requested changes Nov 11, 2023

View reviewed changes

Update min_max_loop test

b6d0007

leo-ard commented Nov 11, 2023

View reviewed changes

llvm/test/Transforms/PhaseOrdering/min_max_loop.ll Show resolved Hide resolved

remove auto-generated note

67b6990

dtcxzyw reviewed Nov 11, 2023

View reviewed changes

llvm/test/Transforms/PhaseOrdering/min_max_loop.ll Outdated Show resolved Hide resolved

remove noundef/signext

b57b3f8

nikic approved these changes Nov 11, 2023

View reviewed changes

nikic changed the title ~~[Instcombine] use zext's nneg flag for icmp folding~~ [InstCombine] Use zext's nneg flag for icmp folding Nov 11, 2023

leo-ard requested a review from dtcxzyw November 11, 2023 21:27

dtcxzyw approved these changes Nov 12, 2023

View reviewed changes

dtcxzyw merged commit ff36411 into llvm:main Nov 12, 2023
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[InstCombine] Use zext's nneg flag for icmp folding #70845

[InstCombine] Use zext's nneg flag for icmp folding #70845

leo-ard commented Oct 31, 2023

llvmbot commented Oct 31, 2023 •

edited

Loading

leo-ard commented Nov 1, 2023

leo-ard commented Nov 8, 2023

github-actions bot commented Nov 10, 2023 •

edited

Loading

dtcxzyw Nov 11, 2023

dtcxzyw Nov 11, 2023

leo-ard Nov 11, 2023

nikic Nov 11, 2023

dtcxzyw Nov 11, 2023

dtcxzyw left a comment

nikic left a comment

dtcxzyw left a comment

leo-ard commented Nov 12, 2023 •

edited

Loading

[InstCombine] Use zext's nneg flag for icmp folding #70845

[InstCombine] Use zext's nneg flag for icmp folding #70845

Conversation

leo-ard commented Oct 31, 2023

llvmbot commented Oct 31, 2023 • edited Loading

leo-ard commented Nov 1, 2023

leo-ard commented Nov 8, 2023

github-actions bot commented Nov 10, 2023 • edited Loading

dtcxzyw Nov 11, 2023

Choose a reason for hiding this comment

dtcxzyw Nov 11, 2023

Choose a reason for hiding this comment

leo-ard Nov 11, 2023

Choose a reason for hiding this comment

nikic Nov 11, 2023

Choose a reason for hiding this comment

dtcxzyw Nov 11, 2023

Choose a reason for hiding this comment

dtcxzyw left a comment

Choose a reason for hiding this comment

nikic left a comment

Choose a reason for hiding this comment

dtcxzyw left a comment

Choose a reason for hiding this comment

leo-ard commented Nov 12, 2023 • edited Loading

llvmbot commented Oct 31, 2023 •

edited

Loading

github-actions bot commented Nov 10, 2023 •

edited

Loading

leo-ard commented Nov 12, 2023 •

edited

Loading