Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] Use zext's nneg flag for icmp folding #70845

Merged

Conversation

leo-ard
Copy link
Contributor

@leo-ard leo-ard commented Oct 31, 2023

This PR fixes #55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics.

For the story :

Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented #67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly.

TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine.

This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.

@leo-ard leo-ard requested a review from nikic as a code owner October 31, 2023 18:01
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 llvm:transforms labels Oct 31, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 31, 2023

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: Léonard Oest O'Leary (leo-ard)

Changes

This PR fixes #55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics.

For the story :

Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented #67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly.

TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine.

This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.


Full diff: https://github.com/llvm/llvm-project/pull/70845.diff

5 Files Affected:

  • (added) clang/test/CodeGen/X86/min_max.c (+19)
  • (modified) llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp (+8-2)
  • (modified) llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll (+2-3)
  • (modified) llvm/test/Transforms/InstCombine/icmp-ext-ext.ll (+16-33)
  • (added) llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll (+185)
diff --git a/clang/test/CodeGen/X86/min_max.c b/clang/test/CodeGen/X86/min_max.c
new file mode 100644
index 000000000000000..7af8181cc9ff367
--- /dev/null
+++ b/clang/test/CodeGen/X86/min_max.c
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 %s -O2 -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s
+
+short vecreduce_smax_v2i16(int n, short* v)
+{
+  // CHECK: @llvm.smax
+  short p = 0;
+  for (int i = 0; i < n; ++i)
+    p = p < v[i] ? v[i] : p;
+  return p;
+}
+
+short vecreduce_smin_v2i16(int n, short* v)
+{
+  // CHECK: @llvm.smin
+  short p = 0;
+  for (int i = 0; i < n; ++i)
+    p = p > v[i] ? v[i] : p;
+  return p;
+}
\ No newline at end of file
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 2ff27abc79318c4..572872397b6baae 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -5587,11 +5587,17 @@ Instruction *InstCombinerImpl::foldICmpWithZextOrSext(ICmpInst &ICmp) {
         return new ICmpInst(ICmp.getPredicate(), Builder.CreateOr(X, Y),
                             Constant::getNullValue(X->getType()));
 
+      // Treat "zext nneg" as "sext"
+      bool IsNonNeg0 = isa<PossiblyNonNegInst>(ICmp.getOperand(0));
+      bool IsNonNeg1 = isa<PossiblyNonNegInst>(ICmp.getOperand(1));
+
       // If we have mismatched casts, treat the zext of a non-negative source as
       // a sext to simulate matching casts. Otherwise, we are done.
       // TODO: Can we handle some predicates (equality) without non-negative?
-      if ((IsZext0 && isKnownNonNegative(X, DL, 0, &AC, &ICmp, &DT)) ||
-          (IsZext1 && isKnownNonNegative(Y, DL, 0, &AC, &ICmp, &DT)))
+      if ((IsZext0 &&
+           (IsNonNeg0 || isKnownNonNegative(X, DL, 0, &AC, &ICmp, &DT))) ||
+          (IsZext1 &&
+           (IsNonNeg1 || isKnownNonNegative(Y, DL, 0, &AC, &ICmp, &DT))))
         IsSignedExt = true;
       else
         return nullptr;
diff --git a/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll b/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll
index 2e70a95dfde6233..b24a71b8dc15ea6 100644
--- a/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll
+++ b/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll
@@ -403,9 +403,8 @@ define i1 @different_size_sext_sext_ule(i7 %x, i4 %y) {
 
 define i1 @different_size_sext_zext_ne(i7 %x, i4 %y) {
 ; CHECK-LABEL: @different_size_sext_zext_ne(
-; CHECK-NEXT:    [[SX:%.*]] = sext i7 [[X:%.*]] to i25
-; CHECK-NEXT:    [[ZY:%.*]] = zext i4 [[Y:%.*]] to i25
-; CHECK-NEXT:    [[R:%.*]] = icmp ne i25 [[SX]], [[ZY]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i4 [[Y:%.*]] to i7
+; CHECK-NEXT:    [[R:%.*]] = icmp ne i7 [[TMP1]], [[X:%.*]]
 ; CHECK-NEXT:    ret i1 [[R]]
 ;
   %sx = sext i7 %x to i25
diff --git a/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll b/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll
index f70e48e27384619..87532c1faff1526 100644
--- a/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll
@@ -119,9 +119,7 @@ define <2 x i1> @sext_sext_uge_op0_wide(<2 x i16> %x, <2 x i8> %y) {
 
 define i1 @zext_sext_sgt(i8 %x, i8 %y) {
 ; CHECK-LABEL: @zext_sext_sgt(
-; CHECK-NEXT:    [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp sgt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp sgt i8 [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = zext i8 %x to i32
@@ -132,9 +130,7 @@ define i1 @zext_sext_sgt(i8 %x, i8 %y) {
 
 define i1 @zext_sext_ugt(i8 %x, i8 %y) {
 ; CHECK-LABEL: @zext_sext_ugt(
-; CHECK-NEXT:    [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp ugt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp ugt i8 [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = zext i8 %x to i32
@@ -145,9 +141,7 @@ define i1 @zext_sext_ugt(i8 %x, i8 %y) {
 
 define i1 @zext_sext_eq(i8 %x, i8 %y) {
 ; CHECK-LABEL: @zext_sext_eq(
-; CHECK-NEXT:    [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp eq i32 [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp eq i8 [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = zext i8 %x to i32
@@ -158,9 +152,8 @@ define i1 @zext_sext_eq(i8 %x, i8 %y) {
 
 define i1 @zext_sext_sle_op0_narrow(i8 %x, i16 %y) {
 ; CHECK-LABEL: @zext_sext_sle_op0_narrow(
-; CHECK-NEXT:    [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = sext i16 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp sle i32 [[A]], [[B]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[X:%.*]] to i16
+; CHECK-NEXT:    [[C:%.*]] = icmp sle i16 [[TMP1]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = zext i8 %x to i32
@@ -171,9 +164,8 @@ define i1 @zext_sext_sle_op0_narrow(i8 %x, i16 %y) {
 
 define i1 @zext_sext_ule_op0_wide(i9 %x, i8 %y) {
 ; CHECK-LABEL: @zext_sext_ule_op0_wide(
-; CHECK-NEXT:    [[A:%.*]] = zext i9 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp ule i32 [[A]], [[B]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[Y:%.*]] to i9
+; CHECK-NEXT:    [[C:%.*]] = icmp uge i9 [[TMP1]], [[X:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = zext i9 %x to i32
@@ -184,9 +176,7 @@ define i1 @zext_sext_ule_op0_wide(i9 %x, i8 %y) {
 
 define i1 @sext_zext_slt(i8 %x, i8 %y) {
 ; CHECK-LABEL: @sext_zext_slt(
-; CHECK-NEXT:    [[A:%.*]] = sext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp slt i32 [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp slt i8 [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = sext i8 %x to i32
@@ -197,9 +187,7 @@ define i1 @sext_zext_slt(i8 %x, i8 %y) {
 
 define i1 @sext_zext_ult(i8 %x, i8 %y) {
 ; CHECK-LABEL: @sext_zext_ult(
-; CHECK-NEXT:    [[A:%.*]] = sext i8 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp ult i32 [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp ult i8 [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = sext i8 %x to i32
@@ -210,9 +198,7 @@ define i1 @sext_zext_ult(i8 %x, i8 %y) {
 
 define <2 x i1> @sext_zext_ne(<2 x i8> %x, <2 x i8> %y) {
 ; CHECK-LABEL: @sext_zext_ne(
-; CHECK-NEXT:    [[A:%.*]] = sext <2 x i8> [[X:%.*]] to <2 x i32>
-; CHECK-NEXT:    [[B:%.*]] = zext <2 x i8> [[Y:%.*]] to <2 x i32>
-; CHECK-NEXT:    [[C:%.*]] = icmp ne <2 x i32> [[A]], [[B]]
+; CHECK-NEXT:    [[C:%.*]] = icmp ne <2 x i8> [[X:%.*]], [[Y:%.*]]
 ; CHECK-NEXT:    ret <2 x i1> [[C]]
 ;
   %a = sext <2 x i8> %x to <2 x i32>
@@ -223,9 +209,8 @@ define <2 x i1> @sext_zext_ne(<2 x i8> %x, <2 x i8> %y) {
 
 define i1 @sext_zext_sge_op0_narrow(i5 %x, i8 %y) {
 ; CHECK-LABEL: @sext_zext_sge_op0_narrow(
-; CHECK-NEXT:    [[A:%.*]] = sext i5 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp sge i32 [[A]], [[B]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i5 [[X:%.*]] to i8
+; CHECK-NEXT:    [[C:%.*]] = icmp sge i8 [[TMP1]], [[Y:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = sext i5 %x to i32
@@ -236,9 +221,8 @@ define i1 @sext_zext_sge_op0_narrow(i5 %x, i8 %y) {
 
 define i1 @sext_zext_uge_op0_wide(i16 %x, i8 %y) {
 ; CHECK-LABEL: @sext_zext_uge_op0_wide(
-; CHECK-NEXT:    [[A:%.*]] = sext i16 [[X:%.*]] to i32
-; CHECK-NEXT:    [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT:    [[C:%.*]] = icmp uge i32 [[A]], [[B]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i8 [[Y:%.*]] to i16
+; CHECK-NEXT:    [[C:%.*]] = icmp ule i16 [[TMP1]], [[X:%.*]]
 ; CHECK-NEXT:    ret i1 [[C]]
 ;
   %a = sext i16 %x to i32
@@ -396,9 +380,8 @@ define i1 @zext_eq_sext(i1 %a, i1 %b) {
 
 define i1 @zext_eq_sext_fail_not_i1(i1 %a, i8 %b) {
 ; CHECK-LABEL: @zext_eq_sext_fail_not_i1(
-; CHECK-NEXT:    [[CONV:%.*]] = zext i1 [[A:%.*]] to i32
-; CHECK-NEXT:    [[CONV3_NEG:%.*]] = sext i8 [[B:%.*]] to i32
-; CHECK-NEXT:    [[TOBOOL4:%.*]] = icmp eq i32 [[CONV]], [[CONV3_NEG]]
+; CHECK-NEXT:    [[TMP1:%.*]] = sext i1 [[A:%.*]] to i8
+; CHECK-NEXT:    [[TOBOOL4:%.*]] = icmp eq i8 [[TMP1]], [[B:%.*]]
 ; CHECK-NEXT:    ret i1 [[TOBOOL4]]
 ;
   %conv = zext i1 %a to i32
diff --git a/llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll b/llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll
new file mode 100644
index 000000000000000..4e9597f7e70daaa
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll
@@ -0,0 +1,185 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --tool ./bin/opt --version 3
+; See PRXXX for more details
+; RUN-./bin/opt: opt < %s -S -passes=ipsccp | FileCheck %s
+
+
+define signext i32 @sext_sext(i16 %x, i16 %y) {
+; CHECK-LABEL: define signext i32 @sext_sext(
+; CHECK-SAME: i16 [[X:%.*]], i16 [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CONV:%.*]] = sext i16 [[X]] to i32
+; CHECK-NEXT:    [[CONV1:%.*]] = sext i16 [[Y]] to i32
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp sgt i16 [[X]], [[Y]]
+; CHECK-NEXT:    br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK:       cond.true:
+; CHECK-NEXT:    br label [[COND_END:%.*]]
+; CHECK:       cond.false:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       cond.end:
+; CHECK-NEXT:    [[COND:%.*]] = phi i32 [ 0, [[COND_TRUE]] ], [ 1, [[COND_FALSE]] ]
+; CHECK-NEXT:    ret i32 [[COND]]
+;
+entry:
+  %conv = sext i16 %x to i32
+  %conv1 = sext i16 %y to i32
+  %cmp2 = icmp sgt i32 %conv, %conv1
+  br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true:                                        ; preds = %for.body
+  br label %cond.end
+
+cond.false:                                       ; preds = %for.body
+  br label %cond.end
+
+cond.end:                                         ; preds = %cond.false, %cond.true
+  %cond = phi i32 [ 0, %cond.true ], [ 1, %cond.false ]
+  ret i32 %cond
+}
+
+
+define signext i32 @zext_zext(i16 %x, i16 %y) {
+; CHECK-LABEL: define signext i32 @zext_zext(
+; CHECK-SAME: i16 [[X:%.*]], i16 [[Y:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[CONV:%.*]] = zext i16 [[X]] to i32
+; CHECK-NEXT:    [[CONV1:%.*]] = zext i16 [[Y]] to i32
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp sgt i16 [[X]], [[Y]]
+; CHECK-NEXT:    br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK:       cond.true:
+; CHECK-NEXT:    br label [[COND_END:%.*]]
+; CHECK:       cond.false:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       cond.end:
+; CHECK-NEXT:    [[COND:%.*]] = phi i32 [ 0, [[COND_TRUE]] ], [ 1, [[COND_FALSE]] ]
+; CHECK-NEXT:    ret i32 [[COND]]
+;
+entry:
+  %conv = zext i16 %x to i32
+  %conv1 = zext i16 %y to i32
+  %cmp2 = icmp sgt i32 %conv, %conv1
+  br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true:                                        ; preds = %for.body
+  br label %cond.end
+
+cond.false:                                       ; preds = %for.body
+  br label %cond.end
+
+cond.end:                                         ; preds = %cond.false, %cond.true
+  %cond = phi i32 [ 0, %cond.true ], [ 1, %cond.false ]
+  ret i32 %cond
+}
+
+
+define signext i16 @zext_positive_and_sext(i32 noundef %n, ptr noundef %v) {
+; CHECK-LABEL: define signext i16 @zext_positive_and_sext(
+; CHECK-SAME: i32 noundef [[N:%.*]], ptr noundef [[V:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_COND:%.*]]
+; CHECK:       for.cond:
+; CHECK-NEXT:    [[P_0:%.*]] = phi i16 [ 0, [[ENTRY:%.*]] ], [ [[CONV8:%.*]], [[COND_END:%.*]] ]
+; CHECK-NEXT:    [[I_0:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC:%.*]], [[COND_END]] ]
+; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[I_0]], [[N]]
+; CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[CONV:%.*]] = zext i16 [[P_0]] to i32
+; CHECK-NEXT:    [[IDXPROM:%.*]] = sext i32 [[I_0]] to i64
+; CHECK-NEXT:    [[ARRAYIDX:%.*]] = getelementptr i16, ptr [[V]], i64 [[IDXPROM]]
+; CHECK-NEXT:    [[TMP0:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
+; CHECK-NEXT:    [[CONV1:%.*]] = sext i16 [[TMP0]] to i32
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp slt i16 [[P_0]], [[TMP0]]
+; CHECK-NEXT:    br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK:       cond.true:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       cond.false:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       for.cond.cleanup:
+; CHECK-NEXT:    ret i16 [[P_0]]
+; CHECK:       cond.end:
+; CHECK-NEXT:    [[COND:%.*]] = phi i32 [ [[CONV1]], [[COND_TRUE]] ], [ [[CONV]], [[COND_FALSE]] ]
+; CHECK-NEXT:    [[CONV8]] = trunc i32 [[COND]] to i16
+; CHECK-NEXT:    [[INC]] = add nsw i32 [[I_0]], 1
+; CHECK-NEXT:    br label [[FOR_COND]]
+;
+entry:
+  br label %for.cond
+
+for.cond:                                         ; preds = %cond.end, %entry
+  %p.0 = phi i16 [ 0, %entry ], [ %conv8, %cond.end ]
+  %i.0 = phi i32 [ 0, %entry ], [ %inc, %cond.end ]
+  %cmp = icmp slt i32 %i.0, %n
+  br i1 %cmp, label %for.body, label %for.cond.cleanup
+
+for.body:                                         ; preds = %for.cond
+  %conv = zext i16 %p.0 to i32                    ;; %p.0 is always positive here
+  %idxprom = sext i32 %i.0 to i64
+  %arrayidx = getelementptr i16, ptr %v, i64 %idxprom
+  %0 = load i16, ptr %arrayidx, align 2
+  %conv1 = sext i16 %0 to i32
+  %cmp2 = icmp slt i32 %conv, %conv1
+  br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true:                                        ; preds = %for.body
+  br label %cond.end
+
+cond.false:                                       ; preds = %for.body
+  br label %cond.end
+
+for.cond.cleanup:                                 ; preds = %for.cond
+  ret i16 %p.0
+
+cond.end:                                         ; preds = %cond.false, %cond.true
+  %cond = phi i32 [ %conv1, %cond.true ], [ %conv, %cond.false ]
+  %conv8 = trunc i32 %cond to i16
+  %inc = add nsw i32 %i.0, 1
+  br label %for.cond
+}
+
+
+
+define signext i16 @sext_and_zext_positive(i16 %x) {
+; CHECK-LABEL: define signext i16 @sext_and_zext_positive(
+; CHECK-SAME: i16 [[X:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_COND:%.*]]
+; CHECK:       for.cond:
+; CHECK-NEXT:    [[V:%.*]] = phi i16 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[COND_END:%.*]] ]
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
+; CHECK:       for.body:
+; CHECK-NEXT:    [[CONV:%.*]] = zext i16 [[V]] to i32
+; CHECK-NEXT:    [[CONV1:%.*]] = sext i16 [[X]] to i32
+; CHECK-NEXT:    [[CMP2:%.*]] = icmp slt i16 [[X]], [[V]]
+; CHECK-NEXT:    br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK:       cond.true:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       cond.false:
+; CHECK-NEXT:    br label [[COND_END]]
+; CHECK:       cond.end:
+; CHECK-NEXT:    [[A:%.*]] = phi i16 [ 10, [[COND_TRUE]] ], [ 20, [[COND_FALSE]] ]
+; CHECK-NEXT:    [[INC]] = add nuw nsw i16 [[A]], 1
+; CHECK-NEXT:    br label [[FOR_COND]]
+;
+entry:
+  br label %for.cond
+
+for.cond:                                         ; preds = %cond.end, %entry
+  %v = phi i16 [ 0, %entry ], [ %inc, %cond.end ] ;; always positive
+  br label %for.body
+
+for.body:                                         ; preds = %for.cond
+  %conv = zext i16 %v to i32                    ;; %p.0 is always positive here
+  %conv1 = sext i16 %x to i32                    ;; %p.0 is always positive here
+  %cmp2 = icmp slt i32 %conv1, %conv ;; positive/positive
+  br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true:                                        ; preds = %for.body
+  br label %cond.end
+
+cond.false:                                       ; preds = %for.body
+  br label %cond.end
+
+cond.end:                                         ; preds = %cond.false, %cond.true
+  %a = phi i16 [ 10, %cond.true ],  [ 20, %cond.false ]
+  %inc = add i16 %a, 1
+  br label %for.cond
+}

clang/test/CodeGen/X86/min_max.c Outdated Show resolved Hide resolved
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp Outdated Show resolved Hide resolved
llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll Outdated Show resolved Hide resolved
@leo-ard leo-ard force-pushed the leo-ard/instcombine-use-nneg-flag-for-icmp-folding branch from c737a83 to 24bf431 Compare November 1, 2023 19:05
@leo-ard
Copy link
Contributor Author

leo-ard commented Nov 1, 2023

Thanks for taking the time to review the PR. I just added another test in PhaseOrdering to make sure that the min/max intrinsics are generated

@leo-ard leo-ard requested a review from nikic November 2, 2023 13:32
@leo-ard
Copy link
Contributor Author

leo-ard commented Nov 8, 2023

@nikic ping:)

@leo-ard leo-ard force-pushed the leo-ard/instcombine-use-nneg-flag-for-icmp-folding branch from fb714bd to 3f2ad99 Compare November 10, 2023 19:38
Copy link

github-actions bot commented Nov 10, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@leo-ard leo-ard requested a review from dtcxzyw November 11, 2023 03:03
@@ -0,0 +1,145 @@
; RUN: opt < %s --O3 -S | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please:

  1. Convert this IR into SSA form as @nikic mentioned.
  2. Drop unused attributes.
  3. Re-generate tests with llvm/utils/update_test_checks.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define i16 @vecreduce_smin_v2i16(i32 %n, ptr %v) {
entry:
  br label %for.cond

for.cond:
  %p.0 = phi i16 [ 0, %entry ], [ %conv8, %for.inc ]
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
  %cmp = icmp slt i32 %i.0, %n
  br i1 %cmp, label %for.body, label %for.end

for.body:
  %conv = sext i16 %p.0 to i32
  %idxprom = sext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds i16, ptr %v, i64 %idxprom
  %0 = load i16, ptr %arrayidx, align 2
  %conv1 = sext i16 %0 to i32
  %cmp2 = icmp sgt i32 %conv, %conv1
  br i1 %cmp2, label %cond.true, label %cond.false

cond.true:
  %idxprom4 = sext i32 %i.0 to i64
  %arrayidx5 = getelementptr inbounds i16, ptr %v, i64 %idxprom4
  %1 = load i16, ptr %arrayidx5, align 2
  %conv6 = sext i16 %1 to i32
  br label %cond.end

cond.false:
  %conv7 = sext i16 %p.0 to i32
  br label %cond.end

cond.end:
  %cond = phi i32 [ %conv6, %cond.true ], [ %conv7, %cond.false ]
  %conv8 = trunc i32 %cond to i16
  br label %for.inc

for.inc:
  %inc = add nsw i32 %i.0, 1
  br label %for.cond

for.end:
  ret i16 %p.0
}

define i16 @vecreduce_smax_v2i16(i32 %n, ptr %v) {
entry:
  br label %for.cond

for.cond:
  %p.0 = phi i16 [ 0, %entry ], [ %conv8, %for.inc ]
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
  %cmp = icmp slt i32 %i.0, %n
  br i1 %cmp, label %for.body, label %for.end

for.body:
  %conv = sext i16 %p.0 to i32
  %idxprom = sext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds i16, ptr %v, i64 %idxprom
  %0 = load i16, ptr %arrayidx, align 2
  %conv1 = sext i16 %0 to i32
  %cmp2 = icmp slt i32 %conv, %conv1
  br i1 %cmp2, label %cond.true, label %cond.false

cond.true:
  %idxprom4 = sext i32 %i.0 to i64
  %arrayidx5 = getelementptr inbounds i16, ptr %v, i64 %idxprom4
  %1 = load i16, ptr %arrayidx5, align 2
  %conv6 = sext i16 %1 to i32
  br label %cond.end

cond.false:
  %conv7 = sext i16 %p.0 to i32
  br label %cond.end

cond.end:
  %cond = phi i32 [ %conv6, %cond.true ], [ %conv7, %cond.false ]
  %conv8 = trunc i32 %cond to i16
  br label %for.inc

for.inc:
  %inc = add nsw i32 %i.0, 1
  br label %for.cond

for.end:
  ret i16 %p.0
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you get the SSA form ? I wasn't able to do it on my side. Here are the commands that I did :

> build_release/bin/clang -S -emit-llvm min_max.c -fno-discard-value-names -o min_max.ll
> build_release/bin/opt -S -passes=sroa min_max.ll > min_max2.ll
> head -n 20 min_max2.ll
; ModuleID = 'min_max.ll'
source_filename = "min_max.c"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-macosx14.0.0"

; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
define signext i16 @vecreduce_smin_v2i16(i32 noundef %n, ptr noundef %v) #0 {
entry:
  %n.addr = alloca i32, align 4
  %v.addr = alloca ptr, align 8
  %p = alloca i16, align 2
  %i = alloca i32, align 4
  store i32 %n, ptr %n.addr, align 4
  store ptr %v, ptr %v.addr, align 8
  store i16 0, ptr %p, align 2
  store i32 0, ptr %i, align 4
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %entry
  %0 = load i32, ptr %i, align 4

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use -O2 -Xclang -disable-llvm-optzns, or manually drop the optnone attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My command: bin/opt -S -O3 -print-changed min_max.ll
Then I got IR after the SROA pass :)

llvm/test/Transforms/PhaseOrdering/min_max_loop.ll Outdated Show resolved Hide resolved
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

llvm/test/Transforms/PhaseOrdering/min_max_loop.ll Outdated Show resolved Hide resolved
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nikic nikic changed the title [Instcombine] use zext's nneg flag for icmp folding [InstCombine] Use zext's nneg flag for icmp folding Nov 11, 2023
@leo-ard leo-ard requested a review from dtcxzyw November 11, 2023 21:27
Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

Do you have the access to merge PR?

@leo-ard
Copy link
Contributor Author

leo-ard commented Nov 12, 2023

Do you have the access to merge PR?

No I don't. This is my first PR on LLVM:)

Thanks for your time and insightful comments !

@dtcxzyw dtcxzyw merged commit ff36411 into llvm:main Nov 12, 2023
3 checks passed
zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023
This PR fixes llvm#55013 : the
max intrinsics is not generated for this simple loop case :
https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being
folded into a select, thus not generating the max intrinsics.

For the story :

Since LLVM 14, SCCP pass got smarter by folding sext into zext for
positive ranges : https://reviews.llvm.org/D81756. After this change,
InstCombine was sometimes unable to fold ICMP correctly as both of the
arguments pointed to mismatched zext/sext. To fix this, @rotateright
implemented this fix : https://reviews.llvm.org/D124419 that tries to
resolve the mismatch by knowing if the argument of a zext is positive
(in which case, it is like a sext) by using ValueTracking, however
ValueTracking is not smart enough to infer that the value is positive in
some cases. Recently, @nikic implemented llvm#67982 which keeps the
information that a zext is non-negative. This PR simply uses this
information to do the folding accordingly.

TLDR : This PR uses the recent nneg tag on zext to fold the icmp
accordingly in instcombine.

This PR also contains test cases for sext/zext folding with InstCombine
as well as a x86 regression tests for the max/min case.
Guzhu-AMD pushed a commit to GPUOpen-Drivers/llvm-project that referenced this pull request Nov 23, 2023
Local branch amd-gfx 958d879 Merged main:8569465adf5e into amd-gfx:c81d641827d7
Remote branch main ff36411 [InstCombine] Use zexts nneg flag for icmp folding (llvm#70845)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang Clang issues not falling into any other category llvm:transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Instruction pmaxsw not generated
4 participants