-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstCombine] Use zext's nneg flag for icmp folding #70845
[InstCombine] Use zext's nneg flag for icmp folding #70845
Conversation
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-x86 Author: Léonard Oest O'Leary (leo-ard) ChangesThis PR fixes #55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics. For the story : Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented #67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly. TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine. This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case. Full diff: https://github.com/llvm/llvm-project/pull/70845.diff 5 Files Affected:
diff --git a/clang/test/CodeGen/X86/min_max.c b/clang/test/CodeGen/X86/min_max.c
new file mode 100644
index 000000000000000..7af8181cc9ff367
--- /dev/null
+++ b/clang/test/CodeGen/X86/min_max.c
@@ -0,0 +1,19 @@
+// RUN: %clang_cc1 %s -O2 -triple=x86_64-apple-darwin -emit-llvm -o - | FileCheck %s
+
+short vecreduce_smax_v2i16(int n, short* v)
+{
+ // CHECK: @llvm.smax
+ short p = 0;
+ for (int i = 0; i < n; ++i)
+ p = p < v[i] ? v[i] : p;
+ return p;
+}
+
+short vecreduce_smin_v2i16(int n, short* v)
+{
+ // CHECK: @llvm.smin
+ short p = 0;
+ for (int i = 0; i < n; ++i)
+ p = p > v[i] ? v[i] : p;
+ return p;
+}
\ No newline at end of file
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index 2ff27abc79318c4..572872397b6baae 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -5587,11 +5587,17 @@ Instruction *InstCombinerImpl::foldICmpWithZextOrSext(ICmpInst &ICmp) {
return new ICmpInst(ICmp.getPredicate(), Builder.CreateOr(X, Y),
Constant::getNullValue(X->getType()));
+ // Treat "zext nneg" as "sext"
+ bool IsNonNeg0 = isa<PossiblyNonNegInst>(ICmp.getOperand(0));
+ bool IsNonNeg1 = isa<PossiblyNonNegInst>(ICmp.getOperand(1));
+
// If we have mismatched casts, treat the zext of a non-negative source as
// a sext to simulate matching casts. Otherwise, we are done.
// TODO: Can we handle some predicates (equality) without non-negative?
- if ((IsZext0 && isKnownNonNegative(X, DL, 0, &AC, &ICmp, &DT)) ||
- (IsZext1 && isKnownNonNegative(Y, DL, 0, &AC, &ICmp, &DT)))
+ if ((IsZext0 &&
+ (IsNonNeg0 || isKnownNonNegative(X, DL, 0, &AC, &ICmp, &DT))) ||
+ (IsZext1 &&
+ (IsNonNeg1 || isKnownNonNegative(Y, DL, 0, &AC, &ICmp, &DT))))
IsSignedExt = true;
else
return nullptr;
diff --git a/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll b/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll
index 2e70a95dfde6233..b24a71b8dc15ea6 100644
--- a/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll
+++ b/llvm/test/Transforms/InstCombine/2004-11-27-SetCCForCastLargerAndConstant.ll
@@ -403,9 +403,8 @@ define i1 @different_size_sext_sext_ule(i7 %x, i4 %y) {
define i1 @different_size_sext_zext_ne(i7 %x, i4 %y) {
; CHECK-LABEL: @different_size_sext_zext_ne(
-; CHECK-NEXT: [[SX:%.*]] = sext i7 [[X:%.*]] to i25
-; CHECK-NEXT: [[ZY:%.*]] = zext i4 [[Y:%.*]] to i25
-; CHECK-NEXT: [[R:%.*]] = icmp ne i25 [[SX]], [[ZY]]
+; CHECK-NEXT: [[TMP1:%.*]] = sext i4 [[Y:%.*]] to i7
+; CHECK-NEXT: [[R:%.*]] = icmp ne i7 [[TMP1]], [[X:%.*]]
; CHECK-NEXT: ret i1 [[R]]
;
%sx = sext i7 %x to i25
diff --git a/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll b/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll
index f70e48e27384619..87532c1faff1526 100644
--- a/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-ext-ext.ll
@@ -119,9 +119,7 @@ define <2 x i1> @sext_sext_uge_op0_wide(<2 x i16> %x, <2 x i8> %y) {
define i1 @zext_sext_sgt(i8 %x, i8 %y) {
; CHECK-LABEL: @zext_sext_sgt(
-; CHECK-NEXT: [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT: [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT: [[C:%.*]] = icmp sgt i32 [[A]], [[B]]
+; CHECK-NEXT: [[C:%.*]] = icmp sgt i8 [[X:%.*]], [[Y:%.*]]
; CHECK-NEXT: ret i1 [[C]]
;
%a = zext i8 %x to i32
@@ -132,9 +130,7 @@ define i1 @zext_sext_sgt(i8 %x, i8 %y) {
define i1 @zext_sext_ugt(i8 %x, i8 %y) {
; CHECK-LABEL: @zext_sext_ugt(
-; CHECK-NEXT: [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT: [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT: [[C:%.*]] = icmp ugt i32 [[A]], [[B]]
+; CHECK-NEXT: [[C:%.*]] = icmp ugt i8 [[X:%.*]], [[Y:%.*]]
; CHECK-NEXT: ret i1 [[C]]
;
%a = zext i8 %x to i32
@@ -145,9 +141,7 @@ define i1 @zext_sext_ugt(i8 %x, i8 %y) {
define i1 @zext_sext_eq(i8 %x, i8 %y) {
; CHECK-LABEL: @zext_sext_eq(
-; CHECK-NEXT: [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT: [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT: [[C:%.*]] = icmp eq i32 [[A]], [[B]]
+; CHECK-NEXT: [[C:%.*]] = icmp eq i8 [[X:%.*]], [[Y:%.*]]
; CHECK-NEXT: ret i1 [[C]]
;
%a = zext i8 %x to i32
@@ -158,9 +152,8 @@ define i1 @zext_sext_eq(i8 %x, i8 %y) {
define i1 @zext_sext_sle_op0_narrow(i8 %x, i16 %y) {
; CHECK-LABEL: @zext_sext_sle_op0_narrow(
-; CHECK-NEXT: [[A:%.*]] = zext i8 [[X:%.*]] to i32
-; CHECK-NEXT: [[B:%.*]] = sext i16 [[Y:%.*]] to i32
-; CHECK-NEXT: [[C:%.*]] = icmp sle i32 [[A]], [[B]]
+; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[X:%.*]] to i16
+; CHECK-NEXT: [[C:%.*]] = icmp sle i16 [[TMP1]], [[Y:%.*]]
; CHECK-NEXT: ret i1 [[C]]
;
%a = zext i8 %x to i32
@@ -171,9 +164,8 @@ define i1 @zext_sext_sle_op0_narrow(i8 %x, i16 %y) {
define i1 @zext_sext_ule_op0_wide(i9 %x, i8 %y) {
; CHECK-LABEL: @zext_sext_ule_op0_wide(
-; CHECK-NEXT: [[A:%.*]] = zext i9 [[X:%.*]] to i32
-; CHECK-NEXT: [[B:%.*]] = sext i8 [[Y:%.*]] to i32
-; CHECK-NEXT: [[C:%.*]] = icmp ule i32 [[A]], [[B]]
+; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[Y:%.*]] to i9
+; CHECK-NEXT: [[C:%.*]] = icmp uge i9 [[TMP1]], [[X:%.*]]
; CHECK-NEXT: ret i1 [[C]]
;
%a = zext i9 %x to i32
@@ -184,9 +176,7 @@ define i1 @zext_sext_ule_op0_wide(i9 %x, i8 %y) {
define i1 @sext_zext_slt(i8 %x, i8 %y) {
; CHECK-LABEL: @sext_zext_slt(
-; CHECK-NEXT: [[A:%.*]] = sext i8 [[X:%.*]] to i32
-; CHECK-NEXT: [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT: [[C:%.*]] = icmp slt i32 [[A]], [[B]]
+; CHECK-NEXT: [[C:%.*]] = icmp slt i8 [[X:%.*]], [[Y:%.*]]
; CHECK-NEXT: ret i1 [[C]]
;
%a = sext i8 %x to i32
@@ -197,9 +187,7 @@ define i1 @sext_zext_slt(i8 %x, i8 %y) {
define i1 @sext_zext_ult(i8 %x, i8 %y) {
; CHECK-LABEL: @sext_zext_ult(
-; CHECK-NEXT: [[A:%.*]] = sext i8 [[X:%.*]] to i32
-; CHECK-NEXT: [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT: [[C:%.*]] = icmp ult i32 [[A]], [[B]]
+; CHECK-NEXT: [[C:%.*]] = icmp ult i8 [[X:%.*]], [[Y:%.*]]
; CHECK-NEXT: ret i1 [[C]]
;
%a = sext i8 %x to i32
@@ -210,9 +198,7 @@ define i1 @sext_zext_ult(i8 %x, i8 %y) {
define <2 x i1> @sext_zext_ne(<2 x i8> %x, <2 x i8> %y) {
; CHECK-LABEL: @sext_zext_ne(
-; CHECK-NEXT: [[A:%.*]] = sext <2 x i8> [[X:%.*]] to <2 x i32>
-; CHECK-NEXT: [[B:%.*]] = zext <2 x i8> [[Y:%.*]] to <2 x i32>
-; CHECK-NEXT: [[C:%.*]] = icmp ne <2 x i32> [[A]], [[B]]
+; CHECK-NEXT: [[C:%.*]] = icmp ne <2 x i8> [[X:%.*]], [[Y:%.*]]
; CHECK-NEXT: ret <2 x i1> [[C]]
;
%a = sext <2 x i8> %x to <2 x i32>
@@ -223,9 +209,8 @@ define <2 x i1> @sext_zext_ne(<2 x i8> %x, <2 x i8> %y) {
define i1 @sext_zext_sge_op0_narrow(i5 %x, i8 %y) {
; CHECK-LABEL: @sext_zext_sge_op0_narrow(
-; CHECK-NEXT: [[A:%.*]] = sext i5 [[X:%.*]] to i32
-; CHECK-NEXT: [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT: [[C:%.*]] = icmp sge i32 [[A]], [[B]]
+; CHECK-NEXT: [[TMP1:%.*]] = sext i5 [[X:%.*]] to i8
+; CHECK-NEXT: [[C:%.*]] = icmp sge i8 [[TMP1]], [[Y:%.*]]
; CHECK-NEXT: ret i1 [[C]]
;
%a = sext i5 %x to i32
@@ -236,9 +221,8 @@ define i1 @sext_zext_sge_op0_narrow(i5 %x, i8 %y) {
define i1 @sext_zext_uge_op0_wide(i16 %x, i8 %y) {
; CHECK-LABEL: @sext_zext_uge_op0_wide(
-; CHECK-NEXT: [[A:%.*]] = sext i16 [[X:%.*]] to i32
-; CHECK-NEXT: [[B:%.*]] = zext i8 [[Y:%.*]] to i32
-; CHECK-NEXT: [[C:%.*]] = icmp uge i32 [[A]], [[B]]
+; CHECK-NEXT: [[TMP1:%.*]] = sext i8 [[Y:%.*]] to i16
+; CHECK-NEXT: [[C:%.*]] = icmp ule i16 [[TMP1]], [[X:%.*]]
; CHECK-NEXT: ret i1 [[C]]
;
%a = sext i16 %x to i32
@@ -396,9 +380,8 @@ define i1 @zext_eq_sext(i1 %a, i1 %b) {
define i1 @zext_eq_sext_fail_not_i1(i1 %a, i8 %b) {
; CHECK-LABEL: @zext_eq_sext_fail_not_i1(
-; CHECK-NEXT: [[CONV:%.*]] = zext i1 [[A:%.*]] to i32
-; CHECK-NEXT: [[CONV3_NEG:%.*]] = sext i8 [[B:%.*]] to i32
-; CHECK-NEXT: [[TOBOOL4:%.*]] = icmp eq i32 [[CONV]], [[CONV3_NEG]]
+; CHECK-NEXT: [[TMP1:%.*]] = sext i1 [[A:%.*]] to i8
+; CHECK-NEXT: [[TOBOOL4:%.*]] = icmp eq i8 [[TMP1]], [[B:%.*]]
; CHECK-NEXT: ret i1 [[TOBOOL4]]
;
%conv = zext i1 %a to i32
diff --git a/llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll b/llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll
new file mode 100644
index 000000000000000..4e9597f7e70daaa
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/icmp-fold-with-cast.ll
@@ -0,0 +1,185 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --tool ./bin/opt --version 3
+; See PRXXX for more details
+; RUN-./bin/opt: opt < %s -S -passes=ipsccp | FileCheck %s
+
+
+define signext i32 @sext_sext(i16 %x, i16 %y) {
+; CHECK-LABEL: define signext i32 @sext_sext(
+; CHECK-SAME: i16 [[X:%.*]], i16 [[Y:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CONV:%.*]] = sext i16 [[X]] to i32
+; CHECK-NEXT: [[CONV1:%.*]] = sext i16 [[Y]] to i32
+; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i16 [[X]], [[Y]]
+; CHECK-NEXT: br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK: cond.true:
+; CHECK-NEXT: br label [[COND_END:%.*]]
+; CHECK: cond.false:
+; CHECK-NEXT: br label [[COND_END]]
+; CHECK: cond.end:
+; CHECK-NEXT: [[COND:%.*]] = phi i32 [ 0, [[COND_TRUE]] ], [ 1, [[COND_FALSE]] ]
+; CHECK-NEXT: ret i32 [[COND]]
+;
+entry:
+ %conv = sext i16 %x to i32
+ %conv1 = sext i16 %y to i32
+ %cmp2 = icmp sgt i32 %conv, %conv1
+ br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true: ; preds = %for.body
+ br label %cond.end
+
+cond.false: ; preds = %for.body
+ br label %cond.end
+
+cond.end: ; preds = %cond.false, %cond.true
+ %cond = phi i32 [ 0, %cond.true ], [ 1, %cond.false ]
+ ret i32 %cond
+}
+
+
+define signext i32 @zext_zext(i16 %x, i16 %y) {
+; CHECK-LABEL: define signext i32 @zext_zext(
+; CHECK-SAME: i16 [[X:%.*]], i16 [[Y:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[CONV:%.*]] = zext i16 [[X]] to i32
+; CHECK-NEXT: [[CONV1:%.*]] = zext i16 [[Y]] to i32
+; CHECK-NEXT: [[CMP2:%.*]] = icmp sgt i16 [[X]], [[Y]]
+; CHECK-NEXT: br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK: cond.true:
+; CHECK-NEXT: br label [[COND_END:%.*]]
+; CHECK: cond.false:
+; CHECK-NEXT: br label [[COND_END]]
+; CHECK: cond.end:
+; CHECK-NEXT: [[COND:%.*]] = phi i32 [ 0, [[COND_TRUE]] ], [ 1, [[COND_FALSE]] ]
+; CHECK-NEXT: ret i32 [[COND]]
+;
+entry:
+ %conv = zext i16 %x to i32
+ %conv1 = zext i16 %y to i32
+ %cmp2 = icmp sgt i32 %conv, %conv1
+ br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true: ; preds = %for.body
+ br label %cond.end
+
+cond.false: ; preds = %for.body
+ br label %cond.end
+
+cond.end: ; preds = %cond.false, %cond.true
+ %cond = phi i32 [ 0, %cond.true ], [ 1, %cond.false ]
+ ret i32 %cond
+}
+
+
+define signext i16 @zext_positive_and_sext(i32 noundef %n, ptr noundef %v) {
+; CHECK-LABEL: define signext i16 @zext_positive_and_sext(
+; CHECK-SAME: i32 noundef [[N:%.*]], ptr noundef [[V:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br label [[FOR_COND:%.*]]
+; CHECK: for.cond:
+; CHECK-NEXT: [[P_0:%.*]] = phi i16 [ 0, [[ENTRY:%.*]] ], [ [[CONV8:%.*]], [[COND_END:%.*]] ]
+; CHECK-NEXT: [[I_0:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[INC:%.*]], [[COND_END]] ]
+; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[I_0]], [[N]]
+; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY:%.*]], label [[FOR_COND_CLEANUP:%.*]]
+; CHECK: for.body:
+; CHECK-NEXT: [[CONV:%.*]] = zext i16 [[P_0]] to i32
+; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[I_0]] to i64
+; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr i16, ptr [[V]], i64 [[IDXPROM]]
+; CHECK-NEXT: [[TMP0:%.*]] = load i16, ptr [[ARRAYIDX]], align 2
+; CHECK-NEXT: [[CONV1:%.*]] = sext i16 [[TMP0]] to i32
+; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i16 [[P_0]], [[TMP0]]
+; CHECK-NEXT: br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK: cond.true:
+; CHECK-NEXT: br label [[COND_END]]
+; CHECK: cond.false:
+; CHECK-NEXT: br label [[COND_END]]
+; CHECK: for.cond.cleanup:
+; CHECK-NEXT: ret i16 [[P_0]]
+; CHECK: cond.end:
+; CHECK-NEXT: [[COND:%.*]] = phi i32 [ [[CONV1]], [[COND_TRUE]] ], [ [[CONV]], [[COND_FALSE]] ]
+; CHECK-NEXT: [[CONV8]] = trunc i32 [[COND]] to i16
+; CHECK-NEXT: [[INC]] = add nsw i32 [[I_0]], 1
+; CHECK-NEXT: br label [[FOR_COND]]
+;
+entry:
+ br label %for.cond
+
+for.cond: ; preds = %cond.end, %entry
+ %p.0 = phi i16 [ 0, %entry ], [ %conv8, %cond.end ]
+ %i.0 = phi i32 [ 0, %entry ], [ %inc, %cond.end ]
+ %cmp = icmp slt i32 %i.0, %n
+ br i1 %cmp, label %for.body, label %for.cond.cleanup
+
+for.body: ; preds = %for.cond
+ %conv = zext i16 %p.0 to i32 ;; %p.0 is always positive here
+ %idxprom = sext i32 %i.0 to i64
+ %arrayidx = getelementptr i16, ptr %v, i64 %idxprom
+ %0 = load i16, ptr %arrayidx, align 2
+ %conv1 = sext i16 %0 to i32
+ %cmp2 = icmp slt i32 %conv, %conv1
+ br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true: ; preds = %for.body
+ br label %cond.end
+
+cond.false: ; preds = %for.body
+ br label %cond.end
+
+for.cond.cleanup: ; preds = %for.cond
+ ret i16 %p.0
+
+cond.end: ; preds = %cond.false, %cond.true
+ %cond = phi i32 [ %conv1, %cond.true ], [ %conv, %cond.false ]
+ %conv8 = trunc i32 %cond to i16
+ %inc = add nsw i32 %i.0, 1
+ br label %for.cond
+}
+
+
+
+define signext i16 @sext_and_zext_positive(i16 %x) {
+; CHECK-LABEL: define signext i16 @sext_and_zext_positive(
+; CHECK-SAME: i16 [[X:%.*]]) {
+; CHECK-NEXT: entry:
+; CHECK-NEXT: br label [[FOR_COND:%.*]]
+; CHECK: for.cond:
+; CHECK-NEXT: [[V:%.*]] = phi i16 [ 0, [[ENTRY:%.*]] ], [ [[INC:%.*]], [[COND_END:%.*]] ]
+; CHECK-NEXT: br label [[FOR_BODY:%.*]]
+; CHECK: for.body:
+; CHECK-NEXT: [[CONV:%.*]] = zext i16 [[V]] to i32
+; CHECK-NEXT: [[CONV1:%.*]] = sext i16 [[X]] to i32
+; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i16 [[X]], [[V]]
+; CHECK-NEXT: br i1 [[CMP2]], label [[COND_TRUE:%.*]], label [[COND_FALSE:%.*]]
+; CHECK: cond.true:
+; CHECK-NEXT: br label [[COND_END]]
+; CHECK: cond.false:
+; CHECK-NEXT: br label [[COND_END]]
+; CHECK: cond.end:
+; CHECK-NEXT: [[A:%.*]] = phi i16 [ 10, [[COND_TRUE]] ], [ 20, [[COND_FALSE]] ]
+; CHECK-NEXT: [[INC]] = add nuw nsw i16 [[A]], 1
+; CHECK-NEXT: br label [[FOR_COND]]
+;
+entry:
+ br label %for.cond
+
+for.cond: ; preds = %cond.end, %entry
+ %v = phi i16 [ 0, %entry ], [ %inc, %cond.end ] ;; always positive
+ br label %for.body
+
+for.body: ; preds = %for.cond
+ %conv = zext i16 %v to i32 ;; %p.0 is always positive here
+ %conv1 = sext i16 %x to i32 ;; %p.0 is always positive here
+ %cmp2 = icmp slt i32 %conv1, %conv ;; positive/positive
+ br i1 %cmp2, label %cond.true, label %cond.false
+
+cond.true: ; preds = %for.body
+ br label %cond.end
+
+cond.false: ; preds = %for.body
+ br label %cond.end
+
+cond.end: ; preds = %cond.false, %cond.true
+ %a = phi i16 [ 10, %cond.true ], [ 20, %cond.false ]
+ %inc = add i16 %a, 1
+ br label %for.cond
+}
|
c737a83
to
24bf431
Compare
Thanks for taking the time to review the PR. I just added another test in PhaseOrdering to make sure that the min/max intrinsics are generated |
@nikic ping:) |
fb714bd
to
3f2ad99
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
@@ -0,0 +1,145 @@ | |||
; RUN: opt < %s --O3 -S | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please:
- Convert this IR into SSA form as @nikic mentioned.
- Drop unused attributes.
- Re-generate tests with
llvm/utils/update_test_checks.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define i16 @vecreduce_smin_v2i16(i32 %n, ptr %v) {
entry:
br label %for.cond
for.cond:
%p.0 = phi i16 [ 0, %entry ], [ %conv8, %for.inc ]
%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
%cmp = icmp slt i32 %i.0, %n
br i1 %cmp, label %for.body, label %for.end
for.body:
%conv = sext i16 %p.0 to i32
%idxprom = sext i32 %i.0 to i64
%arrayidx = getelementptr inbounds i16, ptr %v, i64 %idxprom
%0 = load i16, ptr %arrayidx, align 2
%conv1 = sext i16 %0 to i32
%cmp2 = icmp sgt i32 %conv, %conv1
br i1 %cmp2, label %cond.true, label %cond.false
cond.true:
%idxprom4 = sext i32 %i.0 to i64
%arrayidx5 = getelementptr inbounds i16, ptr %v, i64 %idxprom4
%1 = load i16, ptr %arrayidx5, align 2
%conv6 = sext i16 %1 to i32
br label %cond.end
cond.false:
%conv7 = sext i16 %p.0 to i32
br label %cond.end
cond.end:
%cond = phi i32 [ %conv6, %cond.true ], [ %conv7, %cond.false ]
%conv8 = trunc i32 %cond to i16
br label %for.inc
for.inc:
%inc = add nsw i32 %i.0, 1
br label %for.cond
for.end:
ret i16 %p.0
}
define i16 @vecreduce_smax_v2i16(i32 %n, ptr %v) {
entry:
br label %for.cond
for.cond:
%p.0 = phi i16 [ 0, %entry ], [ %conv8, %for.inc ]
%i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
%cmp = icmp slt i32 %i.0, %n
br i1 %cmp, label %for.body, label %for.end
for.body:
%conv = sext i16 %p.0 to i32
%idxprom = sext i32 %i.0 to i64
%arrayidx = getelementptr inbounds i16, ptr %v, i64 %idxprom
%0 = load i16, ptr %arrayidx, align 2
%conv1 = sext i16 %0 to i32
%cmp2 = icmp slt i32 %conv, %conv1
br i1 %cmp2, label %cond.true, label %cond.false
cond.true:
%idxprom4 = sext i32 %i.0 to i64
%arrayidx5 = getelementptr inbounds i16, ptr %v, i64 %idxprom4
%1 = load i16, ptr %arrayidx5, align 2
%conv6 = sext i16 %1 to i32
br label %cond.end
cond.false:
%conv7 = sext i16 %p.0 to i32
br label %cond.end
cond.end:
%cond = phi i32 [ %conv6, %cond.true ], [ %conv7, %cond.false ]
%conv8 = trunc i32 %cond to i16
br label %for.inc
for.inc:
%inc = add nsw i32 %i.0, 1
br label %for.cond
for.end:
ret i16 %p.0
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you get the SSA form ? I wasn't able to do it on my side. Here are the commands that I did :
> build_release/bin/clang -S -emit-llvm min_max.c -fno-discard-value-names -o min_max.ll
> build_release/bin/opt -S -passes=sroa min_max.ll > min_max2.ll
> head -n 20 min_max2.ll
; ModuleID = 'min_max.ll'
source_filename = "min_max.c"
target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
target triple = "arm64-apple-macosx14.0.0"
; Function Attrs: noinline nounwind optnone ssp uwtable(sync)
define signext i16 @vecreduce_smin_v2i16(i32 noundef %n, ptr noundef %v) #0 {
entry:
%n.addr = alloca i32, align 4
%v.addr = alloca ptr, align 8
%p = alloca i16, align 2
%i = alloca i32, align 4
store i32 %n, ptr %n.addr, align 4
store ptr %v, ptr %v.addr, align 8
store i16 0, ptr %p, align 2
store i32 0, ptr %i, align 4
br label %for.cond
for.cond: ; preds = %for.inc, %entry
%0 = load i32, ptr %i, align 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to use -O2 -Xclang -disable-llvm-optzns
, or manually drop the optnone
attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My command: bin/opt -S -O3 -print-changed min_max.ll
Then I got IR after the SROA pass :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
Do you have the access to merge PR?
No I don't. This is my first PR on LLVM:) Thanks for your time and insightful comments ! |
This PR fixes llvm#55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics. For the story : Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented llvm#67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly. TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine. This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.
Local branch amd-gfx 958d879 Merged main:8569465adf5e into amd-gfx:c81d641827d7 Remote branch main ff36411 [InstCombine] Use zexts nneg flag for icmp folding (llvm#70845)
This PR fixes #55013 : the max intrinsics is not generated for this simple loop case : https://godbolt.org/z/hxz1xhMPh. This is caused by a ICMP not being folded into a select, thus not generating the max intrinsics.
For the story :
Since LLVM 14, SCCP pass got smarter by folding sext into zext for positive ranges : https://reviews.llvm.org/D81756. After this change, InstCombine was sometimes unable to fold ICMP correctly as both of the arguments pointed to mismatched zext/sext. To fix this, @rotateright implemented this fix : https://reviews.llvm.org/D124419 that tries to resolve the mismatch by knowing if the argument of a zext is positive (in which case, it is like a sext) by using ValueTracking, however ValueTracking is not smart enough to infer that the value is positive in some cases. Recently, @nikic implemented #67982 which keeps the information that a zext is non-negative. This PR simply uses this information to do the folding accordingly.
TLDR : This PR uses the recent nneg tag on zext to fold the icmp accordingly in instcombine.
This PR also contains test cases for sext/zext folding with InstCombine as well as a x86 regression tests for the max/min case.