-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GlobalIsel] Combine select to integer minmax. #77213
Conversation
Instcombine canonicalizes selects to floating point and integer minmax. This and the dag combiner canonicalize to floating point minmax. None of them canonicalizes to integer minmax. On Neoverse V2 basic integer arithmetic and integer minmax have the same costs.
The combine is inspired by: llvm-project/llvm/lib/Analysis/ValueTracking.cpp Line 7726 in 83be8a7
The lower code in the legalizer has to pick one of two possible predicate values:
I believe that is the cause for the changes in the |
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-aarch64 Author: Thorsten Schütt (tschuett) ChangesInstcombine canonicalizes selects to floating point and integer minmax. This and the dag combiner canonicalize to floating point minmax. None of them canonicalizes to integer minmax. On Neoverse V2 basic integer arithmetic and integer minmax have the same costs. Patch is 25.91 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77213.diff 5 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index dcc1a4580b14a2..a6e9406bed06a2 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -910,6 +910,9 @@ class CombinerHelper {
bool tryFoldSelectOfConstants(GSelect *Select, BuildFnTy &MatchInfo);
+ /// Try to fold (icmp X, Y) ? X : Y -> integer minmax.
+ bool tryFoldSelectToIntMinMax(GSelect *Select, BuildFnTy &MatchInfo);
+
bool isOneOrOneSplat(Register Src, bool AllowUndefs);
bool isZeroOrZeroSplat(Register Src, bool AllowUndefs);
bool isConstantSplatVector(Register Src, int64_t SplatValue,
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 8b15bdb0aca30b..f5fbacd2f3d608 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -6548,6 +6548,87 @@ bool CombinerHelper::tryFoldBoolSelectToLogic(GSelect *Select,
return false;
}
+bool CombinerHelper::tryFoldSelectToIntMinMax(GSelect *Select,
+ BuildFnTy &MatchInfo) {
+ Register DstReg = Select->getReg(0);
+ Register Cond = Select->getCondReg();
+ Register True = Select->getTrueReg();
+ Register False = Select->getFalseReg();
+ LLT DstTy = MRI.getType(DstReg);
+
+ // We need an G_ICMP on the condition register.
+ GICmp *Cmp = getOpcodeDef<GICmp>(Cond, MRI);
+ if (!Cmp)
+ return false;
+
+ CmpInst::Predicate Pred = Cmp->getCond();
+ // We need a larger or smaller predicate for
+ // canonicalization.
+ if (CmpInst::isEquality(Pred))
+ return false;
+
+ Register CmpLHS = Cmp->getLHSReg();
+ Register CmpRHS = Cmp->getRHSReg();
+
+ // We can swap CmpLHS and CmpRHS for higher hitrate.
+ if (True == CmpRHS && False == CmpLHS) {
+ std::swap(CmpLHS, CmpRHS);
+ Pred = CmpInst::getSwappedPredicate(Pred);
+ }
+
+ // (icmp X, Y) ? X : Y -> integer minmax.
+ // see matchSelectPattern in ValueTracking.
+ // Legality between G_SELECT and integer minmax can differ.
+ if (True == CmpLHS && False == CmpRHS) {
+ switch (Pred) {
+ case ICmpInst::ICMP_UGT:
+ case ICmpInst::ICMP_UGE: {
+ if (!isLegalOrBeforeLegalizer({TargetOpcode::G_UMAX, DstTy}))
+ return false;
+ MatchInfo = [=](MachineIRBuilder &B) {
+ B.setInstrAndDebugLoc(*Select);
+ B.buildUMax(DstReg, True, False);
+ };
+ return true;
+ }
+ case ICmpInst::ICMP_SGT:
+ case ICmpInst::ICMP_SGE: {
+ if (!isLegalOrBeforeLegalizer({TargetOpcode::G_SMAX, DstTy}))
+ return false;
+ MatchInfo = [=](MachineIRBuilder &B) {
+ B.setInstrAndDebugLoc(*Select);
+ B.buildSMax(DstReg, True, False);
+ };
+ return true;
+ }
+ case ICmpInst::ICMP_ULT:
+ case ICmpInst::ICMP_ULE: {
+ if (!isLegalOrBeforeLegalizer({TargetOpcode::G_UMIN, DstTy}))
+ return false;
+ MatchInfo = [=](MachineIRBuilder &B) {
+ B.setInstrAndDebugLoc(*Select);
+ B.buildUMin(DstReg, True, False);
+ };
+ return true;
+ }
+ case ICmpInst::ICMP_SLT:
+ case ICmpInst::ICMP_SLE: {
+ if (!isLegalOrBeforeLegalizer({TargetOpcode::G_SMIN, DstTy}))
+ return false;
+ MatchInfo = [=](MachineIRBuilder &B) {
+ B.setInstrAndDebugLoc(*Select);
+ B.buildSMin(DstReg, True, False);
+ };
+ return true;
+ }
+ default:
+ return false;
+ }
+ }
+
+ return false;
+}
+
bool CombinerHelper::matchSelect(MachineInstr &MI, BuildFnTy &MatchInfo) {
GSelect *Select = cast<GSelect>(&MI);
@@ -6557,5 +6638,8 @@ bool CombinerHelper::matchSelect(MachineInstr &MI, BuildFnTy &MatchInfo) {
if (tryFoldBoolSelectToLogic(Select, MatchInfo))
return true;
+ if (tryFoldSelectToIntMinMax(Select, MatchInfo))
+ return true;
+
return false;
}
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
index 739332414c1985..0e9c126e97a3d8 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
@@ -2421,7 +2421,7 @@ define i8 @atomicrmw_min_i8(ptr %ptr, i8 %rhs) {
; CHECK-NOLSE-O1-NEXT: ldaxrb w8, [x0]
; CHECK-NOLSE-O1-NEXT: sxtb w9, w8
; CHECK-NOLSE-O1-NEXT: cmp w9, w1, sxtb
-; CHECK-NOLSE-O1-NEXT: csel w9, w8, w1, le
+; CHECK-NOLSE-O1-NEXT: csel w9, w8, w1, lt
; CHECK-NOLSE-O1-NEXT: stxrb w10, w9, [x0]
; CHECK-NOLSE-O1-NEXT: cbnz w10, LBB33_1
; CHECK-NOLSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -2435,7 +2435,7 @@ define i8 @atomicrmw_min_i8(ptr %ptr, i8 %rhs) {
; CHECK-OUTLINE-O1-NEXT: ldaxrb w8, [x0]
; CHECK-OUTLINE-O1-NEXT: sxtb w9, w8
; CHECK-OUTLINE-O1-NEXT: cmp w9, w1, sxtb
-; CHECK-OUTLINE-O1-NEXT: csel w9, w8, w1, le
+; CHECK-OUTLINE-O1-NEXT: csel w9, w8, w1, lt
; CHECK-OUTLINE-O1-NEXT: stxrb w10, w9, [x0]
; CHECK-OUTLINE-O1-NEXT: cbnz w10, LBB33_1
; CHECK-OUTLINE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -2662,7 +2662,7 @@ define i8 @atomicrmw_umin_i8(ptr %ptr, i8 %rhs) {
; CHECK-NOLSE-O1-NEXT: ldaxrb w8, [x0]
; CHECK-NOLSE-O1-NEXT: and w10, w8, #0xff
; CHECK-NOLSE-O1-NEXT: cmp w10, w9
-; CHECK-NOLSE-O1-NEXT: csel w10, w10, w9, ls
+; CHECK-NOLSE-O1-NEXT: csel w10, w10, w9, lo
; CHECK-NOLSE-O1-NEXT: stlxrb w11, w10, [x0]
; CHECK-NOLSE-O1-NEXT: cbnz w11, LBB35_1
; CHECK-NOLSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -2677,7 +2677,7 @@ define i8 @atomicrmw_umin_i8(ptr %ptr, i8 %rhs) {
; CHECK-OUTLINE-O1-NEXT: ldaxrb w8, [x0]
; CHECK-OUTLINE-O1-NEXT: and w10, w8, #0xff
; CHECK-OUTLINE-O1-NEXT: cmp w10, w9
-; CHECK-OUTLINE-O1-NEXT: csel w10, w10, w9, ls
+; CHECK-OUTLINE-O1-NEXT: csel w10, w10, w9, lo
; CHECK-OUTLINE-O1-NEXT: stlxrb w11, w10, [x0]
; CHECK-OUTLINE-O1-NEXT: cbnz w11, LBB35_1
; CHECK-OUTLINE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -3477,7 +3477,7 @@ define i16 @atomicrmw_min_i16(ptr %ptr, i16 %rhs) {
; CHECK-NOLSE-O1-NEXT: ldaxrh w8, [x0]
; CHECK-NOLSE-O1-NEXT: sxth w9, w8
; CHECK-NOLSE-O1-NEXT: cmp w9, w1, sxth
-; CHECK-NOLSE-O1-NEXT: csel w9, w8, w1, le
+; CHECK-NOLSE-O1-NEXT: csel w9, w8, w1, lt
; CHECK-NOLSE-O1-NEXT: stxrh w10, w9, [x0]
; CHECK-NOLSE-O1-NEXT: cbnz w10, LBB43_1
; CHECK-NOLSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -3491,7 +3491,7 @@ define i16 @atomicrmw_min_i16(ptr %ptr, i16 %rhs) {
; CHECK-OUTLINE-O1-NEXT: ldaxrh w8, [x0]
; CHECK-OUTLINE-O1-NEXT: sxth w9, w8
; CHECK-OUTLINE-O1-NEXT: cmp w9, w1, sxth
-; CHECK-OUTLINE-O1-NEXT: csel w9, w8, w1, le
+; CHECK-OUTLINE-O1-NEXT: csel w9, w8, w1, lt
; CHECK-OUTLINE-O1-NEXT: stxrh w10, w9, [x0]
; CHECK-OUTLINE-O1-NEXT: cbnz w10, LBB43_1
; CHECK-OUTLINE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -3718,7 +3718,7 @@ define i16 @atomicrmw_umin_i16(ptr %ptr, i16 %rhs) {
; CHECK-NOLSE-O1-NEXT: ldaxrh w8, [x0]
; CHECK-NOLSE-O1-NEXT: and w10, w8, #0xffff
; CHECK-NOLSE-O1-NEXT: cmp w10, w9
-; CHECK-NOLSE-O1-NEXT: csel w10, w10, w9, ls
+; CHECK-NOLSE-O1-NEXT: csel w10, w10, w9, lo
; CHECK-NOLSE-O1-NEXT: stlxrh w11, w10, [x0]
; CHECK-NOLSE-O1-NEXT: cbnz w11, LBB45_1
; CHECK-NOLSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -3733,7 +3733,7 @@ define i16 @atomicrmw_umin_i16(ptr %ptr, i16 %rhs) {
; CHECK-OUTLINE-O1-NEXT: ldaxrh w8, [x0]
; CHECK-OUTLINE-O1-NEXT: and w10, w8, #0xffff
; CHECK-OUTLINE-O1-NEXT: cmp w10, w9
-; CHECK-OUTLINE-O1-NEXT: csel w10, w10, w9, ls
+; CHECK-OUTLINE-O1-NEXT: csel w10, w10, w9, lo
; CHECK-OUTLINE-O1-NEXT: stlxrh w11, w10, [x0]
; CHECK-OUTLINE-O1-NEXT: cbnz w11, LBB45_1
; CHECK-OUTLINE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -4526,7 +4526,7 @@ define i32 @atomicrmw_min_i32(ptr %ptr, i32 %rhs) {
; CHECK-NOLSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NOLSE-O1-NEXT: ldaxr w8, [x0]
; CHECK-NOLSE-O1-NEXT: cmp w8, w1
-; CHECK-NOLSE-O1-NEXT: csel w9, w8, w1, le
+; CHECK-NOLSE-O1-NEXT: csel w9, w8, w1, lt
; CHECK-NOLSE-O1-NEXT: stxr w10, w9, [x0]
; CHECK-NOLSE-O1-NEXT: cbnz w10, LBB53_1
; CHECK-NOLSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -4539,7 +4539,7 @@ define i32 @atomicrmw_min_i32(ptr %ptr, i32 %rhs) {
; CHECK-OUTLINE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-OUTLINE-O1-NEXT: ldaxr w8, [x0]
; CHECK-OUTLINE-O1-NEXT: cmp w8, w1
-; CHECK-OUTLINE-O1-NEXT: csel w9, w8, w1, le
+; CHECK-OUTLINE-O1-NEXT: csel w9, w8, w1, lt
; CHECK-OUTLINE-O1-NEXT: stxr w10, w9, [x0]
; CHECK-OUTLINE-O1-NEXT: cbnz w10, LBB53_1
; CHECK-OUTLINE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -4754,7 +4754,7 @@ define i32 @atomicrmw_umin_i32(ptr %ptr, i32 %rhs) {
; CHECK-NOLSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NOLSE-O1-NEXT: ldaxr w8, [x0]
; CHECK-NOLSE-O1-NEXT: cmp w8, w1
-; CHECK-NOLSE-O1-NEXT: csel w9, w8, w1, ls
+; CHECK-NOLSE-O1-NEXT: csel w9, w8, w1, lo
; CHECK-NOLSE-O1-NEXT: stlxr w10, w9, [x0]
; CHECK-NOLSE-O1-NEXT: cbnz w10, LBB55_1
; CHECK-NOLSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -4767,7 +4767,7 @@ define i32 @atomicrmw_umin_i32(ptr %ptr, i32 %rhs) {
; CHECK-OUTLINE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-OUTLINE-O1-NEXT: ldaxr w8, [x0]
; CHECK-OUTLINE-O1-NEXT: cmp w8, w1
-; CHECK-OUTLINE-O1-NEXT: csel w9, w8, w1, ls
+; CHECK-OUTLINE-O1-NEXT: csel w9, w8, w1, lo
; CHECK-OUTLINE-O1-NEXT: stlxr w10, w9, [x0]
; CHECK-OUTLINE-O1-NEXT: cbnz w10, LBB55_1
; CHECK-OUTLINE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -5547,7 +5547,7 @@ define i64 @atomicrmw_min_i64(ptr %ptr, i64 %rhs) {
; CHECK-NOLSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NOLSE-O1-NEXT: ldaxr x8, [x0]
; CHECK-NOLSE-O1-NEXT: cmp x8, x1
-; CHECK-NOLSE-O1-NEXT: csel x9, x8, x1, le
+; CHECK-NOLSE-O1-NEXT: csel x9, x8, x1, lt
; CHECK-NOLSE-O1-NEXT: stxr w10, x9, [x0]
; CHECK-NOLSE-O1-NEXT: cbnz w10, LBB63_1
; CHECK-NOLSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -5560,7 +5560,7 @@ define i64 @atomicrmw_min_i64(ptr %ptr, i64 %rhs) {
; CHECK-OUTLINE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-OUTLINE-O1-NEXT: ldaxr x8, [x0]
; CHECK-OUTLINE-O1-NEXT: cmp x8, x1
-; CHECK-OUTLINE-O1-NEXT: csel x9, x8, x1, le
+; CHECK-OUTLINE-O1-NEXT: csel x9, x8, x1, lt
; CHECK-OUTLINE-O1-NEXT: stxr w10, x9, [x0]
; CHECK-OUTLINE-O1-NEXT: cbnz w10, LBB63_1
; CHECK-OUTLINE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -5775,7 +5775,7 @@ define i64 @atomicrmw_umin_i64(ptr %ptr, i64 %rhs) {
; CHECK-NOLSE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NOLSE-O1-NEXT: ldaxr x8, [x0]
; CHECK-NOLSE-O1-NEXT: cmp x8, x1
-; CHECK-NOLSE-O1-NEXT: csel x9, x8, x1, ls
+; CHECK-NOLSE-O1-NEXT: csel x9, x8, x1, lo
; CHECK-NOLSE-O1-NEXT: stlxr w10, x9, [x0]
; CHECK-NOLSE-O1-NEXT: cbnz w10, LBB65_1
; CHECK-NOLSE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
@@ -5788,7 +5788,7 @@ define i64 @atomicrmw_umin_i64(ptr %ptr, i64 %rhs) {
; CHECK-OUTLINE-O1-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-OUTLINE-O1-NEXT: ldaxr x8, [x0]
; CHECK-OUTLINE-O1-NEXT: cmp x8, x1
-; CHECK-OUTLINE-O1-NEXT: csel x9, x8, x1, ls
+; CHECK-OUTLINE-O1-NEXT: csel x9, x8, x1, lo
; CHECK-OUTLINE-O1-NEXT: stlxr w10, x9, [x0]
; CHECK-OUTLINE-O1-NEXT: cbnz w10, LBB65_1
; CHECK-OUTLINE-O1-NEXT: ; %bb.2: ; %atomicrmw.end
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
index 4c07081404c889..5a7bd6ee20f9b4 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
@@ -888,7 +888,7 @@ define i8 @atomicrmw_min_i8(ptr %ptr, i8 %rhs) {
; CHECK-NEXT: renamable $w8 = LDAXRB renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
; CHECK-NEXT: renamable $w9 = SBFMWri renamable $w8, 0, 7, pcsections !0
; CHECK-NEXT: dead $wzr = SUBSWrx killed renamable $w9, renamable $w1, 32, implicit-def $nzcv, pcsections !0
- ; CHECK-NEXT: renamable $w9 = CSELWr renamable $w8, renamable $w1, 13, implicit killed $nzcv, implicit-def $x9, pcsections !0
+ ; CHECK-NEXT: renamable $w9 = CSELWr renamable $w8, renamable $w1, 11, implicit killed $nzcv, implicit-def $x9, pcsections !0
; CHECK-NEXT: early-clobber renamable $w10 = STXRB renamable $w9, renamable $x0, implicit killed $x9, pcsections !0 :: (volatile store (s8) into %ir.ptr)
; CHECK-NEXT: CBNZW killed renamable $w10, %bb.1, pcsections !0
; CHECK-NEXT: {{ $}}
@@ -943,7 +943,7 @@ define i8 @atomicrmw_umin_i8(ptr %ptr, i8 %rhs) {
; CHECK-NEXT: renamable $w8 = LDAXRB renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
; CHECK-NEXT: renamable $w10 = ANDWri renamable $w8, 7
; CHECK-NEXT: $wzr = SUBSWrs renamable $w10, renamable $w9, 0, implicit-def $nzcv, pcsections !0
- ; CHECK-NEXT: renamable $w10 = CSELWr killed renamable $w10, renamable $w9, 9, implicit killed $nzcv, implicit-def $x10, pcsections !0
+ ; CHECK-NEXT: renamable $w10 = CSELWr killed renamable $w10, renamable $w9, 3, implicit killed $nzcv, implicit-def $x10, pcsections !0
; CHECK-NEXT: early-clobber renamable $w11 = STLXRB renamable $w10, renamable $x0, implicit killed $x10, pcsections !0 :: (volatile store (s8) into %ir.ptr)
; CHECK-NEXT: CBNZW killed renamable $w11, %bb.1, pcsections !0
; CHECK-NEXT: {{ $}}
@@ -1148,7 +1148,7 @@ define i16 @atomicrmw_min_i16(ptr %ptr, i16 %rhs) {
; CHECK-NEXT: renamable $w8 = LDAXRH renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s16) from %ir.ptr)
; CHECK-NEXT: renamable $w9 = SBFMWri renamable $w8, 0, 15, pcsections !0
; CHECK-NEXT: dead $wzr = SUBSWrx killed renamable $w9, renamable $w1, 40, implicit-def $nzcv, pcsections !0
- ; CHECK-NEXT: renamable $w9 = CSELWr renamable $w8, renamable $w1, 13, implicit killed $nzcv, implicit-def $x9, pcsections !0
+ ; CHECK-NEXT: renamable $w9 = CSELWr renamable $w8, renamable $w1, 11, implicit killed $nzcv, implicit-def $x9, pcsections !0
; CHECK-NEXT: early-clobber renamable $w10 = STXRH renamable $w9, renamable $x0, implicit killed $x9, pcsections !0 :: (volatile store (s16) into %ir.ptr)
; CHECK-NEXT: CBNZW killed renamable $w10, %bb.1, pcsections !0
; CHECK-NEXT: {{ $}}
@@ -1203,7 +1203,7 @@ define i16 @atomicrmw_umin_i16(ptr %ptr, i16 %rhs) {
; CHECK-NEXT: renamable $w8 = LDAXRH renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s16) from %ir.ptr)
; CHECK-NEXT: renamable $w10 = ANDWri renamable $w8, 15
; CHECK-NEXT: $wzr = SUBSWrs renamable $w10, renamable $w9, 0, implicit-def $nzcv, pcsections !0
- ; CHECK-NEXT: renamable $w10 = CSELWr killed renamable $w10, renamable $w9, 9, implicit killed $nzcv, implicit-def $x10, pcsections !0
+ ; CHECK-NEXT: renamable $w10 = CSELWr killed renamable $w10, renamable $w9, 3, implicit killed $nzcv, implicit-def $x10, pcsections !0
; CHECK-NEXT: early-clobber renamable $w11 = STLXRH renamable $w10, renamable $x0, implicit killed $x10, pcsections !0 :: (volatile store (s16) into %ir.ptr)
; CHECK-NEXT: CBNZW killed renamable $w11, %bb.1, pcsections !0
; CHECK-NEXT: {{ $}}
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
index be2de620fa456c..38cfd4169873f4 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
@@ -544,3 +544,254 @@ body: |
%ext:_(s32) = G_ANYEXT %sel
$w0 = COPY %ext(s32)
...
+---
+# select test(failed,registers) select icmp_ugt t,f_t_f --> umax(t,f)
+name: select_failed_icmp_ugt_t_f_t_f_umax_t_f
+body: |
+ bb.1:
+ liveins: $x0, $x1, $x2
+ ; CHECK-LABEL: name: select_failed_icmp_ugt_t_f_t_f_umax_t_f
+ ; CHECK: liveins: $x0, $x1, $x2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s64) = COPY $x2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s64) = COPY $x3
+ ; CHECK-NEXT: %t:_(s8) = G_TRUNC [[COPY]](s64)
+ ; CHECK-NEXT: %f:_(s8) = G_TRUNC [[COPY1]](s64)
+ ; CHECK-NEXT: %y:_(s8) = G_TRUNC [[COPY2]](s64)
+ ; CHECK-NEXT: %z:_(s8) = G_TRUNC [[COPY3]](s64)
+ ; CHECK-NEXT: %c:_(s8) = G_ICMP intpred(ugt), %t(s8), %y
+ ; CHECK-NEXT: %sel:_(s8) = exact G_SELECT %c(s8), %f, %z
+ ; CHECK-NEXT: %ext:_(s32) = G_ANYEXT %sel(s8)
+ ; CHECK-NEXT: $w0 = COPY %ext(s32)
+ %0:_(s64) = COPY $x0
+ %1:_(s64) = COPY $x1
+ %2:_(s64) = COPY $x2
+ %3:_(s64) = COPY $x3
+ %4:_(s64) = COPY $x4
+ %t:_(s8) = G_TRUNC %0
+ %f:_(s8) = G_TRUNC %1
+ %y:_(s8) = G_TRUNC %2
+ %z:_(s8) = G_TRUNC %3
+ %c:_(s8) = G_ICMP intpred(ugt), %t(s8), %y(s8)
+ %sel:_(s8) = exact G_SELECT %c, %f, %z
+ %ext:_(s32) = G_ANYEXT %sel
+ $w0 = COPY %ext(s32)
+...
+---
+# test select icmp_ugt t,f_t_f --> umax(t,f)
+name: select_icmp_ugt_t_f_t_f_umax_t_f
+body: |
+ bb.1:
+ liveins: $x0, $x1, $x2
+ ; CHECK-LABEL: name: select_icmp_ugt_t_f_t_f_umax_t_f
+ ; CHECK: liveins: $x0, $x1, $x2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
+ ; CHECK-NEXT: %t1:_(s32) = G_TRUNC [[COPY]](s64)
+ ; CHECK-NEXT: %f1:_(s32) = G_TRUNC [[COPY1]](s64)
+ ; CHECK-NEXT: %t:_(<4 x s32>) = G_BUILD_VECTOR %t1(s32), %t1(s32), %t1(s32), %t1(s32)
+ ; CHECK-NEXT: %f:_(<4 x s32>) = G_BUILD_VECTOR %f1(s32), %f1(s32), %f1(s32), %f1(s32)
+ ; CHECK-NEXT: %sel:_(<4 x s32>) = G_UMAX %t, %f
+ ; CHECK-NEXT: $q0 = COPY %sel(<4 x s32>)
+ %0:_(s64) = COPY $x0
+ %1:_(s64) = COPY $x1
+ %t1:_(s32) = G_TRUNC %0
+ %f1:_(s32) = G_TRUNC %1
+ %t:_(<4 x s32>) = G_BUILD_VECTOR %t1, %t1, %t1, %t1
+ %f:_(<4 x s32>) = G_BUILD_VECTOR %f1, %f1, %f1, %f1
+ %c:_(<4 x s32>) = G_ICMP intpred(ugt), %t(<4 x s32>), %f(<4 x s32>)
+ %sel:_(<4 x s32>) = exact G_SELECT %c, %t, %f
+ $q0 = COPY %sel(<4 x s32>)
+...
+---
+# test select icmp_uge t,f_t_f --> umax(t,f)
+name: select_icmp_uge_t_f_t_f_umax_t_f
+body: |
+ bb.1:
+ liveins: $x0, $x1, $x2
+ ; CHECK-LABEL: name: select_icmp_uge_t_f_t_f_umax_t_f
+ ; CHECK: liveins: $x0, $x1, $x2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
+ ; CHECK-NEXT: %t1:_(s32) = G_TRUNC [[COPY]](s64)
+ ; CHECK-NEXT: %f1:_(s32) = G_TRUNC [[COPY1]](s64)
+ ; CHECK-NEXT: %t:_(<4 x s32>) = G_BUILD_VECTOR %t1(s32), %t1(s32), %t1(s32), %t1(s32)
+ ; CHECK-NEXT: %f:_(<4 x s32>) = G_BUILD_VECTOR %f1(s32), %f1(s32), %f1(s32), %f1(s32)
+ ; CHECK-NEXT: %sel:_(<4 x s32>) = G_UMAX %t, %f
+ ; CHECK-NEXT: $q0 = COPY %sel(<4 x s32>)
+ %0:_(s64) = COPY $x0
+ %1:_(s64) = COPY $x1
+ %t1:_(s32) = G_TRUNC %0
+ %f1:_(s32) = G_TRUNC %1
+ %t:_(<4 x s32>) = G_BUILD_VECTOR %t1, %t1, %t1, %t1
+ %f:_(<4 x s32>) = G_BUILD_VECTOR %f1, %f1, %f1, %f1
+ %c:_(<4 x s32>) = G_ICMP intpred(uge), %t(<4 x s32>), %f(<4 x s32>)
+ %sel:_(<4 x s32>) = exact G_SELECT %c, %t, %f
+ $q0 = COPY %sel(<4 x s32>)
+...
+---
+# test select icmp_sgt t,f_t_f --> smax(t,f)
+name: select_icmp_sgt_t_f_t_f_smax_t_f
+body: |
+ bb.1:
+ liveins: $x0, $x1, $x2
+ ; CHECK-LABEL: name: select_icmp_sgt_t_f_t_f_smax_t_f
+ ; CHECK: liveins: $x0, $x1, $x2
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
+ ; CHECK-NEXT: %t1:_(s32) = G_TRUNC [[COPY]](s64)
+ ; CHECK-NEXT: %f1:_(s32) = G_TRUNC [[COPY1]](s64)
+ ; CHECK-NEXT: %t:_(<4 x s32>) = G_BUILD_VECTOR %t1(s32), %t1(s32), %t1(s32), %t1(s32)
+ ; CHECK-NEXT: %f:_(<4 x s32>) = G_BUILD_VECTOR %f1(s32), %f1(s32), %f1(s32), %f1(s32)
+ ; CHECK-NEXT: %sel:_(<4 x s32>) = G_SMAX %t, %f
+ ; CHECK-NEXT: $q0 = COPY %sel(<4 x s32>)
+ %0:_(s64) = COPY $x0
+ %1:_(s64) = COPY $x1
+ %t1:_(s32) = G_TRUNC %0
+ %f1:_(s32) = G_TRUNC %1
+ %t:_(<4 x s32>) = G_BUILD_VECTOR %t1, %t1, %t1, %t1
+ ...
[truncated]
|
…ence of __asan_memcpy
This reverts commit b5de136. Based on post commit feedback, I need to some other work before this makes sense.
…6323) see llvm#73359 Declarative assemblyFormat ODS is more concise and requires less boilerplate than filling out CPP interfaces. Changes: * updates the Ops defined in `SPIRVAtomicOps.td` to use assemblyFormat. * Removes print/parse from`AtomcOps.cpp` which is now generated by assemblyFormat * Adds `Trait` to verify that a pointer operand `foo`'s pointee type matches operand `bar`'s type * * Updates error message expected in tests from new Trait * Updates tests to updated format (largely using <operand> in place of "operand")
…nce of StackSafetyAnalysis
This should have been checking that the transform was valid, but used incorrect conditions letting through invalid combinations of lo/hi extracts. Hopefully fixes llvm#76769
…lvm#76650) We add some basic type aliases and function definitions relating to cones for Barvinok's algorithm. These include functions to get the dual of a cone and find its index.
…se (llvm#77232) The ParamPoint datatype has each column representing an affine function. The code for generating functions is modified to reflect this.
Visual Studio needs the class template stuff. C:\llvm\include\llvm/Analysis/MemoryBuiltins.h(217): error C2990: 'llvm::SizeOffsetType': non-class template has already been declared as a class template C:\llvm\include\llvm/Analysis/MemoryBuiltins.h(193): note: see declaration of 'llvm::SizeOffsetType'
This reverts commit 0903d99. This is causing all non-Visual Studio builds fail.
…Z), (binop Y, Z)` (llvm#76384) This patch relaxes the one-use constraints for `icmp pred (binop X, Z), (binop Y, Z)`. It will enable more optimizations with pointer arithmetic. One example in `boost::match_results::set_size`: ``` declare void @use(i64) define i1 @src(ptr %a1, ptr %a2, ptr %add.ptr.i66, i64 %sub.ptr.rhs.cast.i) { %sub.ptr.lhs.cast.i = ptrtoint ptr %a1 to i64 %sub.ptr.rhs.cast.i = ptrtoint ptr %a2 to i64 %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i %sub.ptr.div.i = sdiv exact i64 %sub.ptr.sub.i, 24 call void @use(i64 %sub.ptr.div.i) %sub.ptr.lhs.cast.i.i = ptrtoint ptr %add.ptr.i66 to i64 %sub.ptr.sub.i.i = sub i64 %sub.ptr.lhs.cast.i.i, %sub.ptr.rhs.cast.i %sub.ptr.div.i.i = sdiv exact i64 %sub.ptr.sub.i.i, 24 %cmp.i.not.i.i = icmp eq i64 %sub.ptr.div.i.i, %sub.ptr.div.i ret i1 %cmp.i.not.i.i } define i1 @tgt(ptr %a1, ptr %a2, ptr %add.ptr.i66, i64 %sub.ptr.rhs.cast.i) { %sub.ptr.lhs.cast.i = ptrtoint ptr %a1 to i64 %sub.ptr.rhs.cast.i = ptrtoint ptr %a2 to i64 %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i %sub.ptr.div.i = sdiv exact i64 %sub.ptr.sub.i, 24 call void @use(i64 %sub.ptr.div.i) %cmp.i.not.i.i = icmp eq i64 %sub.ptr.sub.i.i, %sub.ptr.sub.i ret i1 %cmp.i.not.i.i } ```
…Interface` (llvm#77090) `BufferPlacementTransformationBase::isLoop` checks if there a loop in the region branching graph of an operation. This algorithm is similar to `isRegionReachable` in the `RegionBranchOpInterface`. To avoid duplicate code, `isRegionReachable` is generalized, so that it can be used to detect region loops. A helper function `RegionBranchOpInterface::hasLoop` is added. This change also turns a recursive implementation into an iterative one, which is the preferred implementation strategy in LLVM. Also move the `isLoop` to `BufferOptimizations.cpp`, so that we can gradually retire `BufferPlacementTransformationBase`. (This is so that proper error handling can be added to `BufferViewFlowAnalysis`.)
…e of block (llvm#77098) Also improve the implementation of `findCommonDominator` (skip duplicate blocks) and extract it from `BufferPlacementTransformationBase` (so that `BufferPlacementTransformationBase` can be retired eventually).
As suggested by @philnik777 this is a better fix than 02a33b7 Fixes: llvm#77123
As %B.gep.0 executes unconditionally in the latch, inbounds could be preserved in the vector version. https://alive2.llvm.org/ce/z/XWbMuD
As suggested as follow-up in llvm#72164, manage inbounds via VPRecipeWithIRFlags. Note that in some cases we can now preserve inbounds in a few more cases.
The fix used macros that confuses clang-format. This is formatted with clang-format and then excluded from formatting.
switch (Pred) { | ||
case ICmpInst::ICMP_UGT: | ||
case ICmpInst::ICMP_UGE: { | ||
if (!isLegalOrBeforeLegalizer({TargetOpcode::G_UMAX, DstTy})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a smarter set of legality predicates. It also makes sense if we're going to legalize this type to a type where it's going to be legal. Doesn't really need to be part of this patch, it's an existing issue scattered around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It just reads strange: (a) the shape of the G_UMAX must be legal (AArch64 has limitations) and (b) the G_UMAX must fit into the DstReg.
if (!isLegalOrBeforeLegalizer({TargetOpcode::G_UMAX, DstTy})) | ||
return false; | ||
MatchInfo = [=](MachineIRBuilder &B) { | ||
B.setInstrAndDebugLoc(*Select); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly we should really just have the apply function always set the insert point appropriately instead of making every combine deal with this
Register True = Select->getTrueReg(); | ||
Register False = Select->getFalseReg(); | ||
LLT DstTy = MRI.getType(DstReg); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing hasOneUse check?
%c:_(<4 x s32>) = G_ICMP intpred(sle), %t(<4 x s32>), %f(<4 x s32>) | ||
%sel:_(<4 x s32>) = exact G_SELECT %c, %t, %f | ||
$q0 = COPY %sel(<4 x s32>) | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test with multiple uses of the compare
…relocs when sym is not in section (llvm#76433) 1, Follow RISCV 1df5ea2 to support generates relocs for .uleb128 which can not be folded. Unlike RISCV, the located content of LoongArch should be zero. LoongArch fixup uleb128 value by in-place addition and subtraction reloc types named R_LARCH_{ADD,SUB}_ULEB128. The located content can affect the result and R_LARCH_ADD_ULEB128 has enough info to represent the first symbol value, so it needs to be set to zero. 2, Force relocs if sym is not in section so that it can emit relocs for external symbol. Fixes: llvm#72960 (comment)
…rCodeAlign (llvm#77236) The shouldInsertExtraNopBytesForCodeAlign() need STI to check whether relax is enabled or not. It is initialized when call setEmitNops. The setEmitNops may not be called in a section which has instructions but is not executable. In this case uninitialized STI will cause problems. Thus, check hasEmitNops before call it. Fixes: llvm#76552 (comment)
llvm#76848) As per the docs [1]: ``` In absence of an explicit layout, a memref is considered to have a multi-dimensional identity affine map layout. ``` This patch makes sure that MemRefs with no strides (i.e. no explicit layout) are treated as contiguous when checking whether a particular vector is a contiguous slice of the given MemRef. [1] https://mlir.llvm.org/docs/Dialects/Builtin/#layout Follow-up for llvm#76428.
The library implementation is just a wrapper around a call to the intrinsic, but loses metadata. Swap out the call site to the intrinsic so that the lowering can see the !fpmath metadata and fast math flags. Since d56e0d0, clang started placing !fpmath on OpenCL library sqrt calls. Also don't bother emitting native_sqrt anymore, it's just another wrapper around llvm.sqrt.
…77298) When initializing a union that constrain a struct with a flexible array member, and the initializer list is empty, we currently trigger an assertion failure. This happens because getFlexibleArrayInitChars() assumes that the initializer list is non-empty. Fixes llvm#77085.
This is done using libquadmath and the mappings are only available if libquadmath was found by cmake. Support for non quad bessels is already available on POSIX platform using libm extensions.
…kerParallel to have a common library. Part 1. (llvm#75925) This patch creates DWARFLinkerBase library, places DWARFLinker code into DWARFLinker\Classic, places DWARFLinkerParallel into DWARFLinker\Parallel. updates BOLT to use new library. This patch is NFC.
…vm#76942) toFeatures and toFeatureVector both output a list of target feature flags, just with a slightly different interface. toFeatures keeps any unsupported extensions, and also provides a way to append negative extensions (AddAllExtensions=true). This patch combines them into one function, so that a later patch will be be able to get a std::vector of features that includes all the negative extensions, which was previously only possible through the StrAlloc interface.
Instcombine canonicalizes selects to floating point and integer minmax. This and the dag combiner canonicalize to floating point minmax. None of them canonicalizes to integer minmax. On Neoverse V2 basic integer arithmetic and integer minmax have the same costs.
…project into gisel-select-to-intminmax
Sorry. |
Instcombine canonicalizes selects to floating point and integer minmax. This and the dag combiner canonicalize to floating point minmax. None of them canonicalizes to integer minmax. On Neoverse V2 basic integer arithmetic and integer minmax have the same costs.