Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GlobalIsel] Combine select to integer minmax. #77213

Closed
wants to merge 195 commits into from

Conversation

tschuett
Copy link
Member

@tschuett tschuett commented Jan 6, 2024

Instcombine canonicalizes selects to floating point and integer minmax. This and the dag combiner canonicalize to floating point minmax. None of them canonicalizes to integer minmax. On Neoverse V2 basic integer arithmetic and integer minmax have the same costs.

Instcombine canonicalizes selects to floating point and integer
minmax. This and the dag combiner canonicalize to floating point
minmax. None of them canonicalizes to integer minmax. On Neoverse V2
basic integer arithmetic and integer minmax have the same costs.
@tschuett
Copy link
Member Author

tschuett commented Jan 6, 2024

The combine is inspired by:


The lower code in the legalizer has to pick one of two possible predicate values:
static CmpInst::Predicate minMaxToCompare(unsigned Opc) {

I believe that is the cause for the changes in the csel instructions..

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 6, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: Thorsten Schütt (tschuett)

Changes

Instcombine canonicalizes selects to floating point and integer minmax. This and the dag combiner canonicalize to floating point minmax. None of them canonicalizes to integer minmax. On Neoverse V2 basic integer arithmetic and integer minmax have the same costs.


Patch is 25.91 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/77213.diff

5 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h (+3)
  • (modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (+84)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll (+16-16)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll (+4-4)
  • (modified) llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir (+251)
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index dcc1a4580b14a2..a6e9406bed06a2 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -910,6 +910,9 @@ class CombinerHelper {
 
   bool tryFoldSelectOfConstants(GSelect *Select, BuildFnTy &MatchInfo);
 
+  /// Try to fold (icmp X, Y) ? X : Y -> integer minmax.
+  bool tryFoldSelectToIntMinMax(GSelect *Select, BuildFnTy &MatchInfo);
+
   bool isOneOrOneSplat(Register Src, bool AllowUndefs);
   bool isZeroOrZeroSplat(Register Src, bool AllowUndefs);
   bool isConstantSplatVector(Register Src, int64_t SplatValue,
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 8b15bdb0aca30b..f5fbacd2f3d608 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -6548,6 +6548,87 @@ bool CombinerHelper::tryFoldBoolSelectToLogic(GSelect *Select,
   return false;
 }
 
+bool CombinerHelper::tryFoldSelectToIntMinMax(GSelect *Select,
+                                              BuildFnTy &MatchInfo) {
+  Register DstReg = Select->getReg(0);
+  Register Cond = Select->getCondReg();
+  Register True = Select->getTrueReg();
+  Register False = Select->getFalseReg();
+  LLT DstTy = MRI.getType(DstReg);
+
+  // We need an G_ICMP on the condition register.
+  GICmp *Cmp = getOpcodeDef<GICmp>(Cond, MRI);
+  if (!Cmp)
+    return false;
+
+  CmpInst::Predicate Pred = Cmp->getCond();
+  // We need a larger or smaller predicate for
+  // canonicalization.
+  if (CmpInst::isEquality(Pred))
+    return false;
+
+  Register CmpLHS = Cmp->getLHSReg();
+  Register CmpRHS = Cmp->getRHSReg();
+
+  // We can swap CmpLHS and CmpRHS for higher hitrate.
+  if (True == CmpRHS && False == CmpLHS) {
+    std::swap(CmpLHS, CmpRHS);
+    Pred = CmpInst::getSwappedPredicate(Pred);
+  }
+
+  // (icmp X, Y) ? X : Y -> integer minmax.
+  // see matchSelectPattern in ValueTracking.
+  // Legality between G_SELECT and integer minmax can differ.
+  if (True == CmpLHS && False == CmpRHS) {
+    switch (Pred) {
+    case ICmpInst::ICMP_UGT:
+    case ICmpInst::ICMP_UGE: {
+      if (!isLegalOrBeforeLegalizer({TargetOpcode::G_UMAX, DstTy}))
+        return false;
+      MatchInfo = [=](MachineIRBuilder &B) {
+        B.setInstrAndDebugLoc(*Select);
+        B.buildUMax(DstReg, True, False);
+      };
+      return true;
+    }
+    case ICmpInst::ICMP_SGT:
+    case ICmpInst::ICMP_SGE: {
+      if (!isLegalOrBeforeLegalizer({TargetOpcode::G_SMAX, DstTy}))
+        return false;
+      MatchInfo = [=](MachineIRBuilder &B) {
+        B.setInstrAndDebugLoc(*Select);
+        B.buildSMax(DstReg, True, False);
+      };
+      return true;
+    }
+    case ICmpInst::ICMP_ULT:
+    case ICmpInst::ICMP_ULE: {
+      if (!isLegalOrBeforeLegalizer({TargetOpcode::G_UMIN, DstTy}))
+        return false;
+      MatchInfo = [=](MachineIRBuilder &B) {
+        B.setInstrAndDebugLoc(*Select);
+        B.buildUMin(DstReg, True, False);
+      };
+      return true;
+    }
+    case ICmpInst::ICMP_SLT:
+    case ICmpInst::ICMP_SLE: {
+      if (!isLegalOrBeforeLegalizer({TargetOpcode::G_SMIN, DstTy}))
+        return false;
+      MatchInfo = [=](MachineIRBuilder &B) {
+        B.setInstrAndDebugLoc(*Select);
+        B.buildSMin(DstReg, True, False);
+      };
+      return true;
+    }
+    default:
+      return false;
+    }
+  }
+
+  return false;
+}
+
 bool CombinerHelper::matchSelect(MachineInstr &MI, BuildFnTy &MatchInfo) {
   GSelect *Select = cast<GSelect>(&MI);
 
@@ -6557,5 +6638,8 @@ bool CombinerHelper::matchSelect(MachineInstr &MI, BuildFnTy &MatchInfo) {
   if (tryFoldBoolSelectToLogic(Select, MatchInfo))
     return true;
 
+  if (tryFoldSelectToIntMinMax(Select, MatchInfo))
+    return true;
+
   return false;
 }
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
index 739332414c1985..0e9c126e97a3d8 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-atomic.ll
@@ -2421,7 +2421,7 @@ define i8 @atomicrmw_min_i8(ptr %ptr, i8 %rhs) {
 ; CHECK-NOLSE-O1-NEXT:    ldaxrb w8, [x0]
 ; CHECK-NOLSE-O1-NEXT:    sxtb w9, w8
 ; CHECK-NOLSE-O1-NEXT:    cmp w9, w1, sxtb
-; CHECK-NOLSE-O1-NEXT:    csel w9, w8, w1, le
+; CHECK-NOLSE-O1-NEXT:    csel w9, w8, w1, lt
 ; CHECK-NOLSE-O1-NEXT:    stxrb w10, w9, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cbnz w10, LBB33_1
 ; CHECK-NOLSE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -2435,7 +2435,7 @@ define i8 @atomicrmw_min_i8(ptr %ptr, i8 %rhs) {
 ; CHECK-OUTLINE-O1-NEXT:    ldaxrb w8, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    sxtb w9, w8
 ; CHECK-OUTLINE-O1-NEXT:    cmp w9, w1, sxtb
-; CHECK-OUTLINE-O1-NEXT:    csel w9, w8, w1, le
+; CHECK-OUTLINE-O1-NEXT:    csel w9, w8, w1, lt
 ; CHECK-OUTLINE-O1-NEXT:    stxrb w10, w9, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cbnz w10, LBB33_1
 ; CHECK-OUTLINE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -2662,7 +2662,7 @@ define i8 @atomicrmw_umin_i8(ptr %ptr, i8 %rhs) {
 ; CHECK-NOLSE-O1-NEXT:    ldaxrb w8, [x0]
 ; CHECK-NOLSE-O1-NEXT:    and w10, w8, #0xff
 ; CHECK-NOLSE-O1-NEXT:    cmp w10, w9
-; CHECK-NOLSE-O1-NEXT:    csel w10, w10, w9, ls
+; CHECK-NOLSE-O1-NEXT:    csel w10, w10, w9, lo
 ; CHECK-NOLSE-O1-NEXT:    stlxrb w11, w10, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cbnz w11, LBB35_1
 ; CHECK-NOLSE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -2677,7 +2677,7 @@ define i8 @atomicrmw_umin_i8(ptr %ptr, i8 %rhs) {
 ; CHECK-OUTLINE-O1-NEXT:    ldaxrb w8, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    and w10, w8, #0xff
 ; CHECK-OUTLINE-O1-NEXT:    cmp w10, w9
-; CHECK-OUTLINE-O1-NEXT:    csel w10, w10, w9, ls
+; CHECK-OUTLINE-O1-NEXT:    csel w10, w10, w9, lo
 ; CHECK-OUTLINE-O1-NEXT:    stlxrb w11, w10, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cbnz w11, LBB35_1
 ; CHECK-OUTLINE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -3477,7 +3477,7 @@ define i16 @atomicrmw_min_i16(ptr %ptr, i16 %rhs) {
 ; CHECK-NOLSE-O1-NEXT:    ldaxrh w8, [x0]
 ; CHECK-NOLSE-O1-NEXT:    sxth w9, w8
 ; CHECK-NOLSE-O1-NEXT:    cmp w9, w1, sxth
-; CHECK-NOLSE-O1-NEXT:    csel w9, w8, w1, le
+; CHECK-NOLSE-O1-NEXT:    csel w9, w8, w1, lt
 ; CHECK-NOLSE-O1-NEXT:    stxrh w10, w9, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cbnz w10, LBB43_1
 ; CHECK-NOLSE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -3491,7 +3491,7 @@ define i16 @atomicrmw_min_i16(ptr %ptr, i16 %rhs) {
 ; CHECK-OUTLINE-O1-NEXT:    ldaxrh w8, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    sxth w9, w8
 ; CHECK-OUTLINE-O1-NEXT:    cmp w9, w1, sxth
-; CHECK-OUTLINE-O1-NEXT:    csel w9, w8, w1, le
+; CHECK-OUTLINE-O1-NEXT:    csel w9, w8, w1, lt
 ; CHECK-OUTLINE-O1-NEXT:    stxrh w10, w9, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cbnz w10, LBB43_1
 ; CHECK-OUTLINE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -3718,7 +3718,7 @@ define i16 @atomicrmw_umin_i16(ptr %ptr, i16 %rhs) {
 ; CHECK-NOLSE-O1-NEXT:    ldaxrh w8, [x0]
 ; CHECK-NOLSE-O1-NEXT:    and w10, w8, #0xffff
 ; CHECK-NOLSE-O1-NEXT:    cmp w10, w9
-; CHECK-NOLSE-O1-NEXT:    csel w10, w10, w9, ls
+; CHECK-NOLSE-O1-NEXT:    csel w10, w10, w9, lo
 ; CHECK-NOLSE-O1-NEXT:    stlxrh w11, w10, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cbnz w11, LBB45_1
 ; CHECK-NOLSE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -3733,7 +3733,7 @@ define i16 @atomicrmw_umin_i16(ptr %ptr, i16 %rhs) {
 ; CHECK-OUTLINE-O1-NEXT:    ldaxrh w8, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    and w10, w8, #0xffff
 ; CHECK-OUTLINE-O1-NEXT:    cmp w10, w9
-; CHECK-OUTLINE-O1-NEXT:    csel w10, w10, w9, ls
+; CHECK-OUTLINE-O1-NEXT:    csel w10, w10, w9, lo
 ; CHECK-OUTLINE-O1-NEXT:    stlxrh w11, w10, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cbnz w11, LBB45_1
 ; CHECK-OUTLINE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -4526,7 +4526,7 @@ define i32 @atomicrmw_min_i32(ptr %ptr, i32 %rhs) {
 ; CHECK-NOLSE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-NOLSE-O1-NEXT:    ldaxr w8, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cmp w8, w1
-; CHECK-NOLSE-O1-NEXT:    csel w9, w8, w1, le
+; CHECK-NOLSE-O1-NEXT:    csel w9, w8, w1, lt
 ; CHECK-NOLSE-O1-NEXT:    stxr w10, w9, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cbnz w10, LBB53_1
 ; CHECK-NOLSE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -4539,7 +4539,7 @@ define i32 @atomicrmw_min_i32(ptr %ptr, i32 %rhs) {
 ; CHECK-OUTLINE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-OUTLINE-O1-NEXT:    ldaxr w8, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cmp w8, w1
-; CHECK-OUTLINE-O1-NEXT:    csel w9, w8, w1, le
+; CHECK-OUTLINE-O1-NEXT:    csel w9, w8, w1, lt
 ; CHECK-OUTLINE-O1-NEXT:    stxr w10, w9, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cbnz w10, LBB53_1
 ; CHECK-OUTLINE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -4754,7 +4754,7 @@ define i32 @atomicrmw_umin_i32(ptr %ptr, i32 %rhs) {
 ; CHECK-NOLSE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-NOLSE-O1-NEXT:    ldaxr w8, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cmp w8, w1
-; CHECK-NOLSE-O1-NEXT:    csel w9, w8, w1, ls
+; CHECK-NOLSE-O1-NEXT:    csel w9, w8, w1, lo
 ; CHECK-NOLSE-O1-NEXT:    stlxr w10, w9, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cbnz w10, LBB55_1
 ; CHECK-NOLSE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -4767,7 +4767,7 @@ define i32 @atomicrmw_umin_i32(ptr %ptr, i32 %rhs) {
 ; CHECK-OUTLINE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-OUTLINE-O1-NEXT:    ldaxr w8, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cmp w8, w1
-; CHECK-OUTLINE-O1-NEXT:    csel w9, w8, w1, ls
+; CHECK-OUTLINE-O1-NEXT:    csel w9, w8, w1, lo
 ; CHECK-OUTLINE-O1-NEXT:    stlxr w10, w9, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cbnz w10, LBB55_1
 ; CHECK-OUTLINE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -5547,7 +5547,7 @@ define i64 @atomicrmw_min_i64(ptr %ptr, i64 %rhs) {
 ; CHECK-NOLSE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-NOLSE-O1-NEXT:    ldaxr x8, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cmp x8, x1
-; CHECK-NOLSE-O1-NEXT:    csel x9, x8, x1, le
+; CHECK-NOLSE-O1-NEXT:    csel x9, x8, x1, lt
 ; CHECK-NOLSE-O1-NEXT:    stxr w10, x9, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cbnz w10, LBB63_1
 ; CHECK-NOLSE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -5560,7 +5560,7 @@ define i64 @atomicrmw_min_i64(ptr %ptr, i64 %rhs) {
 ; CHECK-OUTLINE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-OUTLINE-O1-NEXT:    ldaxr x8, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cmp x8, x1
-; CHECK-OUTLINE-O1-NEXT:    csel x9, x8, x1, le
+; CHECK-OUTLINE-O1-NEXT:    csel x9, x8, x1, lt
 ; CHECK-OUTLINE-O1-NEXT:    stxr w10, x9, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cbnz w10, LBB63_1
 ; CHECK-OUTLINE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -5775,7 +5775,7 @@ define i64 @atomicrmw_umin_i64(ptr %ptr, i64 %rhs) {
 ; CHECK-NOLSE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-NOLSE-O1-NEXT:    ldaxr x8, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cmp x8, x1
-; CHECK-NOLSE-O1-NEXT:    csel x9, x8, x1, ls
+; CHECK-NOLSE-O1-NEXT:    csel x9, x8, x1, lo
 ; CHECK-NOLSE-O1-NEXT:    stlxr w10, x9, [x0]
 ; CHECK-NOLSE-O1-NEXT:    cbnz w10, LBB65_1
 ; CHECK-NOLSE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -5788,7 +5788,7 @@ define i64 @atomicrmw_umin_i64(ptr %ptr, i64 %rhs) {
 ; CHECK-OUTLINE-O1-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-OUTLINE-O1-NEXT:    ldaxr x8, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cmp x8, x1
-; CHECK-OUTLINE-O1-NEXT:    csel x9, x8, x1, ls
+; CHECK-OUTLINE-O1-NEXT:    csel x9, x8, x1, lo
 ; CHECK-OUTLINE-O1-NEXT:    stlxr w10, x9, [x0]
 ; CHECK-OUTLINE-O1-NEXT:    cbnz w10, LBB65_1
 ; CHECK-OUTLINE-O1-NEXT:  ; %bb.2: ; %atomicrmw.end
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
index 4c07081404c889..5a7bd6ee20f9b4 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/arm64-pcsections.ll
@@ -888,7 +888,7 @@ define i8 @atomicrmw_min_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:   renamable $w8 = LDAXRB renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
   ; CHECK-NEXT:   renamable $w9 = SBFMWri renamable $w8, 0, 7, pcsections !0
   ; CHECK-NEXT:   dead $wzr = SUBSWrx killed renamable $w9, renamable $w1, 32, implicit-def $nzcv, pcsections !0
-  ; CHECK-NEXT:   renamable $w9 = CSELWr renamable $w8, renamable $w1, 13, implicit killed $nzcv, implicit-def $x9, pcsections !0
+  ; CHECK-NEXT:   renamable $w9 = CSELWr renamable $w8, renamable $w1, 11, implicit killed $nzcv, implicit-def $x9, pcsections !0
   ; CHECK-NEXT:   early-clobber renamable $w10 = STXRB renamable $w9, renamable $x0, implicit killed $x9, pcsections !0 :: (volatile store (s8) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w10, %bb.1, pcsections !0
   ; CHECK-NEXT: {{  $}}
@@ -943,7 +943,7 @@ define i8 @atomicrmw_umin_i8(ptr %ptr, i8 %rhs) {
   ; CHECK-NEXT:   renamable $w8 = LDAXRB renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s8) from %ir.ptr)
   ; CHECK-NEXT:   renamable $w10 = ANDWri renamable $w8, 7
   ; CHECK-NEXT:   $wzr = SUBSWrs renamable $w10, renamable $w9, 0, implicit-def $nzcv, pcsections !0
-  ; CHECK-NEXT:   renamable $w10 = CSELWr killed renamable $w10, renamable $w9, 9, implicit killed $nzcv, implicit-def $x10, pcsections !0
+  ; CHECK-NEXT:   renamable $w10 = CSELWr killed renamable $w10, renamable $w9, 3, implicit killed $nzcv, implicit-def $x10, pcsections !0
   ; CHECK-NEXT:   early-clobber renamable $w11 = STLXRB renamable $w10, renamable $x0, implicit killed $x10, pcsections !0 :: (volatile store (s8) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w11, %bb.1, pcsections !0
   ; CHECK-NEXT: {{  $}}
@@ -1148,7 +1148,7 @@ define i16 @atomicrmw_min_i16(ptr %ptr, i16 %rhs) {
   ; CHECK-NEXT:   renamable $w8 = LDAXRH renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s16) from %ir.ptr)
   ; CHECK-NEXT:   renamable $w9 = SBFMWri renamable $w8, 0, 15, pcsections !0
   ; CHECK-NEXT:   dead $wzr = SUBSWrx killed renamable $w9, renamable $w1, 40, implicit-def $nzcv, pcsections !0
-  ; CHECK-NEXT:   renamable $w9 = CSELWr renamable $w8, renamable $w1, 13, implicit killed $nzcv, implicit-def $x9, pcsections !0
+  ; CHECK-NEXT:   renamable $w9 = CSELWr renamable $w8, renamable $w1, 11, implicit killed $nzcv, implicit-def $x9, pcsections !0
   ; CHECK-NEXT:   early-clobber renamable $w10 = STXRH renamable $w9, renamable $x0, implicit killed $x9, pcsections !0 :: (volatile store (s16) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w10, %bb.1, pcsections !0
   ; CHECK-NEXT: {{  $}}
@@ -1203,7 +1203,7 @@ define i16 @atomicrmw_umin_i16(ptr %ptr, i16 %rhs) {
   ; CHECK-NEXT:   renamable $w8 = LDAXRH renamable $x0, implicit-def $x8, pcsections !0 :: (volatile load (s16) from %ir.ptr)
   ; CHECK-NEXT:   renamable $w10 = ANDWri renamable $w8, 15
   ; CHECK-NEXT:   $wzr = SUBSWrs renamable $w10, renamable $w9, 0, implicit-def $nzcv, pcsections !0
-  ; CHECK-NEXT:   renamable $w10 = CSELWr killed renamable $w10, renamable $w9, 9, implicit killed $nzcv, implicit-def $x10, pcsections !0
+  ; CHECK-NEXT:   renamable $w10 = CSELWr killed renamable $w10, renamable $w9, 3, implicit killed $nzcv, implicit-def $x10, pcsections !0
   ; CHECK-NEXT:   early-clobber renamable $w11 = STLXRH renamable $w10, renamable $x0, implicit killed $x10, pcsections !0 :: (volatile store (s16) into %ir.ptr)
   ; CHECK-NEXT:   CBNZW killed renamable $w11, %bb.1, pcsections !0
   ; CHECK-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
index be2de620fa456c..38cfd4169873f4 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-select.mir
@@ -544,3 +544,254 @@ body:             |
     %ext:_(s32) = G_ANYEXT %sel
     $w0 = COPY %ext(s32)
 ...
+---
+# select test(failed,registers) select icmp_ugt t,f_t_f --> umax(t,f)
+name:            select_failed_icmp_ugt_t_f_t_f_umax_t_f
+body:             |
+  bb.1:
+    liveins: $x0, $x1, $x2
+    ; CHECK-LABEL: name: select_failed_icmp_ugt_t_f_t_f_umax_t_f
+    ; CHECK: liveins: $x0, $x1, $x2
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s64) = COPY $x2
+    ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s64) = COPY $x3
+    ; CHECK-NEXT: %t:_(s8) = G_TRUNC [[COPY]](s64)
+    ; CHECK-NEXT: %f:_(s8) = G_TRUNC [[COPY1]](s64)
+    ; CHECK-NEXT: %y:_(s8) = G_TRUNC [[COPY2]](s64)
+    ; CHECK-NEXT: %z:_(s8) = G_TRUNC [[COPY3]](s64)
+    ; CHECK-NEXT: %c:_(s8) = G_ICMP intpred(ugt), %t(s8), %y
+    ; CHECK-NEXT: %sel:_(s8) = exact G_SELECT %c(s8), %f, %z
+    ; CHECK-NEXT: %ext:_(s32) = G_ANYEXT %sel(s8)
+    ; CHECK-NEXT: $w0 = COPY %ext(s32)
+    %0:_(s64) = COPY $x0
+    %1:_(s64) = COPY $x1
+    %2:_(s64) = COPY $x2
+    %3:_(s64) = COPY $x3
+    %4:_(s64) = COPY $x4
+    %t:_(s8) = G_TRUNC %0
+    %f:_(s8) = G_TRUNC %1
+    %y:_(s8) = G_TRUNC %2
+    %z:_(s8) = G_TRUNC %3
+    %c:_(s8) = G_ICMP intpred(ugt), %t(s8), %y(s8)
+    %sel:_(s8) = exact G_SELECT %c, %f, %z
+    %ext:_(s32) = G_ANYEXT %sel
+    $w0 = COPY %ext(s32)
+...
+---
+# test select icmp_ugt t,f_t_f --> umax(t,f)
+name:            select_icmp_ugt_t_f_t_f_umax_t_f
+body:             |
+  bb.1:
+    liveins: $x0, $x1, $x2
+    ; CHECK-LABEL: name: select_icmp_ugt_t_f_t_f_umax_t_f
+    ; CHECK: liveins: $x0, $x1, $x2
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
+    ; CHECK-NEXT: %t1:_(s32) = G_TRUNC [[COPY]](s64)
+    ; CHECK-NEXT: %f1:_(s32) = G_TRUNC [[COPY1]](s64)
+    ; CHECK-NEXT: %t:_(<4 x s32>) = G_BUILD_VECTOR %t1(s32), %t1(s32), %t1(s32), %t1(s32)
+    ; CHECK-NEXT: %f:_(<4 x s32>) = G_BUILD_VECTOR %f1(s32), %f1(s32), %f1(s32), %f1(s32)
+    ; CHECK-NEXT: %sel:_(<4 x s32>) = G_UMAX %t, %f
+    ; CHECK-NEXT: $q0 = COPY %sel(<4 x s32>)
+    %0:_(s64) = COPY $x0
+    %1:_(s64) = COPY $x1
+    %t1:_(s32) = G_TRUNC %0
+    %f1:_(s32) = G_TRUNC %1
+    %t:_(<4 x s32>) = G_BUILD_VECTOR %t1, %t1, %t1, %t1
+    %f:_(<4 x s32>) = G_BUILD_VECTOR %f1, %f1, %f1, %f1
+    %c:_(<4 x s32>) = G_ICMP intpred(ugt), %t(<4 x s32>), %f(<4 x s32>)
+    %sel:_(<4 x s32>) = exact G_SELECT %c, %t, %f
+    $q0 = COPY %sel(<4 x s32>)
+...
+---
+# test select icmp_uge t,f_t_f --> umax(t,f)
+name:            select_icmp_uge_t_f_t_f_umax_t_f
+body:             |
+  bb.1:
+    liveins: $x0, $x1, $x2
+    ; CHECK-LABEL: name: select_icmp_uge_t_f_t_f_umax_t_f
+    ; CHECK: liveins: $x0, $x1, $x2
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
+    ; CHECK-NEXT: %t1:_(s32) = G_TRUNC [[COPY]](s64)
+    ; CHECK-NEXT: %f1:_(s32) = G_TRUNC [[COPY1]](s64)
+    ; CHECK-NEXT: %t:_(<4 x s32>) = G_BUILD_VECTOR %t1(s32), %t1(s32), %t1(s32), %t1(s32)
+    ; CHECK-NEXT: %f:_(<4 x s32>) = G_BUILD_VECTOR %f1(s32), %f1(s32), %f1(s32), %f1(s32)
+    ; CHECK-NEXT: %sel:_(<4 x s32>) = G_UMAX %t, %f
+    ; CHECK-NEXT: $q0 = COPY %sel(<4 x s32>)
+    %0:_(s64) = COPY $x0
+    %1:_(s64) = COPY $x1
+    %t1:_(s32) = G_TRUNC %0
+    %f1:_(s32) = G_TRUNC %1
+    %t:_(<4 x s32>) = G_BUILD_VECTOR %t1, %t1, %t1, %t1
+    %f:_(<4 x s32>) = G_BUILD_VECTOR %f1, %f1, %f1, %f1
+    %c:_(<4 x s32>) = G_ICMP intpred(uge), %t(<4 x s32>), %f(<4 x s32>)
+    %sel:_(<4 x s32>) = exact G_SELECT %c, %t, %f
+    $q0 = COPY %sel(<4 x s32>)
+...
+---
+# test select icmp_sgt t,f_t_f --> smax(t,f)
+name:            select_icmp_sgt_t_f_t_f_smax_t_f
+body:             |
+  bb.1:
+    liveins: $x0, $x1, $x2
+    ; CHECK-LABEL: name: select_icmp_sgt_t_f_t_f_smax_t_f
+    ; CHECK: liveins: $x0, $x1, $x2
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x0
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s64) = COPY $x1
+    ; CHECK-NEXT: %t1:_(s32) = G_TRUNC [[COPY]](s64)
+    ; CHECK-NEXT: %f1:_(s32) = G_TRUNC [[COPY1]](s64)
+    ; CHECK-NEXT: %t:_(<4 x s32>) = G_BUILD_VECTOR %t1(s32), %t1(s32), %t1(s32), %t1(s32)
+    ; CHECK-NEXT: %f:_(<4 x s32>) = G_BUILD_VECTOR %f1(s32), %f1(s32), %f1(s32), %f1(s32)
+    ; CHECK-NEXT: %sel:_(<4 x s32>) = G_SMAX %t, %f
+    ; CHECK-NEXT: $q0 = COPY %sel(<4 x s32>)
+    %0:_(s64) = COPY $x0
+    %1:_(s64) = COPY $x1
+    %t1:_(s32) = G_TRUNC %0
+    %f1:_(s32) = G_TRUNC %1
+    %t:_(<4 x s32>) = G_BUILD_VECTOR %t1, %t1, %t1, %t1
+    ...
[truncated]

mtrofin and others added 21 commits January 6, 2024 17:21
This reverts commit b5de136.

Based on post commit feedback, I need to some other work before
this makes sense.
…6323)

see llvm#73359

Declarative assemblyFormat ODS is more concise and requires less
boilerplate than filling out CPP interfaces.

Changes:
* updates the Ops defined in `SPIRVAtomicOps.td` to use assemblyFormat.
* Removes print/parse from`AtomcOps.cpp` which is now generated by
assemblyFormat
* Adds `Trait` to verify that a pointer operand `foo`'s pointee type
matches operand `bar`'s type
* * Updates error message expected in tests from new Trait
* Updates tests to updated format (largely using <operand> in place of
"operand")
This should have been checking that the transform was valid, but used
incorrect conditions letting through invalid combinations of lo/hi
extracts.

Hopefully fixes llvm#76769
…lvm#76650)

We add some basic type aliases and function definitions relating to
cones for Barvinok's algorithm.
These include functions to get the dual of a cone and find its index.
…se (llvm#77232)

The ParamPoint datatype has each column representing an affine function.
The code for generating functions is modified to reflect this.
Visual Studio needs the class template stuff.

C:\llvm\include\llvm/Analysis/MemoryBuiltins.h(217): error C2990: 'llvm::SizeOffsetType': non-class template has already been declared as a class template
C:\llvm\include\llvm/Analysis/MemoryBuiltins.h(193): note: see declaration of 'llvm::SizeOffsetType'
This reverts commit 0903d99.

This is causing all non-Visual Studio builds fail.
…Z), (binop Y, Z)` (llvm#76384)

This patch relaxes the one-use constraints for `icmp pred (binop X, Z),
(binop Y, Z)`. It will enable more optimizations with pointer
arithmetic.
One example in `boost::match_results::set_size`:

```
declare void @use(i64)
define i1 @src(ptr %a1, ptr %a2, ptr %add.ptr.i66, i64 %sub.ptr.rhs.cast.i) {
  %sub.ptr.lhs.cast.i = ptrtoint ptr %a1 to i64
  %sub.ptr.rhs.cast.i = ptrtoint ptr %a2 to i64
  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
  %sub.ptr.div.i = sdiv exact i64 %sub.ptr.sub.i, 24
  call void @use(i64 %sub.ptr.div.i)
  %sub.ptr.lhs.cast.i.i = ptrtoint ptr %add.ptr.i66 to i64
  %sub.ptr.sub.i.i = sub i64 %sub.ptr.lhs.cast.i.i, %sub.ptr.rhs.cast.i
  %sub.ptr.div.i.i = sdiv exact i64 %sub.ptr.sub.i.i, 24
  %cmp.i.not.i.i = icmp eq i64 %sub.ptr.div.i.i, %sub.ptr.div.i
  ret i1 %cmp.i.not.i.i
}
define i1 @tgt(ptr %a1, ptr %a2, ptr %add.ptr.i66, i64 %sub.ptr.rhs.cast.i) {
  %sub.ptr.lhs.cast.i = ptrtoint ptr %a1 to i64
  %sub.ptr.rhs.cast.i = ptrtoint ptr %a2 to i64
  %sub.ptr.sub.i = sub i64 %sub.ptr.lhs.cast.i, %sub.ptr.rhs.cast.i
  %sub.ptr.div.i = sdiv exact i64 %sub.ptr.sub.i, 24
  call void @use(i64 %sub.ptr.div.i)
  %cmp.i.not.i.i = icmp eq i64 %sub.ptr.sub.i.i, %sub.ptr.sub.i
  ret i1 %cmp.i.not.i.i
}
```
…Interface` (llvm#77090)

`BufferPlacementTransformationBase::isLoop` checks if there a loop in
the region branching graph of an operation. This algorithm is similar to
`isRegionReachable` in the `RegionBranchOpInterface`. To avoid duplicate
code, `isRegionReachable` is generalized, so that it can be used to
detect region loops. A helper function
`RegionBranchOpInterface::hasLoop` is added.

This change also turns a recursive implementation into an iterative one,
which is the preferred implementation strategy in LLVM.

Also move the `isLoop` to `BufferOptimizations.cpp`, so that we can
gradually retire `BufferPlacementTransformationBase`. (This is so that
proper error handling can be added to `BufferViewFlowAnalysis`.)
…e of block (llvm#77098)

Also improve the implementation of `findCommonDominator` (skip duplicate
blocks) and extract it from `BufferPlacementTransformationBase` (so that
`BufferPlacementTransformationBase` can be retired eventually).
As %B.gep.0 executes unconditionally in the latch, inbounds could
be preserved in the vector version.

https://alive2.llvm.org/ce/z/XWbMuD
As suggested as follow-up in
llvm#72164, manage inbounds via
VPRecipeWithIRFlags.

Note that in some cases we can now preserve inbounds in a few more
cases.
The fix used macros that confuses clang-format. This is formatted with
clang-format and then excluded from formatting.
switch (Pred) {
case ICmpInst::ICMP_UGT:
case ICmpInst::ICMP_UGE: {
if (!isLegalOrBeforeLegalizer({TargetOpcode::G_UMAX, DstTy}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a smarter set of legality predicates. It also makes sense if we're going to legalize this type to a type where it's going to be legal. Doesn't really need to be part of this patch, it's an existing issue scattered around

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just reads strange: (a) the shape of the G_UMAX must be legal (AArch64 has limitations) and (b) the G_UMAX must fit into the DstReg.

if (!isLegalOrBeforeLegalizer({TargetOpcode::G_UMAX, DstTy}))
return false;
MatchInfo = [=](MachineIRBuilder &B) {
B.setInstrAndDebugLoc(*Select);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly we should really just have the apply function always set the insert point appropriately instead of making every combine deal with this

Register True = Select->getTrueReg();
Register False = Select->getFalseReg();
LLT DstTy = MRI.getType(DstReg);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing hasOneUse check?

%c:_(<4 x s32>) = G_ICMP intpred(sle), %t(<4 x s32>), %f(<4 x s32>)
%sel:_(<4 x s32>) = exact G_SELECT %c, %t, %f
$q0 = COPY %sel(<4 x s32>)
...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test with multiple uses of the compare

SixWeining and others added 14 commits January 9, 2024 14:58
…relocs when sym is not in section (llvm#76433)

1, Follow RISCV 1df5ea2 to support generates relocs for .uleb128 which
can not be folded. Unlike RISCV, the located content of LoongArch should
be zero. LoongArch fixup uleb128 value by in-place addition and
subtraction reloc types named R_LARCH_{ADD,SUB}_ULEB128. The located
content can affect the result and R_LARCH_ADD_ULEB128 has enough info to
represent the first symbol value, so it needs to be set to zero.
2, Force relocs if sym is not in section so that it can emit relocs for
external symbol.

Fixes:
llvm#72960 (comment)
…rCodeAlign (llvm#77236)

The shouldInsertExtraNopBytesForCodeAlign() need STI to check whether
relax is enabled or not. It is initialized when call setEmitNops. The
setEmitNops may not be called in a section which has instructions but is
not executable. In this case uninitialized STI will cause problems.
Thus, check hasEmitNops before call it.

Fixes:
llvm#76552 (comment)
llvm#76848)

As per the docs [1]:

```
In absence of an explicit layout, a memref is considered to have a
multi-dimensional identity affine map layout.
```

This patch makes sure that MemRefs with no strides (i.e. no explicit
layout) are treated as contiguous when checking whether a particular
vector is a contiguous slice of the given MemRef.

[1] https://mlir.llvm.org/docs/Dialects/Builtin/#layout

Follow-up for llvm#76428.
The library implementation is just a wrapper around a call to the
intrinsic, but loses metadata. Swap out the call site to the intrinsic
so that the lowering can see the !fpmath metadata and fast math flags.

Since d56e0d0, clang started placing
!fpmath on OpenCL library sqrt calls. Also don't bother emitting
native_sqrt anymore, it's just another wrapper around llvm.sqrt.
…77298)

When initializing a union that constrain a struct with a flexible array
member, and the initializer list is empty, we currently trigger an
assertion failure. This happens because getFlexibleArrayInitChars()
assumes that the initializer list is non-empty.

Fixes llvm#77085.
This is done using libquadmath and the mappings are only available if
libquadmath was found by cmake.
Support for non quad bessels is already available on POSIX platform
using libm extensions.
…kerParallel to have a common library. Part 1. (llvm#75925)

This patch creates DWARFLinkerBase library, places DWARFLinker code into
DWARFLinker\Classic, places DWARFLinkerParallel into DWARFLinker\Parallel.
updates BOLT to use new library. This patch is NFC.
…vm#76942)

toFeatures and toFeatureVector both output a list of target feature
flags, just with a slightly different interface. toFeatures keeps any
unsupported extensions, and also provides a way to append negative
extensions (AddAllExtensions=true).

This patch combines them into one function, so that a later patch will
be be able to get a std::vector of features that includes all the
negative extensions, which was previously only possible through the
StrAlloc interface.
Instcombine canonicalizes selects to floating point and integer
minmax. This and the dag combiner canonicalize to floating point
minmax. None of them canonicalizes to integer minmax. On Neoverse V2
basic integer arithmetic and integer minmax have the same costs.
@tschuett
Copy link
Member Author

tschuett commented Jan 9, 2024

Sorry.

@ldionne ldionne removed request for a team January 9, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.