[AArch64][GlobalISel] Combine Shuffles of G_CONCAT_VECTORS #87489

chuongg3 · 2024-04-03T12:54:05Z

Combines G_SHUFFLE_VECTOR whose sources comes from G_CONCAT_VECTORS into a single G_CONCAT_VECTORS instruction.

// a = G_CONCAT_VECTORS x, y, undef, undef // b = G_CONCAT_VECTORS z, undef, undef, undef // c = G_SHUFFLE_VECTORS a, b, <0, 1, 4, undef> // ===> // c = G_CONCAT_VECTORS x, y, z, undef

davemgreen · 2024-04-04T06:24:46Z

Is the first tryCombineConcat patch needed for the shuffle combine part, or can they be separate? Thanks

llvmbot · 2024-04-05T19:01:13Z

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: None (chuongg3)

Changes

Combines G_SHUFFLE_VECTOR whose sources comes from G_CONCAT_VECTORS into a single G_CONCAT_VECTORS instruction.

// a = G_CONCAT_VECTORS x, y, undef, undef // b = G_CONCAT_VECTORS z, undef, undef, undef // c = G_SHUFFLE_VECTORS a, b, <0, 1, 4, undef> // ===> // c = G_CONCAT_VECTORS x, y, z, undef

The first pre-commit comes from #85047

Full diff: https://github.com/llvm/llvm-project/pull/87489.diff

4 Files Affected:

(modified) llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h (+5)
(modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+15-1)
(modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (+73)
(added) llvm/test/CodeGen/AArch64/GlobalISel/combine-shufflevector.mir (+202)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 3af32043391fec..4b8aec8e8a5dd6 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -237,6 +237,11 @@ class CombinerHelper {
   /// or an implicit_def if \p Ops is empty.
   void applyCombineConcatVectors(MachineInstr &MI, SmallVector<Register> &Ops);
 
+  bool matchCombineShuffleConcat(MachineInstr &MI, SmallVector<Register> &Ops);
+  /// Replace \p MI with a flattened build_vector with \p Ops
+  /// or an implicit_def if \p Ops is empty.
+  void applyCombineShuffleConcat(MachineInstr &MI, SmallVector<Register> &Ops);
+
   /// Try to combine G_SHUFFLE_VECTOR into G_CONCAT_VECTORS.
   /// Returns true if MI changed.
   ///
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 778ff7e437eb50..8401e48c1bc9eb 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1507,6 +1507,18 @@ def combine_concat_vector : GICombineRule<
         [{ return Helper.matchCombineConcatVectors(*${root}, ${matchinfo}); }]),
   (apply [{ Helper.applyCombineConcatVectors(*${root}, ${matchinfo}); }])>;
 
+// Combines Shuffles of Concats
+// a = G_CONCAT_VECTORS x, y, undef, undef
+// b = G_CONCAT_VECTORS z, undef, undef, undef
+// c = G_SHUFFLE_VECTORS a, b, <0, 1, 4, undef>
+// ===>
+// c = G_CONCAT_VECTORS x, y, z, undef
+def combine_shuffle_concat : GICombineRule<
+  (defs root:$root, concat_matchinfo:$matchinfo),
+  (match (wip_match_opcode G_SHUFFLE_VECTOR):$root,
+        [{ return Helper.matchCombineShuffleConcat(*${root}, ${matchinfo}); }]),
+  (apply [{ Helper.applyCombineShuffleConcat(*${root}, ${matchinfo}); }])>;
+
 // match_extract_of_element must be the first!
 def vector_ops_combines: GICombineGroup<[
 match_extract_of_element_undef_vector,
@@ -1538,6 +1550,7 @@ extract_vector_element_build_vector_trunc8,
 extract_vector_element_freeze
 ]>;
 
+
 // FIXME: These should use the custom predicate feature once it lands.
 def undef_combines : GICombineGroup<[undef_to_fp_zero, undef_to_int_zero,
                                      undef_to_negative_one,
@@ -1614,7 +1627,8 @@ def all_combines : GICombineGroup<[trivial_combines, vector_ops_combines,
     and_or_disjoint_mask, fma_combines, fold_binop_into_select,
     sub_add_reg, select_to_minmax, redundant_binop_in_equality,
     fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors,
-    combine_concat_vector, double_icmp_zero_and_or_combine, match_addos]>;
+    combine_concat_vector, double_icmp_zero_and_or_combine, match_addos,
+    combine_shuffle_concat]>;
 
 // A combine group used to for prelegalizer combiners at -O0. The combines in
 // this group have been selected based on experiments to balance code size and
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 719209e0edd5fb..9c18e0f136b715 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -303,6 +303,79 @@ void CombinerHelper::applyCombineConcatVectors(MachineInstr &MI,
   replaceRegWith(MRI, DstReg, NewDstReg);
 }
 
+bool CombinerHelper::matchCombineShuffleConcat(MachineInstr &MI,
+                                               SmallVector<Register> &Ops) {
+  ArrayRef<int> Mask = MI.getOperand(3).getShuffleMask();
+  auto ConcatMI1 = dyn_cast<GConcatVectors>(
+      getDefIgnoringCopies(MI.getOperand(1).getReg(), MRI));
+  auto ConcatMI2 = dyn_cast<GConcatVectors>(
+      getDefIgnoringCopies(MI.getOperand(2).getReg(), MRI));
+  if (!ConcatMI1 || !ConcatMI2)
+    return false;
+
+  // Check that the sources of the Concat instructions have the same type
+  if (MRI.getType(ConcatMI1->getSourceReg(0)) !=
+      MRI.getType(ConcatMI2->getSourceReg(0)))
+    return false;
+
+  LLT ConcatSrcTy = MRI.getType(ConcatMI1->getReg(1));
+  LLT ShuffleSrcTy1 = MRI.getType(MI.getOperand(1).getReg());
+  unsigned ConcatSrcNumElt = ConcatSrcTy.getNumElements();
+  for (unsigned i = 0; i < Mask.size(); i += ConcatSrcNumElt) {
+    // Check if the index takes a whole source register from G_CONCAT_VECTORS
+    // Assumes that all Sources of G_CONCAT_VECTORS are the same type
+    if (Mask[i] == -1) {
+      for (unsigned j = 1; j < ConcatSrcNumElt; j++) {
+        if (i + j >= Mask.size())
+          return false;
+        if (Mask[i + j] != -1)
+          return false;
+      }
+      Ops.push_back(0);
+    } else if (Mask[i] % (int)ConcatSrcNumElt == 0) {
+      for (unsigned j = 1; j < ConcatSrcNumElt; j++) {
+        if (i + j >= Mask.size())
+          return false;
+        if (Mask[i + j] != Mask[i] + (int)j)
+          return false;
+      }
+      // Retrieve the source register from its respective G_CONCAT_VECTORS
+      // instruction
+      if (Mask[i] < (int)ShuffleSrcTy1.getNumElements()) {
+        Ops.push_back(ConcatMI1->getSourceReg(Mask[i] / (int)ConcatSrcNumElt));
+      } else {
+        Ops.push_back(ConcatMI2->getSourceReg(Mask[i] / (int)ConcatSrcNumElt -
+                                              (int)ConcatMI1->getNumSources()));
+      }
+    } else {
+      return false;
+    }
+  }
+
+  if (Ops.size() == 0)
+    return false;
+  // Only deal with cases where G_CONCAT_VECTORS sources are all the same type
+  if ((Mask.size() - (Ops.size() * ConcatSrcNumElt)) %
+          MRI.getType(Ops[0]).getNumElements() !=
+      0)
+    return false;
+  return true;
+}
+
+void CombinerHelper::applyCombineShuffleConcat(MachineInstr &MI,
+                                               SmallVector<Register> &Ops) {
+  LLT SrcTy = MRI.getType(Ops[0]);
+  Register UndefReg = Builder.buildUndef(SrcTy).getReg(0);
+
+  for (unsigned i = 0; i < Ops.size(); i++) {
+    if (Ops[i] == 0)
+      Ops[i] = UndefReg;
+  }
+
+  Builder.buildConcatVectors(MI.getOperand(0).getReg(), Ops);
+  MI.eraseFromParent();
+}
+
 bool CombinerHelper::tryCombineShuffleVector(MachineInstr &MI) {
   SmallVector<Register, 4> Ops;
   if (matchCombineShuffleVector(MI, Ops)) {
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-shufflevector.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-shufflevector.mir
new file mode 100644
index 00000000000000..0de989f8be75d7
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-shufflevector.mir
@@ -0,0 +1,202 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc -o - -mtriple=aarch64-unknown-unknown -run-pass=aarch64-prelegalizer-combiner -verify-machineinstrs %s | FileCheck %s
+
+---
+name:            shuffle_concat_1
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2, $x3
+
+    ; CHECK-LABEL: name: shuffle_concat_1
+    ; CHECK: liveins: $x0, $x1, $x2, $x3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %p2:_(p0) = COPY $x1
+    ; CHECK-NEXT: %p3:_(p0) = COPY $x2
+    ; CHECK-NEXT: %p4:_(p0) = COPY $x3
+    ; CHECK-NEXT: %a:_(<4 x s8>) = G_LOAD %p4(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %b:_(<4 x s8>) = G_LOAD %p3(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %c:_(<4 x s8>) = G_LOAD %p2(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: %z:_(<16 x s8>) = G_CONCAT_VECTORS %a(<4 x s8>), %b(<4 x s8>), %c(<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: $q0 = COPY %z(<16 x s8>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %p1:_(p0) = COPY $x0
+    %p2:_(p0) = COPY $x1
+    %p3:_(p0) = COPY $x2
+    %p4:_(p0) = COPY $x3
+
+    %ImpDef:_(<4 x s8>) = G_IMPLICIT_DEF
+    %a:_(<4 x s8>) = G_LOAD %p4:_(p0) :: (load (<4 x s8>))
+    %b:_(<4 x s8>) = G_LOAD %p3:_(p0) :: (load (<4 x s8>))
+    %c:_(<4 x s8>) = G_LOAD %p2:_(p0) :: (load (<4 x s8>))
+    %d:_(<4 x s8>) = G_LOAD %p1:_(p0) :: (load (<4 x s8>))
+
+    %x:_(<16 x s8>) = G_SHUFFLE_VECTOR %a:_(<4 x s8>), %b:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, undef, undef, undef, undef, undef, undef, undef, undef)
+    %y:_(<16 x s8>) = G_SHUFFLE_VECTOR %c:_(<4 x s8>), %d:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %z:_(<16 x s8>) = G_SHUFFLE_VECTOR %x:_(<16 x s8>), %y:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, -1, -1, -1, -1)
+
+    $q0 = COPY %z(<16 x s8>)
+    RET_ReallyLR implicit $q0
+...
+
+---
+name:            shuffle_concat_2
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2, $x3
+
+    ; CHECK-LABEL: name: shuffle_concat_2
+    ; CHECK: liveins: $x0, $x1, $x2, $x3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %p1:_(p0) = COPY $x0
+    ; CHECK-NEXT: %p2:_(p0) = COPY $x1
+    ; CHECK-NEXT: %p3:_(p0) = COPY $x2
+    ; CHECK-NEXT: %p4:_(p0) = COPY $x3
+    ; CHECK-NEXT: %a:_(<4 x s8>) = G_LOAD %p4(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %b:_(<4 x s8>) = G_LOAD %p3(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %c:_(<4 x s8>) = G_LOAD %p2(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %d:_(<4 x s8>) = G_LOAD %p1(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %z:_(<16 x s8>) = G_CONCAT_VECTORS %a(<4 x s8>), %b(<4 x s8>), %c(<4 x s8>), %d(<4 x s8>)
+    ; CHECK-NEXT: $q0 = COPY %z(<16 x s8>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %p1:_(p0) = COPY $x0
+    %p2:_(p0) = COPY $x1
+    %p3:_(p0) = COPY $x2
+    %p4:_(p0) = COPY $x3
+
+    %ImpDef:_(<4 x s8>) = G_IMPLICIT_DEF
+    %a:_(<4 x s8>) = G_LOAD %p4:_(p0) :: (load (<4 x s8>))
+    %b:_(<4 x s8>) = G_LOAD %p3:_(p0) :: (load (<4 x s8>))
+    %c:_(<4 x s8>) = G_LOAD %p2:_(p0) :: (load (<4 x s8>))
+    %d:_(<4 x s8>) = G_LOAD %p1:_(p0) :: (load (<4 x s8>))
+
+    %v:_(<16 x s8>) = G_SHUFFLE_VECTOR %a:_(<4 x s8>), %b:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, undef, undef, undef, undef, undef, undef, undef, undef)
+    %w:_(<16 x s8>) = G_SHUFFLE_VECTOR %c:_(<4 x s8>), %ImpDef:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %x:_(<16 x s8>) = G_SHUFFLE_VECTOR %v:_(<16 x s8>), %w:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, undef, undef, undef, undef)
+    %y:_(<16 x s8>) = G_SHUFFLE_VECTOR %d:_(<4 x s8>), %ImpDef:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %z:_(<16 x s8>) = G_SHUFFLE_VECTOR %x:_(<16 x s8>), %y:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 18, 19)
+
+    $q0 = COPY %z(<16 x s8>)
+    RET_ReallyLR implicit $q0
+...
+
+---
+name:            shuffle_concat_3
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2, $x3
+
+    ; CHECK-LABEL: name: shuffle_concat_3
+    ; CHECK: liveins: $x0, $x1, $x2, $x3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %p2:_(p0) = COPY $x1
+    ; CHECK-NEXT: %p3:_(p0) = COPY $x2
+    ; CHECK-NEXT: %p4:_(p0) = COPY $x3
+    ; CHECK-NEXT: %a:_(<4 x s8>) = G_LOAD %p4(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %b:_(<4 x s8>) = G_LOAD %p3(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %c:_(<4 x s8>) = G_LOAD %p2(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<16 x s8>) = G_CONCAT_VECTORS %a(<4 x s8>), %b(<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: [[CONCAT_VECTORS1:%[0-9]+]]:_(<16 x s8>) = G_CONCAT_VECTORS %c(<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: %z:_(<16 x s8>) = G_SHUFFLE_VECTOR [[CONCAT_VECTORS]](<16 x s8>), [[CONCAT_VECTORS1]], shufflemask(0, undef, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, undef, undef, undef, undef)
+    ; CHECK-NEXT: $q0 = COPY %z(<16 x s8>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %p1:_(p0) = COPY $x0
+    %p2:_(p0) = COPY $x1
+    %p3:_(p0) = COPY $x2
+    %p4:_(p0) = COPY $x3
+
+    %ImpDef:_(<4 x s8>) = G_IMPLICIT_DEF
+    %a:_(<4 x s8>) = G_LOAD %p4:_(p0) :: (load (<4 x s8>))
+    %b:_(<4 x s8>) = G_LOAD %p3:_(p0) :: (load (<4 x s8>))
+    %c:_(<4 x s8>) = G_LOAD %p2:_(p0) :: (load (<4 x s8>))
+    %d:_(<4 x s8>) = G_LOAD %p1:_(p0) :: (load (<4 x s8>))
+
+    %x:_(<16 x s8>) = G_SHUFFLE_VECTOR %a:_(<4 x s8>), %b:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, undef, undef, undef, undef, undef, undef, undef, undef)
+    %y:_(<16 x s8>) = G_SHUFFLE_VECTOR %c:_(<4 x s8>), %d:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %z:_(<16 x s8>) = G_SHUFFLE_VECTOR %x:_(<16 x s8>), %y:_, shufflemask(0, -1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, -1, -1, -1, -1)
+
+    $q0 = COPY %z(<16 x s8>)
+    RET_ReallyLR implicit $q0
+...
+
+---
+name:            shuffle_concat_4
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2, $x3
+
+    ; CHECK-LABEL: name: shuffle_concat_4
+    ; CHECK: liveins: $x0, $x1, $x2, $x3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %p2:_(p0) = COPY $x1
+    ; CHECK-NEXT: %p4:_(p0) = COPY $x3
+    ; CHECK-NEXT: %a:_(<4 x s8>) = G_LOAD %p4(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %c:_(<4 x s8>) = G_LOAD %p2(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: %z:_(<16 x s8>) = G_CONCAT_VECTORS %a(<4 x s8>), [[DEF]](<4 x s8>), %c(<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: $q0 = COPY %z(<16 x s8>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %p1:_(p0) = COPY $x0
+    %p2:_(p0) = COPY $x1
+    %p3:_(p0) = COPY $x2
+    %p4:_(p0) = COPY $x3
+
+    %ImpDef:_(<4 x s8>) = G_IMPLICIT_DEF
+    %a:_(<4 x s8>) = G_LOAD %p4:_(p0) :: (load (<4 x s8>))
+    %b:_(<4 x s8>) = G_LOAD %p3:_(p0) :: (load (<4 x s8>))
+    %c:_(<4 x s8>) = G_LOAD %p2:_(p0) :: (load (<4 x s8>))
+    %d:_(<4 x s8>) = G_LOAD %p1:_(p0) :: (load (<4 x s8>))
+
+    %x:_(<16 x s8>) = G_SHUFFLE_VECTOR %a:_(<4 x s8>), %b:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, undef, undef, undef, undef, undef, undef, undef, undef)
+    %y:_(<16 x s8>) = G_SHUFFLE_VECTOR %c:_(<4 x s8>), %d:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %z:_(<16 x s8>) = G_SHUFFLE_VECTOR %x:_(<16 x s8>), %y:_, shufflemask(0, 1, 2, 3, -1, -1, -1, -1, 16, 17, 18, 19, -1, -1, -1, -1)
+
+    $q0 = COPY %z(<16 x s8>)
+    RET_ReallyLR implicit $q0
+...
+
+---
+name:            shuffle_concat_5
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2, $x3
+
+    ; CHECK-LABEL: name: shuffle_concat_5
+    ; CHECK: liveins: $x0, $x1, $x2, $x3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %p2:_(p0) = COPY $x1
+    ; CHECK-NEXT: %p3:_(p0) = COPY $x2
+    ; CHECK-NEXT: %p4:_(p0) = COPY $x3
+    ; CHECK-NEXT: %a:_(<4 x s8>) = G_LOAD %p4(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %b:_(<4 x s8>) = G_LOAD %p3(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %c:_(<4 x s8>) = G_LOAD %p2(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<16 x s8>) = G_CONCAT_VECTORS %a(<4 x s8>), %b(<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: [[CONCAT_VECTORS1:%[0-9]+]]:_(<16 x s8>) = G_CONCAT_VECTORS %c(<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: %z:_(<16 x s8>) = G_SHUFFLE_VECTOR [[CONCAT_VECTORS]](<16 x s8>), [[CONCAT_VECTORS1]], shufflemask(undef, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, undef, undef, undef, undef)
+    ; CHECK-NEXT: $q0 = COPY %z(<16 x s8>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %p1:_(p0) = COPY $x0
+    %p2:_(p0) = COPY $x1
+    %p3:_(p0) = COPY $x2
+    %p4:_(p0) = COPY $x3
+
+    %ImpDef:_(<4 x s8>) = G_IMPLICIT_DEF
+    %a:_(<4 x s8>) = G_LOAD %p4:_(p0) :: (load (<4 x s8>))
+    %b:_(<4 x s8>) = G_LOAD %p3:_(p0) :: (load (<4 x s8>))
+    %c:_(<4 x s8>) = G_LOAD %p2:_(p0) :: (load (<4 x s8>))
+    %d:_(<4 x s8>) = G_LOAD %p1:_(p0) :: (load (<4 x s8>))
+
+    %x:_(<16 x s8>) = G_SHUFFLE_VECTOR %a:_(<4 x s8>), %b:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, undef, undef, undef, undef, undef, undef, undef, undef)
+    %y:_(<16 x s8>) = G_SHUFFLE_VECTOR %c:_(<4 x s8>), %d:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %z:_(<16 x s8>) = G_SHUFFLE_VECTOR %x:_(<16 x s8>), %y:_, shufflemask(-1, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, -1, -1, -1, -1)
+
+    $q0 = COPY %z(<16 x s8>)
+    RET_ReallyLR implicit $q0
+...

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

arsenm · 2024-04-06T20:31:24Z

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

+      for (unsigned j = 1; j < ConcatSrcNumElt; j++) {
+        if (i + j >= Mask.size())
+          return false;
+        if (Mask[i + j] != Mask[i] + (int)j)


Prefer static_cast (or just make the iteration variable int?)

I have changed it to static_cast, decided against making the iteration variable int since they are used to compared with size() which is an unsigned long

tschuett · 2024-04-07T07:36:05Z

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

+      Ops[i] = UndefReg;
+  }
+
+  Builder.buildConcatVectors(MI.getOperand(0).getReg(), Ops);


Legality check before building.

if (!isLegalOrBeforeLegalizer({G_CONCAT_VECTORS, {DstTy, VectorTy}})) return false;

tschuett · 2024-04-07T07:36:23Z

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

+void CombinerHelper::applyCombineShuffleConcat(MachineInstr &MI,
+                                               SmallVector<Register> &Ops) {
+  LLT SrcTy = MRI.getType(Ops[0]);
+  Register UndefReg = Builder.buildUndef(SrcTy).getReg(0);


Legality check before building.

tschuett · 2024-04-08T19:29:43Z

varargs are unfortunately not here yet, but there are now several combines registered on G_SHUFFLE_VECTOR that will always fail because the input is bad.

def shuffle_vector_of_concats2 : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$matchinfo),
   (match (G_CONCAT_VECTORS $src1, $a, $b),
          (G_CONCAT_VECTORS $src2, $c, $d),
          (G_SHUFFLE_VECTOR $root, $src1, $src2, $mask),

def shuffle_vector_of_concats3 : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$matchinfo),
   (match (G_CONCAT_VECTORS $src1, $a, $b, $c),
          (G_CONCAT_VECTORS $src2, $d, $e, $f),
          (G_SHUFFLE_VECTOR $root, $src1, $src2, $mask),

davemgreen · 2024-04-10T13:42:52Z

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

+void CombinerHelper::applyCombineShuffleConcat(MachineInstr &MI,
+                                               SmallVector<Register> &Ops) {
+  LLT SrcTy = MRI.getType(Ops[0]);
+  Register UndefReg = Builder.buildUndef(SrcTy).getReg(0);


Can you change this to only make the Under if it is needed?

chuongg3 · 2024-04-10T19:12:48Z

varargs are unfortunately not here yet, but there are now several combines registered on G_SHUFFLE_VECTOR that will always fail because the input is bad.

def shuffle_vector_of_concats2 : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$matchinfo),
   (match (G_CONCAT_VECTORS $src1, $a, $b),
          (G_CONCAT_VECTORS $src2, $c, $d),
          (G_SHUFFLE_VECTOR $root, $src1, $src2, $mask),

def shuffle_vector_of_concats3 : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$matchinfo),
   (match (G_CONCAT_VECTORS $src1, $a, $b, $c),
          (G_CONCAT_VECTORS $src2, $d, $e, $f),
          (G_SHUFFLE_VECTOR $root, $src1, $src2, $mask),

I believe using one pattern to match all the different G_CONCAT_VECTORS sizes would allow for more generalization of arbitrary G_CONCAT_VECTORS sizes until variable length arguments are supported in tablegen.

tschuett · 2024-04-11T04:20:15Z

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

+  for (unsigned i = 0; i < Ops.size(); i++) {
+    if (Ops[i] == 0) {
+      if (UndefReg == 0)
+        UndefReg = Builder.buildUndef(SrcTy).getReg(0);


if (!isLegalOrBeforeLegalizer({G_IMPLICIT_DEF, SrcTy})) return false;

These do sound like they might be worth adding (to the match method), if these can run post-legalization as well as before. This is mostly transforming nodes to similarly typed nodes (and shuffles are often not arbitrarily legal), but it's probably good to be safe.

llvm/include/llvm/Target/GlobalISel/Combine.td

davemgreen · 2024-04-11T07:55:35Z

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

+  for (unsigned i = 0; i < Ops.size(); i++) {
+    if (Ops[i] == 0) {
+      if (UndefReg == 0)
+        UndefReg = Builder.buildUndef(SrcTy).getReg(0);


These do sound like they might be worth adding (to the match method), if these can run post-legalization as well as before. This is mostly transforming nodes to similarly typed nodes (and shuffles are often not arbitrarily legal), but it's probably good to be safe.

davemgreen

Thanks. This LGTM with a comment about moving the implicit def check.

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp

…T_VECTORS

Matches masks of G_SHUFFLE_VECTOR to see if they can be combined to a G_CONCAT_VECTORS instruction a = G_CONCAT_VECTORS x, y, undef, undef b = G_CONCAT_VECTORS z, undef, undef, undef c = G_SHUFFLE_VECTORS a, b, <0, 1, 4, undef> ===> c = G_CONCAT_VECTORS x, y, z, undef

…CTORS

…NCAT_VECTORS

…of G_CONCAT_VECTORS

chuongg3 requested review from aemerson and davemgreen April 3, 2024 12:54

chuongg3 force-pushed the GlobalISel_Combine_Shuffle_Concat branch from 545276a to 5865a11 Compare April 5, 2024 19:00

llvmbot added backend:AArch64 llvm:globalisel labels Apr 5, 2024

arsenm reviewed Apr 6, 2024

View reviewed changes

tschuett reviewed Apr 7, 2024

View reviewed changes

davemgreen reviewed Apr 10, 2024

View reviewed changes

tschuett reviewed Apr 11, 2024

View reviewed changes

davemgreen reviewed Apr 11, 2024

View reviewed changes

davemgreen approved these changes Apr 12, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated Show resolved Hide resolved

chuongg3 added 6 commits April 22, 2024 09:15

[AArch64][GlobalISel] Pre-commit tests for Combine Shuffle of G_CONCA…

13d7443

…T_VECTORS

fixup! [AAAArch64][GlobalISel] Combine Shuffles of G_CONCAT_VECTORS

2912dce

fixup! fixup! [AAAArch64][GlobalISel] Combine Shuffles of G_CONCAT_VE…

fe4e645

…CTORS

fixup! fixup! fixup! [AAAArch64][GlobalISel] Combine Shuffles of G_CO…

274fd91

…NCAT_VECTORS

fixup! fixup! fixup! fixup! [AAAArch64][GlobalISel] Combine Shuffles …

5f03323

…of G_CONCAT_VECTORS

chuongg3 force-pushed the GlobalISel_Combine_Shuffle_Concat branch from 041fde6 to 5f03323 Compare April 22, 2024 09:15

chuongg3 merged commit 821935b into llvm:main Apr 22, 2024
3 of 4 checks passed

This was referenced Apr 22, 2024

[libc] Make fenv and math tests preserve fenv_t state #89658

Merged

[libc] Clean up alternate test framework support #89659

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][GlobalISel] Combine Shuffles of G_CONCAT_VECTORS #87489

[AArch64][GlobalISel] Combine Shuffles of G_CONCAT_VECTORS #87489

chuongg3 commented Apr 3, 2024 •

edited

davemgreen commented Apr 4, 2024

llvmbot commented Apr 5, 2024 •

edited

arsenm Apr 6, 2024

chuongg3 Apr 8, 2024

tschuett Apr 7, 2024

tschuett Apr 11, 2024

tschuett Apr 7, 2024

tschuett commented Apr 8, 2024

davemgreen Apr 10, 2024

chuongg3 commented Apr 10, 2024

tschuett Apr 11, 2024

davemgreen Apr 11, 2024

davemgreen Apr 11, 2024

davemgreen left a comment

[AArch64][GlobalISel] Combine Shuffles of G_CONCAT_VECTORS #87489

[AArch64][GlobalISel] Combine Shuffles of G_CONCAT_VECTORS #87489

Conversation

chuongg3 commented Apr 3, 2024 • edited

davemgreen commented Apr 4, 2024

llvmbot commented Apr 5, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tschuett commented Apr 8, 2024

Choose a reason for hiding this comment

chuongg3 commented Apr 10, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

chuongg3 commented Apr 3, 2024 •

edited

llvmbot commented Apr 5, 2024 •

edited