Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][GlobalISel] Combine Shuffles of G_CONCAT_VECTORS #87489

Merged
merged 6 commits into from
Apr 22, 2024

Conversation

chuongg3
Copy link
Contributor

@chuongg3 chuongg3 commented Apr 3, 2024

Combines G_SHUFFLE_VECTOR whose sources comes from G_CONCAT_VECTORS into a single G_CONCAT_VECTORS instruction.

// a = G_CONCAT_VECTORS x, y, undef, undef // b = G_CONCAT_VECTORS z, undef, undef, undef // c = G_SHUFFLE_VECTORS a, b, <0, 1, 4, undef> // ===> // c = G_CONCAT_VECTORS x, y, z, undef

@davemgreen
Copy link
Collaborator

Is the first tryCombineConcat patch needed for the shuffle combine part, or can they be separate? Thanks

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 5, 2024

@llvm/pr-subscribers-backend-aarch64

@llvm/pr-subscribers-llvm-globalisel

Author: None (chuongg3)

Changes

Combines G_SHUFFLE_VECTOR whose sources comes from G_CONCAT_VECTORS into a single G_CONCAT_VECTORS instruction.

// a = G_CONCAT_VECTORS x, y, undef, undef // b = G_CONCAT_VECTORS z, undef, undef, undef // c = G_SHUFFLE_VECTORS a, b, &lt;0, 1, 4, undef&gt; // ===&gt; // c = G_CONCAT_VECTORS x, y, z, undef

The first pre-commit comes from #85047


Full diff: https://github.com/llvm/llvm-project/pull/87489.diff

4 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h (+5)
  • (modified) llvm/include/llvm/Target/GlobalISel/Combine.td (+15-1)
  • (modified) llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp (+73)
  • (added) llvm/test/CodeGen/AArch64/GlobalISel/combine-shufflevector.mir (+202)
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
index 3af32043391fec..4b8aec8e8a5dd6 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
@@ -237,6 +237,11 @@ class CombinerHelper {
   /// or an implicit_def if \p Ops is empty.
   void applyCombineConcatVectors(MachineInstr &MI, SmallVector<Register> &Ops);
 
+  bool matchCombineShuffleConcat(MachineInstr &MI, SmallVector<Register> &Ops);
+  /// Replace \p MI with a flattened build_vector with \p Ops
+  /// or an implicit_def if \p Ops is empty.
+  void applyCombineShuffleConcat(MachineInstr &MI, SmallVector<Register> &Ops);
+
   /// Try to combine G_SHUFFLE_VECTOR into G_CONCAT_VECTORS.
   /// Returns true if MI changed.
   ///
diff --git a/llvm/include/llvm/Target/GlobalISel/Combine.td b/llvm/include/llvm/Target/GlobalISel/Combine.td
index 778ff7e437eb50..8401e48c1bc9eb 100644
--- a/llvm/include/llvm/Target/GlobalISel/Combine.td
+++ b/llvm/include/llvm/Target/GlobalISel/Combine.td
@@ -1507,6 +1507,18 @@ def combine_concat_vector : GICombineRule<
         [{ return Helper.matchCombineConcatVectors(*${root}, ${matchinfo}); }]),
   (apply [{ Helper.applyCombineConcatVectors(*${root}, ${matchinfo}); }])>;
 
+// Combines Shuffles of Concats
+// a = G_CONCAT_VECTORS x, y, undef, undef
+// b = G_CONCAT_VECTORS z, undef, undef, undef
+// c = G_SHUFFLE_VECTORS a, b, <0, 1, 4, undef>
+// ===>
+// c = G_CONCAT_VECTORS x, y, z, undef
+def combine_shuffle_concat : GICombineRule<
+  (defs root:$root, concat_matchinfo:$matchinfo),
+  (match (wip_match_opcode G_SHUFFLE_VECTOR):$root,
+        [{ return Helper.matchCombineShuffleConcat(*${root}, ${matchinfo}); }]),
+  (apply [{ Helper.applyCombineShuffleConcat(*${root}, ${matchinfo}); }])>;
+
 // match_extract_of_element must be the first!
 def vector_ops_combines: GICombineGroup<[
 match_extract_of_element_undef_vector,
@@ -1538,6 +1550,7 @@ extract_vector_element_build_vector_trunc8,
 extract_vector_element_freeze
 ]>;
 
+
 // FIXME: These should use the custom predicate feature once it lands.
 def undef_combines : GICombineGroup<[undef_to_fp_zero, undef_to_int_zero,
                                      undef_to_negative_one,
@@ -1614,7 +1627,8 @@ def all_combines : GICombineGroup<[trivial_combines, vector_ops_combines,
     and_or_disjoint_mask, fma_combines, fold_binop_into_select,
     sub_add_reg, select_to_minmax, redundant_binop_in_equality,
     fsub_to_fneg, commute_constant_to_rhs, match_ands, match_ors,
-    combine_concat_vector, double_icmp_zero_and_or_combine, match_addos]>;
+    combine_concat_vector, double_icmp_zero_and_or_combine, match_addos,
+    combine_shuffle_concat]>;
 
 // A combine group used to for prelegalizer combiners at -O0. The combines in
 // this group have been selected based on experiments to balance code size and
diff --git a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
index 719209e0edd5fb..9c18e0f136b715 100644
--- a/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
@@ -303,6 +303,79 @@ void CombinerHelper::applyCombineConcatVectors(MachineInstr &MI,
   replaceRegWith(MRI, DstReg, NewDstReg);
 }
 
+bool CombinerHelper::matchCombineShuffleConcat(MachineInstr &MI,
+                                               SmallVector<Register> &Ops) {
+  ArrayRef<int> Mask = MI.getOperand(3).getShuffleMask();
+  auto ConcatMI1 = dyn_cast<GConcatVectors>(
+      getDefIgnoringCopies(MI.getOperand(1).getReg(), MRI));
+  auto ConcatMI2 = dyn_cast<GConcatVectors>(
+      getDefIgnoringCopies(MI.getOperand(2).getReg(), MRI));
+  if (!ConcatMI1 || !ConcatMI2)
+    return false;
+
+  // Check that the sources of the Concat instructions have the same type
+  if (MRI.getType(ConcatMI1->getSourceReg(0)) !=
+      MRI.getType(ConcatMI2->getSourceReg(0)))
+    return false;
+
+  LLT ConcatSrcTy = MRI.getType(ConcatMI1->getReg(1));
+  LLT ShuffleSrcTy1 = MRI.getType(MI.getOperand(1).getReg());
+  unsigned ConcatSrcNumElt = ConcatSrcTy.getNumElements();
+  for (unsigned i = 0; i < Mask.size(); i += ConcatSrcNumElt) {
+    // Check if the index takes a whole source register from G_CONCAT_VECTORS
+    // Assumes that all Sources of G_CONCAT_VECTORS are the same type
+    if (Mask[i] == -1) {
+      for (unsigned j = 1; j < ConcatSrcNumElt; j++) {
+        if (i + j >= Mask.size())
+          return false;
+        if (Mask[i + j] != -1)
+          return false;
+      }
+      Ops.push_back(0);
+    } else if (Mask[i] % (int)ConcatSrcNumElt == 0) {
+      for (unsigned j = 1; j < ConcatSrcNumElt; j++) {
+        if (i + j >= Mask.size())
+          return false;
+        if (Mask[i + j] != Mask[i] + (int)j)
+          return false;
+      }
+      // Retrieve the source register from its respective G_CONCAT_VECTORS
+      // instruction
+      if (Mask[i] < (int)ShuffleSrcTy1.getNumElements()) {
+        Ops.push_back(ConcatMI1->getSourceReg(Mask[i] / (int)ConcatSrcNumElt));
+      } else {
+        Ops.push_back(ConcatMI2->getSourceReg(Mask[i] / (int)ConcatSrcNumElt -
+                                              (int)ConcatMI1->getNumSources()));
+      }
+    } else {
+      return false;
+    }
+  }
+
+  if (Ops.size() == 0)
+    return false;
+  // Only deal with cases where G_CONCAT_VECTORS sources are all the same type
+  if ((Mask.size() - (Ops.size() * ConcatSrcNumElt)) %
+          MRI.getType(Ops[0]).getNumElements() !=
+      0)
+    return false;
+  return true;
+}
+
+void CombinerHelper::applyCombineShuffleConcat(MachineInstr &MI,
+                                               SmallVector<Register> &Ops) {
+  LLT SrcTy = MRI.getType(Ops[0]);
+  Register UndefReg = Builder.buildUndef(SrcTy).getReg(0);
+
+  for (unsigned i = 0; i < Ops.size(); i++) {
+    if (Ops[i] == 0)
+      Ops[i] = UndefReg;
+  }
+
+  Builder.buildConcatVectors(MI.getOperand(0).getReg(), Ops);
+  MI.eraseFromParent();
+}
+
 bool CombinerHelper::tryCombineShuffleVector(MachineInstr &MI) {
   SmallVector<Register, 4> Ops;
   if (matchCombineShuffleVector(MI, Ops)) {
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/combine-shufflevector.mir b/llvm/test/CodeGen/AArch64/GlobalISel/combine-shufflevector.mir
new file mode 100644
index 00000000000000..0de989f8be75d7
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/combine-shufflevector.mir
@@ -0,0 +1,202 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 4
+# RUN: llc -o - -mtriple=aarch64-unknown-unknown -run-pass=aarch64-prelegalizer-combiner -verify-machineinstrs %s | FileCheck %s
+
+---
+name:            shuffle_concat_1
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2, $x3
+
+    ; CHECK-LABEL: name: shuffle_concat_1
+    ; CHECK: liveins: $x0, $x1, $x2, $x3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %p2:_(p0) = COPY $x1
+    ; CHECK-NEXT: %p3:_(p0) = COPY $x2
+    ; CHECK-NEXT: %p4:_(p0) = COPY $x3
+    ; CHECK-NEXT: %a:_(<4 x s8>) = G_LOAD %p4(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %b:_(<4 x s8>) = G_LOAD %p3(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %c:_(<4 x s8>) = G_LOAD %p2(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: %z:_(<16 x s8>) = G_CONCAT_VECTORS %a(<4 x s8>), %b(<4 x s8>), %c(<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: $q0 = COPY %z(<16 x s8>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %p1:_(p0) = COPY $x0
+    %p2:_(p0) = COPY $x1
+    %p3:_(p0) = COPY $x2
+    %p4:_(p0) = COPY $x3
+
+    %ImpDef:_(<4 x s8>) = G_IMPLICIT_DEF
+    %a:_(<4 x s8>) = G_LOAD %p4:_(p0) :: (load (<4 x s8>))
+    %b:_(<4 x s8>) = G_LOAD %p3:_(p0) :: (load (<4 x s8>))
+    %c:_(<4 x s8>) = G_LOAD %p2:_(p0) :: (load (<4 x s8>))
+    %d:_(<4 x s8>) = G_LOAD %p1:_(p0) :: (load (<4 x s8>))
+
+    %x:_(<16 x s8>) = G_SHUFFLE_VECTOR %a:_(<4 x s8>), %b:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, undef, undef, undef, undef, undef, undef, undef, undef)
+    %y:_(<16 x s8>) = G_SHUFFLE_VECTOR %c:_(<4 x s8>), %d:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %z:_(<16 x s8>) = G_SHUFFLE_VECTOR %x:_(<16 x s8>), %y:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, -1, -1, -1, -1)
+
+    $q0 = COPY %z(<16 x s8>)
+    RET_ReallyLR implicit $q0
+...
+
+---
+name:            shuffle_concat_2
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2, $x3
+
+    ; CHECK-LABEL: name: shuffle_concat_2
+    ; CHECK: liveins: $x0, $x1, $x2, $x3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %p1:_(p0) = COPY $x0
+    ; CHECK-NEXT: %p2:_(p0) = COPY $x1
+    ; CHECK-NEXT: %p3:_(p0) = COPY $x2
+    ; CHECK-NEXT: %p4:_(p0) = COPY $x3
+    ; CHECK-NEXT: %a:_(<4 x s8>) = G_LOAD %p4(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %b:_(<4 x s8>) = G_LOAD %p3(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %c:_(<4 x s8>) = G_LOAD %p2(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %d:_(<4 x s8>) = G_LOAD %p1(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %z:_(<16 x s8>) = G_CONCAT_VECTORS %a(<4 x s8>), %b(<4 x s8>), %c(<4 x s8>), %d(<4 x s8>)
+    ; CHECK-NEXT: $q0 = COPY %z(<16 x s8>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %p1:_(p0) = COPY $x0
+    %p2:_(p0) = COPY $x1
+    %p3:_(p0) = COPY $x2
+    %p4:_(p0) = COPY $x3
+
+    %ImpDef:_(<4 x s8>) = G_IMPLICIT_DEF
+    %a:_(<4 x s8>) = G_LOAD %p4:_(p0) :: (load (<4 x s8>))
+    %b:_(<4 x s8>) = G_LOAD %p3:_(p0) :: (load (<4 x s8>))
+    %c:_(<4 x s8>) = G_LOAD %p2:_(p0) :: (load (<4 x s8>))
+    %d:_(<4 x s8>) = G_LOAD %p1:_(p0) :: (load (<4 x s8>))
+
+    %v:_(<16 x s8>) = G_SHUFFLE_VECTOR %a:_(<4 x s8>), %b:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, undef, undef, undef, undef, undef, undef, undef, undef)
+    %w:_(<16 x s8>) = G_SHUFFLE_VECTOR %c:_(<4 x s8>), %ImpDef:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %x:_(<16 x s8>) = G_SHUFFLE_VECTOR %v:_(<16 x s8>), %w:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, undef, undef, undef, undef)
+    %y:_(<16 x s8>) = G_SHUFFLE_VECTOR %d:_(<4 x s8>), %ImpDef:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %z:_(<16 x s8>) = G_SHUFFLE_VECTOR %x:_(<16 x s8>), %y:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 18, 19)
+
+    $q0 = COPY %z(<16 x s8>)
+    RET_ReallyLR implicit $q0
+...
+
+---
+name:            shuffle_concat_3
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2, $x3
+
+    ; CHECK-LABEL: name: shuffle_concat_3
+    ; CHECK: liveins: $x0, $x1, $x2, $x3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %p2:_(p0) = COPY $x1
+    ; CHECK-NEXT: %p3:_(p0) = COPY $x2
+    ; CHECK-NEXT: %p4:_(p0) = COPY $x3
+    ; CHECK-NEXT: %a:_(<4 x s8>) = G_LOAD %p4(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %b:_(<4 x s8>) = G_LOAD %p3(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %c:_(<4 x s8>) = G_LOAD %p2(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<16 x s8>) = G_CONCAT_VECTORS %a(<4 x s8>), %b(<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: [[CONCAT_VECTORS1:%[0-9]+]]:_(<16 x s8>) = G_CONCAT_VECTORS %c(<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: %z:_(<16 x s8>) = G_SHUFFLE_VECTOR [[CONCAT_VECTORS]](<16 x s8>), [[CONCAT_VECTORS1]], shufflemask(0, undef, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, undef, undef, undef, undef)
+    ; CHECK-NEXT: $q0 = COPY %z(<16 x s8>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %p1:_(p0) = COPY $x0
+    %p2:_(p0) = COPY $x1
+    %p3:_(p0) = COPY $x2
+    %p4:_(p0) = COPY $x3
+
+    %ImpDef:_(<4 x s8>) = G_IMPLICIT_DEF
+    %a:_(<4 x s8>) = G_LOAD %p4:_(p0) :: (load (<4 x s8>))
+    %b:_(<4 x s8>) = G_LOAD %p3:_(p0) :: (load (<4 x s8>))
+    %c:_(<4 x s8>) = G_LOAD %p2:_(p0) :: (load (<4 x s8>))
+    %d:_(<4 x s8>) = G_LOAD %p1:_(p0) :: (load (<4 x s8>))
+
+    %x:_(<16 x s8>) = G_SHUFFLE_VECTOR %a:_(<4 x s8>), %b:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, undef, undef, undef, undef, undef, undef, undef, undef)
+    %y:_(<16 x s8>) = G_SHUFFLE_VECTOR %c:_(<4 x s8>), %d:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %z:_(<16 x s8>) = G_SHUFFLE_VECTOR %x:_(<16 x s8>), %y:_, shufflemask(0, -1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, -1, -1, -1, -1)
+
+    $q0 = COPY %z(<16 x s8>)
+    RET_ReallyLR implicit $q0
+...
+
+---
+name:            shuffle_concat_4
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2, $x3
+
+    ; CHECK-LABEL: name: shuffle_concat_4
+    ; CHECK: liveins: $x0, $x1, $x2, $x3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %p2:_(p0) = COPY $x1
+    ; CHECK-NEXT: %p4:_(p0) = COPY $x3
+    ; CHECK-NEXT: %a:_(<4 x s8>) = G_LOAD %p4(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %c:_(<4 x s8>) = G_LOAD %p2(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: %z:_(<16 x s8>) = G_CONCAT_VECTORS %a(<4 x s8>), [[DEF]](<4 x s8>), %c(<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: $q0 = COPY %z(<16 x s8>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %p1:_(p0) = COPY $x0
+    %p2:_(p0) = COPY $x1
+    %p3:_(p0) = COPY $x2
+    %p4:_(p0) = COPY $x3
+
+    %ImpDef:_(<4 x s8>) = G_IMPLICIT_DEF
+    %a:_(<4 x s8>) = G_LOAD %p4:_(p0) :: (load (<4 x s8>))
+    %b:_(<4 x s8>) = G_LOAD %p3:_(p0) :: (load (<4 x s8>))
+    %c:_(<4 x s8>) = G_LOAD %p2:_(p0) :: (load (<4 x s8>))
+    %d:_(<4 x s8>) = G_LOAD %p1:_(p0) :: (load (<4 x s8>))
+
+    %x:_(<16 x s8>) = G_SHUFFLE_VECTOR %a:_(<4 x s8>), %b:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, undef, undef, undef, undef, undef, undef, undef, undef)
+    %y:_(<16 x s8>) = G_SHUFFLE_VECTOR %c:_(<4 x s8>), %d:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %z:_(<16 x s8>) = G_SHUFFLE_VECTOR %x:_(<16 x s8>), %y:_, shufflemask(0, 1, 2, 3, -1, -1, -1, -1, 16, 17, 18, 19, -1, -1, -1, -1)
+
+    $q0 = COPY %z(<16 x s8>)
+    RET_ReallyLR implicit $q0
+...
+
+---
+name:            shuffle_concat_5
+tracksRegLiveness: true
+body:             |
+  bb.0:
+    liveins: $x0, $x1, $x2, $x3
+
+    ; CHECK-LABEL: name: shuffle_concat_5
+    ; CHECK: liveins: $x0, $x1, $x2, $x3
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %p2:_(p0) = COPY $x1
+    ; CHECK-NEXT: %p3:_(p0) = COPY $x2
+    ; CHECK-NEXT: %p4:_(p0) = COPY $x3
+    ; CHECK-NEXT: %a:_(<4 x s8>) = G_LOAD %p4(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %b:_(<4 x s8>) = G_LOAD %p3(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: %c:_(<4 x s8>) = G_LOAD %p2(p0) :: (load (<4 x s8>))
+    ; CHECK-NEXT: [[DEF:%[0-9]+]]:_(<4 x s8>) = G_IMPLICIT_DEF
+    ; CHECK-NEXT: [[CONCAT_VECTORS:%[0-9]+]]:_(<16 x s8>) = G_CONCAT_VECTORS %a(<4 x s8>), %b(<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: [[CONCAT_VECTORS1:%[0-9]+]]:_(<16 x s8>) = G_CONCAT_VECTORS %c(<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>), [[DEF]](<4 x s8>)
+    ; CHECK-NEXT: %z:_(<16 x s8>) = G_SHUFFLE_VECTOR [[CONCAT_VECTORS]](<16 x s8>), [[CONCAT_VECTORS1]], shufflemask(undef, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, undef, undef, undef, undef)
+    ; CHECK-NEXT: $q0 = COPY %z(<16 x s8>)
+    ; CHECK-NEXT: RET_ReallyLR implicit $q0
+    %p1:_(p0) = COPY $x0
+    %p2:_(p0) = COPY $x1
+    %p3:_(p0) = COPY $x2
+    %p4:_(p0) = COPY $x3
+
+    %ImpDef:_(<4 x s8>) = G_IMPLICIT_DEF
+    %a:_(<4 x s8>) = G_LOAD %p4:_(p0) :: (load (<4 x s8>))
+    %b:_(<4 x s8>) = G_LOAD %p3:_(p0) :: (load (<4 x s8>))
+    %c:_(<4 x s8>) = G_LOAD %p2:_(p0) :: (load (<4 x s8>))
+    %d:_(<4 x s8>) = G_LOAD %p1:_(p0) :: (load (<4 x s8>))
+
+    %x:_(<16 x s8>) = G_SHUFFLE_VECTOR %a:_(<4 x s8>), %b:_, shufflemask(0, 1, 2, 3, 4, 5, 6, 7, undef, undef, undef, undef, undef, undef, undef, undef)
+    %y:_(<16 x s8>) = G_SHUFFLE_VECTOR %c:_(<4 x s8>), %d:_, shufflemask(0, 1, 2, 3, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef, undef)
+    %z:_(<16 x s8>) = G_SHUFFLE_VECTOR %x:_(<16 x s8>), %y:_, shufflemask(-1, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, -1, -1, -1, -1)
+
+    $q0 = COPY %z(<16 x s8>)
+    RET_ReallyLR implicit $q0
+...

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated Show resolved Hide resolved
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated Show resolved Hide resolved
for (unsigned j = 1; j < ConcatSrcNumElt; j++) {
if (i + j >= Mask.size())
return false;
if (Mask[i + j] != Mask[i] + (int)j)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer static_cast (or just make the iteration variable int?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it to static_cast, decided against making the iteration variable int since they are used to compared with size() which is an unsigned long

Ops[i] = UndefReg;
}

Builder.buildConcatVectors(MI.getOperand(0).getReg(), Ops);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legality check before building.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!isLegalOrBeforeLegalizer({G_CONCAT_VECTORS, {DstTy, VectorTy}}))
  return false;

void CombinerHelper::applyCombineShuffleConcat(MachineInstr &MI,
SmallVector<Register> &Ops) {
LLT SrcTy = MRI.getType(Ops[0]);
Register UndefReg = Builder.buildUndef(SrcTy).getReg(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legality check before building.

@tschuett
Copy link
Member

tschuett commented Apr 8, 2024

varargs are unfortunately not here yet, but there are now several combines registered on G_SHUFFLE_VECTOR that will always fail because the input is bad.

def shuffle_vector_of_concats2 : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$matchinfo),
   (match (G_CONCAT_VECTORS $src1, $a, $b),
          (G_CONCAT_VECTORS $src2, $c, $d),
          (G_SHUFFLE_VECTOR $root, $src1, $src2, $mask),
def shuffle_vector_of_concats3 : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$matchinfo),
   (match (G_CONCAT_VECTORS $src1, $a, $b, $c),
          (G_CONCAT_VECTORS $src2, $d, $e, $f),
          (G_SHUFFLE_VECTOR $root, $src1, $src2, $mask),

void CombinerHelper::applyCombineShuffleConcat(MachineInstr &MI,
SmallVector<Register> &Ops) {
LLT SrcTy = MRI.getType(Ops[0]);
Register UndefReg = Builder.buildUndef(SrcTy).getReg(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change this to only make the Under if it is needed?

@chuongg3
Copy link
Contributor Author

varargs are unfortunately not here yet, but there are now several combines registered on G_SHUFFLE_VECTOR that will always fail because the input is bad.

def shuffle_vector_of_concats2 : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$matchinfo),
   (match (G_CONCAT_VECTORS $src1, $a, $b),
          (G_CONCAT_VECTORS $src2, $c, $d),
          (G_SHUFFLE_VECTOR $root, $src1, $src2, $mask),
def shuffle_vector_of_concats3 : GICombineRule<
   (defs root:$root, build_fn_matchinfo:$matchinfo),
   (match (G_CONCAT_VECTORS $src1, $a, $b, $c),
          (G_CONCAT_VECTORS $src2, $d, $e, $f),
          (G_SHUFFLE_VECTOR $root, $src1, $src2, $mask),

I believe using one pattern to match all the different G_CONCAT_VECTORS sizes would allow for more generalization of arbitrary G_CONCAT_VECTORS sizes until variable length arguments are supported in tablegen.

for (unsigned i = 0; i < Ops.size(); i++) {
if (Ops[i] == 0) {
if (UndefReg == 0)
UndefReg = Builder.buildUndef(SrcTy).getReg(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!isLegalOrBeforeLegalizer({G_IMPLICIT_DEF, SrcTy}))
  return false;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These do sound like they might be worth adding (to the match method), if these can run post-legalization as well as before. This is mostly transforming nodes to similarly typed nodes (and shuffles are often not arbitrarily legal), but it's probably good to be safe.

llvm/include/llvm/Target/GlobalISel/Combine.td Outdated Show resolved Hide resolved
for (unsigned i = 0; i < Ops.size(); i++) {
if (Ops[i] == 0) {
if (UndefReg == 0)
UndefReg = Builder.buildUndef(SrcTy).getReg(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These do sound like they might be worth adding (to the match method), if these can run post-legalization as well as before. This is mostly transforming nodes to similarly typed nodes (and shuffles are often not arbitrarily legal), but it's probably good to be safe.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This LGTM with a comment about moving the implicit def check.

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp Outdated Show resolved Hide resolved
Matches masks of G_SHUFFLE_VECTOR to see if they can be combined to
a G_CONCAT_VECTORS instruction

a = G_CONCAT_VECTORS x, y, undef, undef
b = G_CONCAT_VECTORS z, undef, undef, undef
c = G_SHUFFLE_VECTORS a, b, <0, 1, 4, undef>
===>
c = G_CONCAT_VECTORS x, y, z, undef
@chuongg3 chuongg3 force-pushed the GlobalISel_Combine_Shuffle_Concat branch from 041fde6 to 5f03323 Compare April 22, 2024 09:15
@chuongg3 chuongg3 merged commit 821935b into llvm:main Apr 22, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants