[AArch64][GlobalISel] Refactor BITCAST Legalization #80505

chuongg3 · 2024-02-02T22:54:25Z

Ensure BITCAST is only legal for types with the same amount of bits.
Enable BITCAST to work with non-legal vector types as well.

llvmbot · 2024-02-02T22:54:56Z

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-aarch64

Author: None (chuongg3)

Changes

Ensure BITCAST is only legal for types with the same amount of bits.
Enable BITCAST to work with non-legal vector types as well.

Patch is 24.72 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/80505.diff

5 Files Affected:

(modified) llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h (+5)
(modified) llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp (+52)
(modified) llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp (+9-6)
(modified) llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir (+2-2)
(modified) llvm/test/CodeGen/AArch64/bitcast.ll (+566-9)

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
index a7ecf0dc1ba21..11e1f6a3339c5 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h
@@ -338,6 +338,11 @@ class LegalizerHelper {
                                                   unsigned TypeIdx,
                                                   LLT NarrowTy);
 
+  // Fewer Elements for bitcast, ensuring that the size of the Src and Dst
+  // registers will be the same
+  LegalizeResult fewerElementsBitcast(MachineInstr &MI, unsigned TypeIdx,
+                                      LLT NarrowTy);
+
   LegalizeResult fewerElementsVectorShuffle(MachineInstr &MI, unsigned TypeIdx,
                                             LLT NarrowTy);
 
diff --git a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
index 464ff0864d146..372c75dea45ce 100644
--- a/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
@@ -4654,11 +4654,49 @@ LegalizerHelper::fewerElementsVector(MachineInstr &MI, unsigned TypeIdx,
     return fewerElementsVectorShuffle(MI, TypeIdx, NarrowTy);
   case G_FPOWI:
     return fewerElementsVectorMultiEltType(GMI, NumElts, {2 /*pow*/});
+  case G_BITCAST:
+    return fewerElementsBitcast(MI, TypeIdx, NarrowTy);
   default:
     return UnableToLegalize;
   }
 }
 
+LegalizerHelper::LegalizeResult
+LegalizerHelper::fewerElementsBitcast(MachineInstr &MI, unsigned int TypeIdx,
+                                      LLT NarrowTy) {
+  assert(MI.getOpcode() == TargetOpcode::G_BITCAST &&
+         "Not a bitcast operation");
+
+  if (TypeIdx != 0)
+    return UnableToLegalize;
+
+  auto [DstReg, DstTy, SrcReg, SrcTy] = MI.getFirst2RegLLTs();
+
+  unsigned SrcScalSize = SrcTy.getScalarSizeInBits();
+  LLT SrcNarrowTy =
+      LLT::fixed_vector(NarrowTy.getSizeInBits() / SrcScalSize, SrcScalSize);
+
+  // Split the Src and Dst Reg into smaller registers
+  LLT SrcLeftoverTy;
+  SmallVector<Register> SrcVRegs, SrcLeftoverVRegs, BitcastVRegs;
+  // MIRBuilder.setInstrAndDebugLoc(MI);
+  if (!extractParts(SrcReg, SrcTy, SrcNarrowTy, SrcLeftoverTy, SrcVRegs,
+                    SrcLeftoverVRegs, MIRBuilder, MRI))
+    return UnableToLegalize;
+
+  assert(SrcLeftoverVRegs.size() == 0 && "Splitting Source register failed");
+
+  // Build new smaller bitcast instructions
+  // Not supporting Leftover types for now but will have to
+  for (unsigned i = 0; i < SrcVRegs.size(); i++)
+    BitcastVRegs.push_back(
+        MIRBuilder.buildBitcast(NarrowTy, SrcVRegs[i]).getReg(0));
+
+  MIRBuilder.buildMergeLikeInstr(DstReg, BitcastVRegs);
+  MI.eraseFromParent();
+  return Legalized;
+}
+
 LegalizerHelper::LegalizeResult LegalizerHelper::fewerElementsVectorShuffle(
     MachineInstr &MI, unsigned int TypeIdx, LLT NarrowTy) {
   assert(MI.getOpcode() == TargetOpcode::G_SHUFFLE_VECTOR);
@@ -5342,6 +5380,20 @@ LegalizerHelper::moreElementsVector(MachineInstr &MI, unsigned TypeIdx,
     Observer.changedInstr(MI);
     return Legalized;
   }
+  case TargetOpcode::G_BITCAST: {
+    if (TypeIdx != 0)
+      return UnableToLegalize;
+
+    unsigned SrcScalSize =
+        MRI.getType(MI.getOperand(1).getReg()).getScalarSizeInBits();
+    LLT NewTy =
+        LLT::fixed_vector(MoreTy.getSizeInBits() / SrcScalSize, SrcScalSize);
+    Observer.changingInstr(MI);
+    moreElementsVectorSrc(MI, NewTy, 1);
+    moreElementsVectorDst(MI, MoreTy, 0);
+    Observer.changedInstr(MI);
+    return Legalized;
+  }
   default:
     return UnableToLegalize;
   }
diff --git a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
index cbf5655706e69..7630c5fa83d55 100644
--- a/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
+++ b/llvm/lib/Target/AArch64/GISel/AArch64LegalizerInfo.cpp
@@ -744,12 +744,15 @@ AArch64LegalizerInfo::AArch64LegalizerInfo(const AArch64Subtarget &ST)
   // Casts for 32 and 64-bit width type are just copies.
   // Same for 128-bit width type, except they are on the FPR bank.
   getActionDefinitionsBuilder(G_BITCAST)
-      // FIXME: This is wrong since G_BITCAST is not allowed to change the
-      // number of bits but it's what the previous code described and fixing
-      // it breaks tests.
-      .legalForCartesianProduct({s8, s16, s32, s64, s128, v16s8, v8s8, v4s8,
-                                 v8s16, v4s16, v2s16, v4s32, v2s32, v2s64,
-                                 v2p0});
+      // Keeping 32-bit instructions legal to prevent regression in some tests
+      .legalForCartesianProduct({s32, v2s16, v4s8})
+      .legalForCartesianProduct({s64, v8s8, v4s16, v2s32})
+      .legalForCartesianProduct({s128, v16s8, v8s16, v4s32, v2s64, v2p0})
+      .moreElementsToNextPow2(0)
+      .clampNumElements(0, v8s8, v16s8)
+      .clampNumElements(0, v4s16, v8s16)
+      .clampNumElements(0, v2s32, v4s32)
+      .lower();
 
   getActionDefinitionsBuilder(G_VASTART).legalFor({p0});
 
diff --git a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
index c90c31aa27ef5..58b9100c2f2a4 100644
--- a/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
+++ b/llvm/test/CodeGen/AArch64/GlobalISel/legalizer-info-validation.mir
@@ -127,8 +127,8 @@
 # DEBUG-NEXT: .. the first uncovered imm index: 0, OK
 #
 # DEBUG-NEXT: G_BITCAST (opcode {{[0-9]+}}): 2 type indices, 0 imm indices
-# DEBUG-NEXT: .. the first uncovered type index: 2, OK
-# DEBUG-NEXT: .. the first uncovered imm index: 0, OK
+# DEBUG-NEXT: .. type index coverage check SKIPPED: user-defined predicate detected
+# DEBUG-NEXT: .. imm index coverage check SKIPPED: user-defined predicate detected
 #
 # DEBUG-NEXT: G_FREEZE (opcode {{[0-9]+}}): 1 type index, 0 imm indices
 # DEBUG-NEXT: .. opcode {{[0-9]+}} is aliased to {{[0-9]+}}
diff --git a/llvm/test/CodeGen/AArch64/bitcast.ll b/llvm/test/CodeGen/AArch64/bitcast.ll
index d60bd4ab3fc5f..24002fa70845e 100644
--- a/llvm/test/CodeGen/AArch64/bitcast.ll
+++ b/llvm/test/CodeGen/AArch64/bitcast.ll
@@ -1,12 +1,30 @@
-; RUN: llc < %s -mtriple=aarch64--linux-gnu | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=aarch64 -verify-machineinstrs %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-SD
+; RUN: llc -mtriple=aarch64 -global-isel -global-isel-abort=2 -verify-machineinstrs %s -o - 2>&1 | FileCheck %s --check-prefixes=CHECK,CHECK-GI
 
 ; PR23065: SCALAR_TO_VECTOR implies the top elements 1 to N-1 of the N-element vector are undefined.
 
-define <4 x i16> @foo1(<2 x i32> %a) {
-; CHECK-LABEL: foo1:
-; CHECK:       movi	v0.2d, #0000000000000000
-; CHECK-NEXT:  ret
+; CHECK-GI:         warning: Instruction selection used fallback path for bitcast_v4i8_i32
+; CHECK-GI-NEXT:    warning: Instruction selection used fallback path for bitcast_i32_v4i8
+; CHECK-GI-NEXT:    warning: Instruction selection used fallback path for bitcast_v2i16_i32
+; CHECK-GI-NEXT:    warning: Instruction selection used fallback path for bitcast_i32_v2i16
+; CHECK-GI-NEXT:    warning: Instruction selection used fallback path for bitcast_v2i16_v4i8
+; CHECK-GI-NEXT:    warning: Instruction selection used fallback path for bitcast_v4i8_v2i16
 
+define <4 x i16> @foo1(<2 x i32> %a) {
+; CHECK-SD-LABEL: foo1:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: foo1:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #58712 // =0xe558
+; CHECK-GI-NEXT:    fmov s1, w8
+; CHECK-GI-NEXT:    mov v1.s[1], w8
+; CHECK-GI-NEXT:    zip1 v0.2s, v1.2s, v0.2s
+; CHECK-GI-NEXT:    rev32 v0.4h, v0.4h
+; CHECK-GI-NEXT:    ret
   %1 = shufflevector <2 x i32> <i32 58712, i32 undef>, <2 x i32> %a, <2 x i32> <i32 0, i32 2>
 ; Can't optimize the following bitcast to scalar_to_vector.
   %2 = bitcast <2 x i32> %1 to <4 x i16>
@@ -15,13 +33,552 @@ define <4 x i16> @foo1(<2 x i32> %a) {
 }
 
 define <4 x i16> @foo2(<2 x i32> %a) {
-; CHECK-LABEL: foo2:
-; CHECK:       movi	v0.2d, #0000000000000000
-; CHECK-NEXT:  ret
-
+; CHECK-SD-LABEL: foo2:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: foo2:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    mov w8, #712 // =0x2c8
+; CHECK-GI-NEXT:    fmov s1, w8
+; CHECK-GI-NEXT:    mov v1.s[1], w8
+; CHECK-GI-NEXT:    zip1 v0.2s, v1.2s, v0.2s
+; CHECK-GI-NEXT:    rev32 v0.4h, v0.4h
+; CHECK-GI-NEXT:    ret
   %1 = shufflevector <2 x i32> <i32 712, i32 undef>, <2 x i32> %a, <2 x i32> <i32 0, i32 2>
 ; Can't optimize the following bitcast to scalar_to_vector.
   %2 = bitcast <2 x i32> %1 to <4 x i16>
   %3 = shufflevector <4 x i16> %2, <4 x i16> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
   ret <4 x i16> %3
 }
+
+; ===== To and From Scalar Types =====
+
+define i32 @bitcast_v4i8_i32(<4 x i8> %a, <4 x i8> %b){
+; CHECK-LABEL: bitcast_v4i8_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sub sp, sp, #16
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    add v0.4h, v0.4h, v1.4h
+; CHECK-NEXT:    xtn v0.8b, v0.8h
+; CHECK-NEXT:    fmov w0, s0
+; CHECK-NEXT:    add sp, sp, #16
+; CHECK-NEXT:    ret
+    %c = add <4 x i8> %a, %b
+    %d = bitcast <4 x i8> %c to i32
+    ret i32 %d
+}
+
+define <4 x i8> @bitcast_i32_v4i8(i32 %a, i32 %b){
+; CHECK-LABEL: bitcast_i32_v4i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    fmov s0, w8
+; CHECK-NEXT:    zip1 v0.8b, v0.8b, v0.8b
+; CHECK-NEXT:    ret
+    %c = add i32 %a, %b
+    %d = bitcast i32 %c to <4 x i8>
+    ret <4 x i8> %d
+}
+
+define i32 @bitcast_v2i16_i32(<2 x i16> %a, <2 x i16> %b){
+; CHECK-LABEL: bitcast_v2i16_i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sub sp, sp, #16
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    add v0.2s, v0.2s, v1.2s
+; CHECK-NEXT:    mov w8, v0.s[1]
+; CHECK-NEXT:    fmov w9, s0
+; CHECK-NEXT:    strh w9, [sp, #12]
+; CHECK-NEXT:    strh w8, [sp, #14]
+; CHECK-NEXT:    ldr w0, [sp, #12]
+; CHECK-NEXT:    add sp, sp, #16
+; CHECK-NEXT:    ret
+    %c = add <2 x i16> %a, %b
+    %d = bitcast <2 x i16> %c to i32
+    ret i32 %d
+}
+
+define <2 x i16> @bitcast_i32_v2i16(i32 %a, i32 %b){
+; CHECK-LABEL: bitcast_i32_v2i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add w8, w0, w1
+; CHECK-NEXT:    fmov s0, w8
+; CHECK-NEXT:    ushll v0.4s, v0.4h, #0
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
+; CHECK-NEXT:    ret
+    %c = add i32 %a, %b
+    %d = bitcast i32 %c to <2 x i16>
+    ret <2 x i16> %d
+}
+
+define i64 @bitcast_v8i8_i64(<8 x i8> %a, <8 x i8> %b){
+; CHECK-LABEL: bitcast_v8i8_i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.8b, v0.8b, v1.8b
+; CHECK-NEXT:    fmov x0, d0
+; CHECK-NEXT:    ret
+    %c = add <8 x i8> %a, %b
+    %d = bitcast <8 x i8> %c to i64
+    ret i64 %d
+}
+
+define <8 x i8> @bitcast_i64_v8i8(i64 %a, i64 %b){
+; CHECK-LABEL: bitcast_i64_v8i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add x8, x0, x1
+; CHECK-NEXT:    fmov d0, x8
+; CHECK-NEXT:    ret
+    %c = add i64 %a, %b
+    %d = bitcast i64 %c to <8 x i8>
+    ret <8 x i8> %d
+}
+
+define i64 @bitcast_v4i16_i64(<4 x i16> %a, <4 x i16> %b){
+; CHECK-LABEL: bitcast_v4i16_i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.4h, v0.4h, v1.4h
+; CHECK-NEXT:    fmov x0, d0
+; CHECK-NEXT:    ret
+    %c = add <4 x i16> %a, %b
+    %d = bitcast <4 x i16> %c to i64
+    ret i64 %d
+}
+
+define <4 x i16> @bitcast_i64_v4i16(i64 %a, i64 %b){
+; CHECK-LABEL: bitcast_i64_v4i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add x8, x0, x1
+; CHECK-NEXT:    fmov d0, x8
+; CHECK-NEXT:    ret
+    %c = add i64 %a, %b
+    %d = bitcast i64 %c to <4 x i16>
+    ret <4 x i16> %d
+}
+
+define i64 @bitcast_v2i32_i64(<2 x i32> %a, <2 x i32> %b){
+; CHECK-LABEL: bitcast_v2i32_i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.2s, v0.2s, v1.2s
+; CHECK-NEXT:    fmov x0, d0
+; CHECK-NEXT:    ret
+    %c = add <2 x i32> %a, %b
+    %d = bitcast <2 x i32> %c to i64
+    ret i64 %d
+}
+
+define <2 x i32> @bitcast_i64_v2i32(i64 %a, i64 %b){
+; CHECK-LABEL: bitcast_i64_v2i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add x8, x0, x1
+; CHECK-NEXT:    fmov d0, x8
+; CHECK-NEXT:    ret
+    %c = add i64 %a, %b
+    %d = bitcast i64 %c to <2 x i32>
+    ret <2 x i32> %d
+}
+
+; ===== Legal Vector Types =====
+
+define <4 x i16> @bitcast_v2i32_v4i16(<2 x i32> %a, <2 x i32> %b){
+; CHECK-LABEL: bitcast_v2i32_v4i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.2s, v0.2s, v1.2s
+; CHECK-NEXT:    ret
+    %c = add <2 x i32> %a, %b
+    %d = bitcast <2 x i32> %c to <4 x i16>
+    ret <4 x i16> %d
+}
+
+define <4 x i32> @bitcast_v2i64_v4i32(<2 x i64> %a, <2 x i64> %b){
+; CHECK-LABEL: bitcast_v2i64_v4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.2d, v0.2d, v1.2d
+; CHECK-NEXT:    ret
+    %c = add <2 x i64> %a, %b
+    %d = bitcast <2 x i64> %c to <4 x i32>
+    ret <4 x i32> %d
+}
+
+define <8 x i8> @bitcast_v2i32_v8i8(<2 x i32> %a, <2 x i32> %b){
+; CHECK-LABEL: bitcast_v2i32_v8i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.2s, v0.2s, v1.2s
+; CHECK-NEXT:    ret
+    %c = add <2 x i32> %a, %b
+    %d = bitcast <2 x i32> %c to <8 x i8>
+    ret <8 x i8> %d
+}
+
+define <8 x i16> @bitcast_v2i64_v8i16(<2 x i64> %a, <2 x i64> %b){
+; CHECK-LABEL: bitcast_v2i64_v8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.2d, v0.2d, v1.2d
+; CHECK-NEXT:    ret
+    %c = add <2 x i64> %a, %b
+    %d = bitcast <2 x i64> %c to <8 x i16>
+    ret <8 x i16> %d
+}
+
+define <16 x i8> @bitcast_v2i64_v16i8(<2 x i64> %a, <2 x i64> %b){
+; CHECK-LABEL: bitcast_v2i64_v16i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.2d, v0.2d, v1.2d
+; CHECK-NEXT:    ret
+    %c = add <2 x i64> %a, %b
+    %d = bitcast <2 x i64> %c to <16 x i8>
+    ret <16 x i8> %d
+}
+
+define <2 x i32> @bitcast_v4i16_v2i32(<4 x i16> %a, <4 x i16> %b){
+; CHECK-LABEL: bitcast_v4i16_v2i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.4h, v0.4h, v1.4h
+; CHECK-NEXT:    ret
+    %c = add <4 x i16> %a, %b
+    %d = bitcast <4 x i16> %c to <2 x i32>
+    ret <2 x i32> %d
+}
+
+define <2 x i64> @bitcast_v4i32_v2i64(<4 x i32> %a, <4 x i32> %b){
+; CHECK-LABEL: bitcast_v4i32_v2i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.4s, v0.4s, v1.4s
+; CHECK-NEXT:    ret
+    %c = add <4 x i32> %a, %b
+    %d = bitcast <4 x i32> %c to <2 x i64>
+    ret <2 x i64> %d
+}
+
+define <8 x i8> @bitcast_v4i16_v8i8(<4 x i16> %a, <4 x i16> %b){
+; CHECK-LABEL: bitcast_v4i16_v8i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.4h, v0.4h, v1.4h
+; CHECK-NEXT:    ret
+    %c = add <4 x i16> %a, %b
+    %d = bitcast <4 x i16> %c to <8 x i8>
+    ret <8 x i8> %d
+}
+
+define <8 x i16> @bitcast_v4i32_v8i16(<4 x i32> %a, <4 x i32> %b){
+; CHECK-LABEL: bitcast_v4i32_v8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.4s, v0.4s, v1.4s
+; CHECK-NEXT:    ret
+    %c = add <4 x i32> %a, %b
+    %d = bitcast <4 x i32> %c to <8 x i16>
+    ret <8 x i16> %d
+}
+
+define <16 x i8> @bitcast_v4i32_v16i8(<4 x i32> %a, <4 x i32> %b){
+; CHECK-LABEL: bitcast_v4i32_v16i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.4s, v0.4s, v1.4s
+; CHECK-NEXT:    ret
+    %c = add <4 x i32> %a, %b
+    %d = bitcast <4 x i32> %c to <16 x i8>
+    ret <16 x i8> %d
+}
+
+define <2 x i32> @bitcast_v8i8_v2i32(<8 x i8> %a, <8 x i8> %b){
+; CHECK-LABEL: bitcast_v8i8_v2i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.8b, v0.8b, v1.8b
+; CHECK-NEXT:    ret
+    %c = add <8 x i8> %a, %b
+    %d = bitcast <8 x i8> %c to <2 x i32>
+    ret <2 x i32> %d
+}
+
+define <2 x i64> @bitcast_v8i16_v2i64(<8 x i16> %a, <8 x i16> %b){
+; CHECK-LABEL: bitcast_v8i16_v2i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.8h, v0.8h, v1.8h
+; CHECK-NEXT:    ret
+    %c = add <8 x i16> %a, %b
+    %d = bitcast <8 x i16> %c to <2 x i64>
+    ret <2 x i64> %d
+}
+
+define <4 x i16> @bitcast_v8i8_v4i16(<8 x i8> %a, <8 x i8> %b){
+; CHECK-LABEL: bitcast_v8i8_v4i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.8b, v0.8b, v1.8b
+; CHECK-NEXT:    ret
+    %c = add <8 x i8> %a, %b
+    %d = bitcast <8 x i8> %c to <4 x i16>
+    ret <4 x i16> %d
+}
+
+define <4 x i32> @bitcast_v8i16_v4i32(<8 x i16> %a, <8 x i16> %b){
+; CHECK-LABEL: bitcast_v8i16_v4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.8h, v0.8h, v1.8h
+; CHECK-NEXT:    ret
+    %c = add <8 x i16> %a, %b
+    %d = bitcast <8 x i16> %c to <4 x i32>
+    ret <4 x i32> %d
+}
+
+define <16 x i8> @bitcast_v8i16_v16i8(<8 x i16> %a, <8 x i16> %b){
+; CHECK-LABEL: bitcast_v8i16_v16i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.8h, v0.8h, v1.8h
+; CHECK-NEXT:    ret
+    %c = add <8 x i16> %a, %b
+    %d = bitcast <8 x i16> %c to <16 x i8>
+    ret <16 x i8> %d
+}
+
+define <2 x i64> @bitcast_v16i8_v2i64(<16 x i8> %a, <16 x i8> %b){
+; CHECK-LABEL: bitcast_v16i8_v2i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.16b, v0.16b, v1.16b
+; CHECK-NEXT:    ret
+    %c = add <16 x i8> %a, %b
+    %d = bitcast <16 x i8> %c to <2 x i64>
+    ret <2 x i64> %d
+}
+
+define <4 x i32> @bitcast_v16i8_v4i32(<16 x i8> %a, <16 x i8> %b){
+; CHECK-LABEL: bitcast_v16i8_v4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.16b, v0.16b, v1.16b
+; CHECK-NEXT:    ret
+    %c = add <16 x i8> %a, %b
+    %d = bitcast <16 x i8> %c to <4 x i32>
+    ret <4 x i32> %d
+}
+
+define <8 x i16> @bitcast_v16i8_v8i16(<16 x i8> %a, <16 x i8> %b){
+; CHECK-LABEL: bitcast_v16i8_v8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    add v0.16b, v0.16b, v1.16b
+; CHECK-NEXT:    ret
+    %c = add <16 x i8> %a, %b
+    %d = bitcast <16 x i8> %c to <8 x i16>
+    ret <8 x i16> %d
+}
+
+; ===== Smaller/Larger Width Vectors with Legal Element Sizes =====
+
+define <4 x i8> @bitcast_v2i16_v4i8(<2 x i16> %a, <2 x i16> %b){
+; CHECK-LABEL: bitcast_v2i16_v4i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sub sp, sp, #16
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    add v0.2s, v0.2s, v1.2s
+; CHECK-NEXT:    mov w8, v0.s[1]
+; CHECK-NEXT:    fmov w9, s0
+; CHECK-NEXT:    strh w9, [sp, #12]
+; CHECK-NEXT:    strh w8, [sp, #14]
+; CHECK-NEXT:    ldr s0, [sp, #12]
+; CHECK-NEXT:    ushll v0.8h, v0.8b, #0
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
+; CHECK-NEXT:    add sp, sp, #16
+; CHECK-NEXT:    ret
+    %c = add <2 x i16> %a, %b
+    %d = bitcast <2 x i16> %c to <4 x i8>
+    ret <4 x i8> %d
+}
+
+define <2 x i16> @bitcast_v4i8_v2i16(<4 x i8> %a, <4 x i8> %b){
+; CHECK-LABEL: bitcast_v4i8_v2i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    sub sp, sp, #16
+; CHECK-NEXT:    .cfi_def_cfa_offset 16
+; CHECK-NEXT:    add v0.4h, v0.4h, v1.4h
+; CHECK-NEXT:    add x8, sp, #12
+; CHECK-NEXT:    xtn v0.8b, v0.8h
+; CHECK-NEXT:    str s0, [sp, #12]
+; CHECK-NEXT:    ld1 { v0.h }[0], [x8]
+; CHECK-NEXT:    orr x8, x8, #0x2
+; CHECK-NEXT:    ld1 { v0.h }[2], [x8]
+; CHECK-NEXT:    // kill: def $d0 killed $d0 killed $q0
+; CHECK-NEXT:    add sp, sp, #16
+; CHECK-NEXT:    ret
+    %c = add <4 x i8> %a, %b
+    %d = bitcast <4 x i8> %c to <2 x i16>
+    ret <2 x i16> %d
+}
+
+define <8 x i32> @bitcast_v4i64_v8i32(<4 x i64> %a, <4 x i64> %b){
+; CHECK-SD-LABEL: bitcast_v4i64_v8i32:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    add v1.2d, v1.2d, v3.2d
+; CHECK-SD-NEXT:    add v0.2d, v0.2d, v2.2d
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: bitcast_v4i64_v8i32:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add v0.2d, v0.2d, v2.2d
+; CHECK-GI-NEXT:    add v1.2d, v1.2d, v3.2d
+; CHECK-GI-NEXT:    ret
+    %c = add <4 x i64> %a, %b
+    %d = bitcast <4 x i64> %c to <8 x i32>
+    ret <8 x i32> %d
+}
+
+define <16 x i16> @bitcast_v4i64_v16i16(<4 x i64> %a, <4 x i64> %b){
+; CHECK-SD-LABEL: bitcast_v4i64_v16i16:
+; CHECK-SD:       // %bb.0:
+; CHECK-SD-NEXT:    add v1.2d, v1.2d, v3.2d
+; CHECK-SD-NEXT:    add v0.2d, v0.2d, v2.2d
+; CHECK-SD-NEXT:    ret
+;
+; CHECK-GI-LABEL: bitcast_v4i64_v16i16:
+; CHECK-GI:       // %bb.0:
+; CHECK-GI-NEXT:    add v0.2d, v0.2d, v2.2d
+; CHECK-GI-NEXT:    add v1.2d, v1.2d, v3.2d
+; CHECK-GI-NEXT:    ret
+    %c = add <4 x i64> %a, %b
+    %d = bitcast <4 x i64> %c to <16 x i16>
+    ret <16 x i16> %d
+}
+
+define...
[truncated]

arsenm · 2024-02-05T11:23:31Z

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

+  LLT SrcLeftoverTy;
+  SmallVector<Register> SrcVRegs, SrcLeftoverVRegs, BitcastVRegs;
+  // MIRBuilder.setInstrAndDebugLoc(MI);
+  if (!extractParts(SrcReg, SrcTy, SrcNarrowTy, SrcLeftoverTy, SrcVRegs,


I was hoping to move away from using extractParts, as it falls back on using G_EXTRACT which we don't want

Do you have any recommendations on how to approach this?
Falling back to G_EXTRACT would be quite rare as there is moreElementsToNextPow2() before clamping the vectors, so there will not be a lot of cases where it would be falling back to G_EXTRACT when splitting up the vectors

Other places are using extractGCDType and buildLCMMergePieces in LegalizerHelper

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

llvm/test/CodeGen/AArch64/bitcast.ll

arsenm · 2024-02-08T11:11:55Z

There isn't much point in making an effort to filter out marking G_BITCAST as illegal for cases where G_BITCAST is malformed

davemgreen

This looks OK to me, if others do not disagree. There are still some cases where we are generating some inefficiencies, but in terms if BITCAST it looks pretty sensible. LGTM.

32-bit still does not work because it is unable to select for G_TRUNC with vector types that are smaller than legal

chuongg3 requested review from aemerson, arsenm and davemgreen February 2, 2024 22:54

llvmbot added backend:AArch64 llvm:globalisel labels Feb 2, 2024

arsenm reviewed Feb 5, 2024

View reviewed changes

llvm/test/CodeGen/AArch64/bitcast.ll Outdated Show resolved Hide resolved

chuongg3 force-pushed the GlobalISel_Refactor_BITCAST branch from 4bcd0db to 1c62934 Compare February 6, 2024 16:27

davemgreen mentioned this pull request Feb 7, 2024

[AArch64] G_BITCAST should not change bitwidths #81031

Closed

davemgreen approved these changes Feb 19, 2024

View reviewed changes

chuongg3 force-pushed the GlobalISel_Refactor_BITCAST branch from f085db7 to 3da0301 Compare February 21, 2024 09:23

chuongg3 added 3 commits February 21, 2024 11:10

[AArch64][GlobalISel] Refactor Legalize BITCAST

09595f4

32-bit still does not work because it is unable to select for G_TRUNC with vector types that are smaller than legal

fixup! [AArch64][GlobalISel] Refactor Legalize BITCAST

6ce86fb

fixup! fixup! [AArch64][GlobalISel] Refactor Legalize BITCAST

89bfd4b

chuongg3 force-pushed the GlobalISel_Refactor_BITCAST branch from 3da0301 to 89bfd4b Compare February 21, 2024 11:49

chuongg3 merged commit 0fb3d42 into llvm:main Feb 21, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][GlobalISel] Refactor BITCAST Legalization #80505

[AArch64][GlobalISel] Refactor BITCAST Legalization #80505

chuongg3 commented Feb 2, 2024

llvmbot commented Feb 2, 2024 •

edited

arsenm Feb 5, 2024

chuongg3 Feb 6, 2024

arsenm Feb 8, 2024

arsenm commented Feb 8, 2024

davemgreen left a comment

[AArch64][GlobalISel] Refactor BITCAST Legalization #80505

[AArch64][GlobalISel] Refactor BITCAST Legalization #80505

Conversation

chuongg3 commented Feb 2, 2024

llvmbot commented Feb 2, 2024 • edited

arsenm Feb 5, 2024

Choose a reason for hiding this comment

chuongg3 Feb 6, 2024

Choose a reason for hiding this comment

arsenm Feb 8, 2024

Choose a reason for hiding this comment

arsenm commented Feb 8, 2024

davemgreen left a comment

Choose a reason for hiding this comment

llvmbot commented Feb 2, 2024 •

edited