[PowerPC] Emit lxvkq and vsrq instructions for build vector patterns #157625

tonykuttai · 2025-09-09T08:05:19Z

Optimize BUILD_VECTOR having special quadword patterns

This change optimizes BUILD_VECTOR operations by using the lxvkq or xxpltib + vsrq instructions to inline constants matching specific 128-bit patterns:

MSB set pattern: 0x8000_0000_0000_0000_0000_0000_0000_0000
LSB set pattern: 0x0000_0000_0000_0000_0000_0000_0000_0001

Implementation Details

The lxvkq instruction loads special quadword values into VSX registers:

lxvkq XT, UIM
# When UIM=16: loads 0x8000_0000_0000_0000_0000_0000_0000_0000

The optimization reconstructs the 128-bit register pattern from BUILD_VECTOR operands, accounting for target endianness. For example, the MSB pattern can be represented as:

Big-Endian: <i64 -9223372036854775808, i64 0>
Little-Endian: <i64 0, i64 -9223372036854775808>

Both produce the same register value: 0x8000_0000_0000_0000_0000_0000_0000_0000

MSB Pattern (`0x8000...0000`)

All vector types (v2i64, v4i32, v8i16, v16i8) generate:

lxvkq v2, 16

LSB Pattern (`0x0000...0001`)

All vector types generate:

xxspltib v2, 255
vsrq v2, v2, v2

llvmbot · 2025-09-10T03:04:24Z

@llvm/pr-subscribers-backend-powerpc

Author: Tony Varghese (tonykuttai)

Changes

This change makes use of lxvkq instruction to inline the build vector constants that have the following patterns.

0x8000_0000_0000_0000_0000_0000_0000_0000 (MSB set pattern)
0x0000_0000_0000_0000_0000_0000_0000_0001 (LSB set pattern)

lxvkq loads special quadword values into vsx vectors.

Load VSX Vector Special Value Quadword X-form
lxvkq XT, UIM
if UIM=0b10000 then VSR[32×TX+T] ← 0x8000_0000_0000_0000_0000_0000_0000_0000 /* QP -0.0 */

Note:
• lxvkq with UIM=16 always produces 0x8000_0000_0000_0000_0000_0000_0000_0000 in the VSX register.
• When we see a BUILD_VECTOR, we need to determine what 128-bit register pattern would produce that vector under the current target endianness.
For example The following build vectors give rise to the same MSB register pattern:
• Big-Endian: <i64 -9223372036854775808, i64 0> → 0x8000_0000_0000_0000_0000_0000_0000_0000
• Little-Endian: <i64 0, i64 -9223372036854775808> → 0x8000_0000_0000_0000_0000_0000_0000_0000
- This 128-bit value represents what's in the VSX register, not memory layout.

For emitting the pattern 0x0000_0000_0000_0000_0000_0000_0000_0001 (LSB set pattern), a combination of xxspltib and vsrq is used.

Following table will provide the Code generated for MSB and LSB Patterns based on Endianess:

For the MSB Pattern: 0x8000_0000_0000_0000_0000_0000_0000_0000

Vector Type	Big-Endian vector	Little-Endian vector	Code Generated
`<2 x i64>`	`<-9223372036854775808, 0>`	`<0, -9223372036854775808>`	`lxvkq v2, 16`
`<4 x i32>`	`<-2147483648, 0, 0, 0>`	`<0, 0, 0, -2147483648>`	`lxvkq v2, 16`
`<8 x i16>`	`<-32768, 0, 0, 0, 0, 0, 0, 0>`	`<0, 0, 0, 0, 0, 0, 0, -32768>`	`lxvkq v2, 16`
`<16 x i8>`	`<-128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>`	`<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -128>`	`lxvkq v2, 16`

For the LSB Pattern: 0x0000_0000_0000_0000_0000_0000_0000_0001

Vector Type	Big-Endian Vector	Little-Endian Vector	Code Generated
`<2 x i64>`	`<0, 1>`	`<1, 0>`	`xxspltib v2, 255`<br>`vsrq v2, v2, v2`
`<4 x i32>`	`<0, 0, 0, 1>`	`<1, 0, 0, 0>`	`xxspltib v2, 255`<br>`vsrq v2, v2, v2`
`<8 x i16>`	`<0, 0, 0, 0, 0, 0, 0, 1>`	`<1, 0, 0, 0, 0, 0, 0, 0>`	`xxspltib v2, 255`<br>`vsrq v2, v2, v2`
`<16 x i8>`	`<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1>`	`<1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>`	`xxspltib v2, 255`<br>`vsrq v2, v2, v2`

Patch is 23.30 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/157625.diff

4 Files Affected:

(modified) llvm/lib/Target/PowerPC/PPCISelLowering.cpp (+134)
(modified) llvm/lib/Target/PowerPC/PPCISelLowering.h (+3)
(added) llvm/test/CodeGen/PowerPC/lxvkq-vec-constant.ll (+307)
(modified) llvm/test/CodeGen/PowerPC/vector-reduce-add.ll (+8-9)

diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index fa104e4f69d7f..c76347d48bc62 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -9679,6 +9679,13 @@ SDValue PPCTargetLowering::LowerBUILD_VECTOR(SDValue Op,
   BuildVectorSDNode *BVN = dyn_cast<BuildVectorSDNode>(Op.getNode());
   assert(BVN && "Expected a BuildVectorSDNode in LowerBUILD_VECTOR");
 
+  // Recognize build vector patterns to emit VSX vector instructions
+  // instead of loading value from memory.
+  if (Subtarget.isISA3_1() && Subtarget.hasVSX()) {
+    if (SDValue VecPat = combineBVLoadsSpecialValue(Op, DAG))
+      return VecPat;
+  }
+
   if (Subtarget.hasP10Vector()) {
     APInt BitMask(32, 0);
     // If the value of the vector is all zeros or all ones,
@@ -15657,6 +15664,133 @@ combineElementTruncationToVectorTruncation(SDNode *N,
   return SDValue();
 }
 
+// LXVKQ instruction load VSX vector with a special quadword value
+// based on an immediate value. This helper method returns the details of the
+// match as a tuple of {LXVKQ unsigned IMM Value, right_shift_amount}
+// to help generate the LXVKQ instruction and the subsequent shift instruction
+// required to match the original build vector pattern.
+
+// LXVKQPattern: {LXVKQ unsigned IMM Value, right_shift_amount}
+using LXVKQPattern = std::tuple<uint32_t, uint8_t>;
+
+static std::optional<LXVKQPattern> getPatternInfo(const APInt &FullVal) {
+
+  static const auto BaseLXVKQPatterns = []() {
+    // LXVKQ instruction loads the Quadword value:
+    // 0x8000_0000_0000_0000_0000_0000_0000_0000 when imm = 0b10000
+    return std::array<std::pair<APInt, uint32_t>, 1>{
+        {{APInt(128, 0x8000000000000000ULL) << 64, 16}}};
+  }();
+
+  // Check for direct LXVKQ match (no shift needed)
+  for (const auto &[BasePattern, Uim] : BaseLXVKQPatterns) {
+    if (FullVal == BasePattern)
+      return std::make_tuple(Uim, uint8_t{0});
+  }
+
+  // Check if FullValue can be generated by (right) shifting a base pattern
+  for (const auto &[BasePattern, Uim] : BaseLXVKQPatterns) {
+    if (BasePattern.lshr(127) == FullVal)
+      return std::make_tuple(Uim, uint8_t{127});
+  }
+
+  return std::nullopt;
+}
+
+/// Combine vector loads to a single load by recognising patterns in the Build
+/// Vector. LXVKQ instruction load VSX vector with a special quadword value
+/// based on an immediate value.
+SDValue PPCTargetLowering::combineBVLoadsSpecialValue(SDValue Op,
+                                                      SelectionDAG &DAG) const {
+
+  assert((Op.getNode() && Op.getOpcode() == ISD::BUILD_VECTOR) &&
+         "Expected a BuildVectorSDNode in combineBVLoadsSpecialValue");
+
+  // This transformation is only supported if we are loading either a byte,
+  // halfword, word, or doubleword.
+  EVT VT = Op.getValueType();
+  if (!(VT == MVT::v8i16 || VT == MVT::v16i8 || VT == MVT::v4i32 ||
+        VT == MVT::v2i64))
+    return SDValue();
+
+  LLVM_DEBUG(llvm::dbgs() << "\ncombineBVLoadsSpecialValue: Build vector ("
+                          << VT.getEVTString() << "): ";
+             Op->dump());
+
+  unsigned NumElems = VT.getVectorNumElements();
+  unsigned ElemBits = VT.getScalarSizeInBits();
+
+  bool IsLittleEndian = DAG.getDataLayout().isLittleEndian();
+
+  // Check for Non-constant operand in the build vector.
+  for (const SDValue &Operand : Op.getNode()->op_values()) {
+    if (!isa<ConstantSDNode>(Operand))
+      return SDValue();
+  }
+
+  // Assemble build vector operands as a 128-bit register value
+  // We need to reconstruct what the 128-bit register pattern would be
+  // that produces this vector when interpreted with the current endianness
+  APInt FullVal = APInt::getZero(128);
+
+  for (unsigned Index = 0; Index < NumElems; ++Index) {
+    auto *C = cast<ConstantSDNode>(Op.getOperand(Index));
+
+    // Get element value as raw bits (zero-extended)
+    uint64_t ElemValue = C->getZExtValue();
+
+    // Mask to element size to ensure we only get the relevant bits
+    if (ElemBits < 64)
+      ElemValue &= ((1ULL << ElemBits) - 1);
+
+    // Calculate bit position for this element in the 128-bit register
+    unsigned BitPos =
+        (IsLittleEndian) ? (Index * ElemBits) : (128 - (Index + 1) * ElemBits);
+
+    // Create APInt for the element value and shift it to correct position
+    APInt ElemAPInt(128, ElemValue);
+    ElemAPInt <<= BitPos;
+
+    // Place the element value at the correct bit position
+    FullVal |= ElemAPInt;
+  }
+
+  if (auto UIMOpt = getPatternInfo(FullVal)) {
+    const auto &[Uim, ShiftAmount] = *UIMOpt;
+    SDLoc Dl(Op);
+
+    // Generate LXVKQ instruction if the shift amount is zero.
+    if (ShiftAmount == 0) {
+      SDValue UimVal = DAG.getTargetConstant(Uim, Dl, MVT::i32);
+      SDValue LxvkqInstr =
+          SDValue(DAG.getMachineNode(PPC::LXVKQ, Dl, VT, UimVal), 0);
+      LLVM_DEBUG(llvm::dbgs()
+                     << "combineBVLoadsSpecialValue: Instruction Emitted ";
+                 LxvkqInstr.dump());
+      return LxvkqInstr;
+    }
+
+    // The right shifted pattern can be constructed using a combination of
+    // XXSPLITIB and VSRQ instruction. VSRQ uses the shift amount from the lower
+    // 7 bits of byte 15. This can be specified using XXSPLITIB with immediate
+    // value 255.
+    SDValue ShiftAmountVec =
+        SDValue(DAG.getMachineNode(PPC::XXSPLTIB, Dl, MVT::v4i32,
+                                   DAG.getTargetConstant(255, Dl, MVT::i32)),
+                0);
+    // Generate appropriate right shift instruction
+    SDValue ShiftVec = SDValue(
+        DAG.getMachineNode(PPC::VSRQ, Dl, VT, ShiftAmountVec, ShiftAmountVec),
+        0);
+    LLVM_DEBUG(llvm::dbgs()
+                   << "\n combineBVLoadsSpecialValue: Instruction Emitted ";
+               ShiftVec.dump());
+    return ShiftVec;
+  }
+  // No patterns matched for build vectors.
+  return SDValue();
+}
+
 /// Reduce the number of loads when building a vector.
 ///
 /// Building a vector out of multiple loads can be converted to a load
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index 669430550f4e6..97382cd8f613c 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -1471,6 +1471,9 @@ namespace llvm {
     combineElementTruncationToVectorTruncation(SDNode *N,
                                                DAGCombinerInfo &DCI) const;
 
+    SDValue combineBVLoadsSpecialValue(SDValue Operand,
+                                       SelectionDAG &DAG) const;
+
     /// lowerToVINSERTH - Return the SDValue if this VECTOR_SHUFFLE can be
     /// handled by the VINSERTH instruction introduced in ISA 3.0. This is
     /// essentially any shuffle of v8i16 vectors that just inserts one element
diff --git a/llvm/test/CodeGen/PowerPC/lxvkq-vec-constant.ll b/llvm/test/CodeGen/PowerPC/lxvkq-vec-constant.ll
new file mode 100644
index 0000000000000..0ee4524a6c68a
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/lxvkq-vec-constant.ll
@@ -0,0 +1,307 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+
+; RUN: llc -verify-machineinstrs -mcpu=pwr10 -mtriple=powerpc64le-unknown-unknown \
+; RUN:   -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC64-LE-10
+
+; RUN: llc -verify-machineinstrs -mcpu=pwr10 -mtriple=powerpc64-unknown-unknown \
+; RUN:   -ppc-asm-full-reg-names --ppc-vsr-nums-as-vr < %s | FileCheck %s --check-prefix=POWERPC64-BE-10
+
+; Test LXVKQ instruction generation for special vector constants matching 128 bit patterns:
+; 0x8000_0000_0000_0000_0000_0000_0000_0000 (MSB set pattern)
+; 0x0000_0000_0000_0000_0000_0000_0000_0001 (LSB set pattern)
+
+; =============================================================================
+; v2i64 tests - MSB set pattern (0x8000_0000_0000_0000_0000_0000_0000_0000)
+; =============================================================================
+
+; Big-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <-9223372036854775808, 0>
+define dso_local noundef <2 x i64> @test_v2i64_msb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v2i64_msb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI0_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v2i64_msb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <2 x i64> <i64 -9223372036854775808, i64 0>
+}
+
+; Little-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <0, -9223372036854775808>
+define dso_local noundef <2 x i64> @test_v2i64_msb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v2i64_msb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v2i64_msb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI1_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI1_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <2 x i64> <i64 0, i64 -9223372036854775808>
+}
+
+; =============================================================================
+; v4i32 tests - MSB set pattern (0x8000_0000_0000_0000_0000_0000_0000_0000)
+; =============================================================================
+
+; Big-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <-2147483648, 0, 0, 0>
+define dso_local noundef <4 x i32> @test_v4i32_msb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v4i32_msb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI2_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v4i32_msb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <4 x i32> <i32 -2147483648, i32 0, i32 0, i32 0>
+}
+
+; Little-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <0, 0, 0, -2147483648>
+define dso_local noundef <4 x i32> @test_v4i32_msb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v4i32_msb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v4i32_msb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI3_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI3_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <4 x i32> <i32 0, i32 0, i32 0, i32 -2147483648>
+}
+
+; =============================================================================
+; v8i16 tests - MSB set pattern (0x8000_0000_0000_0000_0000_0000_0000_0000)
+; =============================================================================
+
+; Big-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <-32768, 0, 0, 0, 0, 0, 0, 0>
+define dso_local noundef <8 x i16> @test_v8i16_msb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v8i16_msb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI4_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v8i16_msb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <8 x i16> <i16 -32768, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0>
+}
+
+; Little-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <0, 0, 0, 0, 0, 0, 0, -32768>
+define dso_local noundef <8 x i16> @test_v8i16_msb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v8i16_msb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v8i16_msb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI5_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI5_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <8 x i16> <i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 -32768>
+}
+
+; =============================================================================
+; v16i8 tests - MSB set pattern (0x8000_0000_0000_0000_0000_0000_0000_0000)
+; =============================================================================
+
+; Big-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <-128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
+define dso_local noundef <16 x i8> @test_v16i8_msb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v16i8_msb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI6_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v16i8_msb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <16 x i8> <i8 -128, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0>
+}
+
+; Little-Endian: 0x8000_0000_0000_0000_0000_0000_0000_0000 represents <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -128>
+define dso_local noundef <16 x i8> @test_v16i8_msb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v16i8_msb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    lxvkq v2, 16
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v16i8_msb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI7_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI7_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <16 x i8> <i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 -128>
+}
+
+; =============================================================================
+; v2i64 tests - LSB set pattern (0x0000_0000_0000_0000_0000_0000_0000_0001)
+; =============================================================================
+
+; Big-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <0, 1>
+define dso_local noundef <2 x i64> @test_v2i64_lsb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v2i64_lsb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI8_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v2i64_lsb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-BE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <2 x i64> <i64 0, i64 1>
+}
+
+; Little-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <1, 0>
+define dso_local noundef <2 x i64> @test_v2i64_lsb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v2i64_lsb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-LE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v2i64_lsb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI9_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI9_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <2 x i64> <i64 1, i64 0>
+}
+
+; =============================================================================
+; v4i32 tests - LSB set pattern (0x0000_0000_0000_0000_0000_0000_0000_0001)
+; =============================================================================
+
+; Big-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <0, 0, 0, 1>
+define dso_local noundef <4 x i32> @test_v4i32_lsb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v4i32_lsb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI10_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v4i32_lsb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-BE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <4 x i32> <i32 0, i32 0, i32 0, i32 1>
+}
+
+; Little-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <1, 0, 0, 0>
+define dso_local noundef <4 x i32> @test_v4i32_lsb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v4i32_lsb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-LE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v4i32_lsb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI11_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI11_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <4 x i32> <i32 1, i32 0, i32 0, i32 0>
+}
+
+; =============================================================================
+; v8i16 tests - LSB set pattern (0x0000_0000_0000_0000_0000_0000_0000_0001)
+; =============================================================================
+
+; Big-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <0, 0, 0, 0, 0, 0, 0, 1>
+define dso_local noundef <8 x i16> @test_v8i16_lsb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v8i16_lsb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI12_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v8i16_lsb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-BE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <8 x i16> <i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 1>
+}
+
+; Little-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <1, 0, 0, 0, 0, 0, 0, 0>
+define dso_local noundef <8 x i16> @test_v8i16_lsb_set_littleendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v8i16_lsb_set_littleendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-LE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v8i16_lsb_set_littleendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    addis r3, r2, .LCPI13_0@toc@ha
+; POWERPC64-BE-10-NEXT:    addi r3, r3, .LCPI13_0@toc@l
+; POWERPC64-BE-10-NEXT:    lxv v2, 0(r3)
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <8 x i16> <i16 1, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0, i16 0>
+}
+
+; =============================================================================
+; v16i8 tests - LSB set pattern (0x0000_0000_0000_0000_0000_0000_0000_0001)
+; =============================================================================
+
+; Big-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1>
+define dso_local noundef <16 x i8> @test_v16i8_lsb_set_bigendian() local_unnamed_addr {
+; POWERPC64-LE-10-LABEL: test_v16i8_lsb_set_bigendian:
+; POWERPC64-LE-10:       # %bb.0: # %entry
+; POWERPC64-LE-10-NEXT:    plxv v2, .LCPI14_0@PCREL(0), 1
+; POWERPC64-LE-10-NEXT:    blr
+;
+; POWERPC64-BE-10-LABEL: test_v16i8_lsb_set_bigendian:
+; POWERPC64-BE-10:       # %bb.0: # %entry
+; POWERPC64-BE-10-NEXT:    xxspltib v2, 255
+; POWERPC64-BE-10-NEXT:    vsrq v2, v2, v2
+; POWERPC64-BE-10-NEXT:    blr
+entry:
+  ret <16 x i8> <i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 1>
+}
+
+; Little-Endian: 0x0000_0000_0000_0000_0000_0000_0000_0001 represents <1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
[truncated]

RolandF77 · 2025-09-18T21:02:17Z

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+    }
+
+    // The right shifted pattern can be constructed using a combination of
+    // XXSPLITIB and VSRQ instruction. VSRQ uses the shift amount from the lower


RolandF77 · 2025-09-18T21:03:28Z

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+
+    // The right shifted pattern can be constructed using a combination of
+    // XXSPLITIB and VSRQ instruction. VSRQ uses the shift amount from the lower
+    // 7 bits of byte 15. This can be specified using XXSPLITIB with immediate


RolandF77 · 2025-09-18T22:05:36Z

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+
+  // Check if FullValue can be generated by (right) shifting a base pattern
+  for (const auto &[BasePattern, Uim] : BaseLXVKQPatterns) {
+    if (BasePattern.lshr(127) == FullVal)


Don't need a loop here. A 128 bit value lshr 127 is just 0/1, and 0 is not interesting.

simplified the code. removed the unnecessary loops. I was thinking we should add support for more patterns and hence the loop in the first place.

RolandF77 · 2025-09-18T22:14:42Z

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+  if (Subtarget.isISA3_1() && Subtarget.hasVSX()) {
+    if (SDValue VecPat = combineBVLoadsSpecialValue(Op, DAG))
+      return VecPat;
+  }


Like code just below, don't want to handle all 0s or all 1s. Maybe they can be combined.

moved the snippet to inside the code block below.

RolandF77 · 2025-09-19T14:57:28Z

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+  return std::nullopt;
+}
+
+/// Combine vector loads to a single load by recognising patterns in the Build


to a single load or a shift of a constant?

updated the comment.

lei137 · 2025-09-19T15:28:21Z

The commit message for this PR need to be updated to summarize what's done and details added as a comment to this PR instead. I don't think the charts will show up nicely in the git log as well.

tonykuttai · 2025-09-22T18:41:59Z

This change makes use of lxvkq instruction to inline the build vector constants that have the following patterns.

0x8000_0000_0000_0000_0000_0000_0000_0000 (MSB set pattern)
0x0000_0000_0000_0000_0000_0000_0000_0001 (LSB set pattern)

lxvkq loads special quadword values into vsx vectors.

Load VSX Vector Special Value Quadword X-form
lxvkq XT, UIM
if UIM=0b10000 then VSR[32×TX+T] ← 0x8000_0000_0000_0000_0000_0000_0000_0000 /* QP -0.0 */

Note:
• lxvkq with UIM=16 always produces 0x8000_0000_0000_0000_0000_0000_0000_0000 in the VSX register.
• When we see a BUILD_VECTOR, we need to determine what 128-bit register pattern would produce that vector under the current target endianness.
For example The following build vectors give rise to the same MSB register pattern:
• Big-Endian: <i64 -9223372036854775808, i64 0> → 0x8000_0000_0000_0000_0000_0000_0000_0000
• Little-Endian: <i64 0, i64 -9223372036854775808> → 0x8000_0000_0000_0000_0000_0000_0000_0000
- This 128-bit value represents what's in the VSX register, not memory layout.

For emitting the pattern 0x0000_0000_0000_0000_0000_0000_0000_0001 (LSB set pattern), a combination of xxspltib and vsrq is used.

Following table will provide the Code generated for MSB and LSB Patterns based on Endianess:

For the MSB Pattern: 0x8000_0000_0000_0000_0000_0000_0000_0000

Vector Type	Big-Endian vector	Little-Endian vector	Code Generated
`<2 x i64>`	`<-9223372036854775808, 0>`	`<0, -9223372036854775808>`	`lxvkq v2, 16`
`<4 x i32>`	`<-2147483648, 0, 0, 0>`	`<0, 0, 0, -2147483648>`	`lxvkq v2, 16`
`<8 x i16>`	`<-32768, 0, 0, 0, 0, 0, 0, 0>`	`<0, 0, 0, 0, 0, 0, 0, -32768>`	`lxvkq v2, 16`
`<16 x i8>`	`<-128, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>`	`<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -128>`	`lxvkq v2, 16`

For the LSB Pattern: 0x0000_0000_0000_0000_0000_0000_0000_0001

Vector Type	Big-Endian Vector	Little-Endian Vector	Code Generated
`<2 x i64>`	`<0, 1>`	`<1, 0>`	`xxspltib v2, 255` `vsrq v2, v2, v2`
`<4 x i32>`	`<0, 0, 0, 1>`	`<1, 0, 0, 0>`	`xxspltib v2, 255` `vsrq v2, v2, v2`
`<8 x i16>`	`<0, 0, 0, 0, 0, 0, 0, 1>`	`<1, 0, 0, 0, 0, 0, 0, 0>`	`xxspltib v2, 255` `vsrq v2, v2, v2`
`<16 x i8>`	`<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1>`	`<1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>`	`xxspltib v2, 255` `vsrq v2, v2, v2`

RolandF77 · 2025-09-30T16:25:18Z

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+    // The right shifted pattern can be constructed using a combination of
+    // XXSPLTIB and VSRQ instruction. VSRQ uses the shift amount from the lower
+    // 7 bits of byte 15. This can be specified using XXSPLTIB with immediate
+    // value 255.


assert(ShiftAmount == 127, "Unexpected lxvkq pattern");

RolandF77 · 2025-09-30T16:26:37Z

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+
+    // Place the element value at the correct bit position
+    FullVal |= ElemAPInt;
+  }


if (FullVal.isZero() || FullVal.isAllOnes())
return SDValue();

tonykuttai force-pushed the tvarghese/vectorconst branch 5 times, most recently from ee47d25 to 85a8234 Compare September 9, 2025 18:48

tonykuttai marked this pull request as ready for review September 10, 2025 03:03

llvmbot added the backend:PowerPC label Sep 10, 2025

tonykuttai requested review from amy-kwan, lei137, RolandF77 and AditiRM September 10, 2025 03:04

RolandF77 reviewed Sep 18, 2025

View reviewed changes

tonykuttai closed this Sep 19, 2025

tonykuttai deleted the tvarghese/vectorconst branch September 19, 2025 07:12

tonykuttai restored the tvarghese/vectorconst branch September 19, 2025 07:12

tonykuttai reopened this Sep 19, 2025

RolandF77 reviewed Sep 19, 2025

View reviewed changes

Emit lxvkq and vsrq instructions for build vector patterns

cc537c7

tonykuttai force-pushed the tvarghese/vectorconst branch from 85a8234 to fc37040 Compare September 22, 2025 19:28

tonykuttai requested a review from RolandF77 September 22, 2025 19:29

Removed redundant loops, updated comments

fc37040

RolandF77 reviewed Sep 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PowerPC] Emit lxvkq and vsrq instructions for build vector patterns #157625

[PowerPC] Emit lxvkq and vsrq instructions for build vector patterns #157625

Uh oh!

tonykuttai commented Sep 9, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 10, 2025

Uh oh!

RolandF77 Sep 18, 2025

Uh oh!

tonykuttai Sep 22, 2025

Uh oh!

RolandF77 Sep 18, 2025

Uh oh!

tonykuttai Sep 22, 2025

Uh oh!

RolandF77 Sep 18, 2025

Uh oh!

tonykuttai Sep 22, 2025

Uh oh!

RolandF77 Sep 18, 2025

Uh oh!

tonykuttai Sep 22, 2025

Uh oh!

RolandF77 Sep 19, 2025

Uh oh!

tonykuttai Sep 22, 2025

Uh oh!

lei137 commented Sep 19, 2025

Uh oh!

tonykuttai commented Sep 22, 2025

Uh oh!

RolandF77 Sep 30, 2025 •

edited

Loading

Uh oh!

RolandF77 Sep 30, 2025

Uh oh!

Uh oh!

[PowerPC] Emit lxvkq and vsrq instructions for build vector patterns #157625

Are you sure you want to change the base?

[PowerPC] Emit lxvkq and vsrq instructions for build vector patterns #157625

Uh oh!

Conversation

tonykuttai commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Optimize BUILD_VECTOR having special quadword patterns

Implementation Details

MSB Pattern (0x8000...0000)

LSB Pattern (0x0000...0001)

Uh oh!

llvmbot commented Sep 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lei137 commented Sep 19, 2025

Uh oh!

tonykuttai commented Sep 22, 2025

Uh oh!

RolandF77 Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tonykuttai commented Sep 9, 2025 •

edited

Loading

MSB Pattern (`0x8000...0000`)

LSB Pattern (`0x0000...0001`)

RolandF77 Sep 30, 2025 •

edited

Loading