Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Handle fixed length vectors with exact VLEN in loweringEXTRACT_SUBVECTOR #79949

Merged
merged 3 commits into from
Feb 13, 2024

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Jan 30, 2024

This is a revival of #65392. When we lower an extract_subvector, we extract the
subregister that the subvector is contained in first and then do a vslidedown
with LMUL=1. We can currently only do this for scalable vectors though because
the index is scaled by vscale and thus we will know what subregister the
subvector lies in.

For fixed length vectors, the index isn't scaled by vscale and so the subvector
could lie in any arbitrary subregister, so we have to do a vslidedown with the
full LMUL.

The exception to this is when we know the exact VLEN: in which case, we can
still work out the exact subregister and do the LMUL=1 vslidedown on it.

This patch handles this case by scaling the index by 1/vscale before computing
the subregister, and extending the LMUL=1 path to handle fixed length vectors.

…T_SUBVECTOR

This is a revival of llvm#65392. When we lower an extract_subvector, we extract the
subregister that the subvector is contained in first and then do a vslidedown
with LMUL=1.  We can currently only do this for scalable vectors though because
the index is scaled by vscale and thus we will know what subregister the
subvector lies in.

For fixed length vectors, the index isn't scaled by vscale and so the subvector
could lie in any arbitrary subregister, so we have to do a vslidedown with the
full LMUL.

The exception to this is when we know the exact VLEN: in which case, we can
still work out the exact subregister and do the LMUL=1 vslidedown on it.

This patch handles this case by scaling the index by 1/vscale before computing
the subregister, and extending the LMUL=1 path to handle fixed length vectors.
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 30, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This is a revival of #65392. When we lower an extract_subvector, we extract the
subregister that the subvector is contained in first and then do a vslidedown
with LMUL=1. We can currently only do this for scalable vectors though because
the index is scaled by vscale and thus we will know what subregister the
subvector lies in.

For fixed length vectors, the index isn't scaled by vscale and so the subvector
could lie in any arbitrary subregister, so we have to do a vslidedown with the
full LMUL.

The exception to this is when we know the exact VLEN: in which case, we can
still work out the exact subregister and do the LMUL=1 vslidedown on it.

This patch handles this case by scaling the index by 1/vscale before computing
the subregister, and extending the LMUL=1 path to handle fixed length vectors.


Full diff: https://github.com/llvm/llvm-project/pull/79949.diff

2 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+51-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+129-76)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index f2cacba0ff773..65f98bc238ae5 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -9616,12 +9616,15 @@ SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op,
   if (OrigIdx == 0)
     return Op;
 
-  // If the subvector vector is a fixed-length type, we cannot use subregister
-  // manipulation to simplify the codegen; we don't know which register of a
-  // LMUL group contains the specific subvector as we only know the minimum
-  // register size. Therefore we must slide the vector group down the full
-  // amount.
-  if (SubVecVT.isFixedLengthVector()) {
+  const unsigned MinVLen = Subtarget.getRealMinVLen();
+  const unsigned MaxVLen = Subtarget.getRealMaxVLen();
+
+  // If the subvector vector is a fixed-length type and we don't know VLEN
+  // exactly, we cannot use subregister manipulation to simplify the codegen; we
+  // don't know which register of a LMUL group contains the specific subvector
+  // as we only know the minimum register size. Therefore we must slide the
+  // vector group down the full amount.
+  if (SubVecVT.isFixedLengthVector() && MinVLen != MaxVLen) {
     MVT ContainerVT = VecVT;
     if (VecVT.isFixedLengthVector()) {
       ContainerVT = getContainerForFixedLengthVector(VecVT);
@@ -9653,19 +9656,46 @@ SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op,
     return DAG.getBitcast(Op.getValueType(), Slidedown);
   }
 
+  if (VecVT.isFixedLengthVector()) {
+    VecVT = getContainerForFixedLengthVector(VecVT);
+    Vec = convertToScalableVector(VecVT, Vec, DAG, Subtarget);
+  }
+
+  MVT ContainerSubVecVT = SubVecVT;
   unsigned SubRegIdx, RemIdx;
-  std::tie(SubRegIdx, RemIdx) =
-      RISCVTargetLowering::decomposeSubvectorInsertExtractToSubRegs(
-          VecVT, SubVecVT, OrigIdx, TRI);
+
+  // extract_subvector scales the index by vscale is the subvector is scalable,
+  // and decomposeSubvectorInsertExtractToSubRegs takes this into account. So if
+  // we have a fixed length subvector, we need to adjust the index by 1/vscale.
+  if (SubVecVT.isFixedLengthVector()) {
+    assert(MinVLen == MaxVLen);
+    ContainerSubVecVT = getContainerForFixedLengthVector(SubVecVT);
+    unsigned Vscale = MinVLen / RISCV::RVVBitsPerBlock;
+    std::tie(SubRegIdx, RemIdx) =
+        RISCVTargetLowering::decomposeSubvectorInsertExtractToSubRegs(
+            VecVT, ContainerSubVecVT, OrigIdx / Vscale, TRI);
+    RemIdx = (RemIdx * Vscale) + (OrigIdx % Vscale);
+  } else {
+    std::tie(SubRegIdx, RemIdx) =
+        RISCVTargetLowering::decomposeSubvectorInsertExtractToSubRegs(
+            VecVT, ContainerSubVecVT, OrigIdx, TRI);
+  }
 
   // If the Idx has been completely eliminated then this is a subvector extract
   // which naturally aligns to a vector register. These can easily be handled
   // using subregister manipulation.
-  if (RemIdx == 0)
+  if (RemIdx == 0) {
+    if (SubVecVT.isFixedLengthVector()) {
+      Vec = DAG.getTargetExtractSubreg(SubRegIdx, DL, ContainerSubVecVT, Vec);
+      return convertFromScalableVector(SubVecVT, Vec, DAG, Subtarget);
+    }
     return Op;
+  }
 
-  // Else SubVecVT is a fractional LMUL and may need to be slid down.
-  assert(RISCVVType::decodeVLMUL(getLMUL(SubVecVT)).second);
+  // Else SubVecVT is a fractional LMUL and may need to be slid down: if
+  // SubVecVT was > M1 then the index would need to be a multiple of VLMAX, and
+  // so would divide exactly.
+  assert(RISCVVType::decodeVLMUL(getLMUL(ContainerSubVecVT)).second);
 
   // If the vector type is an LMUL-group type, extract a subvector equal to the
   // nearest full vector register type.
@@ -9680,10 +9710,17 @@ SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op,
 
   // Slide this vector register down by the desired number of elements in order
   // to place the desired subvector starting at element 0.
-  SDValue SlidedownAmt =
-      DAG.getVScale(DL, XLenVT, APInt(XLenVT.getSizeInBits(), RemIdx));
+  SDValue SlidedownAmt;
+  if (SubVecVT.isFixedLengthVector())
+    SlidedownAmt = DAG.getConstant(RemIdx, DL, Subtarget.getXLenVT());
+  else
+    SlidedownAmt =
+        DAG.getVScale(DL, XLenVT, APInt(XLenVT.getSizeInBits(), RemIdx));
 
   auto [Mask, VL] = getDefaultScalableVLOps(InterSubVT, DL, DAG, Subtarget);
+  if (SubVecVT.isFixedLengthVector())
+    VL = getVLOp(SubVecVT.getVectorNumElements(), InterSubVT, DL, DAG,
+                 Subtarget);
   SDValue Slidedown =
       getVSlidedown(DAG, Subtarget, DL, InterSubVT, DAG.getUNDEF(InterSubVT),
                     Vec, SlidedownAmt, Mask, VL);
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
index 8b51a38830420..7d29b194f041e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
@@ -76,10 +76,8 @@ define void @extract_v1i32_v8i32_4(ptr %x, ptr %y) {
 ; CHECK-KNOWNVLEN128-LABEL: extract_v1i32_v8i32_4:
 ; CHECK-KNOWNVLEN128:       # %bb.0:
 ; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
-; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 4
 ; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v9, (a1)
 ; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <1 x i32> @llvm.vector.extract.v1i32.v8i32(<8 x i32> %a, i64 4)
@@ -101,8 +99,8 @@ define void @extract_v1i32_v8i32_5(ptr %x, ptr %y) {
 ; CHECK-KNOWNVLEN128-LABEL: extract_v1i32_v8i32_5:
 ; CHECK-KNOWNVLEN128:       # %bb.0:
 ; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
-; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 5
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v9, 1
 ; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
 ; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
 ; CHECK-KNOWNVLEN128-NEXT:    ret
@@ -172,10 +170,8 @@ define void @extract_v2i32_v8i32_4(ptr %x, ptr %y) {
 ; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_4:
 ; CHECK-KNOWNVLEN128:       # %bb.0:
 ; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
-; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 4
 ; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v9, (a1)
 ; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32> %a, i64 4)
@@ -197,8 +193,8 @@ define void @extract_v2i32_v8i32_6(ptr %x, ptr %y) {
 ; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_6:
 ; CHECK-KNOWNVLEN128:       # %bb.0:
 ; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
-; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 6
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v9, 2
 ; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
 ; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
 ; CHECK-KNOWNVLEN128-NEXT:    ret
@@ -234,39 +230,59 @@ define void @extract_v2i32_nxv16i32_2(<vscale x 16 x i32> %x, ptr %y) {
 }
 
 define void @extract_v2i32_nxv16i32_4(<vscale x 16 x i32> %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_nxv16i32_4:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 4
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a0)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_nxv16i32_4:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 4
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a0)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_nxv16i32_4:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v9, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %c = call <2 x i32> @llvm.vector.extract.v2i32.nxv16i32(<vscale x 16 x i32> %x, i64 4)
   store <2 x i32> %c, ptr %y
   ret void
 }
 
 define void @extract_v2i32_nxv16i32_6(<vscale x 16 x i32> %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_nxv16i32_6:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 6
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a0)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_nxv16i32_6:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 6
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a0)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_nxv16i32_6:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v9, 2
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %c = call <2 x i32> @llvm.vector.extract.v2i32.nxv16i32(<vscale x 16 x i32> %x, i64 6)
   store <2 x i32> %c, ptr %y
   ret void
 }
 
 define void @extract_v2i32_nxv16i32_8(<vscale x 16 x i32> %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_nxv16i32_8:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 2, e32, m4, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 8
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a0)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_nxv16i32_8:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, m4, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 8
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a0)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_nxv16i32_8:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v10, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %c = call <2 x i32> @llvm.vector.extract.v2i32.nxv16i32(<vscale x 16 x i32> %x, i64 8)
   store <2 x i32> %c, ptr %y
   ret void
@@ -333,9 +349,7 @@ define void @extract_v8i32_nxv16i32_8(<vscale x 16 x i32> %x, ptr %y) {
 ;
 ; CHECK-KNOWNVLEN128-LABEL: extract_v8i32_nxv16i32_8:
 ; CHECK-KNOWNVLEN128:       # %bb.0:
-; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 8, e32, m4, ta, ma
-; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 8
-; CHECK-KNOWNVLEN128-NEXT:    vs2r.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vs2r.v v10, (a0)
 ; CHECK-KNOWNVLEN128-NEXT:    ret
   %c = call <8 x i32> @llvm.vector.extract.v8i32.nxv16i32(<vscale x 16 x i32> %x, i64 8)
   store <8 x i32> %c, ptr %y
@@ -611,9 +625,8 @@ define void @extract_v2i1_v64i1_42(ptr %x, ptr %y) {
 ; CHECK-KNOWNVLEN128-NEXT:    vlm.v v0, (a0)
 ; CHECK-KNOWNVLEN128-NEXT:    vmv.v.i v8, 0
 ; CHECK-KNOWNVLEN128-NEXT:    vmerge.vim v8, v8, 1, v0
-; CHECK-KNOWNVLEN128-NEXT:    li a0, 42
-; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e8, m4, ta, ma
-; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vx v8, v8, a0
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e8, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v10, 10
 ; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e8, mf8, ta, ma
 ; CHECK-KNOWNVLEN128-NEXT:    vmsne.vi v0, v8, 0
 ; CHECK-KNOWNVLEN128-NEXT:    vmv.v.i v8, 0
@@ -741,51 +754,91 @@ define void @extract_v2i1_nxv64i1_2(<vscale x 64 x i1> %x, ptr %y) {
 }
 
 define void @extract_v2i1_nxv64i1_42(<vscale x 64 x i1> %x, ptr %y) {
-; CHECK-LABEL: extract_v2i1_nxv64i1_42:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e8, m8, ta, ma
-; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vmerge.vim v8, v8, 1, v0
-; CHECK-NEXT:    li a1, 42
-; CHECK-NEXT:    vsetivli zero, 2, e8, m4, ta, ma
-; CHECK-NEXT:    vslidedown.vx v8, v8, a1
-; CHECK-NEXT:    vsetivli zero, 2, e8, mf8, ta, ma
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
-; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vmerge.vim v8, v8, 1, v0
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT:    vmv.v.i v9, 0
-; CHECK-NEXT:    vsetivli zero, 2, e8, mf2, tu, ma
-; CHECK-NEXT:    vmv.v.v v9, v8
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT:    vmsne.vi v8, v9, 0
-; CHECK-NEXT:    vsm.v v8, (a0)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i1_nxv64i1_42:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetvli a1, zero, e8, m8, ta, ma
+; CHECK-V-NEXT:    vmv.v.i v8, 0
+; CHECK-V-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-V-NEXT:    li a1, 42
+; CHECK-V-NEXT:    vsetivli zero, 2, e8, m4, ta, ma
+; CHECK-V-NEXT:    vslidedown.vx v8, v8, a1
+; CHECK-V-NEXT:    vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-V-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-V-NEXT:    vmv.v.i v8, 0
+; CHECK-V-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-V-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-V-NEXT:    vmv.v.i v9, 0
+; CHECK-V-NEXT:    vsetivli zero, 2, e8, mf2, tu, ma
+; CHECK-V-NEXT:    vmv.v.v v9, v8
+; CHECK-V-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-V-NEXT:    vmsne.vi v8, v9, 0
+; CHECK-V-NEXT:    vsm.v v8, (a0)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i1_nxv64i1_42:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vsetvli a1, zero, e8, m8, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vmv.v.i v8, 0
+; CHECK-KNOWNVLEN128-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e8, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v10, 10
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-KNOWNVLEN128-NEXT:    vmv.v.i v8, 0
+; CHECK-KNOWNVLEN128-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vmv.v.i v9, 0
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e8, mf2, tu, ma
+; CHECK-KNOWNVLEN128-NEXT:    vmv.v.v v9, v8
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vmsne.vi v8, v9, 0
+; CHECK-KNOWNVLEN128-NEXT:    vsm.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %c = call <2 x i1> @llvm.vector.extract.v2i1.nxv64i1(<vscale x 64 x i1> %x, i64 42)
   store <2 x i1> %c, ptr %y
   ret void
 }
 
 define void @extract_v2i1_nxv32i1_26(<vscale x 32 x i1> %x, ptr %y) {
-; CHECK-LABEL: extract_v2i1_nxv32i1_26:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e8, m4, ta, ma
-; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vmerge.vim v8, v8, 1, v0
-; CHECK-NEXT:    vsetivli zero, 2, e8, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 26
-; CHECK-NEXT:    vsetivli zero, 2, e8, mf8, ta, ma
-; CHECK-NEXT:    vmsne.vi v0, v8, 0
-; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vmerge.vim v8, v8, 1, v0
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT:    vmv.v.i v9, 0
-; CHECK-NEXT:    vsetivli zero, 2, e8, mf2, tu, ma
-; CHECK-NEXT:    vmv.v.v v9, v8
-; CHECK-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT:    vmsne.vi v8, v9, 0
-; CHECK-NEXT:    vsm.v v8, (a0)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i1_nxv32i1_26:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetvli a1, zero, e8, m4, ta, ma
+; CHECK-V-NEXT:    vmv.v.i v8, 0
+; CHECK-V-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-V-NEXT:    vsetivli zero, 2, e8, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 26
+; CHECK-V-NEXT:    vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-V-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-V-NEXT:    vmv.v.i v8, 0
+; CHECK-V-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-V-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-V-NEXT:    vmv.v.i v9, 0
+; CHECK-V-NEXT:    vsetivli zero, 2, e8, mf2, tu, ma
+; CHECK-V-NEXT:    vmv.v.v v9, v8
+; CHECK-V-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-V-NEXT:    vmsne.vi v8, v9, 0
+; CHECK-V-NEXT:    vsm.v v8, (a0)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i1_nxv32i1_26:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vsetvli a1, zero, e8, m4, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vmv.v.i v8, 0
+; CHECK-KNOWNVLEN128-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e8, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v9, 10
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vmsne.vi v0, v8, 0
+; CHECK-KNOWNVLEN128-NEXT:    vmv.v.i v8, 0
+; CHECK-KNOWNVLEN128-NEXT:    vmerge.vim v8, v8, 1, v0
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vmv.v.i v9, 0
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e8, mf2, tu, ma
+; CHECK-KNOWNVLEN128-NEXT:    vmv.v.v v9, v8
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vmsne.vi v8, v9, 0
+; CHECK-KNOWNVLEN128-NEXT:    vsm.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %c = call <2 x i1> @llvm.vector.extract.v2i1.nxv32i1(<vscale x 32 x i1> %x, i64 26)
   store <2 x i1> %c, ptr %y
   ret void

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Show resolved Hide resolved
@@ -9680,10 +9710,17 @@ SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op,

// Slide this vector register down by the desired number of elements in order
// to place the desired subvector starting at element 0.
SDValue SlidedownAmt =
DAG.getVScale(DL, XLenVT, APInt(XLenVT.getSizeInBits(), RemIdx));
SDValue SlidedownAmt;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up, I think there's room to simplify this via TypeSize and the IRBuilder routines for the same. We could express RemIdx as a TypeSize quantity, and abstract over the scalable vs fixed aspect of it.

This is minor, and definitely worthy of it's own review.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp Show resolved Hide resolved
Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The if statement it was merged into was not equivalent: VecVT.isScalableVector() != SubVecVT.isScalableVector()
@lukel97 lukel97 merged commit bb77047 into llvm:main Feb 13, 2024
3 of 4 checks passed
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Feb 14, 2024
lukel97 added a commit that referenced this pull request Feb 19, 2024
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Mar 6, 2024
…BVECTOR

This is the insert_subvector equivalent to llvm#79949, where we can avoid sliding up
by the full LMUL amount if we know the exact subregister the subvector will be
inserted into.

This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two
parts:

- We handle fixed length subvector types by converting the subvector to a
  scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the
  vector being inserted into too.

- Whenever we don't need a vslideup because either the subvector aligns to a
  vector register group *or* the vector is undef, we need to emit an
  insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly
  handle fixed length subvectors yet: see d7a28f7

I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an
invariant), so that the insert_subvector and extract_subvector code paths are
the same.

We should teach it to properly handle fixed length subvectors in a follow-up
patch, so that the "exact subregsiter" logic is handled in one place instead of
being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Mar 7, 2024
…BVECTOR

This is the insert_subvector equivalent to llvm#79949, where we can avoid sliding up
by the full LMUL amount if we know the exact subregister the subvector will be
inserted into.

This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two
parts:

- We handle fixed length subvector types by converting the subvector to a
  scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the
  vector being inserted into too.

- Whenever we don't need a vslideup because either the subvector aligns to a
  vector register group *or* the vector is undef, we need to emit an
  insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly
  handle fixed length subvectors yet: see d7a28f7

I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an
invariant), so that the insert_subvector and extract_subvector code paths are
the same.

We should teach it to properly handle fixed length subvectors in a follow-up
patch, so that the "exact subregsiter" logic is handled in one place instead of
being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Apr 25, 2024
…BVECTOR

This is the insert_subvector equivalent to llvm#79949, where we can avoid sliding up
by the full LMUL amount if we know the exact subregister the subvector will be
inserted into.

This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two
parts:

- We handle fixed length subvector types by converting the subvector to a
  scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the
  vector being inserted into too.

- Whenever we don't need a vslideup because either the subvector aligns to a
  vector register group *or* the vector is undef, we need to emit an
  insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly
  handle fixed length subvectors yet: see d7a28f7

I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an
invariant), so that the insert_subvector and extract_subvector code paths are
the same.

We should teach it to properly handle fixed length subvectors in a follow-up
patch, so that the "exact subregsiter" logic is handled in one place instead of
being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Apr 30, 2024
…BVECTOR

This is the insert_subvector equivalent to llvm#79949, where we can avoid sliding up
by the full LMUL amount if we know the exact subregister the subvector will be
inserted into.

This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two
parts:

- We handle fixed length subvector types by converting the subvector to a
  scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the
  vector being inserted into too.

- Whenever we don't need a vslideup because either the subvector aligns to a
  vector register group *or* the vector is undef, we need to emit an
  insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly
  handle fixed length subvectors yet: see d7a28f7

I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an
invariant), so that the insert_subvector and extract_subvector code paths are
the same.

We should teach it to properly handle fixed length subvectors in a follow-up
patch, so that the "exact subregsiter" logic is handled in one place instead of
being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.
lukel97 added a commit that referenced this pull request Apr 30, 2024
…BVECTOR (#84107)

This is the insert_subvector equivalent to #79949, where we can avoid
sliding up by the full LMUL amount if we know the exact subregister the
subvector will be inserted into.

This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this
in two parts:

- We handle fixed length subvector types by converting the subvector to
a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to
convert the vector being inserted into too.

- Whenever we don't need a vslideup because either the subvector fits
exactly into a vector register group *or* the vector is undef, we need
to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select
doesn't correctly handle fixed length subvectors yet: see d7a28f7

A subvector exactly fits into a vector register group if its size is a
known multiple of the size of a vector register, and this adds a new
overload for TypeSize::isKnownMultipleOf for scalable to scalable
comparisons to help reason about this.

I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an
invariant), so that the insert_subvector and extract_subvector code
paths are the same.

We should teach it to properly handle fixed length subvectors in a
follow-up patch, so that the "exact subregsiter" logic is handled in one
place instead of being spread across both RISCVISelDAGToDAG.cpp and
RISCVISelLowering.cpp.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants