-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Handle fixed length vectors with exact VLEN in loweringEXTRACT_SUBVECTOR #79949
Conversation
…T_SUBVECTOR This is a revival of llvm#65392. When we lower an extract_subvector, we extract the subregister that the subvector is contained in first and then do a vslidedown with LMUL=1. We can currently only do this for scalable vectors though because the index is scaled by vscale and thus we will know what subregister the subvector lies in. For fixed length vectors, the index isn't scaled by vscale and so the subvector could lie in any arbitrary subregister, so we have to do a vslidedown with the full LMUL. The exception to this is when we know the exact VLEN: in which case, we can still work out the exact subregister and do the LMUL=1 vslidedown on it. This patch handles this case by scaling the index by 1/vscale before computing the subregister, and extending the LMUL=1 path to handle fixed length vectors.
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesThis is a revival of #65392. When we lower an extract_subvector, we extract the For fixed length vectors, the index isn't scaled by vscale and so the subvector The exception to this is when we know the exact VLEN: in which case, we can This patch handles this case by scaling the index by 1/vscale before computing Full diff: https://github.com/llvm/llvm-project/pull/79949.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index f2cacba0ff773..65f98bc238ae5 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -9616,12 +9616,15 @@ SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op,
if (OrigIdx == 0)
return Op;
- // If the subvector vector is a fixed-length type, we cannot use subregister
- // manipulation to simplify the codegen; we don't know which register of a
- // LMUL group contains the specific subvector as we only know the minimum
- // register size. Therefore we must slide the vector group down the full
- // amount.
- if (SubVecVT.isFixedLengthVector()) {
+ const unsigned MinVLen = Subtarget.getRealMinVLen();
+ const unsigned MaxVLen = Subtarget.getRealMaxVLen();
+
+ // If the subvector vector is a fixed-length type and we don't know VLEN
+ // exactly, we cannot use subregister manipulation to simplify the codegen; we
+ // don't know which register of a LMUL group contains the specific subvector
+ // as we only know the minimum register size. Therefore we must slide the
+ // vector group down the full amount.
+ if (SubVecVT.isFixedLengthVector() && MinVLen != MaxVLen) {
MVT ContainerVT = VecVT;
if (VecVT.isFixedLengthVector()) {
ContainerVT = getContainerForFixedLengthVector(VecVT);
@@ -9653,19 +9656,46 @@ SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op,
return DAG.getBitcast(Op.getValueType(), Slidedown);
}
+ if (VecVT.isFixedLengthVector()) {
+ VecVT = getContainerForFixedLengthVector(VecVT);
+ Vec = convertToScalableVector(VecVT, Vec, DAG, Subtarget);
+ }
+
+ MVT ContainerSubVecVT = SubVecVT;
unsigned SubRegIdx, RemIdx;
- std::tie(SubRegIdx, RemIdx) =
- RISCVTargetLowering::decomposeSubvectorInsertExtractToSubRegs(
- VecVT, SubVecVT, OrigIdx, TRI);
+
+ // extract_subvector scales the index by vscale is the subvector is scalable,
+ // and decomposeSubvectorInsertExtractToSubRegs takes this into account. So if
+ // we have a fixed length subvector, we need to adjust the index by 1/vscale.
+ if (SubVecVT.isFixedLengthVector()) {
+ assert(MinVLen == MaxVLen);
+ ContainerSubVecVT = getContainerForFixedLengthVector(SubVecVT);
+ unsigned Vscale = MinVLen / RISCV::RVVBitsPerBlock;
+ std::tie(SubRegIdx, RemIdx) =
+ RISCVTargetLowering::decomposeSubvectorInsertExtractToSubRegs(
+ VecVT, ContainerSubVecVT, OrigIdx / Vscale, TRI);
+ RemIdx = (RemIdx * Vscale) + (OrigIdx % Vscale);
+ } else {
+ std::tie(SubRegIdx, RemIdx) =
+ RISCVTargetLowering::decomposeSubvectorInsertExtractToSubRegs(
+ VecVT, ContainerSubVecVT, OrigIdx, TRI);
+ }
// If the Idx has been completely eliminated then this is a subvector extract
// which naturally aligns to a vector register. These can easily be handled
// using subregister manipulation.
- if (RemIdx == 0)
+ if (RemIdx == 0) {
+ if (SubVecVT.isFixedLengthVector()) {
+ Vec = DAG.getTargetExtractSubreg(SubRegIdx, DL, ContainerSubVecVT, Vec);
+ return convertFromScalableVector(SubVecVT, Vec, DAG, Subtarget);
+ }
return Op;
+ }
- // Else SubVecVT is a fractional LMUL and may need to be slid down.
- assert(RISCVVType::decodeVLMUL(getLMUL(SubVecVT)).second);
+ // Else SubVecVT is a fractional LMUL and may need to be slid down: if
+ // SubVecVT was > M1 then the index would need to be a multiple of VLMAX, and
+ // so would divide exactly.
+ assert(RISCVVType::decodeVLMUL(getLMUL(ContainerSubVecVT)).second);
// If the vector type is an LMUL-group type, extract a subvector equal to the
// nearest full vector register type.
@@ -9680,10 +9710,17 @@ SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op,
// Slide this vector register down by the desired number of elements in order
// to place the desired subvector starting at element 0.
- SDValue SlidedownAmt =
- DAG.getVScale(DL, XLenVT, APInt(XLenVT.getSizeInBits(), RemIdx));
+ SDValue SlidedownAmt;
+ if (SubVecVT.isFixedLengthVector())
+ SlidedownAmt = DAG.getConstant(RemIdx, DL, Subtarget.getXLenVT());
+ else
+ SlidedownAmt =
+ DAG.getVScale(DL, XLenVT, APInt(XLenVT.getSizeInBits(), RemIdx));
auto [Mask, VL] = getDefaultScalableVLOps(InterSubVT, DL, DAG, Subtarget);
+ if (SubVecVT.isFixedLengthVector())
+ VL = getVLOp(SubVecVT.getVectorNumElements(), InterSubVT, DL, DAG,
+ Subtarget);
SDValue Slidedown =
getVSlidedown(DAG, Subtarget, DL, InterSubVT, DAG.getUNDEF(InterSubVT),
Vec, SlidedownAmt, Mask, VL);
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
index 8b51a38830420..7d29b194f041e 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
@@ -76,10 +76,8 @@ define void @extract_v1i32_v8i32_4(ptr %x, ptr %y) {
; CHECK-KNOWNVLEN128-LABEL: extract_v1i32_v8i32_4:
; CHECK-KNOWNVLEN128: # %bb.0:
; CHECK-KNOWNVLEN128-NEXT: vl2re32.v v8, (a0)
-; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 1, e32, m2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v8, 4
; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT: vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT: vse32.v v9, (a1)
; CHECK-KNOWNVLEN128-NEXT: ret
%a = load <8 x i32>, ptr %x
%c = call <1 x i32> @llvm.vector.extract.v1i32.v8i32(<8 x i32> %a, i64 4)
@@ -101,8 +99,8 @@ define void @extract_v1i32_v8i32_5(ptr %x, ptr %y) {
; CHECK-KNOWNVLEN128-LABEL: extract_v1i32_v8i32_5:
; CHECK-KNOWNVLEN128: # %bb.0:
; CHECK-KNOWNVLEN128-NEXT: vl2re32.v v8, (a0)
-; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 1, e32, m2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v8, 5
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 1, e32, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v9, 1
; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 1, e32, mf2, ta, ma
; CHECK-KNOWNVLEN128-NEXT: vse32.v v8, (a1)
; CHECK-KNOWNVLEN128-NEXT: ret
@@ -172,10 +170,8 @@ define void @extract_v2i32_v8i32_4(ptr %x, ptr %y) {
; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_4:
; CHECK-KNOWNVLEN128: # %bb.0:
; CHECK-KNOWNVLEN128-NEXT: vl2re32.v v8, (a0)
-; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v8, 4
; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT: vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT: vse32.v v9, (a1)
; CHECK-KNOWNVLEN128-NEXT: ret
%a = load <8 x i32>, ptr %x
%c = call <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32> %a, i64 4)
@@ -197,8 +193,8 @@ define void @extract_v2i32_v8i32_6(ptr %x, ptr %y) {
; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_6:
; CHECK-KNOWNVLEN128: # %bb.0:
; CHECK-KNOWNVLEN128-NEXT: vl2re32.v v8, (a0)
-; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v8, 6
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e32, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v9, 2
; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
; CHECK-KNOWNVLEN128-NEXT: vse32.v v8, (a1)
; CHECK-KNOWNVLEN128-NEXT: ret
@@ -234,39 +230,59 @@ define void @extract_v2i32_nxv16i32_2(<vscale x 16 x i32> %x, ptr %y) {
}
define void @extract_v2i32_nxv16i32_4(<vscale x 16 x i32> %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_nxv16i32_4:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-NEXT: vslidedown.vi v8, v8, 4
-; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT: vse32.v v8, (a0)
-; CHECK-NEXT: ret
+; CHECK-V-LABEL: extract_v2i32_nxv16i32_4:
+; CHECK-V: # %bb.0:
+; CHECK-V-NEXT: vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-V-NEXT: vslidedown.vi v8, v8, 4
+; CHECK-V-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT: vse32.v v8, (a0)
+; CHECK-V-NEXT: ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_nxv16i32_4:
+; CHECK-KNOWNVLEN128: # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vse32.v v9, (a0)
+; CHECK-KNOWNVLEN128-NEXT: ret
%c = call <2 x i32> @llvm.vector.extract.v2i32.nxv16i32(<vscale x 16 x i32> %x, i64 4)
store <2 x i32> %c, ptr %y
ret void
}
define void @extract_v2i32_nxv16i32_6(<vscale x 16 x i32> %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_nxv16i32_6:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-NEXT: vslidedown.vi v8, v8, 6
-; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT: vse32.v v8, (a0)
-; CHECK-NEXT: ret
+; CHECK-V-LABEL: extract_v2i32_nxv16i32_6:
+; CHECK-V: # %bb.0:
+; CHECK-V-NEXT: vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-V-NEXT: vslidedown.vi v8, v8, 6
+; CHECK-V-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT: vse32.v v8, (a0)
+; CHECK-V-NEXT: ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_nxv16i32_6:
+; CHECK-KNOWNVLEN128: # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e32, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v9, 2
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vse32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT: ret
%c = call <2 x i32> @llvm.vector.extract.v2i32.nxv16i32(<vscale x 16 x i32> %x, i64 6)
store <2 x i32> %c, ptr %y
ret void
}
define void @extract_v2i32_nxv16i32_8(<vscale x 16 x i32> %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_nxv16i32_8:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetivli zero, 2, e32, m4, ta, ma
-; CHECK-NEXT: vslidedown.vi v8, v8, 8
-; CHECK-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT: vse32.v v8, (a0)
-; CHECK-NEXT: ret
+; CHECK-V-LABEL: extract_v2i32_nxv16i32_8:
+; CHECK-V: # %bb.0:
+; CHECK-V-NEXT: vsetivli zero, 2, e32, m4, ta, ma
+; CHECK-V-NEXT: vslidedown.vi v8, v8, 8
+; CHECK-V-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT: vse32.v v8, (a0)
+; CHECK-V-NEXT: ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_nxv16i32_8:
+; CHECK-KNOWNVLEN128: # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vse32.v v10, (a0)
+; CHECK-KNOWNVLEN128-NEXT: ret
%c = call <2 x i32> @llvm.vector.extract.v2i32.nxv16i32(<vscale x 16 x i32> %x, i64 8)
store <2 x i32> %c, ptr %y
ret void
@@ -333,9 +349,7 @@ define void @extract_v8i32_nxv16i32_8(<vscale x 16 x i32> %x, ptr %y) {
;
; CHECK-KNOWNVLEN128-LABEL: extract_v8i32_nxv16i32_8:
; CHECK-KNOWNVLEN128: # %bb.0:
-; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 8, e32, m4, ta, ma
-; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v8, 8
-; CHECK-KNOWNVLEN128-NEXT: vs2r.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT: vs2r.v v10, (a0)
; CHECK-KNOWNVLEN128-NEXT: ret
%c = call <8 x i32> @llvm.vector.extract.v8i32.nxv16i32(<vscale x 16 x i32> %x, i64 8)
store <8 x i32> %c, ptr %y
@@ -611,9 +625,8 @@ define void @extract_v2i1_v64i1_42(ptr %x, ptr %y) {
; CHECK-KNOWNVLEN128-NEXT: vlm.v v0, (a0)
; CHECK-KNOWNVLEN128-NEXT: vmv.v.i v8, 0
; CHECK-KNOWNVLEN128-NEXT: vmerge.vim v8, v8, 1, v0
-; CHECK-KNOWNVLEN128-NEXT: li a0, 42
-; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e8, m4, ta, ma
-; CHECK-KNOWNVLEN128-NEXT: vslidedown.vx v8, v8, a0
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e8, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v10, 10
; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e8, mf8, ta, ma
; CHECK-KNOWNVLEN128-NEXT: vmsne.vi v0, v8, 0
; CHECK-KNOWNVLEN128-NEXT: vmv.v.i v8, 0
@@ -741,51 +754,91 @@ define void @extract_v2i1_nxv64i1_2(<vscale x 64 x i1> %x, ptr %y) {
}
define void @extract_v2i1_nxv64i1_42(<vscale x 64 x i1> %x, ptr %y) {
-; CHECK-LABEL: extract_v2i1_nxv64i1_42:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e8, m8, ta, ma
-; CHECK-NEXT: vmv.v.i v8, 0
-; CHECK-NEXT: vmerge.vim v8, v8, 1, v0
-; CHECK-NEXT: li a1, 42
-; CHECK-NEXT: vsetivli zero, 2, e8, m4, ta, ma
-; CHECK-NEXT: vslidedown.vx v8, v8, a1
-; CHECK-NEXT: vsetivli zero, 2, e8, mf8, ta, ma
-; CHECK-NEXT: vmsne.vi v0, v8, 0
-; CHECK-NEXT: vmv.v.i v8, 0
-; CHECK-NEXT: vmerge.vim v8, v8, 1, v0
-; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT: vmv.v.i v9, 0
-; CHECK-NEXT: vsetivli zero, 2, e8, mf2, tu, ma
-; CHECK-NEXT: vmv.v.v v9, v8
-; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT: vmsne.vi v8, v9, 0
-; CHECK-NEXT: vsm.v v8, (a0)
-; CHECK-NEXT: ret
+; CHECK-V-LABEL: extract_v2i1_nxv64i1_42:
+; CHECK-V: # %bb.0:
+; CHECK-V-NEXT: vsetvli a1, zero, e8, m8, ta, ma
+; CHECK-V-NEXT: vmv.v.i v8, 0
+; CHECK-V-NEXT: vmerge.vim v8, v8, 1, v0
+; CHECK-V-NEXT: li a1, 42
+; CHECK-V-NEXT: vsetivli zero, 2, e8, m4, ta, ma
+; CHECK-V-NEXT: vslidedown.vx v8, v8, a1
+; CHECK-V-NEXT: vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-V-NEXT: vmsne.vi v0, v8, 0
+; CHECK-V-NEXT: vmv.v.i v8, 0
+; CHECK-V-NEXT: vmerge.vim v8, v8, 1, v0
+; CHECK-V-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-V-NEXT: vmv.v.i v9, 0
+; CHECK-V-NEXT: vsetivli zero, 2, e8, mf2, tu, ma
+; CHECK-V-NEXT: vmv.v.v v9, v8
+; CHECK-V-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-V-NEXT: vmsne.vi v8, v9, 0
+; CHECK-V-NEXT: vsm.v v8, (a0)
+; CHECK-V-NEXT: ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i1_nxv64i1_42:
+; CHECK-KNOWNVLEN128: # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT: vsetvli a1, zero, e8, m8, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vmv.v.i v8, 0
+; CHECK-KNOWNVLEN128-NEXT: vmerge.vim v8, v8, 1, v0
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e8, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v10, 10
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vmsne.vi v0, v8, 0
+; CHECK-KNOWNVLEN128-NEXT: vmv.v.i v8, 0
+; CHECK-KNOWNVLEN128-NEXT: vmerge.vim v8, v8, 1, v0
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vmv.v.i v9, 0
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e8, mf2, tu, ma
+; CHECK-KNOWNVLEN128-NEXT: vmv.v.v v9, v8
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vmsne.vi v8, v9, 0
+; CHECK-KNOWNVLEN128-NEXT: vsm.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT: ret
%c = call <2 x i1> @llvm.vector.extract.v2i1.nxv64i1(<vscale x 64 x i1> %x, i64 42)
store <2 x i1> %c, ptr %y
ret void
}
define void @extract_v2i1_nxv32i1_26(<vscale x 32 x i1> %x, ptr %y) {
-; CHECK-LABEL: extract_v2i1_nxv32i1_26:
-; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e8, m4, ta, ma
-; CHECK-NEXT: vmv.v.i v8, 0
-; CHECK-NEXT: vmerge.vim v8, v8, 1, v0
-; CHECK-NEXT: vsetivli zero, 2, e8, m2, ta, ma
-; CHECK-NEXT: vslidedown.vi v8, v8, 26
-; CHECK-NEXT: vsetivli zero, 2, e8, mf8, ta, ma
-; CHECK-NEXT: vmsne.vi v0, v8, 0
-; CHECK-NEXT: vmv.v.i v8, 0
-; CHECK-NEXT: vmerge.vim v8, v8, 1, v0
-; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT: vmv.v.i v9, 0
-; CHECK-NEXT: vsetivli zero, 2, e8, mf2, tu, ma
-; CHECK-NEXT: vmv.v.v v9, v8
-; CHECK-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
-; CHECK-NEXT: vmsne.vi v8, v9, 0
-; CHECK-NEXT: vsm.v v8, (a0)
-; CHECK-NEXT: ret
+; CHECK-V-LABEL: extract_v2i1_nxv32i1_26:
+; CHECK-V: # %bb.0:
+; CHECK-V-NEXT: vsetvli a1, zero, e8, m4, ta, ma
+; CHECK-V-NEXT: vmv.v.i v8, 0
+; CHECK-V-NEXT: vmerge.vim v8, v8, 1, v0
+; CHECK-V-NEXT: vsetivli zero, 2, e8, m2, ta, ma
+; CHECK-V-NEXT: vslidedown.vi v8, v8, 26
+; CHECK-V-NEXT: vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-V-NEXT: vmsne.vi v0, v8, 0
+; CHECK-V-NEXT: vmv.v.i v8, 0
+; CHECK-V-NEXT: vmerge.vim v8, v8, 1, v0
+; CHECK-V-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-V-NEXT: vmv.v.i v9, 0
+; CHECK-V-NEXT: vsetivli zero, 2, e8, mf2, tu, ma
+; CHECK-V-NEXT: vmv.v.v v9, v8
+; CHECK-V-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-V-NEXT: vmsne.vi v8, v9, 0
+; CHECK-V-NEXT: vsm.v v8, (a0)
+; CHECK-V-NEXT: ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i1_nxv32i1_26:
+; CHECK-KNOWNVLEN128: # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT: vsetvli a1, zero, e8, m4, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vmv.v.i v8, 0
+; CHECK-KNOWNVLEN128-NEXT: vmerge.vim v8, v8, 1, v0
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e8, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vslidedown.vi v8, v9, 10
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e8, mf8, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vmsne.vi v0, v8, 0
+; CHECK-KNOWNVLEN128-NEXT: vmv.v.i v8, 0
+; CHECK-KNOWNVLEN128-NEXT: vmerge.vim v8, v8, 1, v0
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vmv.v.i v9, 0
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 2, e8, mf2, tu, ma
+; CHECK-KNOWNVLEN128-NEXT: vmv.v.v v9, v8
+; CHECK-KNOWNVLEN128-NEXT: vsetivli zero, 8, e8, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT: vmsne.vi v8, v9, 0
+; CHECK-KNOWNVLEN128-NEXT: vsm.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT: ret
%c = call <2 x i1> @llvm.vector.extract.v2i1.nxv32i1(<vscale x 32 x i1> %x, i64 26)
store <2 x i1> %c, ptr %y
ret void
|
@@ -9680,10 +9710,17 @@ SDValue RISCVTargetLowering::lowerEXTRACT_SUBVECTOR(SDValue Op, | |||
|
|||
// Slide this vector register down by the desired number of elements in order | |||
// to place the desired subvector starting at element 0. | |||
SDValue SlidedownAmt = | |||
DAG.getVScale(DL, XLenVT, APInt(XLenVT.getSizeInBits(), RemIdx)); | |||
SDValue SlidedownAmt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up, I think there's room to simplify this via TypeSize and the IRBuilder routines for the same. We could express RemIdx as a TypeSize quantity, and abstract over the scalable vs fixed aspect of it.
This is minor, and definitely worthy of it's own review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The if statement it was merged into was not equivalent: VecVT.isScalableVector() != SubVecVT.isScalableVector()
Following up on the review comment in llvm#79949 (comment)
Following up on the review comment in #79949 (comment)
…BVECTOR This is the insert_subvector equivalent to llvm#79949, where we can avoid sliding up by the full LMUL amount if we know the exact subregister the subvector will be inserted into. This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two parts: - We handle fixed length subvector types by converting the subvector to a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the vector being inserted into too. - Whenever we don't need a vslideup because either the subvector aligns to a vector register group *or* the vector is undef, we need to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly handle fixed length subvectors yet: see d7a28f7 I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an invariant), so that the insert_subvector and extract_subvector code paths are the same. We should teach it to properly handle fixed length subvectors in a follow-up patch, so that the "exact subregsiter" logic is handled in one place instead of being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.
…BVECTOR This is the insert_subvector equivalent to llvm#79949, where we can avoid sliding up by the full LMUL amount if we know the exact subregister the subvector will be inserted into. This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two parts: - We handle fixed length subvector types by converting the subvector to a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the vector being inserted into too. - Whenever we don't need a vslideup because either the subvector aligns to a vector register group *or* the vector is undef, we need to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly handle fixed length subvectors yet: see d7a28f7 I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an invariant), so that the insert_subvector and extract_subvector code paths are the same. We should teach it to properly handle fixed length subvectors in a follow-up patch, so that the "exact subregsiter" logic is handled in one place instead of being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.
…BVECTOR This is the insert_subvector equivalent to llvm#79949, where we can avoid sliding up by the full LMUL amount if we know the exact subregister the subvector will be inserted into. This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two parts: - We handle fixed length subvector types by converting the subvector to a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the vector being inserted into too. - Whenever we don't need a vslideup because either the subvector aligns to a vector register group *or* the vector is undef, we need to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly handle fixed length subvectors yet: see d7a28f7 I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an invariant), so that the insert_subvector and extract_subvector code paths are the same. We should teach it to properly handle fixed length subvectors in a follow-up patch, so that the "exact subregsiter" logic is handled in one place instead of being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.
…BVECTOR This is the insert_subvector equivalent to llvm#79949, where we can avoid sliding up by the full LMUL amount if we know the exact subregister the subvector will be inserted into. This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two parts: - We handle fixed length subvector types by converting the subvector to a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the vector being inserted into too. - Whenever we don't need a vslideup because either the subvector aligns to a vector register group *or* the vector is undef, we need to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly handle fixed length subvectors yet: see d7a28f7 I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an invariant), so that the insert_subvector and extract_subvector code paths are the same. We should teach it to properly handle fixed length subvectors in a follow-up patch, so that the "exact subregsiter" logic is handled in one place instead of being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.
…BVECTOR (#84107) This is the insert_subvector equivalent to #79949, where we can avoid sliding up by the full LMUL amount if we know the exact subregister the subvector will be inserted into. This mirrors the lowerEXTRACT_SUBVECTOR changes in that we handle this in two parts: - We handle fixed length subvector types by converting the subvector to a scalable vector. But unlike EXTRACT_SUBVECTOR, we may also need to convert the vector being inserted into too. - Whenever we don't need a vslideup because either the subvector fits exactly into a vector register group *or* the vector is undef, we need to emit an insert_subreg ourselves because RISCVISelDAGToDAG::Select doesn't correctly handle fixed length subvectors yet: see d7a28f7 A subvector exactly fits into a vector register group if its size is a known multiple of the size of a vector register, and this adds a new overload for TypeSize::isKnownMultipleOf for scalable to scalable comparisons to help reason about this. I've left RISCVISelDAGToDAG::Select untouched for now (minus relaxing an invariant), so that the insert_subvector and extract_subvector code paths are the same. We should teach it to properly handle fixed length subvectors in a follow-up patch, so that the "exact subregsiter" logic is handled in one place instead of being spread across both RISCVISelDAGToDAG.cpp and RISCVISelLowering.cpp.
This is a revival of #65392. When we lower an extract_subvector, we extract the
subregister that the subvector is contained in first and then do a vslidedown
with LMUL=1. We can currently only do this for scalable vectors though because
the index is scaled by vscale and thus we will know what subregister the
subvector lies in.
For fixed length vectors, the index isn't scaled by vscale and so the subvector
could lie in any arbitrary subregister, so we have to do a vslidedown with the
full LMUL.
The exception to this is when we know the exact VLEN: in which case, we can
still work out the exact subregister and do the LMUL=1 vslidedown on it.
This patch handles this case by scaling the index by 1/vscale before computing
the subregister, and extending the LMUL=1 path to handle fixed length vectors.