Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Prefer whole register loads and stores when VL=VLMAX #75531

Merged

Conversation

preames
Copy link
Collaborator

@preames preames commented Dec 14, 2023

If we're lowering a fixed length vector load or store which happens to exactly VLEN in size (when VLEN is exactly known), we can use a whole register load or store instead of the unit strided variants. This doesn't require a vsetvli in some cases, allows additional flexibility of vsetvli cases in others, and doesn't have a runtime dependency on the value of VL.

If we're lowering a fixed length vector load or store which happens to
exactly VLEN in size (when VLEN is exactly known), we can use a whole
register load or store instead of the unit strided variants.  This
doesn't require a vsetvli in some cases, allows additional flexibility
of vsetvli cases in others, and doesn't have a runtime dependency on
the value of VL.
@llvmbot llvmbot added backend:RISC-V llvm:SelectionDAG SelectionDAGISel as well labels Dec 14, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Dec 14, 2023

@llvm/pr-subscribers-llvm-selectiondag

Author: Philip Reames (preames)

Changes

If we're lowering a fixed length vector load or store which happens to exactly VLEN in size (when VLEN is exactly known), we can use a whole register load or store instead of the unit strided variants. This doesn't require a vsetvli in some cases, allows additional flexibility of vsetvli cases in others, and doesn't have a runtime dependency on the value of VL.


Full diff: https://github.com/llvm/llvm-project/pull/75531.diff

7 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+4)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+28-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+118-59)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll (+5-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll (+7-7)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index ba8591529399b7..3ba4bb93620a6e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -24180,6 +24180,10 @@ static SDValue narrowExtractedVectorLoad(SDNode *Extract, SelectionDAG &DAG) {
 
   unsigned Index = Extract->getConstantOperandVal(1);
   unsigned NumElts = VT.getVectorMinNumElements();
+  // A fixed length vector being extracted from a scalable vector
+  // may not be any *smaller* than the scalable one.
+  if (Index == 0 && NumElts >= Ld->getValueType(0).getVectorMinNumElements())
+    return SDValue();
 
   // The definition of EXTRACT_SUBVECTOR states that the index must be a
   // multiple of the minimum number of elements in the result type.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 91bccc77f93fd0..9e4f0e775a7ef8 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -9838,6 +9838,20 @@ RISCVTargetLowering::lowerFixedLengthVectorLoadToRVV(SDValue Op,
   MVT XLenVT = Subtarget.getXLenVT();
   MVT ContainerVT = getContainerForFixedLengthVector(VT);
 
+  // If we know the exact VLEN and our fixed length vector completely fills
+  // the container, use a whole register load instead.
+  const auto [MinVLMAX, MaxVLMAX] =
+      RISCVTargetLowering::computeVLMAXBounds(ContainerVT, Subtarget);
+  if (MinVLMAX == MaxVLMAX && MinVLMAX == VT.getVectorNumElements() &&
+      getLMUL1VT(ContainerVT).bitsLE(ContainerVT)) {
+    SDValue NewLoad =
+      DAG.getLoad(ContainerVT, DL, Load->getChain(), Load->getBasePtr(),
+                  Load->getPointerInfo(), Load->getOriginalAlign(),
+                  Load->getMemOperand()->getFlags());
+    SDValue Result = convertFromScalableVector(VT, NewLoad, DAG, Subtarget);
+    return DAG.getMergeValues({Result, NewLoad.getValue(1)}, DL);
+  }
+
   SDValue VL = getVLOp(VT.getVectorNumElements(), ContainerVT, DL, DAG, Subtarget);
 
   bool IsMaskOp = VT.getVectorElementType() == MVT::i1;
@@ -9882,12 +9896,23 @@ RISCVTargetLowering::lowerFixedLengthVectorStoreToRVV(SDValue Op,
 
   MVT ContainerVT = getContainerForFixedLengthVector(VT);
 
-  SDValue VL = getVLOp(VT.getVectorNumElements(), ContainerVT, DL, DAG,
-                       Subtarget);
-
   SDValue NewValue =
       convertToScalableVector(ContainerVT, StoreVal, DAG, Subtarget);
 
+
+  // If we know the exact VLEN and our fixed length vector completely fills
+  // the container, use a whole register store instead.
+  const auto [MinVLMAX, MaxVLMAX] =
+      RISCVTargetLowering::computeVLMAXBounds(ContainerVT, Subtarget);
+  if (MinVLMAX == MaxVLMAX && MinVLMAX == VT.getVectorNumElements() &&
+      getLMUL1VT(ContainerVT).bitsLE(ContainerVT))
+    return DAG.getStore(Store->getChain(), DL, NewValue, Store->getBasePtr(),
+                        Store->getPointerInfo(), Store->getOriginalAlign(),
+                        Store->getMemOperand()->getFlags());
+
+  SDValue VL = getVLOp(VT.getVectorNumElements(), ContainerVT, DL, DAG,
+                       Subtarget);
+
   bool IsMaskOp = VT.getVectorElementType() == MVT::i1;
   SDValue IntID = DAG.getTargetConstant(
       IsMaskOp ? Intrinsic::riscv_vsm : Intrinsic::riscv_vse, DL, XLenVT);
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
index 0b9db09aab3a9c..00fbdf02acd039 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
@@ -63,15 +63,24 @@ define void @extract_v2i8_v8i8_6(ptr %x, ptr %y) {
 }
 
 define void @extract_v1i32_v8i32_4(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v1i32_v8i32_4:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 4
-; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v1i32_v8i32_4:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 4
+; CHECK-V-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v1i32_v8i32_4:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 4
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <1 x i32> @llvm.vector.extract.v1i32.v8i32(<8 x i32> %a, i64 4)
   store <1 x i32> %c, ptr %y
@@ -79,15 +88,24 @@ define void @extract_v1i32_v8i32_4(ptr %x, ptr %y) {
 }
 
 define void @extract_v1i32_v8i32_5(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v1i32_v8i32_5:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 5
-; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v1i32_v8i32_5:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 5
+; CHECK-V-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v1i32_v8i32_5:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 5
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <1 x i32> @llvm.vector.extract.v1i32.v8i32(<8 x i32> %a, i64 5)
   store <1 x i32> %c, ptr %y
@@ -95,13 +113,20 @@ define void @extract_v1i32_v8i32_5(ptr %x, ptr %y) {
 }
 
 define void @extract_v2i32_v8i32_0(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_v8i32_0:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_v8i32_0:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_0:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32> %a, i64 0)
   store <2 x i32> %c, ptr %y
@@ -109,15 +134,24 @@ define void @extract_v2i32_v8i32_0(ptr %x, ptr %y) {
 }
 
 define void @extract_v2i32_v8i32_2(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_v8i32_2:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 2, e32, m1, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 2
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_v8i32_2:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, m1, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 2
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_2:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 2
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32> %a, i64 2)
   store <2 x i32> %c, ptr %y
@@ -125,15 +159,24 @@ define void @extract_v2i32_v8i32_2(ptr %x, ptr %y) {
 }
 
 define void @extract_v2i32_v8i32_4(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_v8i32_4:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 4
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_v8i32_4:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 4
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_4:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 4
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32> %a, i64 4)
   store <2 x i32> %c, ptr %y
@@ -141,15 +184,24 @@ define void @extract_v2i32_v8i32_4(ptr %x, ptr %y) {
 }
 
 define void @extract_v2i32_v8i32_6(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_v8i32_6:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 6
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_v8i32_6:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 6
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_6:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 6
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32> %a, i64 6)
   store <2 x i32> %c, ptr %y
@@ -271,13 +323,20 @@ define void @extract_v2i8_nxv2i8_6(<vscale x 2 x i8> %x, ptr %y) {
 }
 
 define void @extract_v8i32_nxv16i32_8(<vscale x 16 x i32> %x, ptr %y) {
-; CHECK-LABEL: extract_v8i32_nxv16i32_8:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m4, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 8
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a0)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v8i32_nxv16i32_8:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m4, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 8
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a0)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v8i32_nxv16i32_8:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 8, e32, m4, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 8
+; CHECK-KNOWNVLEN128-NEXT:    vs2r.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %c = call <8 x i32> @llvm.vector.extract.v8i32.nxv16i32(<vscale x 16 x i32> %x, i64 8)
   store <8 x i32> %c, ptr %y
   ret void
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
index d3c4b0f5cddd12..06d1ada300a1d2 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
@@ -1141,10 +1141,10 @@ define float @extractelt_fdiv_v4f32(<4 x float> %x) {
 define i32 @extractelt_v16i32_idx7_exact_vlen(ptr %x) nounwind vscale_range(2,2) {
 ; CHECK-LABEL: extractelt_v16i32_idx7_exact_vlen:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    addi a0, a0, 16
+; CHECK-NEXT:    vl1re32.v v8, (a0)
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v9, 3
+; CHECK-NEXT:    vslidedown.vi v8, v8, 3
 ; CHECK-NEXT:    vmv.x.s a0, v8
 ; CHECK-NEXT:    ret
   %a = load <16 x i32>, ptr %x
@@ -1155,10 +1155,10 @@ define i32 @extractelt_v16i32_idx7_exact_vlen(ptr %x) nounwind vscale_range(2,2)
 define i32 @extractelt_v16i32_idx15_exact_vlen(ptr %x) nounwind vscale_range(2,2) {
 ; CHECK-LABEL: extractelt_v16i32_idx15_exact_vlen:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    addi a0, a0, 48
+; CHECK-NEXT:    vl1re32.v v8, (a0)
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v11, 3
+; CHECK-NEXT:    vslidedown.vi v8, v8, 3
 ; CHECK-NEXT:    vmv.x.s a0, v8
 ; CHECK-NEXT:    ret
   %a = load <16 x i32>, ptr %x
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll
index bcf2b0655b1763..791e6eb5ff300d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll
@@ -140,8 +140,7 @@ define <6 x i1> @load_v6i1(ptr %p) {
 define <4 x i32> @exact_vlen_i32_m1(ptr %p) vscale_range(2,2) {
 ; CHECK-LABEL: exact_vlen_i32_m1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    vl1re32.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v = load <4 x i32>, ptr %p
   ret <4 x i32> %v
@@ -150,8 +149,7 @@ define <4 x i32> @exact_vlen_i32_m1(ptr %p) vscale_range(2,2) {
 define <16 x i8> @exact_vlen_i8_m1(ptr %p) vscale_range(2,2) {
 ; CHECK-LABEL: exact_vlen_i8_m1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a0)
+; CHECK-NEXT:    vl1r.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v = load <16 x i8>, ptr %p
   ret <16 x i8> %v
@@ -160,8 +158,7 @@ define <16 x i8> @exact_vlen_i8_m1(ptr %p) vscale_range(2,2) {
 define <32 x i8> @exact_vlen_i8_m2(ptr %p) vscale_range(2,2) {
 ; CHECK-LABEL: exact_vlen_i8_m2:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e8, m2, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a0)
+; CHECK-NEXT:    vl2r.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v = load <32 x i8>, ptr %p
   ret <32 x i8> %v
@@ -170,8 +167,7 @@ define <32 x i8> @exact_vlen_i8_m2(ptr %p) vscale_range(2,2) {
 define <128 x i8> @exact_vlen_i8_m8(ptr %p) vscale_range(2,2) {
 ; CHECK-LABEL: exact_vlen_i8_m8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e8, m8, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a0)
+; CHECK-NEXT:    vl8r.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v = load <128 x i8>, ptr %p
   ret <128 x i8> %v
@@ -180,8 +176,7 @@ define <128 x i8> @exact_vlen_i8_m8(ptr %p) vscale_range(2,2) {
 define <16 x i64> @exact_vlen_i64_m8(ptr %p) vscale_range(2,2) {
 ; CHECK-LABEL: exact_vlen_i64_m8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
-; CHECK-NEXT:    vle64.v v8, (a0)
+; CHECK-NEXT:    vl8re64.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v = load <16 x i64>, ptr %p
   ret <16 x i64> %v
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll
index 8e3ef615428f35..b747d73ce353e2 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll
@@ -244,7 +244,7 @@ define void @exact_vlen_i32_m1(ptr %p) vscale_range(2,2) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vse32.v v8, (a0)
+; CHECK-NEXT:    vs1r.v v8, (a0)
 ; CHECK-NEXT:    ret
   store <4 x i32> zeroinitializer, ptr %p
   ret void
@@ -255,7 +255,7 @@ define void @exact_vlen_i8_m1(ptr %p) vscale_range(2,2) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vse8.v v8, (a0)
+; CHECK-NEXT:    vs1r.v v8, (a0)
 ; CHECK-NEXT:    ret
   store <16 x i8> zeroinitializer, ptr %p
   ret void
@@ -266,7 +266,7 @@ define void @exact_vlen_i8_m2(ptr %p) vscale_range(2,2) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a1, zero, e8, m2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vse8.v v8, (a0)
+; CHECK-NEXT:    vs2r.v v8, (a0)
 ; CHECK-NEXT:    ret
   store <32 x i8> zeroinitializer, ptr %p
   ret void
@@ -277,7 +277,7 @@ define void @exact_vlen_i8_m8(ptr %p) vscale_range(2,2) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a1, zero, e8, m8, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vse8.v v8, (a0)
+; CHECK-NEXT:    vs8r.v v8, (a0)
 ; CHECK-NEXT:    ret
   store <128 x i8> zeroinitializer, ptr %p
   ret void
@@ -288,7 +288,7 @@ define void @exact_vlen_i64_m8(ptr %p) vscale_range(2,2) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vse64.v v8, (a0)
+; CHECK-NEXT:    vs8r.v v8, (a0)
 ; CHECK-NEXT:    ret
   store <16 x i64> zeroinitializer, ptr %p
   ret void
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
index a79180ed45b42b..1dc8e842de594f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
@@ -598,11 +598,11 @@ bb:
 define void @add_v128i8(ptr %x, ptr %y) vscale_range(2,2) {
 ; CHECK-LABEL: add_v128i8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a2, zero, e8, m8, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a0)
-; CHECK-NEXT:    vle8.v v16, (a1)
+; CHECK-NEXT:    vl8r.v v8, (a0)
+; CHECK-NEXT:    vl8r.v v16, (a1)
+; CHECK-NEXT:    vsetvli a1, zero, e8, m8, ta, ma
 ; CHECK-NEXT:    vadd.vv v8, v8, v16
-; CHECK-NEXT:    vse8.v v8, (a0)
+; CHECK-NEXT:    vs8r.v v8, (a0)
 ; CHECK-NEXT:    ret
   %a = load <128 x i8>, ptr %x
   %b = load <128 x i8>, ptr %y
@@ -614,11 +614,11 @@ define void @add_v128i8(ptr %x, ptr %y) vscale_range(2,2) {
 define void @add_v16i64(ptr %x, ptr %y) vscale_range(2,2) {
 ; CHECK-LABEL: add_v16i64:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl8re64.v v8, (a0)
+; CHECK-NEXT:    vl8re64.v v16, (a1)
 ; CHECK-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
-; CHECK-NEXT:    vle64.v v8, (a0)
-; CHECK-NEXT:    vle64.v v16, (a1)
 ; CHECK-NEXT:    vadd.vv v8, v8, v16
-; CHECK-NEXT:    vse64.v v8, (a0)
+; CHECK-NEXT:    vs8r.v v8, (a0)
 ; CHECK-NEXT:    ret
   %a = load <16 x i64>, ptr %x
   %b = load <16 x i64>, ptr %y

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 14, 2023

@llvm/pr-subscribers-backend-risc-v

Author: Philip Reames (preames)

Changes

If we're lowering a fixed length vector load or store which happens to exactly VLEN in size (when VLEN is exactly known), we can use a whole register load or store instead of the unit strided variants. This doesn't require a vsetvli in some cases, allows additional flexibility of vsetvli cases in others, and doesn't have a runtime dependency on the value of VL.


Full diff: https://github.com/llvm/llvm-project/pull/75531.diff

7 Files Affected:

  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+4)
  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+28-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+118-59)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll (+5-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll (+7-7)
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index ba8591529399b7..3ba4bb93620a6e 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -24180,6 +24180,10 @@ static SDValue narrowExtractedVectorLoad(SDNode *Extract, SelectionDAG &DAG) {
 
   unsigned Index = Extract->getConstantOperandVal(1);
   unsigned NumElts = VT.getVectorMinNumElements();
+  // A fixed length vector being extracted from a scalable vector
+  // may not be any *smaller* than the scalable one.
+  if (Index == 0 && NumElts >= Ld->getValueType(0).getVectorMinNumElements())
+    return SDValue();
 
   // The definition of EXTRACT_SUBVECTOR states that the index must be a
   // multiple of the minimum number of elements in the result type.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 91bccc77f93fd0..9e4f0e775a7ef8 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -9838,6 +9838,20 @@ RISCVTargetLowering::lowerFixedLengthVectorLoadToRVV(SDValue Op,
   MVT XLenVT = Subtarget.getXLenVT();
   MVT ContainerVT = getContainerForFixedLengthVector(VT);
 
+  // If we know the exact VLEN and our fixed length vector completely fills
+  // the container, use a whole register load instead.
+  const auto [MinVLMAX, MaxVLMAX] =
+      RISCVTargetLowering::computeVLMAXBounds(ContainerVT, Subtarget);
+  if (MinVLMAX == MaxVLMAX && MinVLMAX == VT.getVectorNumElements() &&
+      getLMUL1VT(ContainerVT).bitsLE(ContainerVT)) {
+    SDValue NewLoad =
+      DAG.getLoad(ContainerVT, DL, Load->getChain(), Load->getBasePtr(),
+                  Load->getPointerInfo(), Load->getOriginalAlign(),
+                  Load->getMemOperand()->getFlags());
+    SDValue Result = convertFromScalableVector(VT, NewLoad, DAG, Subtarget);
+    return DAG.getMergeValues({Result, NewLoad.getValue(1)}, DL);
+  }
+
   SDValue VL = getVLOp(VT.getVectorNumElements(), ContainerVT, DL, DAG, Subtarget);
 
   bool IsMaskOp = VT.getVectorElementType() == MVT::i1;
@@ -9882,12 +9896,23 @@ RISCVTargetLowering::lowerFixedLengthVectorStoreToRVV(SDValue Op,
 
   MVT ContainerVT = getContainerForFixedLengthVector(VT);
 
-  SDValue VL = getVLOp(VT.getVectorNumElements(), ContainerVT, DL, DAG,
-                       Subtarget);
-
   SDValue NewValue =
       convertToScalableVector(ContainerVT, StoreVal, DAG, Subtarget);
 
+
+  // If we know the exact VLEN and our fixed length vector completely fills
+  // the container, use a whole register store instead.
+  const auto [MinVLMAX, MaxVLMAX] =
+      RISCVTargetLowering::computeVLMAXBounds(ContainerVT, Subtarget);
+  if (MinVLMAX == MaxVLMAX && MinVLMAX == VT.getVectorNumElements() &&
+      getLMUL1VT(ContainerVT).bitsLE(ContainerVT))
+    return DAG.getStore(Store->getChain(), DL, NewValue, Store->getBasePtr(),
+                        Store->getPointerInfo(), Store->getOriginalAlign(),
+                        Store->getMemOperand()->getFlags());
+
+  SDValue VL = getVLOp(VT.getVectorNumElements(), ContainerVT, DL, DAG,
+                       Subtarget);
+
   bool IsMaskOp = VT.getVectorElementType() == MVT::i1;
   SDValue IntID = DAG.getTargetConstant(
       IsMaskOp ? Intrinsic::riscv_vsm : Intrinsic::riscv_vse, DL, XLenVT);
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
index 0b9db09aab3a9c..00fbdf02acd039 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll
@@ -63,15 +63,24 @@ define void @extract_v2i8_v8i8_6(ptr %x, ptr %y) {
 }
 
 define void @extract_v1i32_v8i32_4(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v1i32_v8i32_4:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 4
-; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v1i32_v8i32_4:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 4
+; CHECK-V-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v1i32_v8i32_4:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 4
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <1 x i32> @llvm.vector.extract.v1i32.v8i32(<8 x i32> %a, i64 4)
   store <1 x i32> %c, ptr %y
@@ -79,15 +88,24 @@ define void @extract_v1i32_v8i32_4(ptr %x, ptr %y) {
 }
 
 define void @extract_v1i32_v8i32_5(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v1i32_v8i32_5:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 5
-; CHECK-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v1i32_v8i32_5:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 5
+; CHECK-V-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v1i32_v8i32_5:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, m2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 5
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 1, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <1 x i32> @llvm.vector.extract.v1i32.v8i32(<8 x i32> %a, i64 5)
   store <1 x i32> %c, ptr %y
@@ -95,13 +113,20 @@ define void @extract_v1i32_v8i32_5(ptr %x, ptr %y) {
 }
 
 define void @extract_v2i32_v8i32_0(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_v8i32_0:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_v8i32_0:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_0:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32> %a, i64 0)
   store <2 x i32> %c, ptr %y
@@ -109,15 +134,24 @@ define void @extract_v2i32_v8i32_0(ptr %x, ptr %y) {
 }
 
 define void @extract_v2i32_v8i32_2(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_v8i32_2:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 2, e32, m1, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 2
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_v8i32_2:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, m1, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 2
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_2:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, m1, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 2
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32> %a, i64 2)
   store <2 x i32> %c, ptr %y
@@ -125,15 +159,24 @@ define void @extract_v2i32_v8i32_2(ptr %x, ptr %y) {
 }
 
 define void @extract_v2i32_v8i32_4(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_v8i32_4:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 4
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_v8i32_4:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 4
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_4:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 4
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32> %a, i64 4)
   store <2 x i32> %c, ptr %y
@@ -141,15 +184,24 @@ define void @extract_v2i32_v8i32_4(ptr %x, ptr %y) {
 }
 
 define void @extract_v2i32_v8i32_6(ptr %x, ptr %y) {
-; CHECK-LABEL: extract_v2i32_v8i32_6:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
-; CHECK-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 6
-; CHECK-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a1)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v2i32_v8i32_6:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vle32.v v8, (a0)
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 6
+; CHECK-V-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a1)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v2i32_v8i32_6:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vl2re32.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, m2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 6
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 2, e32, mf2, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vse32.v v8, (a1)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %a = load <8 x i32>, ptr %x
   %c = call <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32> %a, i64 6)
   store <2 x i32> %c, ptr %y
@@ -271,13 +323,20 @@ define void @extract_v2i8_nxv2i8_6(<vscale x 2 x i8> %x, ptr %y) {
 }
 
 define void @extract_v8i32_nxv16i32_8(<vscale x 16 x i32> %x, ptr %y) {
-; CHECK-LABEL: extract_v8i32_nxv16i32_8:
-; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 8, e32, m4, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v8, 8
-; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
-; CHECK-NEXT:    vse32.v v8, (a0)
-; CHECK-NEXT:    ret
+; CHECK-V-LABEL: extract_v8i32_nxv16i32_8:
+; CHECK-V:       # %bb.0:
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m4, ta, ma
+; CHECK-V-NEXT:    vslidedown.vi v8, v8, 8
+; CHECK-V-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-V-NEXT:    vse32.v v8, (a0)
+; CHECK-V-NEXT:    ret
+;
+; CHECK-KNOWNVLEN128-LABEL: extract_v8i32_nxv16i32_8:
+; CHECK-KNOWNVLEN128:       # %bb.0:
+; CHECK-KNOWNVLEN128-NEXT:    vsetivli zero, 8, e32, m4, ta, ma
+; CHECK-KNOWNVLEN128-NEXT:    vslidedown.vi v8, v8, 8
+; CHECK-KNOWNVLEN128-NEXT:    vs2r.v v8, (a0)
+; CHECK-KNOWNVLEN128-NEXT:    ret
   %c = call <8 x i32> @llvm.vector.extract.v8i32.nxv16i32(<vscale x 16 x i32> %x, i64 8)
   store <8 x i32> %c, ptr %y
   ret void
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
index d3c4b0f5cddd12..06d1ada300a1d2 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
@@ -1141,10 +1141,10 @@ define float @extractelt_fdiv_v4f32(<4 x float> %x) {
 define i32 @extractelt_v16i32_idx7_exact_vlen(ptr %x) nounwind vscale_range(2,2) {
 ; CHECK-LABEL: extractelt_v16i32_idx7_exact_vlen:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    addi a0, a0, 16
+; CHECK-NEXT:    vl1re32.v v8, (a0)
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v9, 3
+; CHECK-NEXT:    vslidedown.vi v8, v8, 3
 ; CHECK-NEXT:    vmv.x.s a0, v8
 ; CHECK-NEXT:    ret
   %a = load <16 x i32>, ptr %x
@@ -1155,10 +1155,10 @@ define i32 @extractelt_v16i32_idx7_exact_vlen(ptr %x) nounwind vscale_range(2,2)
 define i32 @extractelt_v16i32_idx15_exact_vlen(ptr %x) nounwind vscale_range(2,2) {
 ; CHECK-LABEL: extractelt_v16i32_idx15_exact_vlen:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e32, m4, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    addi a0, a0, 48
+; CHECK-NEXT:    vl1re32.v v8, (a0)
 ; CHECK-NEXT:    vsetivli zero, 1, e32, m1, ta, ma
-; CHECK-NEXT:    vslidedown.vi v8, v11, 3
+; CHECK-NEXT:    vslidedown.vi v8, v8, 3
 ; CHECK-NEXT:    vmv.x.s a0, v8
 ; CHECK-NEXT:    ret
   %a = load <16 x i32>, ptr %x
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll
index bcf2b0655b1763..791e6eb5ff300d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-load.ll
@@ -140,8 +140,7 @@ define <6 x i1> @load_v6i1(ptr %p) {
 define <4 x i32> @exact_vlen_i32_m1(ptr %p) vscale_range(2,2) {
 ; CHECK-LABEL: exact_vlen_i32_m1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
-; CHECK-NEXT:    vle32.v v8, (a0)
+; CHECK-NEXT:    vl1re32.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v = load <4 x i32>, ptr %p
   ret <4 x i32> %v
@@ -150,8 +149,7 @@ define <4 x i32> @exact_vlen_i32_m1(ptr %p) vscale_range(2,2) {
 define <16 x i8> @exact_vlen_i8_m1(ptr %p) vscale_range(2,2) {
 ; CHECK-LABEL: exact_vlen_i8_m1:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a0)
+; CHECK-NEXT:    vl1r.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v = load <16 x i8>, ptr %p
   ret <16 x i8> %v
@@ -160,8 +158,7 @@ define <16 x i8> @exact_vlen_i8_m1(ptr %p) vscale_range(2,2) {
 define <32 x i8> @exact_vlen_i8_m2(ptr %p) vscale_range(2,2) {
 ; CHECK-LABEL: exact_vlen_i8_m2:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e8, m2, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a0)
+; CHECK-NEXT:    vl2r.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v = load <32 x i8>, ptr %p
   ret <32 x i8> %v
@@ -170,8 +167,7 @@ define <32 x i8> @exact_vlen_i8_m2(ptr %p) vscale_range(2,2) {
 define <128 x i8> @exact_vlen_i8_m8(ptr %p) vscale_range(2,2) {
 ; CHECK-LABEL: exact_vlen_i8_m8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e8, m8, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a0)
+; CHECK-NEXT:    vl8r.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v = load <128 x i8>, ptr %p
   ret <128 x i8> %v
@@ -180,8 +176,7 @@ define <128 x i8> @exact_vlen_i8_m8(ptr %p) vscale_range(2,2) {
 define <16 x i64> @exact_vlen_i64_m8(ptr %p) vscale_range(2,2) {
 ; CHECK-LABEL: exact_vlen_i64_m8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
-; CHECK-NEXT:    vle64.v v8, (a0)
+; CHECK-NEXT:    vl8re64.v v8, (a0)
 ; CHECK-NEXT:    ret
   %v = load <16 x i64>, ptr %p
   ret <16 x i64> %v
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll
index 8e3ef615428f35..b747d73ce353e2 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-store.ll
@@ -244,7 +244,7 @@ define void @exact_vlen_i32_m1(ptr %p) vscale_range(2,2) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 4, e32, m1, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vse32.v v8, (a0)
+; CHECK-NEXT:    vs1r.v v8, (a0)
 ; CHECK-NEXT:    ret
   store <4 x i32> zeroinitializer, ptr %p
   ret void
@@ -255,7 +255,7 @@ define void @exact_vlen_i8_m1(ptr %p) vscale_range(2,2) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 16, e8, m1, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vse8.v v8, (a0)
+; CHECK-NEXT:    vs1r.v v8, (a0)
 ; CHECK-NEXT:    ret
   store <16 x i8> zeroinitializer, ptr %p
   ret void
@@ -266,7 +266,7 @@ define void @exact_vlen_i8_m2(ptr %p) vscale_range(2,2) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a1, zero, e8, m2, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vse8.v v8, (a0)
+; CHECK-NEXT:    vs2r.v v8, (a0)
 ; CHECK-NEXT:    ret
   store <32 x i8> zeroinitializer, ptr %p
   ret void
@@ -277,7 +277,7 @@ define void @exact_vlen_i8_m8(ptr %p) vscale_range(2,2) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetvli a1, zero, e8, m8, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vse8.v v8, (a0)
+; CHECK-NEXT:    vs8r.v v8, (a0)
 ; CHECK-NEXT:    ret
   store <128 x i8> zeroinitializer, ptr %p
   ret void
@@ -288,7 +288,7 @@ define void @exact_vlen_i64_m8(ptr %p) vscale_range(2,2) {
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
 ; CHECK-NEXT:    vmv.v.i v8, 0
-; CHECK-NEXT:    vse64.v v8, (a0)
+; CHECK-NEXT:    vs8r.v v8, (a0)
 ; CHECK-NEXT:    ret
   store <16 x i64> zeroinitializer, ptr %p
   ret void
diff --git a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
index a79180ed45b42b..1dc8e842de594f 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vsetvli-insert.ll
@@ -598,11 +598,11 @@ bb:
 define void @add_v128i8(ptr %x, ptr %y) vscale_range(2,2) {
 ; CHECK-LABEL: add_v128i8:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a2, zero, e8, m8, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a0)
-; CHECK-NEXT:    vle8.v v16, (a1)
+; CHECK-NEXT:    vl8r.v v8, (a0)
+; CHECK-NEXT:    vl8r.v v16, (a1)
+; CHECK-NEXT:    vsetvli a1, zero, e8, m8, ta, ma
 ; CHECK-NEXT:    vadd.vv v8, v8, v16
-; CHECK-NEXT:    vse8.v v8, (a0)
+; CHECK-NEXT:    vs8r.v v8, (a0)
 ; CHECK-NEXT:    ret
   %a = load <128 x i8>, ptr %x
   %b = load <128 x i8>, ptr %y
@@ -614,11 +614,11 @@ define void @add_v128i8(ptr %x, ptr %y) vscale_range(2,2) {
 define void @add_v16i64(ptr %x, ptr %y) vscale_range(2,2) {
 ; CHECK-LABEL: add_v16i64:
 ; CHECK:       # %bb.0:
+; CHECK-NEXT:    vl8re64.v v8, (a0)
+; CHECK-NEXT:    vl8re64.v v16, (a1)
 ; CHECK-NEXT:    vsetivli zero, 16, e64, m8, ta, ma
-; CHECK-NEXT:    vle64.v v8, (a0)
-; CHECK-NEXT:    vle64.v v16, (a1)
 ; CHECK-NEXT:    vadd.vv v8, v8, v16
-; CHECK-NEXT:    vse64.v v8, (a0)
+; CHECK-NEXT:    vs8r.v v8, (a0)
 ; CHECK-NEXT:    ret
   %a = load <16 x i64>, ptr %x
   %b = load <16 x i64>, ptr %y

Copy link

github-actions bot commented Dec 14, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 7537c3c452df36447719291bebc5589bdd5d1501 acfa65a0e397af8e42612353fdda3d2116f1cbf9 -- llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp llvm/lib/Target/RISCV/RISCVISelLowering.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 4a8ff73ec4..64869d838e 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -9844,9 +9844,8 @@ RISCVTargetLowering::lowerFixedLengthVectorLoadToRVV(SDValue Op,
       RISCVTargetLowering::computeVLMAXBounds(ContainerVT, Subtarget);
   if (MinVLMAX == MaxVLMAX && MinVLMAX == VT.getVectorNumElements() &&
       getLMUL1VT(ContainerVT).bitsLE(ContainerVT)) {
-    SDValue NewLoad =
-        DAG.getLoad(ContainerVT, DL, Load->getChain(), Load->getBasePtr(),
-                    Load->getMemOperand());
+    SDValue NewLoad = DAG.getLoad(ContainerVT, DL, Load->getChain(),
+                                  Load->getBasePtr(), Load->getMemOperand());
     SDValue Result = convertFromScalableVector(VT, NewLoad, DAG, Subtarget);
     return DAG.getMergeValues({Result, NewLoad.getValue(1)}, DL);
   }
@@ -9898,7 +9897,6 @@ RISCVTargetLowering::lowerFixedLengthVectorStoreToRVV(SDValue Op,
   SDValue NewValue =
       convertToScalableVector(ContainerVT, StoreVal, DAG, Subtarget);
 
-
   // If we know the exact VLEN and our fixed length vector completely fills
   // the container, use a whole register store instead.
   const auto [MinVLMAX, MaxVLMAX] =
@@ -9908,8 +9906,8 @@ RISCVTargetLowering::lowerFixedLengthVectorStoreToRVV(SDValue Op,
     return DAG.getStore(Store->getChain(), DL, NewValue, Store->getBasePtr(),
                         Store->getMemOperand());
 
-  SDValue VL = getVLOp(VT.getVectorNumElements(), ContainerVT, DL, DAG,
-                       Subtarget);
+  SDValue VL =
+      getVLOp(VT.getVectorNumElements(), ContainerVT, DL, DAG, Subtarget);
 
   bool IsMaskOp = VT.getVectorElementType() == MVT::i1;
   SDValue IntID = DAG.getTargetConstant(

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@preames preames merged commit e8a15ec into llvm:main Dec 15, 2023
3 of 4 checks passed
@preames preames deleted the pr-exact-vlen-whole-register-load-store-for-fv branch December 15, 2023 17:27
if (MinVLMAX == MaxVLMAX && MinVLMAX == VT.getVectorNumElements() &&
getLMUL1VT(ContainerVT).bitsLE(ContainerVT))
return DAG.getStore(Store->getChain(), DL, NewValue, Store->getBasePtr(),
Store->getMemOperand());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing a miscompile on SPEC CPU 2017 502.gcc_r when compiled with -march=rv64gv -mrvv-vector-bits=zvl -O3 that was bisected back to here e8a15ec. It seems to be caused by this store part and not the load, only with SEW=64 and LMUL=1.

I don't have any reduced test case to share at the moment, and looking at the code and the test diffs nothing seems to stick out to me. Will keep investigating

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it isn't affected by the actual use of vs1r.v itself, since replacing the patterns to emit vse64.vs still miscompiles.

The resulting diff shows that some of these stores are reordered, and the memory VT of the node also changes from v2i64 to nxv1i64. I'm not sure if these are related.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the problem is the memory type in the memory operand. We leave a fixed size memory operand in place for a whole register (i.e. scalable?) store. Maybe this confuses something in AA? I don't think this is wrong per se, but it might be surprising.

Copy link
Contributor

@lukel97 lukel97 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ultimate cause of the miscompile was due to a dodgy combine on ISD::STORE which incorrectly removed a scalar load/store pair, assuming they didn't alias with a fixed length vector ISD::STORE. There was nothing wrong in this PR, it just happened to trigger it by lowering to an ISD::STORE node instead of a llvm.riscv.vse intrinsic.

Coincidentally on the same day I began investigating this, #83017 was landed which fixes it.

I've added a test case here which illustrates it in more detail: 63725ab

lukel97 added a commit that referenced this pull request Mar 4, 2024
Previously we incorrectly removed the scalar load store pair here assuming it
was dead, when it actually aliased with the memset.  This showed up as a
miscompile on SPEC CPU 2017 when compiling with -mrvv-vector-bits, and was only
triggered by the changes in #75531.  This was fixed in #83017, but this patch
adds a test case for this specific miscompile.

For reference, the incorrect codegen was:

	vsetvli	a1, zero, e8, m4, ta, ma
	vmv.v.i	v8, 0
	vs4r.v	v8, (a0)
	addi	a1, a0, 80
	vsetivli	zero, 16, e8, m1, ta, ma
	vmv.v.i	v8, 0
	vs1r.v	v8, (a1)
	addi	a0, a0, 64
	vs1r.v	v8, (a0)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Mar 4, 2024
Previously we incorrectly removed the scalar load store pair here assuming it
was dead, when it actually aliased with the memset.  This showed up as a
miscompile on SPEC CPU 2017 when compiling with -mrvv-vector-bits, and was only
triggered by the changes in llvm#75531.  This was fixed in llvm#83017, but this patch
adds a test case for this specific miscompile.

For reference, the incorrect codegen was:

	vsetvli	a1, zero, e8, m4, ta, ma
	vmv.v.i	v8, 0
	vs4r.v	v8, (a0)
	addi	a1, a0, 80
	vsetivli	zero, 16, e8, m1, ta, ma
	vmv.v.i	v8, 0
	vs1r.v	v8, (a1)
	addi	a0, a0, 64
	vs1r.v	v8, (a0)

(cherry picked from commit 63725ab)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Mar 13, 2024
Previously we incorrectly removed the scalar load store pair here assuming it
was dead, when it actually aliased with the memset.  This showed up as a
miscompile on SPEC CPU 2017 when compiling with -mrvv-vector-bits, and was only
triggered by the changes in llvm#75531.  This was fixed in llvm#83017, but this patch
adds a test case for this specific miscompile.

For reference, the incorrect codegen was:

	vsetvli	a1, zero, e8, m4, ta, ma
	vmv.v.i	v8, 0
	vs4r.v	v8, (a0)
	addi	a1, a0, 80
	vsetivli	zero, 16, e8, m1, ta, ma
	vmv.v.i	v8, 0
	vs1r.v	v8, (a1)
	addi	a0, a0, 64
	vs1r.v	v8, (a0)

(cherry picked from commit 63725ab)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants