[RISCV] Use LMUL=1 for vmv_s_x_vl with non-undef passthru #66659

lukel97 · 2023-09-18T15:52:44Z

We currently shrink the type of vmv_s_x_vl to LMUL=1 when its passthru is
undef to avoid constraining the register allocator since it ignores LMUL.
This patch relaxes it for non-undef passthrus, which occurs when lowering
insert_vector_elt.

llvmbot · 2023-09-18T15:53:52Z

@llvm/pr-subscribers-backend-risc-v

Changes

We currently shrink the type of vmv_s_x_vl to LMUL=1 when its passthru is
undef to avoid constraining the register allocator since it ignores LMUL.
This patch relaxes it for non-undef passthrus, which occurs when lowering
insert_vector_elt.

Patch is 37.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66659.diff

9 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+7-4)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll (+6-6)
(modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+17-17)
(modified) llvm/test/CodeGen/RISCV/rvv/insertelt-fp.ll (+9-9)
(modified) llvm/test/CodeGen/RISCV/rvv/insertelt-int-rv32.ll (+11-11)
(modified) llvm/test/CodeGen/RISCV/rvv/insertelt-int-rv64.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/vmv.s.x-rv32.ll (+9-9)
(modified) llvm/test/CodeGen/RISCV/rvv/vmv.s.x-rv64.ll (+12-12)
(modified) llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll (+3-2)

diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 52b19ce7a228dbe..c5c98a3183905cf 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -14328,11 +14328,14 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
 
     // Use M1 or smaller to avoid over constraining register allocation
     const MVT M1VT = getLMUL1VT(VT);
-    if (M1VT.bitsLT(VT) && Passthru.isUndef()) {
+    if (M1VT.bitsLT(VT)) {
+      SDValue M1Passthru =
+          DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, M1VT, Passthru,
+                      DAG.getVectorIdxConstant(0, DL));
       SDValue Result =
-        DAG.getNode(N->getOpcode(), DL, M1VT, Passthru, Scalar, VL);
-      Result = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, VT, DAG.getUNDEF(VT),
-                           Result, DAG.getConstant(0, DL, XLenVT));
+          DAG.getNode(N->getOpcode(), DL, M1VT, M1Passthru, Scalar, VL);
+      Result = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, VT, Passthru, Result,
+                           DAG.getConstant(0, DL, XLenVT));
       return Result;
     }
 
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll
index 373a96356a207e2..0ae90cdc237f6a5 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll
@@ -40,7 +40,7 @@ define <32 x i32> @insertelt_v32i32_0(<32 x i32> %a, i32 %y) {
 ; CHECK-LABEL: insertelt_v32i32_0:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a1, 32
-; CHECK-NEXT:    vsetvli zero, a1, e32, m8, tu, ma
+; CHECK-NEXT:    vsetvli zero, a1, e32, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %b = insertelement <32 x i32> %a, i32 %y, i32 0
@@ -92,7 +92,7 @@ define <64 x i32> @insertelt_v64i32_0(<64 x i32> %a, i32 %y) {
 ; CHECK-LABEL: insertelt_v64i32_0:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a1, 32
-; CHECK-NEXT:    vsetvli zero, a1, e32, m8, tu, ma
+; CHECK-NEXT:    vsetvli zero, a1, e32, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %b = insertelement <64 x i32> %a, i32 %y, i32 0
@@ -390,7 +390,7 @@ define <8 x i64> @insertelt_v8i64_0(<8 x i64> %a, ptr %x) {
 ; CHECK-LABEL: insertelt_v8i64_0:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a0, -1
-; CHECK-NEXT:    vsetivli zero, 8, e64, m4, tu, ma
+; CHECK-NEXT:    vsetivli zero, 8, e64, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %b = insertelement <8 x i64> %a, i64 -1, i32 0
@@ -468,7 +468,7 @@ define <8 x i64> @insertelt_c6_v8i64_0(<8 x i64> %a, ptr %x) {
 ; CHECK-LABEL: insertelt_c6_v8i64_0:
 ; CHECK:       # %bb.0:
 ; CHECK-NEXT:    li a0, 6
-; CHECK-NEXT:    vsetivli zero, 8, e64, m4, tu, ma
+; CHECK-NEXT:    vsetivli zero, 8, e64, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %b = insertelement <8 x i64> %a, i64 6, i32 0
@@ -550,9 +550,9 @@ define void @insertelt_c6_v8i64_0_add(ptr %x, ptr %y) {
 ; CHECK-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    li a2, 6
-; CHECK-NEXT:    vsetvli zero, zero, e64, m4, tu, ma
+; CHECK-NEXT:    vsetivli zero, 8, e64, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a2
-; CHECK-NEXT:    vsetvli zero, zero, e64, m4, ta, ma
+; CHECK-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
 ; CHECK-NEXT:    vle64.v v12, (a1)
 ; CHECK-NEXT:    vadd.vv v8, v8, v12
 ; CHECK-NEXT:    vse64.v v8, (a0)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
index e19a878187e2c29..a61fda745da1b3a 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
@@ -2424,7 +2424,7 @@ define <8 x i32> @mgather_v8i32(<8 x ptr> %ptrs, <8 x i1> %m, <8 x i32> %passthr
 ; RV64ZVE32F-NEXT:  .LBB34_9: # %cond.load
 ; RV64ZVE32F-NEXT:    ld a2, 0(a0)
 ; RV64ZVE32F-NEXT:    lw a2, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v8, a2
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB34_2
@@ -2518,7 +2518,7 @@ define <8 x i32> @mgather_baseidx_v8i8_v8i32(ptr %base, <8 x i8> %idxs, <8 x i1>
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    lw a2, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v10, a2
 ; RV64ZVE32F-NEXT:  .LBB35_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
@@ -2668,7 +2668,7 @@ define <8 x i32> @mgather_baseidx_sext_v8i8_v8i32(ptr %base, <8 x i8> %idxs, <8
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    lw a2, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v10, a2
 ; RV64ZVE32F-NEXT:  .LBB36_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
@@ -2821,7 +2821,7 @@ define <8 x i32> @mgather_baseidx_zext_v8i8_v8i32(ptr %base, <8 x i8> %idxs, <8
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    lw a2, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v10, a2
 ; RV64ZVE32F-NEXT:  .LBB37_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
@@ -2980,7 +2980,7 @@ define <8 x i32> @mgather_baseidx_v8i16_v8i32(ptr %base, <8 x i16> %idxs, <8 x i
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    lw a2, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v10, a2
 ; RV64ZVE32F-NEXT:  .LBB38_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
@@ -3131,7 +3131,7 @@ define <8 x i32> @mgather_baseidx_sext_v8i16_v8i32(ptr %base, <8 x i16> %idxs, <
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    lw a2, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v10, a2
 ; RV64ZVE32F-NEXT:  .LBB39_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
@@ -3285,7 +3285,7 @@ define <8 x i32> @mgather_baseidx_zext_v8i16_v8i32(ptr %base, <8 x i16> %idxs, <
 ; RV64ZVE32F-NEXT:    slli a3, a3, 2
 ; RV64ZVE32F-NEXT:    add a3, a0, a3
 ; RV64ZVE32F-NEXT:    lw a3, 0(a3)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v10, a3
 ; RV64ZVE32F-NEXT:  .LBB40_2: # %else
 ; RV64ZVE32F-NEXT:    andi a3, a2, 2
@@ -3438,7 +3438,7 @@ define <8 x i32> @mgather_baseidx_v8i32(ptr %base, <8 x i32> %idxs, <8 x i1> %m,
 ; RV64ZVE32F-NEXT:    andi a2, a1, 1
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB41_2
 ; RV64ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v8
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
@@ -8227,7 +8227,7 @@ define <8 x float> @mgather_v8f32(<8 x ptr> %ptrs, <8 x i1> %m, <8 x float> %pas
 ; RV64ZVE32F-NEXT:  .LBB73_9: # %cond.load
 ; RV64ZVE32F-NEXT:    ld a2, 0(a0)
 ; RV64ZVE32F-NEXT:    flw fa5, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vfmv.s.f v8, fa5
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB73_2
@@ -8321,7 +8321,7 @@ define <8 x float> @mgather_baseidx_v8i8_v8f32(ptr %base, <8 x i8> %idxs, <8 x i
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    flw fa5, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vfmv.s.f v10, fa5
 ; RV64ZVE32F-NEXT:  .LBB74_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
@@ -8471,7 +8471,7 @@ define <8 x float> @mgather_baseidx_sext_v8i8_v8f32(ptr %base, <8 x i8> %idxs, <
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    flw fa5, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vfmv.s.f v10, fa5
 ; RV64ZVE32F-NEXT:  .LBB75_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
@@ -8624,7 +8624,7 @@ define <8 x float> @mgather_baseidx_zext_v8i8_v8f32(ptr %base, <8 x i8> %idxs, <
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    flw fa5, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vfmv.s.f v10, fa5
 ; RV64ZVE32F-NEXT:  .LBB76_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
@@ -8783,7 +8783,7 @@ define <8 x float> @mgather_baseidx_v8i16_v8f32(ptr %base, <8 x i16> %idxs, <8 x
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    flw fa5, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vfmv.s.f v10, fa5
 ; RV64ZVE32F-NEXT:  .LBB77_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
@@ -8934,7 +8934,7 @@ define <8 x float> @mgather_baseidx_sext_v8i16_v8f32(ptr %base, <8 x i16> %idxs,
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    flw fa5, 0(a2)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vfmv.s.f v10, fa5
 ; RV64ZVE32F-NEXT:  .LBB78_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
@@ -9088,7 +9088,7 @@ define <8 x float> @mgather_baseidx_zext_v8i16_v8f32(ptr %base, <8 x i16> %idxs,
 ; RV64ZVE32F-NEXT:    slli a3, a3, 2
 ; RV64ZVE32F-NEXT:    add a3, a0, a3
 ; RV64ZVE32F-NEXT:    flw fa5, 0(a3)
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vfmv.s.f v10, fa5
 ; RV64ZVE32F-NEXT:  .LBB79_2: # %else
 ; RV64ZVE32F-NEXT:    andi a3, a2, 2
@@ -9241,7 +9241,7 @@ define <8 x float> @mgather_baseidx_v8f32(ptr %base, <8 x i32> %idxs, <8 x i1> %
 ; RV64ZVE32F-NEXT:    andi a2, a1, 1
 ; RV64ZVE32F-NEXT:    beqz a2, .LBB80_2
 ; RV64ZVE32F-NEXT:  # %bb.1: # %cond.load
-; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetivli zero, 8, e32, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.x.s a2, v8
 ; RV64ZVE32F-NEXT:    slli a2, a2, 2
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
@@ -12382,7 +12382,7 @@ define <32 x i8> @mgather_baseidx_v32i8(ptr %base, <32 x i8> %idxs, <32 x i1> %m
 ; RV64ZVE32F-NEXT:    add a2, a0, a2
 ; RV64ZVE32F-NEXT:    lbu a2, 0(a2)
 ; RV64ZVE32F-NEXT:    li a3, 32
-; RV64ZVE32F-NEXT:    vsetvli zero, a3, e8, m2, tu, ma
+; RV64ZVE32F-NEXT:    vsetvli zero, a3, e8, m1, tu, ma
 ; RV64ZVE32F-NEXT:    vmv.s.x v10, a2
 ; RV64ZVE32F-NEXT:  .LBB98_2: # %else
 ; RV64ZVE32F-NEXT:    andi a2, a1, 2
diff --git a/llvm/test/CodeGen/RISCV/rvv/insertelt-fp.ll b/llvm/test/CodeGen/RISCV/rvv/insertelt-fp.ll
index 4bd9f7befa52a1c..a2c58931a87fef4 100644
--- a/llvm/test/CodeGen/RISCV/rvv/insertelt-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/insertelt-fp.ll
@@ -109,7 +109,7 @@ define <vscale x 4 x half> @insertelt_nxv4f16_idx(<vscale x 4 x half> %v, half %
 define <vscale x 8 x half> @insertelt_nxv8f16_0(<vscale x 8 x half> %v, half %elt) {
 ; CHECK-LABEL: insertelt_nxv8f16_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m2, tu, ma
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, tu, ma
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 8 x half> %v, half %elt, i32 0
@@ -143,7 +143,7 @@ define <vscale x 8 x half> @insertelt_nxv8f16_idx(<vscale x 8 x half> %v, half %
 define <vscale x 16 x half> @insertelt_nxv16f16_0(<vscale x 16 x half> %v, half %elt) {
 ; CHECK-LABEL: insertelt_nxv16f16_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m4, tu, ma
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, tu, ma
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 16 x half> %v, half %elt, i32 0
@@ -177,7 +177,7 @@ define <vscale x 16 x half> @insertelt_nxv16f16_idx(<vscale x 16 x half> %v, hal
 define <vscale x 32 x half> @insertelt_nxv32f16_0(<vscale x 32 x half> %v, half %elt) {
 ; CHECK-LABEL: insertelt_nxv32f16_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e16, m8, tu, ma
+; CHECK-NEXT:    vsetvli a0, zero, e16, m1, tu, ma
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 32 x half> %v, half %elt, i32 0
@@ -279,7 +279,7 @@ define <vscale x 2 x float> @insertelt_nxv2f32_idx(<vscale x 2 x float> %v, floa
 define <vscale x 4 x float> @insertelt_nxv4f32_0(<vscale x 4 x float> %v, float %elt) {
 ; CHECK-LABEL: insertelt_nxv4f32_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m2, tu, ma
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, tu, ma
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 4 x float> %v, float %elt, i32 0
@@ -313,7 +313,7 @@ define <vscale x 4 x float> @insertelt_nxv4f32_idx(<vscale x 4 x float> %v, floa
 define <vscale x 8 x float> @insertelt_nxv8f32_0(<vscale x 8 x float> %v, float %elt) {
 ; CHECK-LABEL: insertelt_nxv8f32_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m4, tu, ma
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, tu, ma
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 8 x float> %v, float %elt, i32 0
@@ -347,7 +347,7 @@ define <vscale x 8 x float> @insertelt_nxv8f32_idx(<vscale x 8 x float> %v, floa
 define <vscale x 16 x float> @insertelt_nxv16f32_0(<vscale x 16 x float> %v, float %elt) {
 ; CHECK-LABEL: insertelt_nxv16f32_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e32, m8, tu, ma
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, tu, ma
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 16 x float> %v, float %elt, i32 0
@@ -415,7 +415,7 @@ define <vscale x 1 x double> @insertelt_nxv1f64_idx(<vscale x 1 x double> %v, do
 define <vscale x 2 x double> @insertelt_nxv2f64_0(<vscale x 2 x double> %v, double %elt) {
 ; CHECK-LABEL: insertelt_nxv2f64_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e64, m2, tu, ma
+; CHECK-NEXT:    vsetvli a0, zero, e64, m1, tu, ma
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 2 x double> %v, double %elt, i32 0
@@ -449,7 +449,7 @@ define <vscale x 2 x double> @insertelt_nxv2f64_idx(<vscale x 2 x double> %v, do
 define <vscale x 4 x double> @insertelt_nxv4f64_0(<vscale x 4 x double> %v, double %elt) {
 ; CHECK-LABEL: insertelt_nxv4f64_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e64, m4, tu, ma
+; CHECK-NEXT:    vsetvli a0, zero, e64, m1, tu, ma
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 4 x double> %v, double %elt, i32 0
@@ -483,7 +483,7 @@ define <vscale x 4 x double> @insertelt_nxv4f64_idx(<vscale x 4 x double> %v, do
 define <vscale x 8 x double> @insertelt_nxv8f64_0(<vscale x 8 x double> %v, double %elt) {
 ; CHECK-LABEL: insertelt_nxv8f64_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a0, zero, e64, m8, tu, ma
+; CHECK-NEXT:    vsetvli a0, zero, e64, m1, tu, ma
 ; CHECK-NEXT:    vfmv.s.f v8, fa0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 8 x double> %v, double %elt, i32 0
diff --git a/llvm/test/CodeGen/RISCV/rvv/insertelt-int-rv32.ll b/llvm/test/CodeGen/RISCV/rvv/insertelt-int-rv32.ll
index 39f94eab2aa6606..767f1c71350352c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/insertelt-int-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/insertelt-int-rv32.ll
@@ -141,7 +141,7 @@ define <vscale x 8 x i8> @insertelt_nxv8i8_idx(<vscale x 8 x i8> %v, i8 signext
 define <vscale x 16 x i8> @insertelt_nxv16i8_0(<vscale x 16 x i8> %v, i8 signext %elt) {
 ; CHECK-LABEL: insertelt_nxv16i8_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e8, m2, tu, ma
+; CHECK-NEXT:    vsetvli a1, zero, e8, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 16 x i8> %v, i8 %elt, i32 0
@@ -175,7 +175,7 @@ define <vscale x 16 x i8> @insertelt_nxv16i8_idx(<vscale x 16 x i8> %v, i8 signe
 define <vscale x 32 x i8> @insertelt_nxv32i8_0(<vscale x 32 x i8> %v, i8 signext %elt) {
 ; CHECK-LABEL: insertelt_nxv32i8_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e8, m4, tu, ma
+; CHECK-NEXT:    vsetvli a1, zero, e8, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 32 x i8> %v, i8 %elt, i32 0
@@ -209,7 +209,7 @@ define <vscale x 32 x i8> @insertelt_nxv32i8_idx(<vscale x 32 x i8> %v, i8 signe
 define <vscale x 64 x i8> @insertelt_nxv64i8_0(<vscale x 64 x i8> %v, i8 signext %elt) {
 ; CHECK-LABEL: insertelt_nxv64i8_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e8, m8, tu, ma
+; CHECK-NEXT:    vsetvli a1, zero, e8, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 64 x i8> %v, i8 %elt, i32 0
@@ -345,7 +345,7 @@ define <vscale x 4 x i16> @insertelt_nxv4i16_idx(<vscale x 4 x i16> %v, i16 sign
 define <vscale x 8 x i16> @insertelt_nxv8i16_0(<vscale x 8 x i16> %v, i16 signext %elt) {
 ; CHECK-LABEL: insertelt_nxv8i16_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e16, m2, tu, ma
+; CHECK-NEXT:    vsetvli a1, zero, e16, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 8 x i16> %v, i16 %elt, i32 0
@@ -379,7 +379,7 @@ define <vscale x 8 x i16> @insertelt_nxv8i16_idx(<vscale x 8 x i16> %v, i16 sign
 define <vscale x 16 x i16> @insertelt_nxv16i16_0(<vscale x 16 x i16> %v, i16 signext %elt) {
 ; CHECK-LABEL: insertelt_nxv16i16_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e16, m4, tu, ma
+; CHECK-NEXT:    vsetvli a1, zero, e16, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 16 x i16> %v, i16 %elt, i32 0
@@ -413,7 +413,7 @@ define <vscale x 16 x i16> @insertelt_nxv16i16_idx(<vscale x 16 x i16> %v, i16 s
 define <vscale x 32 x i16> @insertelt_nxv32i16_0(<vscale x 32 x i16> %v, i16 signext %elt) {
 ; CHECK-LABEL: insertelt_nxv32i16_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e16, m8, tu, ma
+; CHECK-NEXT:    vsetvli a1, zero, e16, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 32 x i16> %v, i16 %elt, i32 0
@@ -515,7 +515,7 @@ define <vscale x 2 x i32> @insertelt_nxv2i32_idx(<vscale x 2 x i32> %v, i32 %elt
 define <vscale x 4 x i32> @insertelt_nxv4i32_0(<vscale x 4 x i32> %v, i32 %elt) {
 ; CHECK-LABEL: insertelt_nxv4i32_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e32, m2, tu, ma
+; CHECK-NEXT:    vsetvli a1, zero, e32, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 4 x i32> %v, i32 %elt, i32 0
@@ -549,7 +549,7 @@ define <vscale x 4 x i32> @insertelt_nxv4i32_idx(<vscale x 4 x i32> %v, i32 %elt
 define <vscale x 8 x i32> @insertelt_nxv8i32_0(<vscale x 8 x i32> %v, i32 %elt) {
 ; CHECK-LABEL: insertelt_nxv8i32_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e32, m4, tu, ma
+; CHECK-NEXT:    vsetvli a1, zero, e32, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 8 x i32> %v, i32 %elt, i32 0
@@ -583,7 +583,7 @@ define <vscale x 8 x i32> @insertelt_nxv8i32_idx(<vscale x 8 x i32> %v, i32 %elt
 define <vscale x 16 x i32> @insertelt_nxv16i32_0(<vscale x 16 x i32> %v, i32 %elt) {
 ; CHECK-LABEL: insertelt_nxv16i32_0:
 ; CHECK:       # %bb.0:
-; CHECK-NEXT:    vsetvli a1, zero, e32, m8, tu, ma
+; CHECK-NEXT:    vsetvli a1, zero, e32, m1, tu, ma
 ; CHECK-NEXT:    vmv.s.x v8, a0
 ; CHECK-NEXT:    ret
   %r = insertelement <vscale x 16 ...
[truncated]

preames · 2023-09-20T16:17:45Z

llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.ll

@@ -810,9 +810,10 @@ for.end:                                          ; preds = %for.body
 define <vscale x 4 x i32> @cross_block_mutate(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b,
 ; CHECK-LABEL: cross_block_mutate:
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    vsetivli a0, 6, e32, m2, tu, ma
+; CHECK-NEXT:    vsetivli a0, 6, e16, m1, ta, ma
+; CHECK-NEXT:    vsetvli a1, zero, e32, m1, tu, ma


Here's another instance of a forced VL toggle.

preames · 2023-09-20T16:21:27Z

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll

@@ -550,9 +550,9 @@ define void @insertelt_c6_v8i64_0_add(ptr %x, ptr %y) {
 ; CHECK-NEXT:    vsetivli zero, 8, e64, m4, ta, ma
 ; CHECK-NEXT:    vle64.v v8, (a0)
 ; CHECK-NEXT:    li a2, 6
-; CHECK-NEXT:    vsetvli zero, zero, e64, m4, tu, ma
+; CHECK-NEXT:    vsetivli zero, 8, e64, m1, tu, ma


Continuing from our chat offline... Ah, here's the problem I hit when I tried this before.

8 is not a valid VL for m1. Trying to set VL=8 will result in a VL=2 on zvl126b hardware. This is correct for the vmv.s.x, but forces two pairs of VL toggles in this bit of assembly. (We'd previously had two VTYPE toggles, but VTYPE is cheaper than VL.)

This isn't a correctness issue, but it does raise a question of profitability. Is it worth two VL toggles to use a narrower LMUL here? I'm not sure it is.

Alternatively, can we reasonably strengthen InsertVSETVLI to undo the damage here? Maybe we should allow the m4 to slide over the vmv.s.x if doing so eliminates the VL toggle?

Isn't VL is already 8 here? Maybe this is just an issue with InsertVSETVLI not using the vsetvli zero, zero form?

Did a bit of digging, InsertVSETVLI has to set VL again because VLMAX has changed with LMUL. Allowing the m4 to slide over the vmv.s.x sounds like a good path to take here.

Edit: rereading your earlier comment, you've said the exact same thing. My brain is playing catch up here

Normally the m4 would already slide over the vmv.s.x here because we explicitly mark LMUL and VL as not demanded (we just check that VL has the same zero-ness). I think the reason as to why this the vsetvlis aren't being merged is because of the tail policy switch.

This old patch that relaxed TA -> TU eliminates the toggle for this test (but it's overall not profitable for OoO processors): https://reviews.llvm.org/differential/changeset/?ref=4362822

For this specific test case though, should we not be doing this as a vmv.v.i with VL=1 anyway?

I think there's a legality issue here. Since the vmv.s.x has a register class which only requires m1, it could chose an unaligned (for m4) register. This would result in an illegal instruction if we allow the m4 to forward over the m1 vmv.s.x. We'd have to change the register class constraints to do this legally.

I think I'm missing something here. Are you saying that we don't currently allow a m1 vmv.s.x to be turned into a m4 vmv.s.x?

If we construct this example below where the vadd.vi has m4 (%1:vrm4), and the vmv.s.x has m1 (%3:vr)

--- name: lmul_merge body: | bb.0: liveins: $v0, $x1, $x2 %vl:gprnox0 = COPY $x1 %0:vrm4 = COPY $v0 %pt0:vrm4 = IMPLICIT_DEF %1:vrm4 = PseudoVADD_VI_M4 %pt0, %0, 1, %vl, 6, 0 %2:gpr = COPY $x2 %pt1:vr = IMPLICIT_DEF %3:vr = PseudoVMV_S_X_M1 %pt1, %2, %vl, 6 PseudoRET

Running llc -o - -run-pass=riscv-insert-vsetvli -mtriple=riscv64 -mattr=+v -verify-machineinstrs, it looks like we do relax the LMUL for vmv.s.x:

body: | bb.0: liveins: $v0, $x1, $x2 %vl:gprnox0 = COPY $x1 %1:vrm4 = COPY $v0 %pt0:vrm4 = IMPLICIT_DEF dead $x0 = PseudoVSETVLI %vl, 218 /* e64, m4, ta, ma */, implicit-def $vl, implicit-def $vtype %3:vrm4 = PseudoVADD_VI_M4 %pt0, %1, 1, $noreg, 6 /* e64 */, 0 /* tu, mu */, implicit $vl, implicit $vtype %4:gpr = COPY $x2 %pt1:vr = IMPLICIT_DEF %6:vr = PseudoVMV_S_X_M1 %pt1, %4, $noreg, 6 /* e64 */, implicit $vl, implicit $vtype PseudoRET

I see what you mean about the legality though: Does the register allocator here know not to allocate v1/v2/v3 etc for %6, given its vtype will be M4?

Edit: Or could we interpret "ignores LMUL and vector register groups." from the spec as meaning that vmv.s.x only operates on individual registers and not groups, meaning that vmv.s.x v1, x0 is actually ok when vtype has LMUL=2/4/8?

I see what you mean about the legality though: Does the register allocator here know not to allocate v1/v2/v3 etc for %6, given its vtype will be M4?

I think I'm wrong here and I sent you on a wild goose chase. The specification says for vmv.s.x that "The instructions ignore LMUL and vector register groups.". I think that means these are unconditionally m1 (from an encoding legality perspective) even if LMUL is set to m4.

We currently shrink the type of vmv_s_x_vl to LMUL=1 when its passthru is undef to avoid constraining the register allocator since it ignores LMUL. This patch relaxes it for non-undef passthrus, which occurs when lowering insert_vector_elt.

lukel97 · 2023-10-02T08:52:27Z

Ping
The test diff is smaller now after rebasing on top of main which now contains #66087. It does the same optimisation for the vmv.s.xs generated from insert_vector_elt, and this PR catches the remaining other cases.

preames

LGTM

For anyone reading along, the interesting part of the test diff no longer exists in this review. It was handled by the dependent change which already landed, and was discussed in detail there. What's left is straight forward with no surprising test changes.

topperc · 2023-10-04T17:17:33Z

Have we arrived at a point where we just never reach isel with a VMV_S_X with a register class other than LMUL=1?

preames · 2023-10-04T17:35:29Z

Have we arrived at a point where we just never reach isel with a VMV_S_X with a register class other than LMUL=1?

I think so. Any clean up you see resulting from that? Are there some pseudos we could delete, or are those still required by mca?

topperc · 2023-10-04T17:43:08Z

Have we arrived at a point where we just never reach isel with a VMV_S_X with a register class other than LMUL=1?

I think so. Any clean up you see resulting from that? Are there some pseudos we could delete, or are those still required by mca?

Should we create it with an LMUL=1 type with surrounding subvector ops in the first place instead of fixing it with a DAG combine?

Does mca use the pseudos? @michaelmaitland

lukel97 · 2023-10-09T18:16:51Z

Have we arrived at a point where we just never reach isel with a VMV_S_X with a register class other than LMUL=1?

I think so. Any clean up you see resulting from that? Are there some pseudos we could delete, or are those still required by mca?

I tried this out and it looks like we can get rid of the pseudos. However because it no longer carries information about LMUL, if a vmv.s.x is the first instruction in a basic block then it now has an LMUL of 1 by default. This is fine on its own but we can end up with a VL toggle if subsequent instructions don't have the same SEW/LMUL ratio, e.g:

-; CHECK-NEXT:    vsetvli a0, zero, e32, mf2, ta, ma
+; CHECK-NEXT:    vsetvli a0, zero, e32, m1, ta, ma
 ; CHECK-NEXT:    vmv.s.x v9, zero
-; CHECK-NEXT:    vsetvli zero, zero, e16, mf4, ta, ma
+; CHECK-NEXT:    vsetvli a0, zero, e16, mf4, ta, ma

I think we can remediate this by teaching RISCVInsertVSETVLI to mutate LMUL to match the next SEW/LMUL ratio, provided LMUL or the SEW/LMUL ratio aren't used. That way we end up with the same SEW/LMUL ratio in both and can keep VL intact. I'm working on a patch for this, will post it separately

preames · 2023-10-20T17:58:09Z

Luke, please land this change and move the discussion on reducing the number of pseudos to another PR or issue. It doesn't appear to be blocking the code change in this review, and it'd be good to get it committed.

lukel97 requested review from preames and topperc September 18, 2023 15:52

llvmbot added the backend:RISC-V label Sep 18, 2023

preames reviewed Sep 20, 2023

View reviewed changes

lukel97 force-pushed the vmv.s.x-reduce-lmul branch from 14ab242 to 21281af Compare September 28, 2023 13:32

preames approved these changes Oct 4, 2023

View reviewed changes

lukel97 mentioned this pull request Oct 17, 2023

[RISCV] Adjust LMUL if not used to avoid VL toggle #69259

Closed

lukel97 merged commit b4729f7 into llvm:main Oct 20, 2023
3 checks passed

spupyrev mentioned this pull request Oct 23, 2023

[BOLT] Rename cds to cdsort #69966

Merged

banach-space mentioned this pull request Oct 24, 2023

[mlir][vector] Add scalable vectors to tests for vector.contract #70039

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Use LMUL=1 for vmv_s_x_vl with non-undef passthru #66659

[RISCV] Use LMUL=1 for vmv_s_x_vl with non-undef passthru #66659

lukel97 commented Sep 18, 2023

llvmbot commented Sep 18, 2023

preames Sep 20, 2023

preames Sep 20, 2023

lukel97 Sep 20, 2023

lukel97 Sep 20, 2023 •

edited

Loading

lukel97 Sep 22, 2023

preames Sep 22, 2023

lukel97 Sep 22, 2023 •

edited

Loading

preames Sep 22, 2023

lukel97 commented Oct 2, 2023 •

edited

Loading

preames left a comment

topperc commented Oct 4, 2023

preames commented Oct 4, 2023

topperc commented Oct 4, 2023

lukel97 commented Oct 9, 2023

preames commented Oct 20, 2023

[RISCV] Use LMUL=1 for vmv_s_x_vl with non-undef passthru #66659

[RISCV] Use LMUL=1 for vmv_s_x_vl with non-undef passthru #66659

Conversation

lukel97 commented Sep 18, 2023

llvmbot commented Sep 18, 2023

preames Sep 20, 2023

Choose a reason for hiding this comment

preames Sep 20, 2023

Choose a reason for hiding this comment

lukel97 Sep 20, 2023

Choose a reason for hiding this comment

lukel97 Sep 20, 2023 • edited Loading

Choose a reason for hiding this comment

lukel97 Sep 22, 2023

Choose a reason for hiding this comment

preames Sep 22, 2023

Choose a reason for hiding this comment

lukel97 Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

preames Sep 22, 2023

Choose a reason for hiding this comment

lukel97 commented Oct 2, 2023 • edited Loading

preames left a comment

Choose a reason for hiding this comment

topperc commented Oct 4, 2023

preames commented Oct 4, 2023

topperc commented Oct 4, 2023

lukel97 commented Oct 9, 2023

preames commented Oct 20, 2023

lukel97 Sep 20, 2023 •

edited

Loading

lukel97 Sep 22, 2023 •

edited

Loading

lukel97 commented Oct 2, 2023 •

edited

Loading