-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Use LMUL=1 for vmv_s_x_vl with non-undef passthru #66659
Conversation
@llvm/pr-subscribers-backend-risc-v ChangesWe currently shrink the type of vmv_s_x_vl to LMUL=1 when its passthru is Patch is 37.33 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66659.diff 9 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 52b19ce7a228dbe..c5c98a3183905cf 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -14328,11 +14328,14 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
// Use M1 or smaller to avoid over constraining register allocation
const MVT M1VT = getLMUL1VT(VT);
- if (M1VT.bitsLT(VT) && Passthru.isUndef()) {
+ if (M1VT.bitsLT(VT)) {
+ SDValue M1Passthru =
+ DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, M1VT, Passthru,
+ DAG.getVectorIdxConstant(0, DL));
SDValue Result =
- DAG.getNode(N->getOpcode(), DL, M1VT, Passthru, Scalar, VL);
- Result = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, VT, DAG.getUNDEF(VT),
- Result, DAG.getConstant(0, DL, XLenVT));
+ DAG.getNode(N->getOpcode(), DL, M1VT, M1Passthru, Scalar, VL);
+ Result = DAG.getNode(ISD::INSERT_SUBVECTOR, DL, VT, Passthru, Result,
+ DAG.getConstant(0, DL, XLenVT));
return Result;
}
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll
index 373a96356a207e2..0ae90cdc237f6a5 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-insert.ll
@@ -40,7 +40,7 @@ define <32 x i32> @insertelt_v32i32_0(<32 x i32> %a, i32 %y) {
; CHECK-LABEL: insertelt_v32i32_0:
; CHECK: # %bb.0:
; CHECK-NEXT: li a1, 32
-; CHECK-NEXT: vsetvli zero, a1, e32, m8, tu, ma
+; CHECK-NEXT: vsetvli zero, a1, e32, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%b = insertelement <32 x i32> %a, i32 %y, i32 0
@@ -92,7 +92,7 @@ define <64 x i32> @insertelt_v64i32_0(<64 x i32> %a, i32 %y) {
; CHECK-LABEL: insertelt_v64i32_0:
; CHECK: # %bb.0:
; CHECK-NEXT: li a1, 32
-; CHECK-NEXT: vsetvli zero, a1, e32, m8, tu, ma
+; CHECK-NEXT: vsetvli zero, a1, e32, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%b = insertelement <64 x i32> %a, i32 %y, i32 0
@@ -390,7 +390,7 @@ define <8 x i64> @insertelt_v8i64_0(<8 x i64> %a, ptr %x) {
; CHECK-LABEL: insertelt_v8i64_0:
; CHECK: # %bb.0:
; CHECK-NEXT: li a0, -1
-; CHECK-NEXT: vsetivli zero, 8, e64, m4, tu, ma
+; CHECK-NEXT: vsetivli zero, 8, e64, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%b = insertelement <8 x i64> %a, i64 -1, i32 0
@@ -468,7 +468,7 @@ define <8 x i64> @insertelt_c6_v8i64_0(<8 x i64> %a, ptr %x) {
; CHECK-LABEL: insertelt_c6_v8i64_0:
; CHECK: # %bb.0:
; CHECK-NEXT: li a0, 6
-; CHECK-NEXT: vsetivli zero, 8, e64, m4, tu, ma
+; CHECK-NEXT: vsetivli zero, 8, e64, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%b = insertelement <8 x i64> %a, i64 6, i32 0
@@ -550,9 +550,9 @@ define void @insertelt_c6_v8i64_0_add(ptr %x, ptr %y) {
; CHECK-NEXT: vsetivli zero, 8, e64, m4, ta, ma
; CHECK-NEXT: vle64.v v8, (a0)
; CHECK-NEXT: li a2, 6
-; CHECK-NEXT: vsetvli zero, zero, e64, m4, tu, ma
+; CHECK-NEXT: vsetivli zero, 8, e64, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a2
-; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma
+; CHECK-NEXT: vsetivli zero, 8, e64, m4, ta, ma
; CHECK-NEXT: vle64.v v12, (a1)
; CHECK-NEXT: vadd.vv v8, v8, v12
; CHECK-NEXT: vse64.v v8, (a0)
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
index e19a878187e2c29..a61fda745da1b3a 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll
@@ -2424,7 +2424,7 @@ define <8 x i32> @mgather_v8i32(<8 x ptr> %ptrs, <8 x i1> %m, <8 x i32> %passthr
; RV64ZVE32F-NEXT: .LBB34_9: # %cond.load
; RV64ZVE32F-NEXT: ld a2, 0(a0)
; RV64ZVE32F-NEXT: lw a2, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vmv.s.x v8, a2
; RV64ZVE32F-NEXT: andi a2, a1, 2
; RV64ZVE32F-NEXT: beqz a2, .LBB34_2
@@ -2518,7 +2518,7 @@ define <8 x i32> @mgather_baseidx_v8i8_v8i32(ptr %base, <8 x i8> %idxs, <8 x i1>
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: lw a2, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vmv.s.x v10, a2
; RV64ZVE32F-NEXT: .LBB35_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
@@ -2668,7 +2668,7 @@ define <8 x i32> @mgather_baseidx_sext_v8i8_v8i32(ptr %base, <8 x i8> %idxs, <8
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: lw a2, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vmv.s.x v10, a2
; RV64ZVE32F-NEXT: .LBB36_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
@@ -2821,7 +2821,7 @@ define <8 x i32> @mgather_baseidx_zext_v8i8_v8i32(ptr %base, <8 x i8> %idxs, <8
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: lw a2, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vmv.s.x v10, a2
; RV64ZVE32F-NEXT: .LBB37_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
@@ -2980,7 +2980,7 @@ define <8 x i32> @mgather_baseidx_v8i16_v8i32(ptr %base, <8 x i16> %idxs, <8 x i
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: lw a2, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vmv.s.x v10, a2
; RV64ZVE32F-NEXT: .LBB38_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
@@ -3131,7 +3131,7 @@ define <8 x i32> @mgather_baseidx_sext_v8i16_v8i32(ptr %base, <8 x i16> %idxs, <
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: lw a2, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vmv.s.x v10, a2
; RV64ZVE32F-NEXT: .LBB39_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
@@ -3285,7 +3285,7 @@ define <8 x i32> @mgather_baseidx_zext_v8i16_v8i32(ptr %base, <8 x i16> %idxs, <
; RV64ZVE32F-NEXT: slli a3, a3, 2
; RV64ZVE32F-NEXT: add a3, a0, a3
; RV64ZVE32F-NEXT: lw a3, 0(a3)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vmv.s.x v10, a3
; RV64ZVE32F-NEXT: .LBB40_2: # %else
; RV64ZVE32F-NEXT: andi a3, a2, 2
@@ -3438,7 +3438,7 @@ define <8 x i32> @mgather_baseidx_v8i32(ptr %base, <8 x i32> %idxs, <8 x i1> %m,
; RV64ZVE32F-NEXT: andi a2, a1, 1
; RV64ZVE32F-NEXT: beqz a2, .LBB41_2
; RV64ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vmv.x.s a2, v8
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
@@ -8227,7 +8227,7 @@ define <8 x float> @mgather_v8f32(<8 x ptr> %ptrs, <8 x i1> %m, <8 x float> %pas
; RV64ZVE32F-NEXT: .LBB73_9: # %cond.load
; RV64ZVE32F-NEXT: ld a2, 0(a0)
; RV64ZVE32F-NEXT: flw fa5, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vfmv.s.f v8, fa5
; RV64ZVE32F-NEXT: andi a2, a1, 2
; RV64ZVE32F-NEXT: beqz a2, .LBB73_2
@@ -8321,7 +8321,7 @@ define <8 x float> @mgather_baseidx_v8i8_v8f32(ptr %base, <8 x i8> %idxs, <8 x i
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: flw fa5, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vfmv.s.f v10, fa5
; RV64ZVE32F-NEXT: .LBB74_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
@@ -8471,7 +8471,7 @@ define <8 x float> @mgather_baseidx_sext_v8i8_v8f32(ptr %base, <8 x i8> %idxs, <
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: flw fa5, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vfmv.s.f v10, fa5
; RV64ZVE32F-NEXT: .LBB75_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
@@ -8624,7 +8624,7 @@ define <8 x float> @mgather_baseidx_zext_v8i8_v8f32(ptr %base, <8 x i8> %idxs, <
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: flw fa5, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vfmv.s.f v10, fa5
; RV64ZVE32F-NEXT: .LBB76_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
@@ -8783,7 +8783,7 @@ define <8 x float> @mgather_baseidx_v8i16_v8f32(ptr %base, <8 x i16> %idxs, <8 x
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: flw fa5, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vfmv.s.f v10, fa5
; RV64ZVE32F-NEXT: .LBB77_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
@@ -8934,7 +8934,7 @@ define <8 x float> @mgather_baseidx_sext_v8i16_v8f32(ptr %base, <8 x i16> %idxs,
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: flw fa5, 0(a2)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vfmv.s.f v10, fa5
; RV64ZVE32F-NEXT: .LBB78_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
@@ -9088,7 +9088,7 @@ define <8 x float> @mgather_baseidx_zext_v8i16_v8f32(ptr %base, <8 x i16> %idxs,
; RV64ZVE32F-NEXT: slli a3, a3, 2
; RV64ZVE32F-NEXT: add a3, a0, a3
; RV64ZVE32F-NEXT: flw fa5, 0(a3)
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vfmv.s.f v10, fa5
; RV64ZVE32F-NEXT: .LBB79_2: # %else
; RV64ZVE32F-NEXT: andi a3, a2, 2
@@ -9241,7 +9241,7 @@ define <8 x float> @mgather_baseidx_v8f32(ptr %base, <8 x i32> %idxs, <8 x i1> %
; RV64ZVE32F-NEXT: andi a2, a1, 1
; RV64ZVE32F-NEXT: beqz a2, .LBB80_2
; RV64ZVE32F-NEXT: # %bb.1: # %cond.load
-; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetivli zero, 8, e32, m1, tu, ma
; RV64ZVE32F-NEXT: vmv.x.s a2, v8
; RV64ZVE32F-NEXT: slli a2, a2, 2
; RV64ZVE32F-NEXT: add a2, a0, a2
@@ -12382,7 +12382,7 @@ define <32 x i8> @mgather_baseidx_v32i8(ptr %base, <32 x i8> %idxs, <32 x i1> %m
; RV64ZVE32F-NEXT: add a2, a0, a2
; RV64ZVE32F-NEXT: lbu a2, 0(a2)
; RV64ZVE32F-NEXT: li a3, 32
-; RV64ZVE32F-NEXT: vsetvli zero, a3, e8, m2, tu, ma
+; RV64ZVE32F-NEXT: vsetvli zero, a3, e8, m1, tu, ma
; RV64ZVE32F-NEXT: vmv.s.x v10, a2
; RV64ZVE32F-NEXT: .LBB98_2: # %else
; RV64ZVE32F-NEXT: andi a2, a1, 2
diff --git a/llvm/test/CodeGen/RISCV/rvv/insertelt-fp.ll b/llvm/test/CodeGen/RISCV/rvv/insertelt-fp.ll
index 4bd9f7befa52a1c..a2c58931a87fef4 100644
--- a/llvm/test/CodeGen/RISCV/rvv/insertelt-fp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/insertelt-fp.ll
@@ -109,7 +109,7 @@ define <vscale x 4 x half> @insertelt_nxv4f16_idx(<vscale x 4 x half> %v, half %
define <vscale x 8 x half> @insertelt_nxv8f16_0(<vscale x 8 x half> %v, half %elt) {
; CHECK-LABEL: insertelt_nxv8f16_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e16, m2, tu, ma
+; CHECK-NEXT: vsetvli a0, zero, e16, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
%r = insertelement <vscale x 8 x half> %v, half %elt, i32 0
@@ -143,7 +143,7 @@ define <vscale x 8 x half> @insertelt_nxv8f16_idx(<vscale x 8 x half> %v, half %
define <vscale x 16 x half> @insertelt_nxv16f16_0(<vscale x 16 x half> %v, half %elt) {
; CHECK-LABEL: insertelt_nxv16f16_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e16, m4, tu, ma
+; CHECK-NEXT: vsetvli a0, zero, e16, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
%r = insertelement <vscale x 16 x half> %v, half %elt, i32 0
@@ -177,7 +177,7 @@ define <vscale x 16 x half> @insertelt_nxv16f16_idx(<vscale x 16 x half> %v, hal
define <vscale x 32 x half> @insertelt_nxv32f16_0(<vscale x 32 x half> %v, half %elt) {
; CHECK-LABEL: insertelt_nxv32f16_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e16, m8, tu, ma
+; CHECK-NEXT: vsetvli a0, zero, e16, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
%r = insertelement <vscale x 32 x half> %v, half %elt, i32 0
@@ -279,7 +279,7 @@ define <vscale x 2 x float> @insertelt_nxv2f32_idx(<vscale x 2 x float> %v, floa
define <vscale x 4 x float> @insertelt_nxv4f32_0(<vscale x 4 x float> %v, float %elt) {
; CHECK-LABEL: insertelt_nxv4f32_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e32, m2, tu, ma
+; CHECK-NEXT: vsetvli a0, zero, e32, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
%r = insertelement <vscale x 4 x float> %v, float %elt, i32 0
@@ -313,7 +313,7 @@ define <vscale x 4 x float> @insertelt_nxv4f32_idx(<vscale x 4 x float> %v, floa
define <vscale x 8 x float> @insertelt_nxv8f32_0(<vscale x 8 x float> %v, float %elt) {
; CHECK-LABEL: insertelt_nxv8f32_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e32, m4, tu, ma
+; CHECK-NEXT: vsetvli a0, zero, e32, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
%r = insertelement <vscale x 8 x float> %v, float %elt, i32 0
@@ -347,7 +347,7 @@ define <vscale x 8 x float> @insertelt_nxv8f32_idx(<vscale x 8 x float> %v, floa
define <vscale x 16 x float> @insertelt_nxv16f32_0(<vscale x 16 x float> %v, float %elt) {
; CHECK-LABEL: insertelt_nxv16f32_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e32, m8, tu, ma
+; CHECK-NEXT: vsetvli a0, zero, e32, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
%r = insertelement <vscale x 16 x float> %v, float %elt, i32 0
@@ -415,7 +415,7 @@ define <vscale x 1 x double> @insertelt_nxv1f64_idx(<vscale x 1 x double> %v, do
define <vscale x 2 x double> @insertelt_nxv2f64_0(<vscale x 2 x double> %v, double %elt) {
; CHECK-LABEL: insertelt_nxv2f64_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e64, m2, tu, ma
+; CHECK-NEXT: vsetvli a0, zero, e64, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
%r = insertelement <vscale x 2 x double> %v, double %elt, i32 0
@@ -449,7 +449,7 @@ define <vscale x 2 x double> @insertelt_nxv2f64_idx(<vscale x 2 x double> %v, do
define <vscale x 4 x double> @insertelt_nxv4f64_0(<vscale x 4 x double> %v, double %elt) {
; CHECK-LABEL: insertelt_nxv4f64_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e64, m4, tu, ma
+; CHECK-NEXT: vsetvli a0, zero, e64, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
%r = insertelement <vscale x 4 x double> %v, double %elt, i32 0
@@ -483,7 +483,7 @@ define <vscale x 4 x double> @insertelt_nxv4f64_idx(<vscale x 4 x double> %v, do
define <vscale x 8 x double> @insertelt_nxv8f64_0(<vscale x 8 x double> %v, double %elt) {
; CHECK-LABEL: insertelt_nxv8f64_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a0, zero, e64, m8, tu, ma
+; CHECK-NEXT: vsetvli a0, zero, e64, m1, tu, ma
; CHECK-NEXT: vfmv.s.f v8, fa0
; CHECK-NEXT: ret
%r = insertelement <vscale x 8 x double> %v, double %elt, i32 0
diff --git a/llvm/test/CodeGen/RISCV/rvv/insertelt-int-rv32.ll b/llvm/test/CodeGen/RISCV/rvv/insertelt-int-rv32.ll
index 39f94eab2aa6606..767f1c71350352c 100644
--- a/llvm/test/CodeGen/RISCV/rvv/insertelt-int-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/insertelt-int-rv32.ll
@@ -141,7 +141,7 @@ define <vscale x 8 x i8> @insertelt_nxv8i8_idx(<vscale x 8 x i8> %v, i8 signext
define <vscale x 16 x i8> @insertelt_nxv16i8_0(<vscale x 16 x i8> %v, i8 signext %elt) {
; CHECK-LABEL: insertelt_nxv16i8_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e8, m2, tu, ma
+; CHECK-NEXT: vsetvli a1, zero, e8, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%r = insertelement <vscale x 16 x i8> %v, i8 %elt, i32 0
@@ -175,7 +175,7 @@ define <vscale x 16 x i8> @insertelt_nxv16i8_idx(<vscale x 16 x i8> %v, i8 signe
define <vscale x 32 x i8> @insertelt_nxv32i8_0(<vscale x 32 x i8> %v, i8 signext %elt) {
; CHECK-LABEL: insertelt_nxv32i8_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e8, m4, tu, ma
+; CHECK-NEXT: vsetvli a1, zero, e8, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%r = insertelement <vscale x 32 x i8> %v, i8 %elt, i32 0
@@ -209,7 +209,7 @@ define <vscale x 32 x i8> @insertelt_nxv32i8_idx(<vscale x 32 x i8> %v, i8 signe
define <vscale x 64 x i8> @insertelt_nxv64i8_0(<vscale x 64 x i8> %v, i8 signext %elt) {
; CHECK-LABEL: insertelt_nxv64i8_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e8, m8, tu, ma
+; CHECK-NEXT: vsetvli a1, zero, e8, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%r = insertelement <vscale x 64 x i8> %v, i8 %elt, i32 0
@@ -345,7 +345,7 @@ define <vscale x 4 x i16> @insertelt_nxv4i16_idx(<vscale x 4 x i16> %v, i16 sign
define <vscale x 8 x i16> @insertelt_nxv8i16_0(<vscale x 8 x i16> %v, i16 signext %elt) {
; CHECK-LABEL: insertelt_nxv8i16_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e16, m2, tu, ma
+; CHECK-NEXT: vsetvli a1, zero, e16, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%r = insertelement <vscale x 8 x i16> %v, i16 %elt, i32 0
@@ -379,7 +379,7 @@ define <vscale x 8 x i16> @insertelt_nxv8i16_idx(<vscale x 8 x i16> %v, i16 sign
define <vscale x 16 x i16> @insertelt_nxv16i16_0(<vscale x 16 x i16> %v, i16 signext %elt) {
; CHECK-LABEL: insertelt_nxv16i16_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e16, m4, tu, ma
+; CHECK-NEXT: vsetvli a1, zero, e16, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%r = insertelement <vscale x 16 x i16> %v, i16 %elt, i32 0
@@ -413,7 +413,7 @@ define <vscale x 16 x i16> @insertelt_nxv16i16_idx(<vscale x 16 x i16> %v, i16 s
define <vscale x 32 x i16> @insertelt_nxv32i16_0(<vscale x 32 x i16> %v, i16 signext %elt) {
; CHECK-LABEL: insertelt_nxv32i16_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e16, m8, tu, ma
+; CHECK-NEXT: vsetvli a1, zero, e16, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%r = insertelement <vscale x 32 x i16> %v, i16 %elt, i32 0
@@ -515,7 +515,7 @@ define <vscale x 2 x i32> @insertelt_nxv2i32_idx(<vscale x 2 x i32> %v, i32 %elt
define <vscale x 4 x i32> @insertelt_nxv4i32_0(<vscale x 4 x i32> %v, i32 %elt) {
; CHECK-LABEL: insertelt_nxv4i32_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e32, m2, tu, ma
+; CHECK-NEXT: vsetvli a1, zero, e32, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%r = insertelement <vscale x 4 x i32> %v, i32 %elt, i32 0
@@ -549,7 +549,7 @@ define <vscale x 4 x i32> @insertelt_nxv4i32_idx(<vscale x 4 x i32> %v, i32 %elt
define <vscale x 8 x i32> @insertelt_nxv8i32_0(<vscale x 8 x i32> %v, i32 %elt) {
; CHECK-LABEL: insertelt_nxv8i32_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e32, m4, tu, ma
+; CHECK-NEXT: vsetvli a1, zero, e32, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%r = insertelement <vscale x 8 x i32> %v, i32 %elt, i32 0
@@ -583,7 +583,7 @@ define <vscale x 8 x i32> @insertelt_nxv8i32_idx(<vscale x 8 x i32> %v, i32 %elt
define <vscale x 16 x i32> @insertelt_nxv16i32_0(<vscale x 16 x i32> %v, i32 %elt) {
; CHECK-LABEL: insertelt_nxv16i32_0:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli a1, zero, e32, m8, tu, ma
+; CHECK-NEXT: vsetvli a1, zero, e32, m1, tu, ma
; CHECK-NEXT: vmv.s.x v8, a0
; CHECK-NEXT: ret
%r = insertelement <vscale x 16 ...
[truncated]
|
@@ -810,9 +810,10 @@ for.end: ; preds = %for.body | |||
define <vscale x 4 x i32> @cross_block_mutate(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, | |||
; CHECK-LABEL: cross_block_mutate: | |||
; CHECK: # %bb.0: # %entry | |||
; CHECK-NEXT: vsetivli a0, 6, e32, m2, tu, ma | |||
; CHECK-NEXT: vsetivli a0, 6, e16, m1, ta, ma | |||
; CHECK-NEXT: vsetvli a1, zero, e32, m1, tu, ma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another instance of a forced VL toggle.
@@ -550,9 +550,9 @@ define void @insertelt_c6_v8i64_0_add(ptr %x, ptr %y) { | |||
; CHECK-NEXT: vsetivli zero, 8, e64, m4, ta, ma | |||
; CHECK-NEXT: vle64.v v8, (a0) | |||
; CHECK-NEXT: li a2, 6 | |||
; CHECK-NEXT: vsetvli zero, zero, e64, m4, tu, ma | |||
; CHECK-NEXT: vsetivli zero, 8, e64, m1, tu, ma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuing from our chat offline... Ah, here's the problem I hit when I tried this before.
8 is not a valid VL for m1. Trying to set VL=8 will result in a VL=2 on zvl126b hardware. This is correct for the vmv.s.x, but forces two pairs of VL toggles in this bit of assembly. (We'd previously had two VTYPE toggles, but VTYPE is cheaper than VL.)
This isn't a correctness issue, but it does raise a question of profitability. Is it worth two VL toggles to use a narrower LMUL here? I'm not sure it is.
Alternatively, can we reasonably strengthen InsertVSETVLI to undo the damage here? Maybe we should allow the m4 to slide over the vmv.s.x if doing so eliminates the VL toggle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't VL is already 8 here? Maybe this is just an issue with InsertVSETVLI not using the vsetvli zero, zero
form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a bit of digging, InsertVSETVLI has to set VL again because VLMAX has changed with LMUL. Allowing the m4 to slide over the vmv.s.x sounds like a good path to take here.
Edit: rereading your earlier comment, you've said the exact same thing. My brain is playing catch up here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally the m4 would already slide over the vmv.s.x here because we explicitly mark LMUL and VL as not demanded (we just check that VL has the same zero-ness). I think the reason as to why this the vsetvlis aren't being merged is because of the tail policy switch.
This old patch that relaxed TA -> TU eliminates the toggle for this test (but it's overall not profitable for OoO processors): https://reviews.llvm.org/differential/changeset/?ref=4362822
For this specific test case though, should we not be doing this as a vmv.v.i with VL=1 anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a legality issue here. Since the vmv.s.x has a register class which only requires m1, it could chose an unaligned (for m4) register. This would result in an illegal instruction if we allow the m4 to forward over the m1 vmv.s.x. We'd have to change the register class constraints to do this legally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing something here. Are you saying that we don't currently allow a m1 vmv.s.x to be turned into a m4 vmv.s.x?
If we construct this example below where the vadd.vi has m4 (%1:vrm4), and the vmv.s.x has m1 (%3:vr)
---
name: lmul_merge
body: |
bb.0:
liveins: $v0, $x1, $x2
%vl:gprnox0 = COPY $x1
%0:vrm4 = COPY $v0
%pt0:vrm4 = IMPLICIT_DEF
%1:vrm4 = PseudoVADD_VI_M4 %pt0, %0, 1, %vl, 6, 0
%2:gpr = COPY $x2
%pt1:vr = IMPLICIT_DEF
%3:vr = PseudoVMV_S_X_M1 %pt1, %2, %vl, 6
PseudoRET
Running llc -o - -run-pass=riscv-insert-vsetvli -mtriple=riscv64 -mattr=+v -verify-machineinstrs
, it looks like we do relax the LMUL for vmv.s.x:
body: |
bb.0:
liveins: $v0, $x1, $x2
%vl:gprnox0 = COPY $x1
%1:vrm4 = COPY $v0
%pt0:vrm4 = IMPLICIT_DEF
dead $x0 = PseudoVSETVLI %vl, 218 /* e64, m4, ta, ma */, implicit-def $vl, implicit-def $vtype
%3:vrm4 = PseudoVADD_VI_M4 %pt0, %1, 1, $noreg, 6 /* e64 */, 0 /* tu, mu */, implicit $vl, implicit $vtype
%4:gpr = COPY $x2
%pt1:vr = IMPLICIT_DEF
%6:vr = PseudoVMV_S_X_M1 %pt1, %4, $noreg, 6 /* e64 */, implicit $vl, implicit $vtype
PseudoRET
I see what you mean about the legality though: Does the register allocator here know not to allocate v1/v2/v3 etc for %6, given its vtype will be M4?
Edit: Or could we interpret "ignores LMUL and vector register groups." from the spec as meaning that vmv.s.x
only operates on individual registers and not groups, meaning that vmv.s.x v1, x0
is actually ok when vtype has LMUL=2/4/8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean about the legality though: Does the register allocator here know not to allocate v1/v2/v3 etc for %6, given its vtype will be M4?
I think I'm wrong here and I sent you on a wild goose chase. The specification says for vmv.s.x that "The instructions ignore LMUL and vector register groups.". I think that means these are unconditionally m1 (from an encoding legality perspective) even if LMUL is set to m4.
We currently shrink the type of vmv_s_x_vl to LMUL=1 when its passthru is undef to avoid constraining the register allocator since it ignores LMUL. This patch relaxes it for non-undef passthrus, which occurs when lowering insert_vector_elt.
14ab242
to
21281af
Compare
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
For anyone reading along, the interesting part of the test diff no longer exists in this review. It was handled by the dependent change which already landed, and was discussed in detail there. What's left is straight forward with no surprising test changes.
Have we arrived at a point where we just never reach isel with a VMV_S_X with a register class other than LMUL=1? |
I think so. Any clean up you see resulting from that? Are there some pseudos we could delete, or are those still required by mca? |
Should we create it with an LMUL=1 type with surrounding subvector ops in the first place instead of fixing it with a DAG combine? Does mca use the pseudos? @michaelmaitland |
I tried this out and it looks like we can get rid of the pseudos. However because it no longer carries information about LMUL, if a vmv.s.x is the first instruction in a basic block then it now has an LMUL of 1 by default. This is fine on its own but we can end up with a VL toggle if subsequent instructions don't have the same SEW/LMUL ratio, e.g: -; CHECK-NEXT: vsetvli a0, zero, e32, mf2, ta, ma
+; CHECK-NEXT: vsetvli a0, zero, e32, m1, ta, ma
; CHECK-NEXT: vmv.s.x v9, zero
-; CHECK-NEXT: vsetvli zero, zero, e16, mf4, ta, ma
+; CHECK-NEXT: vsetvli a0, zero, e16, mf4, ta, ma I think we can remediate this by teaching RISCVInsertVSETVLI to mutate LMUL to match the next SEW/LMUL ratio, provided LMUL or the SEW/LMUL ratio aren't used. That way we end up with the same SEW/LMUL ratio in both and can keep VL intact. I'm working on a patch for this, will post it separately |
Luke, please land this change and move the discussion on reducing the number of pseudos to another PR or issue. It doesn't appear to be blocking the code change in this review, and it'd be good to get it committed. |
We currently shrink the type of vmv_s_x_vl to LMUL=1 when its passthru is
undef to avoid constraining the register allocator since it ignores LMUL.
This patch relaxes it for non-undef passthrus, which occurs when lowering
insert_vector_elt.