-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Allow folding vmerge with implicit merge operand when true has tied dest #78565
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesWe currently don't fold a vmerge if it has an implicit merge operand and its true operand has a tied dest (i.e. has a passthru operand). This restriction was added in https://reviews.llvm.org/D151596, back whenever we had separate TU/TA pseudos. It looks like it was added because the policy might not have been handled correctly. However the policy should be set correctly if we relax this restriction today, since we compute the policy differently now that we have removed the TU/TA distinction in our pseudos. We use a TUMU policy, and relax it to TAMU iff the vmerge's merge operand is implicit. The reasoning behind this being that the tail elements always come from the vmerge's merge operand1, so if the merge operand is implicit-def then the tail is implicit-def, and hence tail agnostic. > [!NOTE] Full diff: https://github.com/llvm/llvm-project/pull/78565.diff 3 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 0d8688ba2eaeaff..5b20a2014cc98ca 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3531,11 +3531,6 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
return false;
if (HasTiedDest && !isImplicitDef(True->getOperand(0))) {
- // The vmerge instruction must be TU.
- // FIXME: This could be relaxed, but we need to handle the policy for the
- // resulting op correctly.
- if (isImplicitDef(Merge))
- return false;
SDValue MergeOpTrue = True->getOperand(0);
// Both the vmerge instruction and the True instruction must have the same
// merge operand.
@@ -3545,9 +3540,6 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
if (IsMasked) {
assert(HasTiedDest && "Expected tied dest");
- // The vmerge instruction must be TU.
- if (isImplicitDef(Merge))
- return false;
// The vmerge instruction must have an all 1s mask since we're going to keep
// the mask from the True instruction.
// FIXME: Support mask agnostic True instruction which would have an
diff --git a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
index c639f092444fc43..b3166bf5b01071d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
@@ -1187,3 +1187,41 @@ define <vscale x 2 x i32> @vmerge_larger_vl_false_becomes_tail(<vscale x 2 x i32
%b = call <vscale x 2 x i32> @llvm.riscv.vmerge.nxv2i32.nxv2i32(<vscale x 2 x i32> poison, <vscale x 2 x i32> %false, <vscale x 2 x i32> %a, <vscale x 2 x i1> %m, i64 3)
ret <vscale x 2 x i32> %b
}
+
+declare <vscale x 2 x i32> @llvm.riscv.vmacc.nxv2i32.nxv2i32(<vscale x 2 x i32>, <vscale x 2 x i32>, <vscale x 2 x i32>, i64, i64)
+
+define <vscale x 2 x i32> @true_tied_dest_vmerge_implicit_passthru(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %x, <vscale x 2 x i32> %y, <vscale x 2 x i1> %m, i64 %avl) {
+; CHECK-LABEL: true_tied_dest_vmerge_implicit_passthru:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, ta, mu
+; CHECK-NEXT: vmacc.vv v8, v9, v10, v0.t
+; CHECK-NEXT: ret
+ %a = call <vscale x 2 x i32> @llvm.riscv.vmacc.nxv2i32.nxv2i32(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %x, <vscale x 2 x i32> %y, i64 %avl, i64 0)
+ %b = call <vscale x 2 x i32> @llvm.riscv.vmerge.nxv2i32.nxv2i32(
+ <vscale x 2 x i32> poison,
+ <vscale x 2 x i32> %passthru,
+ <vscale x 2 x i32> %a,
+ <vscale x 2 x i1> %m,
+ i64 %avl
+ )
+ ret <vscale x 2 x i32> %b
+}
+
+declare <vscale x 2 x i32> @llvm.riscv.vadd.mask.nxv2i32.i32(<vscale x 2 x i32>, <vscale x 2 x i32>, i32, <vscale x 2 x i1>, i64, i64)
+
+define <vscale x 2 x i32> @true_mask_vmerge_implicit_passthru(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %x, <vscale x 2 x i32> %y, <vscale x 2 x i1> %m, i64 %avl) {
+; CHECK-LABEL: true_mask_vmerge_implicit_passthru:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, ta, mu
+; CHECK-NEXT: vadd.vv v8, v9, v10, v0.t
+; CHECK-NEXT: ret
+ %a = call <vscale x 2 x i32> @llvm.riscv.vadd.mask.nxv2i32.nxv2i32(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %x, <vscale x 2 x i32> %y, <vscale x 2 x i1> %m, i64 %avl, i64 0)
+ %b = call <vscale x 2 x i32> @llvm.riscv.vmerge.nxv2i32.nxv2i32(
+ <vscale x 2 x i32> poison,
+ <vscale x 2 x i32> %passthru,
+ <vscale x 2 x i32> %a,
+ <vscale x 2 x i1> shufflevector(<vscale x 2 x i1> insertelement(<vscale x 2 x i1> poison, i1 true, i32 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer),
+ i64 %avl
+ )
+ ret <vscale x 2 x i32> %b
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll
index 6a6d7d2a41424c5..d699ee3f1dc29fc 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll
@@ -91,9 +91,8 @@ define <vscale x 1 x i8> @vmadd_vv_nxv1i8_ta(<vscale x 1 x i8> %a, <vscale x 1 x
define <vscale x 1 x i8> @vmadd_vx_nxv1i8_ta(<vscale x 1 x i8> %a, i8 %b, <vscale x 1 x i8> %c, <vscale x 1 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv1i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, mf8, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 1 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 1 x i8> %elt.head, <vscale x 1 x i8> poison, <vscale x 1 x i32> zeroinitializer
@@ -192,9 +191,8 @@ define <vscale x 2 x i8> @vmadd_vv_nxv2i8_ta(<vscale x 2 x i8> %a, <vscale x 2 x
define <vscale x 2 x i8> @vmadd_vx_nxv2i8_ta(<vscale x 2 x i8> %a, i8 %b, <vscale x 2 x i8> %c, <vscale x 2 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv2i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, mf4, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 2 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 2 x i8> %elt.head, <vscale x 2 x i8> poison, <vscale x 2 x i32> zeroinitializer
@@ -293,9 +291,8 @@ define <vscale x 4 x i8> @vmadd_vv_nxv4i8_ta(<vscale x 4 x i8> %a, <vscale x 4 x
define <vscale x 4 x i8> @vmadd_vx_nxv4i8_ta(<vscale x 4 x i8> %a, i8 %b, <vscale x 4 x i8> %c, <vscale x 4 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv4i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, mf2, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, mf2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 4 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 4 x i8> %elt.head, <vscale x 4 x i8> poison, <vscale x 4 x i32> zeroinitializer
@@ -394,9 +391,8 @@ define <vscale x 8 x i8> @vmadd_vv_nxv8i8_ta(<vscale x 8 x i8> %a, <vscale x 8 x
define <vscale x 8 x i8> @vmadd_vx_nxv8i8_ta(<vscale x 8 x i8> %a, i8 %b, <vscale x 8 x i8> %c, <vscale x 8 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv8i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, m1, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, m1, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 8 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 8 x i8> %elt.head, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer
@@ -495,9 +491,8 @@ define <vscale x 16 x i8> @vmadd_vv_nxv16i8_ta(<vscale x 16 x i8> %a, <vscale x
define <vscale x 16 x i8> @vmadd_vx_nxv16i8_ta(<vscale x 16 x i8> %a, i8 %b, <vscale x 16 x i8> %c, <vscale x 16 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv16i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, m2, ta, ma
-; CHECK-NEXT: vmacc.vx v10, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v10, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, m2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v10, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 16 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 16 x i8> %elt.head, <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer
@@ -596,9 +591,8 @@ define <vscale x 32 x i8> @vmadd_vv_nxv32i8_ta(<vscale x 32 x i8> %a, <vscale x
define <vscale x 32 x i8> @vmadd_vx_nxv32i8_ta(<vscale x 32 x i8> %a, i8 %b, <vscale x 32 x i8> %c, <vscale x 32 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv32i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, m4, ta, ma
-; CHECK-NEXT: vmacc.vx v12, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v12, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, m4, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v12, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 32 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 32 x i8> %elt.head, <vscale x 32 x i8> poison, <vscale x 32 x i32> zeroinitializer
@@ -700,9 +694,8 @@ define <vscale x 64 x i8> @vmadd_vv_nxv64i8_ta(<vscale x 64 x i8> %a, <vscale x
define <vscale x 64 x i8> @vmadd_vx_nxv64i8_ta(<vscale x 64 x i8> %a, i8 %b, <vscale x 64 x i8> %c, <vscale x 64 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv64i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, m8, ta, ma
-; CHECK-NEXT: vmacc.vx v16, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v16, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, m8, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v16, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 64 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 64 x i8> %elt.head, <vscale x 64 x i8> poison, <vscale x 64 x i32> zeroinitializer
@@ -801,9 +794,8 @@ define <vscale x 1 x i16> @vmadd_vv_nxv1i16_ta(<vscale x 1 x i16> %a, <vscale x
define <vscale x 1 x i16> @vmadd_vx_nxv1i16_ta(<vscale x 1 x i16> %a, i16 %b, <vscale x 1 x i16> %c, <vscale x 1 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv1i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, mf4, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, mf4, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 1 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 1 x i16> %elt.head, <vscale x 1 x i16> poison, <vscale x 1 x i32> zeroinitializer
@@ -902,9 +894,8 @@ define <vscale x 2 x i16> @vmadd_vv_nxv2i16_ta(<vscale x 2 x i16> %a, <vscale x
define <vscale x 2 x i16> @vmadd_vx_nxv2i16_ta(<vscale x 2 x i16> %a, i16 %b, <vscale x 2 x i16> %c, <vscale x 2 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv2i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, mf2, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, mf2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 2 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 2 x i16> %elt.head, <vscale x 2 x i16> poison, <vscale x 2 x i32> zeroinitializer
@@ -1003,9 +994,8 @@ define <vscale x 4 x i16> @vmadd_vv_nxv4i16_ta(<vscale x 4 x i16> %a, <vscale x
define <vscale x 4 x i16> @vmadd_vx_nxv4i16_ta(<vscale x 4 x i16> %a, i16 %b, <vscale x 4 x i16> %c, <vscale x 4 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv4i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, m1, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, m1, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 4 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 4 x i16> %elt.head, <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer
@@ -1104,9 +1094,8 @@ define <vscale x 8 x i16> @vmadd_vv_nxv8i16_ta(<vscale x 8 x i16> %a, <vscale x
define <vscale x 8 x i16> @vmadd_vx_nxv8i16_ta(<vscale x 8 x i16> %a, i16 %b, <vscale x 8 x i16> %c, <vscale x 8 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv8i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, m2, ta, ma
-; CHECK-NEXT: vmacc.vx v10, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v10, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, m2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v10, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 8 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 8 x i16> %elt.head, <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer
@@ -1205,9 +1194,8 @@ define <vscale x 16 x i16> @vmadd_vv_nxv16i16_ta(<vscale x 16 x i16> %a, <vscale
define <vscale x 16 x i16> @vmadd_vx_nxv16i16_ta(<vscale x 16 x i16> %a, i16 %b, <vscale x 16 x i16> %c, <vscale x 16 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv16i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, m4, ta, ma
-; CHECK-NEXT: vmacc.vx v12, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v12, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, m4, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v12, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 16 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 16 x i16> %elt.head, <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer
@@ -1309,9 +1297,8 @@ define <vscale x 32 x i16> @vmadd_vv_nxv32i16_ta(<vscale x 32 x i16> %a, <vscale
define <vscale x 32 x i16> @vmadd_vx_nxv32i16_ta(<vscale x 32 x i16> %a, i16 %b, <vscale x 32 x i16> %c, <vscale x 32 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv32i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, m8, ta, ma
-; CHECK-NEXT: vmacc.vx v16, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v16, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, m8, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v16, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 32 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 32 x i16> %elt.head, <vscale x 32 x i16> poison, <vscale x 32 x i32> zeroinitializer
@@ -1410,9 +1397,8 @@ define <vscale x 1 x i32> @vmadd_vv_nxv1i32_ta(<vscale x 1 x i32> %a, <vscale x
define <vscale x 1 x i32> @vmadd_vx_nxv1i32_ta(<vscale x 1 x i32> %a, i32 %b, <vscale x 1 x i32> %c, <vscale x 1 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv1i32_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e32, mf2, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e32, mf2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 1 x i32> poison, i32 %b, i32 0
%vb = shufflevector <vscale x 1 x i32> %elt.head, <vscale x 1 x i32> poison, <vscale x 1 x i32> zeroinitializer
@@ -1511,9 +1497,8 @@ define <vscale x 2 x i32> @vmadd_vv_nxv2i32_ta(<vscale x 2 x i32> %a, <vscale x
define <vscale x 2 x i32> @vmadd_vx_nxv2i32_ta(<vscale x 2 x i32> %a, i32 %b, <vscale x 2 x i32> %c, <vscale x 2 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv2i32_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e32, m1, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e32, m1, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 2 x i32> poison, i32 %b, i32 0
%vb = shufflevector <vscale x 2 x i32> %elt.head, <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer
@@ -1612,9 +1597,8 @@ define <vscale x 4 x i32> @vmadd_vv_nxv4i32_ta(<vscale x 4 x i32> %a, <vscale x
define <vscale x 4 x i32> @vmadd_vx_nxv4i32_ta(<vscale x 4 x i32> %a, i32 %b, <vscale x 4 x i32> %c, <vscale x 4 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv4i32_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e32, m2, ta, ma
-; CHECK-NEXT: vmacc.vx v10, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v10, v0
+; CHECK-NEXT: vsetvli zero, a1, e32, m2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v10, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 4 x i32> poison, i32 %b, i32 0
%vb = shufflevector <vscale x 4 x i32> %elt.head, <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
@@ -1713,9 +1697,8 @@ define <vscale x 8 x i32> @vmadd_vv_nxv8i32_ta(<vscale x 8 x i32> %a, <vscale x
define <vscale x 8 x i32> @vmadd_vx_nxv8i32_ta(<vscale x 8 x i32> %a, i32 %b, <vscale x 8 x i32> %c, <vscale x 8 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv8i32_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e32, m4, ta, ma
-; CHECK-NEXT: vmacc.vx v12, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v12, v0
+; CHECK-NEXT: vsetvli zero, a1, e32, m4, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v12, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 8 x i32> poison, i32 %b, i32 0
%vb = shufflevector <vscale x 8 x i32> %elt.head, <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer
@@ -1817,9 +1800,8 @@ define <vscale x 16 x i32> @vmadd_vv_nxv16i32_ta(<vscale x 16 x i32> %a, <vscale
define <vscale x 16 x i32> @vmadd_vx_nxv16i32_ta(<vscale x 16 x i32> %a, i32 %b, <vscale x 16 x i32> %c, <vscale x 16 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv16i32_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e32, m8, ta, ma
-; CHECK-NEXT: vmacc.vx v16, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v16, v0
+; CHECK-NEXT: vsetvli zero, a1, e32, m8, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v16, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 16 x i32> poison, i32 %b, i32 0
%vb = shufflevector <vscale x 16 x i32> %elt.head, <vscale x 16 x i32> poison, <vscale x 16 x i32> zeroinitializer
@@ -1965,9 +1947,8 @@ define <vscale x 1 x i64> @vmadd_vx_nxv1i64_ta(<vscale x 1 x i64> %a, i64 %b, <v
;
; RV64-LABEL: vmadd_vx_nxv1i64_ta:
; RV64: # %bb.0:
-; RV64-NEXT: vsetvli zero, a1, e64, m1, ta, ma
-; RV64-NEXT: vmacc.vx v9, a0, v8
-; RV64-NEXT: vmerge.vvm v8, v8, v9, v0
+; RV64-NEXT: vsetvli zero, a1, e64, m1, ta, mu
+; RV64-NEXT: vmadd.vx v8, a0, v9, v0.t
; RV64-NEXT: ret
%elt.head = insertelement <vscale x 1 x i64> poison, i64 %b, i32 0
%vb = shufflevector <vscale x 1 x i64> %elt.head, <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
@@ -2113,9 +2094,8 @@ define <vscale x 2 x i64> @vmadd_vx_nxv2i64_ta(<vscale x 2 x i64> %a, i64 %b, <v
;
; RV64-LABEL: vmadd_vx_nxv2i64_ta:
; RV64: # %bb.0:
-; RV64-NEXT: vsetvli zero, a1, e64, m2, ta, ma
-; RV64-NEXT: vmacc.vx v10, a0, v8
-; RV64-NEXT: vmerge.vvm v8, v8, v10, v0
+; RV64-NEXT: vsetvli zero, a1, e64, m2, ta, mu
+; RV64-NEXT: vmadd.vx v8, a0, v10, v0.t
; RV64-NEXT: ret
%elt.head = insertelement <vscale x 2 x i64> poison, i64 %b, i32 0
%vb = shufflevector <vscale x 2 x i64> %elt.head, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
@@ -2261,9 +2241,8 @@ define <vscale x 4 x i64> @vmadd_vx_nxv4i64_ta(<vscale x 4 x i64> %a, i64 %b, <v
;
; RV64-LABEL: vmadd_vx_nxv4i64_ta:
; RV64: # %bb.0:
-; RV64-NEXT: vsetvli zero, a1, e64, m4, ta, ma
-; RV64-NEXT: vmacc.vx v12, a0, v8
-; RV64-NEXT: vmerge.vvm v8, v8, v12, v0
+; RV64-NEXT: vsetvli zero, a1, e64, m4, ta, mu
+; RV64-NEXT: vmadd.vx v8, a0, v12, v0.t
; RV64-NEXT: ret
%elt.head = insertelement <vscale x 4 x i64> poison, i64 %b, i32 0
%vb = shufflevector <vscale x 4 x i64> %elt.head, <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
@@ -2412,9 +2391,8 @@ define <vscale x 8 x i64> @vmadd_vx_nxv8i64_ta(<vscale x 8 x i64> %a, i64 %b, <v
;
; RV64-LABEL: vmadd_vx_nxv8i64_ta:
; RV64: # %bb.0:
-; RV64-NEXT: vsetvli zero, a1, e64, m8, ta, ma
-; RV64-NEXT: vmacc.vx v16, a0, v8
-; RV64-NEXT: vmerge.vvm v8, v8, v16, v0
+; RV64-NEXT: vsetvli zero, a1, e64, m8, ta, mu
+; RV64-NEXT: vmadd.vx v8, a0, v16, v0.t
; RV64-NEXT: ret
%elt.head = insertelement <vscale x 8 x i64> poison, i64 %b, i32 0
%vb = shufflevector <vscale x 8 x i64> %elt.head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
Footnotes
|
Haven't read the patch, but I'm not convinced of the correctness of the justification. Consider the following case: The transform would create: Note the difference in the last element. Not sure I got my example entirely correct, but the basic idea I'm aiming for is that if the original sequence produces values which consume 4 distinct input registers, there's no folded form for the VADD which can do the same. Unless I'm missing something? |
We have an additional constraint that C has to be the same as PT though. In order for folding to occur all of these need to be “equivalent”
- VADD’s passthru
- VMERGE’s passthru
- VMERGES’s false
The code structure doesn’t make this particularly clear, and by “equivalent” I mean they all need to be undef or equal to VMERGE’s false IIUC. So we always transform from 3 distinct input registers to 3 input registers: A, B, and VMERGE’s false.
So in the example below, C = PT so the two results would be the same.
|
The formatting of the above comment got messed up when I replied via email, sorry about that. Should be fixed now. |
6fd3400
to
2389203
Compare
Rebased, gentle ping |
2389203
to
b24f3fe
Compare
Rebased, ping |
b24f3fe
to
dc6853b
Compare
Ping |
dc6853b
to
d00ab4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks. Will wait to hear back from @preames as well to make sure the correctness point is addressed |
Could this PR replace #78403 ? |
Not on its own, because the issue in #78403 stems from the fact that
to
So the vselect/vmerge won't be pulled outside the add which is needed for the vmerge peephole. With that said, I returned to this PR because I was attempting to generalize #78403 by teaching |
Thank you for the explanation. |
…n't due to it having an implicit merge operand.
…s tied dest We currently don't fold a vmerge if it has an implicit merge operand and its true operand has a tied dest (i.e. has a passthru operand). This restriction was added in https://reviews.llvm.org/D151596, back whenever we had separate TU/TA pseudos. It looks like it was added because the policy might not have been handled correctly. However the policy should be set correctly if we relax this restriction today, since we compute the policy differently now that we have removed the TU/TA distinction in our pseudos. We use a TUMU policy, and relax it to TAMU iff the vmerge's merge operand is implicit. The reasoning behind this being that the tail elements always come from the vmerge's merge operand[1], so if the merge operand is implicit-def then the tail is implicit-def, and hence tail agnostic. [1] unless the VL was shrunk, in which case we conservatively use TUMU.
d00ab4e
to
8470c72
Compare
We currently don't fold a vmerge if it has an implicit merge operand and its true operand has a tied dest (i.e. has a passthru operand).
This restriction was added in https://reviews.llvm.org/D151596, back whenever we had separate TU/TA pseudos. It looks like it was added because the policy might not have been handled correctly.
However the policy should be set correctly if we relax this restriction today, since we compute the policy differently now that we have removed the TU/TA distinction in our pseudos.
We use a TUMU policy, and relax it to TAMU iff the vmerge's merge operand is implicit.
The reasoning behind this being that the tail elements always come from the vmerge's merge operand1, so if the merge operand is implicit-def then the tail is implicit-def, and hence tail agnostic.
Note
The tests were included in a separate commit this PR so reviewers can see the diff
Footnotes
unless the VL was shrunk, in which case we conservatively use TUMU. ↩