Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV][ISel] Remove redundant vmerge for the vwadd. #78403

Merged
merged 1 commit into from
Jan 27, 2024

Conversation

sun-jacobi
Copy link
Member

@sun-jacobi sun-jacobi commented Jan 17, 2024

This patch is aiming at resolving the below missed-optimization case.

Code

define <8 x i64> @vwadd_mask_v8i32(<8 x i32> %x, <8 x i64> %y) {
    %mask = icmp slt <8 x i32> %x, <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
    %a = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> zeroinitializer
    %sa = sext <8 x i32> %a to <8 x i64>
    %ret = add <8 x i64> %sa, %y
    ret <8 x i64> %ret
}

Before this patch

Compiler Explorer

vwadd_mask_v8i32:
        li      a0, 42
        vsetivli        zero, 8, e32, m2, ta, ma
        vmslt.vx        v0, v8, a0
        vmv.v.i v10, 0
        vmerge.vvm      v16, v10, v8, v0
        vwadd.wv        v8, v12, v16
        ret

After this patch

vwadd_mask_v8i32:
        li a0, 42
        vsetivli zero, 8, e32, m2, ta, ma
        vmslt.vx v0, v8, a0
        vsetvli zero, zero, e32, m2, tu, mu
        vwadd.wv v12, v12, v8, v0.t
        vmv4r.v v8, v12
        ret

This pattern could be found in a reduction with a widening destination

Specifically, we first do a fold like (vwadd.wv y, (vmerge cond, x, 0)) -> (vwadd.wv y, x, y, cond), then do pattern matching on it.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 17, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Chia (sun-jacobi)

Changes

This patch is aiming at resolving the below missed-optimization case.

Code

define &lt;8 x i64&gt; @<!-- -->vwadd_mask_v8i32(&lt;8 x i32&gt; %x, &lt;8 x i64&gt; %y) {
    %mask = icmp slt &lt;8 x i32&gt; %x, &lt;i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42&gt;
    %a = select &lt;8 x i1&gt; %mask, &lt;8 x i32&gt; %x, &lt;8 x i32&gt; zeroinitializer
    %sa = sext &lt;8 x i32&gt; %a to &lt;8 x i64&gt;
    %ret = add &lt;8 x i64&gt; %sa, %y
    ret &lt;8 x i64&gt; %ret
}

Before this patch

Compiler Explorer

vwadd_mask_v8i32:
        li      a0, 42
        vsetivli        zero, 8, e32, m2, ta, ma
        vmslt.vx        v0, v8, a0
        vmv.v.i v10, 0
        vmerge.vvm      v16, v10, v8, v0
        vwadd.wv        v8, v12, v16
        ret

After this patch

vwadd_mask_v8i32:
        li a0, 42
        vsetivli zero, 8, e32, m2, ta, ma
        vmslt.vx v0, v8, a0
        vwadd.wv v16, v12, v8, v0.t
        vmv4r.v v8, v16
        ret

This pattern could be found in a reduction with a widening destination

Specifically, we first do a fold like (vwadd y, (select cond, x, 0)) -&gt; select cond (vwadd y, x), y, then do pattern matching on it.


Full diff: https://github.com/llvm/llvm-project/pull/78403.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVISelLowering.cpp (+53-1)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td (+27)
  • (added) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwadd-mask.ll (+35)
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index cb9ffabc41236e..a030538e5e8ba9 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -13457,6 +13457,56 @@ combineBinOp_VLToVWBinOp_VL(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
   return InputRootReplacement;
 }
 
+// (vwadd y, (select cond, x, 0)) -> select cond (vwadd y, x), y
+static SDValue combineVWADDSelect(SDNode *N, SelectionDAG &DAG) {
+  unsigned Opc = N->getOpcode();
+  assert(Opc == RISCVISD::VWADD_VL || Opc == RISCVISD::VWADD_W_VL ||
+         Opc == RISCVISD::VWADDU_W_VL);
+
+  SDValue VL = N->getOperand(4);
+  SDValue Y = N->getOperand(0);
+  SDValue Merge = N->getOperand(1);
+
+  if (Merge.getOpcode() != RISCVISD::VMERGE_VL)
+    return SDValue();
+
+  SDValue Cond = Merge->getOperand(0);
+  SDValue X = Merge->getOperand(1);
+  SDValue Z = Merge->getOperand(2);
+
+  if (Z.getOpcode() != ISD::INSERT_SUBVECTOR ||
+      !isNullConstant(Z.getOperand(2)))
+    return SDValue();
+
+  if (!Merge.hasOneUse())
+    return SDValue();
+
+  SmallVector<SDValue, 6> Ops(N->op_values());
+  Ops[0] = Y;
+  Ops[1] = X;
+
+  SDLoc DL(N);
+  EVT VT = N->getValueType(0);
+
+  SDValue WX = DAG.getNode(Opc, DL, VT, Ops, N->getFlags());
+  return DAG.getNode(RISCVISD::VMERGE_VL, DL, VT, Cond, WX, Y, DAG.getUNDEF(VT),
+                     VL);
+}
+
+static SDValue performVWADD_VLCombine(SDNode *N,
+                                      TargetLowering::DAGCombinerInfo &DCI) {
+  unsigned Opc = N->getOpcode();
+  assert(Opc == RISCVISD::VWADD_VL || Opc == RISCVISD::VWADD_W_VL ||
+         Opc == RISCVISD::VWADDU_W_VL);
+
+  if (Opc != RISCVISD::VWADD_VL) {
+    if (SDValue V = combineBinOp_VLToVWBinOp_VL(N, DCI))
+      return V;
+  }
+
+  return combineVWADDSelect(N, DCI.DAG);
+}
+
 // Helper function for performMemPairCombine.
 // Try to combine the memory loads/stores LSNode1 and LSNode2
 // into a single memory pair operation.
@@ -15500,9 +15550,11 @@ SDValue RISCVTargetLowering::PerformDAGCombine(SDNode *N,
     if (SDValue V = combineBinOp_VLToVWBinOp_VL(N, DCI))
       return V;
     return combineToVWMACC(N, DAG, Subtarget);
-  case RISCVISD::SUB_VL:
+  case RISCVISD::VWADD_VL:
   case RISCVISD::VWADD_W_VL:
   case RISCVISD::VWADDU_W_VL:
+    return performVWADD_VLCombine(N, DCI);
+  case RISCVISD::SUB_VL:
   case RISCVISD::VWSUB_W_VL:
   case RISCVISD::VWSUBU_W_VL:
   case RISCVISD::MUL_VL:
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
index 1deb9a709463e8..6744a38d036b00 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVVLPatterns.td
@@ -691,6 +691,30 @@ multiclass VPatTiedBinaryNoMaskVL_V<SDNode vop,
                      GPR:$vl, sew, TU_MU)>;
 }
 
+class VPatTiedBinaryMaskVL_V<SDNode vop,
+                                string instruction_name,
+                                string suffix,
+                                ValueType result_type,
+                                ValueType op2_type,
+                                ValueType mask_type,
+                                int sew,
+                                LMULInfo vlmul,
+                                VReg result_reg_class,
+                                VReg op2_reg_class>
+  : Pat<(riscv_vmerge_vl (mask_type V0),
+          (result_type (vop
+                       result_reg_class:$rs1,
+                       (op2_type op2_reg_class:$rs2),
+                       srcvalue,
+                       true_mask,
+                       VLOpFrag)),
+          result_reg_class:$rs1, result_reg_class:$merge, VLOpFrag),
+        (!cast<Instruction>(instruction_name#"_"#suffix#"_"# vlmul.MX#"_MASK")
+          result_reg_class:$merge,
+          result_reg_class:$rs1,
+          op2_reg_class:$rs2,
+          (mask_type V0), GPR:$vl, sew, TAIL_AGNOSTIC)>;
+
 multiclass VPatTiedBinaryNoMaskVL_V_RM<SDNode vop,
                                        string instruction_name,
                                        string suffix,
@@ -819,6 +843,9 @@ multiclass VPatBinaryWVL_VV_VX_WV_WX<SDPatternOperator vop, SDNode vop_w,
       defm : VPatTiedBinaryNoMaskVL_V<vop_w, instruction_name, "WV",
                                       wti.Vector, vti.Vector, vti.Log2SEW,
                                       vti.LMul, wti.RegClass, vti.RegClass>;
+      def : VPatTiedBinaryMaskVL_V<vop_w, instruction_name, "WV",
+                                      wti.Vector, vti.Vector, vti.Mask, vti.Log2SEW,
+                                      vti.LMul, wti.RegClass, vti.RegClass>;
       def : VPatBinaryVL_V<vop_w, instruction_name, "WV",
                            wti.Vector, wti.Vector, vti.Vector, vti.Mask,
                            vti.Log2SEW, vti.LMul, wti.RegClass, wti.RegClass,
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwadd-mask.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwadd-mask.ll
new file mode 100644
index 00000000000000..afc59b875d79df
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwadd-mask.ll
@@ -0,0 +1,35 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=riscv32 -mattr=+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK
+; RUN: llc -mtriple=riscv64 -mattr=+v -verify-machineinstrs < %s | FileCheck %s --check-prefixes=CHECK
+
+define <8 x i64> @vwadd_mask_v8i32(<8 x i32> %x, <8 x i64> %y) {
+; CHECK-LABEL: vwadd_mask_v8i32:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    li a0, 42
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vmslt.vx v0, v8, a0
+; CHECK-NEXT:    vwadd.wv v16, v12, v8, v0.t
+; CHECK-NEXT:    vmv4r.v v8, v16
+; CHECK-NEXT:    ret
+    %mask = icmp slt <8 x i32> %x, <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
+    %a = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> zeroinitializer
+    %sa = sext <8 x i32> %a to <8 x i64>
+    %ret = add <8 x i64> %sa, %y
+    ret <8 x i64> %ret
+}
+
+define <8 x i64> @vwadd_mask_v8i32_commutative(<8 x i32> %x, <8 x i64> %y) {
+; CHECK-LABEL: vwadd_mask_v8i32_commutative:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    li a0, 42
+; CHECK-NEXT:    vsetivli zero, 8, e32, m2, ta, ma
+; CHECK-NEXT:    vmslt.vx v0, v8, a0
+; CHECK-NEXT:    vwadd.wv v16, v12, v8, v0.t
+; CHECK-NEXT:    vmv4r.v v8, v16
+; CHECK-NEXT:    ret
+    %mask = icmp slt <8 x i32> %x, <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
+    %a = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> zeroinitializer
+    %sa = sext <8 x i32> %a to <8 x i64>
+    %ret = add <8 x i64> %y, %sa
+    ret <8 x i64> %ret
+}

EVT VT = N->getValueType(0);

SDValue WX = DAG.getNode(Opc, DL, VT, Ops, N->getFlags());
return DAG.getNode(RISCVISD::VMERGE_VL, DL, VT, Cond, WX, Y, DAG.getUNDEF(VT),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operand 2 of the original vmerge is the passthru operand for elements past VL if the VWADD is tail undisturbed. This VMERGE_VL has undef for its passthru. That corrupts the elements past VL.

SDValue Z = Merge->getOperand(2);

if (Z.getOpcode() != ISD::INSERT_SUBVECTOR ||
!isNullConstant(Z.getOperand(2)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only checks that the insertion index is 0. Where do you check the vector being inserted is 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out this, I will fix it.

@lukel97
Copy link
Contributor

lukel97 commented Jan 17, 2024

I think this might be an issue in performCombineVMergeAndVOps. In the case below we're able to merge the
PseudoVMERGE into the PseudoVADD after isel, but not the PseudoVWADD. Although I'm not exactly sure why, since the vmerge is an input operand to the vadd, not the other way round

define <vscale x 2 x i32> @f(<vscale x 2 x i32> %x, <vscale x 2 x i32> %y) {
    %mask = icmp slt <vscale x 2 x i32> %x, shufflevector(<vscale x 2 x i32> insertelement(<vscale x 2 x i32> poison, i32 42, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
    %a = select <vscale x 2 x i1> %mask, <vscale x 2 x i32> %x, <vscale x 2 x i32> zeroinitializer
    %ret = add <vscale x 2 x i32> %a, %y
    ret <vscale x 2 x i32> %ret
}

define <vscale x 2 x i64> @g(<vscale x 2 x i32> %x, <vscale x 2 x i64> %y) {
    %mask = icmp slt <vscale x 2 x i32> %x, shufflevector(<vscale x 2 x i32> insertelement(<vscale x 2 x i32> poison, i32 42, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
    %a = select <vscale x 2 x i1> %mask, <vscale x 2 x i32> %x, <vscale x 2 x i32> zeroinitializer
    %sa = sext <vscale x 2 x i32> %a to <vscale x 2 x i64>
    %ret = add <vscale x 2 x i64> %sa, %y
    ret <vscale x 2 x i64> %ret
}

Update: It's not an issue with performCombineVMergeAndVOps, we're doing a similar combine as this patch somewhere for add.

Initial selection DAG: %bb.0 'f:'
SelectionDAG has 19 nodes:
  t0: ch,glue = EntryToken
  t2: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %0
  t9: nxv2i32 = insert_vector_elt undef:nxv2i32, Constant:i32<42>, Constant:i64<0>
          t10: nxv2i32 = splat_vector Constant:i32<42>
        t12: nxv2i1 = setcc t2, t10, setlt:ch
        t13: nxv2i32 = splat_vector Constant:i32<0>
      t14: nxv2i32 = vselect t12, t2, t13
      t4: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %1
    t15: nxv2i32 = add t14, t4
  t17: ch,glue = CopyToReg t0, Register:nxv2i32 $v8, t15
  t18: ch = RISCVISD::RET_GLUE t17, Register:nxv2i32 $v8, t17:1



Optimized lowered selection DAG: %bb.0 'f:'
SelectionDAG has 15 nodes:
  t0: ch,glue = EntryToken
  t2: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %0
        t10: nxv2i32 = splat_vector Constant:i32<42>
      t12: nxv2i1 = setcc t2, t10, setlt:ch
      t20: nxv2i32 = add t19, t2
    t21: nxv2i32 = vselect t12, t20, t19
  t17: ch,glue = CopyToReg t0, Register:nxv2i32 $v8, t21
    t4: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %1
  t19: nxv2i32 = freeze t4
  t18: ch = RISCVISD::RET_GLUE t17, Register:nxv2i32 $v8, t17:1

Update: The combine is DAGCombiner::foldBinOpIntoSelect, which doesn't trigger for the sext case because there's a sign_extend in between the add and vselect:

t16: nxv2i64 = add t15, t4
  t15: nxv2i64 = sign_extend t14
    t14: nxv2i32 = vselect t12, t2, t13
      t12: nxv2i1 = setcc t2, t10, setlt:ch
        t2: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %0
          t0: ch,glue = EntryToken
        t10: nxv2i32 = splat_vector Constant:i32<42>
      t13: nxv2i32 = splat_vector Constant:i32<0>
  t4: nxv2i64,ch = CopyFromReg t0, Register:nxv2i64 %1

@sun-jacobi
Copy link
Member Author

sun-jacobi commented Jan 17, 2024

I think this might be an issue in performCombineVMergeAndVOps. In the case below we're able to merge the PseudoVMERGE into the PseudoVADD after isel, but not the PseudoVWADD. Although I'm not exactly sure why, since the vmerge is an input operand to the vadd, not the other way round

define <vscale x 2 x i32> @f(<vscale x 2 x i32> %x, <vscale x 2 x i32> %y) {
    %mask = icmp slt <vscale x 2 x i32> %x, shufflevector(<vscale x 2 x i32> insertelement(<vscale x 2 x i32> poison, i32 42, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
    %a = select <vscale x 2 x i1> %mask, <vscale x 2 x i32> %x, <vscale x 2 x i32> zeroinitializer
    %ret = add <vscale x 2 x i32> %a, %y
    ret <vscale x 2 x i32> %ret
}

define <vscale x 2 x i64> @g(<vscale x 2 x i32> %x, <vscale x 2 x i64> %y) {
    %mask = icmp slt <vscale x 2 x i32> %x, shufflevector(<vscale x 2 x i32> insertelement(<vscale x 2 x i32> poison, i32 42, i32 0), <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer)
    %a = select <vscale x 2 x i1> %mask, <vscale x 2 x i32> %x, <vscale x 2 x i32> zeroinitializer
    %sa = sext <vscale x 2 x i32> %a to <vscale x 2 x i64>
    %ret = add <vscale x 2 x i64> %sa, %y
    ret <vscale x 2 x i64> %ret
}

Yes, the original performCombineVMergeAndVOp actually works for vadd.
The DAGCombiner does a target-independent folding similar to what this issue does so that the performCombineVMergeAndVOps can fold the vadd into the masked version.
This is also the motivation why I do the (vwadd y, (select cond, x, 0)) -> select cond (vwadd y, x), y.

@sun-jacobi
Copy link
Member Author

sun-jacobi commented Jan 17, 2024

@lukel97 Thank you! That's exactly what I mean.

return SDValue();

SmallVector<SDValue, 6> Ops(N->op_values());
Ops[0] = Y;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't Ops[0] already Y?

EVT VT = N->getValueType(0);

SDValue WX = DAG.getNode(Opc, DL, VT, Ops, N->getFlags());
return DAG.getNode(RISCVISD::VMERGE_VL, DL, VT, Cond, WX, Y, Y, VL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still incorrect. You have to use N->getOperand(2) for the passthru operand to the vmerge.

You're also losing any mask that the VWADD_W_VL may have already had.

// (vwadd y, (select cond, x, 0)) -> select cond (vwadd y, x), y
static SDValue combineVWADDSelect(SDNode *N, SelectionDAG &DAG) {
unsigned Opc = N->getOpcode();
assert(Opc == RISCVISD::VWADD_VL || Opc == RISCVISD::VWADD_W_VL ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't be VWADD_VL due to the check in performVWADD_VLCombine right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check in performVWADD_VLCombine is for RISCVISD::VWADD_W_VL and RISCVISD::VWADDU_W_VL.
We need to first do combineBinOp_VLToVWBinOp_VL on those.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. You're right. Sorry about that.

EVT VT = N->getValueType(0);

SDValue WX = DAG.getNode(Opc, DL, VT, Ops, N->getFlags());
return DAG.getNode(RISCVISD::VMERGE_VL, DL, VT, Cond, WX, Y, Y, VL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to create a VMERGE, you just need to change the Mask operand when you create WX. RISCVISD::VWADD_W_VL supports all the operands you need to describe this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the advice. It works.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this we get a normal masked instruction, but in this case I think we need the MASK_TIED.


SmallVector<SDValue, 6> Ops(N->op_values());
Ops[MergeID] = X;
Ops[3] = Cond;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't replace operand 3 without checking that operand 3 was an all 1s mask or the passthru was undef originally. If the mask wasn't all 1s or the passthru wasn't undef then then original add produced the passthru operand for masked off elements.

SDValue X = Merge->getOperand(1);
SDValue Z = Merge->getOperand(2);

if (Z.getOpcode() != ISD::INSERT_SUBVECTOR ||
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't check what operand 0 of the insert is or the size of the insertion. So you only know some subvector of the input is 0. You don't know the whole vector is 0.

if (!Merge.hasOneUse())
return SDValue();

SmallVector<SDValue, 6> Ops(N->op_values());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 6? I think there are only 5 operands. LHS, RHS, Passthru, Mask, VL

@sun-jacobi
Copy link
Member Author

sun-jacobi commented Jan 20, 2024

AFAIU, we may need MASK_TIED to use the same register for vd and vs2.

For vwadd.vv and vwaddu.vv, we could not make sure vd and vs2 are the same, thus this folding might not work for them.

@sun-jacobi
Copy link
Member Author

sun-jacobi commented Jan 26, 2024

Sorry for Ping.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sun-jacobi sun-jacobi merged commit 3855757 into llvm:main Jan 27, 2024
3 of 4 checks passed
sun-jacobi added a commit that referenced this pull request Jan 31, 2024
…80079)

Similar to #78403, but for scalable `vwadd(u).wv`, given that #76785 is recommited.

### Code
```
define <vscale x 8 x i64> @vwadd_wv_mask_v8i32(<vscale x 8 x i32> %x, <vscale x 8 x i64> %y) {
    %mask = icmp slt <vscale x 8 x i32> %x, shufflevector (<vscale x 8 x i32> insertelement (<vscale x 8 x i32> poison, i32 42, i64 0), <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer)
    %a = select <vscale x 8 x i1> %mask, <vscale x 8 x i32> %x, <vscale x 8 x i32> zeroinitializer
    %sa = sext <vscale x 8 x i32> %a to <vscale x 8 x i64>
    %ret = add <vscale x 8 x i64> %sa, %y
    ret <vscale x 8 x i64> %ret
}
```

### Before this patch
[Compiler Explorer](https://godbolt.org/z/xsoa5xPrd)
```
vwadd_wv_mask_v8i32:
        li      a0, 42
        vsetvli a1, zero, e32, m4, ta, ma
        vmslt.vx        v0, v8, a0
        vmv.v.i v12, 0
        vmerge.vvm      v24, v12, v8, v0
        vwadd.wv        v8, v16, v24
        ret
```

### After this patch
```
vwadd_wv_mask_v8i32:
        li a0, 42
        vsetvli a1, zero, e32, m4, ta, ma
        vmslt.vx v0, v8, a0
        vsetvli zero, zero, e32, m4, tu, mu
        vwadd.wv v16, v16, v8, v0.t
        vmv8r.v v8, v16
        ret
```
lukel97 added a commit that referenced this pull request Mar 27, 2024
Note we can't use vwaddu.wv because it will get combined away with #78403
@sun-jacobi sun-jacobi deleted the merge-vwadd branch April 14, 2024 05:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants