Skip to content

Conversation

@lukel97
Copy link
Contributor

@lukel97 lukel97 commented Dec 3, 2025

In #156499 we taught the vmerge peephole to commute operands so that the passthru operands lined up. We can do the same for the vmv.v.v peephole, which allows us fold more vmv.v.vs away.

This is needed to prevent a regression in an upcoming patch that adds a combine for vmerge.vvm to vmv.v.v.

In llvm#156499 we taught the vmerge peephole to commute operands so that the passthru operands lined up. We can do the same for the vmv.v.v peephole, which allows us fold more vmv.v.vs away.

This is needed to prevent a regression in an upcoming patch that adds a combine for vmerge.vvm to vmv.v.v.
@llvmbot
Copy link
Member

llvmbot commented Dec 3, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

In #156499 we taught the vmerge peephole to commute operands so that the passthru operands lined up. We can do the same for the vmv.v.v peephole, which allows us fold more vmv.v.vs away.

This is needed to prevent a regression in an upcoming patch that adds a combine for vmerge.vvm to vmv.v.v.


Full diff: https://github.com/llvm/llvm-project/pull/170536.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp (+21-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll (+11)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.mir (+20)
diff --git a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
index 6ddca4a3e0909..a5385be0c011c 100644
--- a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
@@ -651,11 +651,23 @@ bool RISCVVectorPeephole::foldVMV_V_V(MachineInstr &MI) {
   if (!hasSameEEW(MI, *Src))
     return false;
 
+  std::optional<std::pair<unsigned, unsigned>> NeedsCommute;
+
   // Src needs to have the same passthru as VMV_V_V
   MachineOperand &SrcPassthru = Src->getOperand(Src->getNumExplicitDefs());
   if (SrcPassthru.getReg().isValid() &&
-      SrcPassthru.getReg() != Passthru.getReg())
-    return false;
+      SrcPassthru.getReg() != Passthru.getReg()) {
+    // If Src's passthru != Passthru, check if it uses Passthru in another
+    // operand and try to commute it.
+    int OtherIdx = Src->findRegisterUseOperandIdx(Passthru.getReg(), TRI);
+    if (OtherIdx == -1)
+      return false;
+    unsigned OpIdx1 = OtherIdx;
+    unsigned OpIdx2 = Src->getNumExplicitDefs();
+    if (!TII->findCommutedOpIndices(*Src, OpIdx1, OpIdx2))
+      return false;
+    NeedsCommute = {OpIdx1, OpIdx2};
+  }
 
   // Src VL will have already been reduced if legal (see tryToReduceVL),
   // so we don't need to handle a smaller source VL here.  However, the
@@ -668,6 +680,13 @@ bool RISCVVectorPeephole::foldVMV_V_V(MachineInstr &MI) {
   if (!ensureDominates(Passthru, *Src))
     return false;
 
+  if (NeedsCommute) {
+    auto [OpIdx1, OpIdx2] = *NeedsCommute;
+    [[maybe_unused]] bool Commuted =
+        TII->commuteInstruction(*Src, /*NewMI=*/false, OpIdx1, OpIdx2);
+    assert(Commuted && "Failed to commute Src?");
+  }
+
   if (SrcPassthru.getReg() != Passthru.getReg()) {
     SrcPassthru.setReg(Passthru.getReg());
     // If Src is masked then its passthru needs to be in VRNoV0.
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
index c2638127e47af..698d47f3be720 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.ll
@@ -245,3 +245,14 @@ define <vscale x 1 x i64> @vmerge(<vscale x 1 x i64> %passthru, <vscale x 1 x i6
   %b = call <vscale x 1 x i64> @llvm.riscv.vmv.v.v.nxv1i64(<vscale x 1 x i64> %passthru, <vscale x 1 x i64> %a, iXLen %avl)
   ret <vscale x 1 x i64> %b
 }
+
+define <vscale x 4 x float> @commute_vfmadd(<vscale x 4 x float> %passthru, <vscale x 4 x float> %a, <vscale x 4 x float> %b, iXLen %vl) {
+; CHECK-LABEL: commute_vfmadd:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vsetvli zero, a0, e32, m2, tu, ma
+; CHECK-NEXT:    vfmacc.vv v8, v12, v10
+; CHECK-NEXT:    ret
+  %v = call <vscale x 4 x float> @llvm.riscv.vfmadd(<vscale x 4 x float> %a, <vscale x 4 x float> %b, <vscale x 4 x float> %passthru, iXLen 7, iXLen %vl, iXLen 3)
+  %w = call <vscale x 4 x float> @llvm.riscv.vmv.v.v(<vscale x 4 x float> %passthru, <vscale x 4 x float> %v, iXLen %vl)
+  ret <vscale x 4 x float> %w
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.mir b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.mir
index 95232e734bb18..68e74ff6ba05b 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/vmv.v.v-peephole.mir
@@ -168,3 +168,23 @@ body: |
     %x:vrnov0 = PseudoVMERGE_VVM_M1 $noreg, %passthru, $noreg, %mask, 4, 5 /* e32 */
     %z:vr = PseudoVMV_V_V_M1 %passthru, %x, 4, 5 /* e32 */, 0 /* tu, mu */
 ...
+---
+name: commute_vfmadd
+body: |
+  bb.0:
+    liveins: $x8, $v0, $v8, $v9, $v10
+    ; CHECK-LABEL: name: commute_vfmadd
+    ; CHECK: liveins: $x8, $v0, $v8, $v9, $v10
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %avl:gprnox0 = COPY $x8
+    ; CHECK-NEXT: %passthru:vrnov0 = COPY $v8
+    ; CHECK-NEXT: %x:vr = COPY $v9
+    ; CHECK-NEXT: %y:vr = COPY $v10
+    ; CHECK-NEXT: %vfmadd:vrnov0 = nofpexcept PseudoVFMACC_VV_M1_E32 %passthru, %y, %x, 7, %avl, 5 /* e32 */, 0 /* tu, mu */, implicit $frm
+    %avl:gprnox0 = COPY $x8
+    %passthru:vrnov0 = COPY $v8
+    %x:vr = COPY $v9
+    %y:vr = COPY $v10
+    %vfmadd:vrnov0 = nofpexcept PseudoVFMADD_VV_M1_E32 %x, %y, %passthru, 7, -1, 5 /* e32 */, 3 /* ta, ma */, implicit $frm
+    %vmerge:vrnov0 = PseudoVMV_V_V_M1 %passthru, %vfmadd, %avl, 5 /* e32 */, 0 /* tu, mu */
+...

lukel97 added a commit to lukel97/llvm-project that referenced this pull request Dec 3, 2025
…_v_x

Stacked on llvm#170536

An upcoming patch aims to remove the last use of @llvm.experimental.vp.splat in RISCVCodegenPrepare by replacing it with a vp_merge of a regular splat.

A vp_merge will get lowered to vmerge_vl, and if we combine vmerge_vl of a splat to vmv_v_x we can get the same behaviour as the vp.splat intrinsic.

This adds the two combines needed. It was easier to do the combines on _vl nodes rather than on vp_merge itself, since the types are already legal for _vl nodes.
Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lukel97 lukel97 merged commit 4fcb6e1 into llvm:main Dec 4, 2025
12 checks passed
lukel97 added a commit to lukel97/llvm-project that referenced this pull request Dec 4, 2025
…_v_x

Stacked on llvm#170536

An upcoming patch aims to remove the last use of @llvm.experimental.vp.splat in RISCVCodegenPrepare by replacing it with a vp_merge of a regular splat.

A vp_merge will get lowered to vmerge_vl, and if we combine vmerge_vl of a splat to vmv_v_x we can get the same behaviour as the vp.splat intrinsic.

This adds the two combines needed. It was easier to do the combines on _vl nodes rather than on vp_merge itself, since the types are already legal for _vl nodes.
kcloudy0717 pushed a commit to kcloudy0717/llvm-project that referenced this pull request Dec 4, 2025
In llvm#156499 we taught the vmerge peephole to commute operands so that the
passthru operands lined up. We can do the same for the vmv.v.v peephole,
which allows us fold more vmv.v.vs away.

This is needed to prevent a regression in an upcoming patch that adds a
combine for vmerge.vvm to vmv.v.v.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants