Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV] Re-model RVV comparison instructions #88868

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

wangpc-pp
Copy link
Contributor

We remove the @earlyclobber constraint.

Instead, we add RegisterClasses which contain only the lowest
LMUL1 registers of different LMULs. We use them as the output
operand of comparison instrcutions to match the constraint:

The destination EEW is smaller than the source EEW and the overlap
is in the lowest-numbered part of the source register group.

The benefits:

  • From the tests diff, we can see improvements (and there are some
    regressions in LMUL8 tests). Of course, this conclusion is from
    unit tests, we should evaluate it on some benchmarks.
  • This change makes [RISCV] Don't use V0 directly in patterns #88496 possible (though I think a fine-grained
    earlyclobber mechanism should be the right way).
  • If we agree, we can model other instructions in this way.

Created using spr 1.3.6-beta.1
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 16, 2024

@llvm/pr-subscribers-llvm-globalisel

@llvm/pr-subscribers-backend-risc-v

Author: Pengcheng Wang (wangpc-pp)

Changes

We remove the @<!-- -->earlyclobber constraint.

Instead, we add RegisterClasses which contain only the lowest
LMUL1 registers of different LMULs. We use them as the output
operand of comparison instrcutions to match the constraint:

> The destination EEW is smaller than the source EEW and the overlap
> is in the lowest-numbered part of the source register group.

The benefits:

  • From the tests diff, we can see improvements (and there are some
    regressions in LMUL8 tests). Of course, this conclusion is from
    unit tests, we should evaluate it on some benchmarks.
  • This change makes #88496 possible (though I think a fine-grained
    earlyclobber mechanism should be the right way).
  • If we agree, we can model other instructions in this way.

Patch is 1.84 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88868.diff

90 Files Affected:

  • (modified) llvm/lib/Target/RISCV/GISel/RISCVRegisterBankInfo.cpp (+2-14)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td (+10-12)
  • (modified) llvm/lib/Target/RISCV/RISCVRegisterInfo.td (+6)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/icmp.mir (+68-68)
  • (modified) llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/binop-splats.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/bitreverse-vp.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/bswap-vp.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll (+52-74)
  • (modified) llvm/test/CodeGen/RISCV/rvv/extractelt-i1.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-binop-splats.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bitreverse-vp.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap-vp.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-ceil-vp.ll (+46-67)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-i1.ll (+48-48)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-floor-vp.ll (+46-67)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fmaximum-vp.ll (+113-108)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fmaximum.ll (+42-32)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fminimum-vp.ll (+113-108)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fminimum.ll (+42-32)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-setcc.ll (+138-138)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fptosi-vp-mask.ll (+2-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fptoui-vp-mask.ll (+2-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-setcc.ll (+48-48)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-interleave-store.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-splat.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-load-fp.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-load-int.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-store-fp.ll (+32-120)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-store-int.ll (+39-103)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-nearbyint-vp.ll (+57-62)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll (+336-160)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int-vp.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-rint-vp.ll (+36-53)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-round-vp.ll (+46-67)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundeven-vp.ll (+46-67)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-roundtozero-vp.ll (+46-67)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-fp-vp.ll (+177-235)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-setcc-int-vp.ll (+164-295)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-trunc-vp.ll (+25-25)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfclass-vp.ll (+14-21)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfcmp-constrained-sdnode.ll (+1008-1260)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vfcmps-constrained-sdnode.ll (+486-486)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vpmerge.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vselect-vp.ll (+17-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/floor-vp.ll (+52-74)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-sdnode.ll (+129-69)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fmaximum-vp.ll (+245-336)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fminimum-sdnode.ll (+129-69)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fminimum-vp.ll (+245-336)
  • (modified) llvm/test/CodeGen/RISCV/rvv/mscatter-sdnode.ll (+34-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/nearbyint-vp.ll (+114-170)
  • (modified) llvm/test/CodeGen/RISCV/rvv/rint-vp.ll (+78-124)
  • (modified) llvm/test/CodeGen/RISCV/rvv/round-vp.ll (+78-124)
  • (modified) llvm/test/CodeGen/RISCV/rvv/roundeven-vp.ll (+78-124)
  • (modified) llvm/test/CodeGen/RISCV/rvv/roundtozero-vp.ll (+78-124)
  • (modified) llvm/test/CodeGen/RISCV/rvv/select-int.ll (+18-18)
  • (modified) llvm/test/CodeGen/RISCV/rvv/setcc-fp-vp.ll (+347-371)
  • (modified) llvm/test/CodeGen/RISCV/rvv/setcc-fp.ll (+296-296)
  • (modified) llvm/test/CodeGen/RISCV/rvv/setcc-int-vp.ll (+179-291)
  • (modified) llvm/test/CodeGen/RISCV/rvv/setcc-integer.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/sshl_sat_vec.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-deinterleave-load.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave-store.ll (+13-22)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vector-interleave.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfcmp-constrained-sdnode.ll (+1452-1830)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfcmps-constrained-sdnode.ll (+729-729)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfptoi-sdnode.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfptosi-vp-mask.ll (+2-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vfptoui-vp-mask.ll (+2-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmfeq.ll (+39-39)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmfge.ll (+39-39)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmfgt.ll (+39-39)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmfle.ll (+39-39)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmflt.ll (+39-39)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmfne.ll (+39-39)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmseq.ll (+82-82)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmsge.ll (+118-122)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmsgeu.ll (+118-122)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmsgt.ll (+82-82)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmsgtu.ll (+82-82)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmsle.ll (+82-82)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmsleu.ll (+82-82)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmslt.ll (+82-82)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmsltu.ll (+82-82)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vmsne.ll (+82-82)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vp-reverse-mask.ll (+3-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vp-splice-mask-vectors.ll (+3-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vselect-fp.ll (+12-20)
  • (modified) llvm/test/CodeGen/RISCV/rvv/vtrunc-vp-mask.ll (+2-3)
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVRegisterBankInfo.cpp b/llvm/lib/Target/RISCV/GISel/RISCVRegisterBankInfo.cpp
index 86e44343b50865..ca77a9729e03b9 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVRegisterBankInfo.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVRegisterBankInfo.cpp
@@ -110,6 +110,8 @@ RISCVRegisterBankInfo::getRegBankFromRegClass(const TargetRegisterClass &RC,
                                               LLT Ty) const {
   switch (RC.getID()) {
   default:
+    if (RISCVRI::isVRegClass(RC.TSFlags))
+      return getRegBank(RISCV::VRBRegBankID);
     llvm_unreachable("Register class not supported");
   case RISCV::GPRRegClassID:
   case RISCV::GPRF16RegClassID:
@@ -131,20 +133,6 @@ RISCVRegisterBankInfo::getRegBankFromRegClass(const TargetRegisterClass &RC,
   case RISCV::FPR64CRegClassID:
   case RISCV::FPR32CRegClassID:
     return getRegBank(RISCV::FPRBRegBankID);
-  case RISCV::VMRegClassID:
-  case RISCV::VRRegClassID:
-  case RISCV::VRNoV0RegClassID:
-  case RISCV::VRM2RegClassID:
-  case RISCV::VRM2NoV0RegClassID:
-  case RISCV::VRM4RegClassID:
-  case RISCV::VRM4NoV0RegClassID:
-  case RISCV::VMV0RegClassID:
-  case RISCV::VRM2_with_sub_vrm1_0_in_VMV0RegClassID:
-  case RISCV::VRM4_with_sub_vrm1_0_in_VMV0RegClassID:
-  case RISCV::VRM8RegClassID:
-  case RISCV::VRM8NoV0RegClassID:
-  case RISCV::VRM8_with_sub_vrm1_0_in_VMV0RegClassID:
-    return getRegBank(RISCV::VRBRegBankID);
   }
 }
 
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
index ad1821d57256bc..686bfd1af0d062 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
@@ -143,22 +143,24 @@ class PseudoToVInst<string PseudoInst> {
 
 // This class describes information associated to the LMUL.
 class LMULInfo<int lmul, int oct, VReg regclass, VReg wregclass,
-               VReg f2regclass, VReg f4regclass, VReg f8regclass, string mx> {
+               VReg f2regclass, VReg f4regclass, VReg f8regclass, string mx,
+               VReg moutregclass = VMM1> {
   bits<3> value = lmul; // This is encoded as the vlmul field of vtype.
   VReg vrclass = regclass;
   VReg wvrclass = wregclass;
   VReg f8vrclass = f8regclass;
   VReg f4vrclass = f4regclass;
   VReg f2vrclass = f2regclass;
+  VReg moutclass = moutregclass;
   string MX = mx;
   int octuple = oct;
 }
 
 // Associate LMUL with tablegen records of register classes.
 def V_M1  : LMULInfo<0b000,  8,   VR,        VRM2,   VR,   VR, VR, "M1">;
-def V_M2  : LMULInfo<0b001, 16, VRM2,        VRM4,   VR,   VR, VR, "M2">;
-def V_M4  : LMULInfo<0b010, 32, VRM4,        VRM8, VRM2,   VR, VR, "M4">;
-def V_M8  : LMULInfo<0b011, 64, VRM8,/*NoVReg*/VR, VRM4, VRM2, VR, "M8">;
+def V_M2  : LMULInfo<0b001, 16, VRM2,        VRM4,   VR,   VR, VR, "M2", VMM2>;
+def V_M4  : LMULInfo<0b010, 32, VRM4,        VRM8, VRM2,   VR, VR, "M4", VMM4>;
+def V_M8  : LMULInfo<0b011, 64, VRM8,/*NoVReg*/VR, VRM4, VRM2, VR, "M8", VMM8>;
 
 def V_MF8 : LMULInfo<0b101, 1, VR, VR,/*NoVReg*/VR,/*NoVReg*/VR,/*NoVReg*/VR, "MF8">;
 def V_MF4 : LMULInfo<0b110, 2, VR, VR,          VR,/*NoVReg*/VR,/*NoVReg*/VR, "MF4">;
@@ -2668,25 +2670,21 @@ multiclass PseudoVEXT_VF8 {
 // With LMUL<=1 the source and dest occupy a single register so any overlap
 // is in the lowest-numbered part.
 multiclass VPseudoBinaryM_VV<LMULInfo m, int TargetConstraintType = 1> {
-  defm _VV : VPseudoBinaryM<VR, m.vrclass, m.vrclass, m,
-                            !if(!ge(m.octuple, 16), "@earlyclobber $rd", ""), TargetConstraintType>;
+  defm _VV : VPseudoBinaryM<m.moutclass, m.vrclass, m.vrclass, m, "", TargetConstraintType>;
 }
 
 multiclass VPseudoBinaryM_VX<LMULInfo m, int TargetConstraintType = 1> {
   defm "_VX" :
-    VPseudoBinaryM<VR, m.vrclass, GPR, m,
-                   !if(!ge(m.octuple, 16), "@earlyclobber $rd", ""), TargetConstraintType>;
+    VPseudoBinaryM<m.moutclass, m.vrclass, GPR, m, "", TargetConstraintType>;
 }
 
 multiclass VPseudoBinaryM_VF<LMULInfo m, FPR_Info f, int TargetConstraintType = 1> {
   defm "_V" # f.FX :
-    VPseudoBinaryM<VR, m.vrclass, f.fprclass, m,
-                   !if(!ge(m.octuple, 16), "@earlyclobber $rd", ""), TargetConstraintType>;
+    VPseudoBinaryM<m.moutclass, m.vrclass, f.fprclass, m, "", TargetConstraintType>;
 }
 
 multiclass VPseudoBinaryM_VI<LMULInfo m, int TargetConstraintType = 1> {
-  defm _VI : VPseudoBinaryM<VR, m.vrclass, simm5, m,
-                            !if(!ge(m.octuple, 16), "@earlyclobber $rd", ""), TargetConstraintType>;
+  defm _VI : VPseudoBinaryM<m.moutclass, m.vrclass, simm5, m, "", TargetConstraintType>;
 }
 
 multiclass VPseudoVGTR_VV_VX_VI<Operand ImmType = simm5, string Constraint = ""> {
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
index 316daf2763ca1e..1a0533c7072705 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
@@ -533,6 +533,12 @@ def VR : VReg<!listconcat(VM1VTs, VMaskVTs),
               (add (sequence "V%u", 8, 31),
                    (sequence "V%u", 7, 0)), 1>;
 
+// V0 is likely to be used as mask, so we move it in front of allocation order.
+def VMM1 : VReg<VMaskVTs, (add (sequence "V%u", 0, 31)), 1>;
+def VMM2 : VReg<VMaskVTs, (add (sequence "V%u", 0, 31, 2)), 1>;
+def VMM4 : VReg<VMaskVTs, (add (sequence "V%u", 0, 31, 4)), 1>;
+def VMM8 : VReg<VMaskVTs, (add (sequence "V%u", 0, 31, 8)), 1>;
+
 def VRNoV0 : VReg<!listconcat(VM1VTs, VMaskVTs), (sub VR, V0), 1>;
 
 def VRM2 : VReg<VM2VTs, (add (sequence "V%uM2", 8, 31, 2),
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/icmp.mir b/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/icmp.mir
index df0d48aac92551..0677232fa60677 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/icmp.mir
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/icmp.mir
@@ -13,13 +13,13 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv1i8
     ; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV32I-NEXT: [[PseudoVMSLTU_VV_MF8_:%[0-9]+]]:vr = PseudoVMSLTU_VV_MF8 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV32I-NEXT: [[PseudoVMSLTU_VV_MF8_:%[0-9]+]]:vmm1 = PseudoVMSLTU_VV_MF8 [[DEF]], [[DEF]], -1, 3 /* e8 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_MF8_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv1i8
     ; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV64I-NEXT: [[PseudoVMSLTU_VV_MF8_:%[0-9]+]]:vr = PseudoVMSLTU_VV_MF8 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV64I-NEXT: [[PseudoVMSLTU_VV_MF8_:%[0-9]+]]:vmm1 = PseudoVMSLTU_VV_MF8 [[DEF]], [[DEF]], -1, 3 /* e8 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_MF8_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 1 x s8>) = G_IMPLICIT_DEF
@@ -37,13 +37,13 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv2i8
     ; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV32I-NEXT: [[PseudoVMSLT_VV_MF4_:%[0-9]+]]:vr = PseudoVMSLT_VV_MF4 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV32I-NEXT: [[PseudoVMSLT_VV_MF4_:%[0-9]+]]:vmm1 = PseudoVMSLT_VV_MF4 [[DEF]], [[DEF]], -1, 3 /* e8 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_MF4_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv2i8
     ; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV64I-NEXT: [[PseudoVMSLT_VV_MF4_:%[0-9]+]]:vr = PseudoVMSLT_VV_MF4 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV64I-NEXT: [[PseudoVMSLT_VV_MF4_:%[0-9]+]]:vmm1 = PseudoVMSLT_VV_MF4 [[DEF]], [[DEF]], -1, 3 /* e8 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_MF4_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 2 x s8>) = G_IMPLICIT_DEF
@@ -61,13 +61,13 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv4i8
     ; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV32I-NEXT: [[PseudoVMSLEU_VV_MF2_:%[0-9]+]]:vr = PseudoVMSLEU_VV_MF2 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV32I-NEXT: [[PseudoVMSLEU_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSLEU_VV_MF2 [[DEF]], [[DEF]], -1, 3 /* e8 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_MF2_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv4i8
     ; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV64I-NEXT: [[PseudoVMSLEU_VV_MF2_:%[0-9]+]]:vr = PseudoVMSLEU_VV_MF2 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV64I-NEXT: [[PseudoVMSLEU_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSLEU_VV_MF2 [[DEF]], [[DEF]], -1, 3 /* e8 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_MF2_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 4 x s8>) = G_IMPLICIT_DEF
@@ -85,13 +85,13 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv8i8
     ; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV32I-NEXT: [[PseudoVMSLE_VV_M1_:%[0-9]+]]:vr = PseudoVMSLE_VV_M1 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV32I-NEXT: [[PseudoVMSLE_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_M1 [[DEF]], [[DEF]], -1, 3 /* e8 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_M1_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv8i8
     ; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV64I-NEXT: [[PseudoVMSLE_VV_M1_:%[0-9]+]]:vr = PseudoVMSLE_VV_M1 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV64I-NEXT: [[PseudoVMSLE_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_M1 [[DEF]], [[DEF]], -1, 3 /* e8 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_M1_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 8 x s8>) = G_IMPLICIT_DEF
@@ -109,14 +109,14 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv16i8
     ; RV32I: [[DEF:%[0-9]+]]:vrm2 = IMPLICIT_DEF
-    ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 3 /* e8 */
-    ; RV32I-NEXT: $v8 = COPY %1
+    ; RV32I-NEXT: [[PseudoVMSLTU_VV_M2_:%[0-9]+]]:vmm2 = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M2_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv16i8
     ; RV64I: [[DEF:%[0-9]+]]:vrm2 = IMPLICIT_DEF
-    ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 3 /* e8 */
-    ; RV64I-NEXT: $v8 = COPY %1
+    ; RV64I-NEXT: [[PseudoVMSLTU_VV_M2_:%[0-9]+]]:vmm2 = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M2_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 16 x s8>) = G_IMPLICIT_DEF
     %1:vrb(<vscale x 16 x s1>) = G_ICMP intpred(ugt), %0(<vscale x 16 x s8>), %0
@@ -133,14 +133,14 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv32i8
     ; RV32I: [[DEF:%[0-9]+]]:vrm4 = IMPLICIT_DEF
-    ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 3 /* e8 */
-    ; RV32I-NEXT: $v8 = COPY %1
+    ; RV32I-NEXT: [[PseudoVMSLT_VV_M4_:%[0-9]+]]:vmm4 = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_M4_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv32i8
     ; RV64I: [[DEF:%[0-9]+]]:vrm4 = IMPLICIT_DEF
-    ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 3 /* e8 */
-    ; RV64I-NEXT: $v8 = COPY %1
+    ; RV64I-NEXT: [[PseudoVMSLT_VV_M4_:%[0-9]+]]:vmm4 = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_M4_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 32 x s8>) = G_IMPLICIT_DEF
     %1:vrb(<vscale x 32 x s1>) = G_ICMP intpred(sgt), %0(<vscale x 32 x s8>), %0
@@ -157,14 +157,14 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv64i8
     ; RV32I: [[DEF:%[0-9]+]]:vrm8 = IMPLICIT_DEF
-    ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 3 /* e8 */
-    ; RV32I-NEXT: $v8 = COPY %1
+    ; RV32I-NEXT: [[PseudoVMSLEU_VV_M8_:%[0-9]+]]:vmm8 = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_M8_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv64i8
     ; RV64I: [[DEF:%[0-9]+]]:vrm8 = IMPLICIT_DEF
-    ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 3 /* e8 */
-    ; RV64I-NEXT: $v8 = COPY %1
+    ; RV64I-NEXT: [[PseudoVMSLEU_VV_M8_:%[0-9]+]]:vmm8 = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 3 /* e8 */
+    ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_M8_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 64 x s8>) = G_IMPLICIT_DEF
     %1:vrb(<vscale x 64 x s1>) = G_ICMP intpred(ule), %0(<vscale x 64 x s8>), %0
@@ -181,13 +181,13 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv1i16
     ; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV32I-NEXT: [[PseudoVMSLE_VV_MF4_:%[0-9]+]]:vr = PseudoVMSLE_VV_MF4 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV32I-NEXT: [[PseudoVMSLE_VV_MF4_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_MF4 [[DEF]], [[DEF]], -1, 4 /* e16 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_MF4_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv1i16
     ; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV64I-NEXT: [[PseudoVMSLE_VV_MF4_:%[0-9]+]]:vr = PseudoVMSLE_VV_MF4 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV64I-NEXT: [[PseudoVMSLE_VV_MF4_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_MF4 [[DEF]], [[DEF]], -1, 4 /* e16 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_MF4_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 1 x s16>) = G_IMPLICIT_DEF
@@ -205,13 +205,13 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv2i16
     ; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV32I-NEXT: [[PseudoVMSNE_VV_MF2_:%[0-9]+]]:vr = PseudoVMSNE_VV_MF2 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV32I-NEXT: [[PseudoVMSNE_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSNE_VV_MF2 [[DEF]], [[DEF]], -1, 4 /* e16 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMSNE_VV_MF2_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv2i16
     ; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV64I-NEXT: [[PseudoVMSNE_VV_MF2_:%[0-9]+]]:vr = PseudoVMSNE_VV_MF2 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV64I-NEXT: [[PseudoVMSNE_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSNE_VV_MF2 [[DEF]], [[DEF]], -1, 4 /* e16 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMSNE_VV_MF2_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 2 x s16>) = G_IMPLICIT_DEF
@@ -229,13 +229,13 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv4i16
     ; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV32I-NEXT: [[PseudoVMSEQ_VV_M1_:%[0-9]+]]:vr = PseudoVMSEQ_VV_M1 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV32I-NEXT: [[PseudoVMSEQ_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSEQ_VV_M1 [[DEF]], [[DEF]], -1, 4 /* e16 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMSEQ_VV_M1_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv4i16
     ; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV64I-NEXT: [[PseudoVMSEQ_VV_M1_:%[0-9]+]]:vr = PseudoVMSEQ_VV_M1 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV64I-NEXT: [[PseudoVMSEQ_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSEQ_VV_M1 [[DEF]], [[DEF]], -1, 4 /* e16 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMSEQ_VV_M1_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 4 x s16>) = G_IMPLICIT_DEF
@@ -253,14 +253,14 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv8i16
     ; RV32I: [[DEF:%[0-9]+]]:vrm2 = IMPLICIT_DEF
-    ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 4 /* e16 */
-    ; RV32I-NEXT: $v8 = COPY %1
+    ; RV32I-NEXT: [[PseudoVMSLTU_VV_M2_:%[0-9]+]]:vmm2 = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M2_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv8i16
     ; RV64I: [[DEF:%[0-9]+]]:vrm2 = IMPLICIT_DEF
-    ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 4 /* e16 */
-    ; RV64I-NEXT: $v8 = COPY %1
+    ; RV64I-NEXT: [[PseudoVMSLTU_VV_M2_:%[0-9]+]]:vmm2 = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M2_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 8 x s16>) = G_IMPLICIT_DEF
     %1:vrb(<vscale x 8 x s1>) = G_ICMP intpred(ult), %0(<vscale x 8 x s16>), %0
@@ -277,14 +277,14 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv16i16
     ; RV32I: [[DEF:%[0-9]+]]:vrm4 = IMPLICIT_DEF
-    ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 4 /* e16 */
-    ; RV32I-NEXT: $v8 = COPY %1
+    ; RV32I-NEXT: [[PseudoVMSLT_VV_M4_:%[0-9]+]]:vmm4 = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_M4_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv16i16
     ; RV64I: [[DEF:%[0-9]+]]:vrm4 = IMPLICIT_DEF
-    ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 4 /* e16 */
-    ; RV64I-NEXT: $v8 = COPY %1
+    ; RV64I-NEXT: [[PseudoVMSLT_VV_M4_:%[0-9]+]]:vmm4 = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_M4_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 16 x s16>) = G_IMPLICIT_DEF
     %1:vrb(<vscale x 16 x s1>) = G_ICMP intpred(slt), %0(<vscale x 16 x s16>), %0
@@ -301,14 +301,14 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv32i16
     ; RV32I: [[DEF:%[0-9]+]]:vrm8 = IMPLICIT_DEF
-    ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 4 /* e16 */
-    ; RV32I-NEXT: $v8 = COPY %1
+    ; RV32I-NEXT: [[PseudoVMSLEU_VV_M8_:%[0-9]+]]:vmm8 = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_M8_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv32i16
     ; RV64I: [[DEF:%[0-9]+]]:vrm8 = IMPLICIT_DEF
-    ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 4 /* e16 */
-    ; RV64I-NEXT: $v8 = COPY %1
+    ; RV64I-NEXT: [[PseudoVMSLEU_VV_M8_:%[0-9]+]]:vmm8 = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 4 /* e16 */
+    ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_M8_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 32 x s16>) = G_IMPLICIT_DEF
     %1:vrb(<vscale x 32 x s1>) = G_ICMP intpred(uge), %0(<vscale x 32 x s16>), %0
@@ -325,13 +325,13 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv1i32
     ; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV32I-NEXT: [[PseudoVMSLE_VV_MF2_:%[0-9]+]]:vr = PseudoVMSLE_VV_MF2 [[DEF]], [[DEF]], -1, 5 /* e32 */
+    ; RV32I-NEXT: [[PseudoVMSLE_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_MF2 [[DEF]], [[DEF]], -1, 5 /* e32 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_MF2_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv1i32
     ; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV64I-NEXT: [[PseudoVMSLE_VV_MF2_:%[0-9]+]]:vr = PseudoVMSLE_VV_MF2 [[DEF]], [[DEF]], -1, 5 /* e32 */
+    ; RV64I-NEXT: [[PseudoVMSLE_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_MF2 [[DEF]], [[DEF]], -1, 5 /* e32 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_MF2_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 1 x s32>) = G_IMPLICIT_DEF
@@ -349,13 +349,13 @@ body:             |
   bb.0.entry:
     ; RV32I-LABEL: name: icmp_nxv2i32
     ; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV32I-NEXT: [[PseudoVMSLTU_VV_M1_:%[0-9]+]]:vr = PseudoVMSLTU_VV_M1 [[DEF]], [[DEF]], -1, 5 /* e32 */
+    ; RV32I-NEXT: [[PseudoVMSLTU_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSLTU_VV_M1 [[DEF]], [[DEF]], -1, 5 /* e32 */
     ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M1_]]
     ; RV32I-NEXT: PseudoRET implicit $v8
     ;
     ; RV64I-LABEL: name: icmp_nxv2i32
     ; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
-    ; RV64I-NEXT: [[PseudoVMSLTU_VV_M1_:%[0-9]+]]:vr = PseudoVMSLTU_VV_M1 [[DEF]], [[DEF]], -1, 5 /* e32 */
+    ; RV64I-NEXT: [[PseudoVMSLTU_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSLTU_VV_M1 [[DEF]], [[DEF]], -1, 5 /* e32 */
     ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M1_]]
     ; RV64I-NEXT: PseudoRET implicit $v8
     %0:vrb(<vscale x 2 x s32>) = G_IMPLICIT_DEF
@@ -373,14 +373,14 @@ body:             |
   bb.0.entry:
   ...
[truncated]

; CHECK-NEXT: vmerge.vvm v24, v8, v16, v0
; CHECK-NEXT: vmv1r.v v0, v7
; CHECK-NEXT: vl1r.v v0, (a0) # Unknown-size Folded Reload
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of regression here.

; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma
; CHECK-NEXT: fsrmi a0, 2
; CHECK-NEXT: vmv1r.v v0, v12
; CHECK-NEXT: vfcvt.x.f.v v16, v8, v0.t
; CHECK-NEXT: vfcvt.x.f.v v12, v8, v0.t
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of improvement here.

Created using spr 1.3.6-beta.1
@topperc
Copy link
Collaborator

topperc commented Apr 16, 2024

If we agree, we can model other instructions in this way.

What other instructions need this?

@topperc
Copy link
Collaborator

topperc commented Apr 16, 2024

I think, a long time ago I wondered if we should have a version of the compare pseudos that had the dest tied to v0 to avoid the early clobber. And let the 3 address instruction conversion break it if it needed. Similar to the _TIED instructions we have for VWADD.WV.

@wangpc-pp
Copy link
Contributor Author

I think, a long time ago I wondered if we should have a version of the compare pseudos that had the dest tied to v0 to avoid the early clobber. And let the 3 address instruction conversion break it if it needed. Similar to the _TIED instructions we have for VWADD.WV.

Thanks! I didn't know we can avoid the early clobber in this way, I will have a try!

@wangpc-pp
Copy link
Contributor Author

I think, a long time ago I wondered if we should have a version of the compare pseudos that had the dest tied to v0 to avoid the early clobber. And let the 3 address instruction conversion break it if it needed. Similar to the _TIED instructions we have for VWADD.WV.

Thanks! I didn't know we can avoid the early clobber in this way, I will have a try!

I don't know if I understand correctly, I think this TIED pseudos way doesn't work for compare instructions. :-(

wangpc-pp added a commit that referenced this pull request Apr 17, 2024
Created using spr 1.3.6-beta.1
@@ -88,11 +88,11 @@ define <vscale x 16 x i1> @nxv16i1(i1 %x, i1 %y) {
; CHECK-NEXT: andi a0, a0, 1
; CHECK-NEXT: vsetvli a2, zero, e8, m2, ta, ma
; CHECK-NEXT: vmv.v.x v8, a0
; CHECK-NEXT: vmsne.vi v10, v8, 0
; CHECK-NEXT: vmsne.vi v0, v8, 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LMUL 8 the dest reg can only be v0,v8,v16,v24 now right? I think we would want to check that the register coalescer isn't propagating the VMM8 reg class to other uses. Or at least make sure that it gets inflated back to VR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least make sure that it gets inflated back to VR?

Any thoughts about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that MachineRegisterInfo::recomputeRegClass might be able to change a VMM8 register to VR. But after reading the function, I don't think it will if the compare instruction is still around since that will still have the constraint.

So I presume we'll end up with spilling if we have more than four LMUL8 compare results live at the same time? It would be good to know how often that happens in practice

@preames
Copy link
Collaborator

preames commented Apr 22, 2024

It snot clear to me that this change is a net positive.

Using m2 as an example, with the early clobber modeling the register allocator can pick any of 30 registers (for the vx forms). With your proposed form, we only get 16 possible registers. This is a net increase in register constraint and seems undesirable.

The ability to reuse a source register is only useful if that source has no other use, for cases where the operand has other uses, this patch would be a regression.

I think Craigs suggestion of a tied variant which is untied if needed might be reasonable. We could also consider biting the bullet and implementing the actual overlap constraint rules - though that would need some pretty major evidence of benefit to be worth the work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants