-
Notifications
You must be signed in to change notification settings - Fork 11.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Re-model RVV comparison instructions #88868
base: main
Are you sure you want to change the base?
[RISCV] Re-model RVV comparison instructions #88868
Conversation
Created using spr 1.3.6-beta.1
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-risc-v Author: Pengcheng Wang (wangpc-pp) ChangesWe remove the Instead, we add > The destination EEW is smaller than the source EEW and the overlap The benefits:
Patch is 1.84 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88868.diff 90 Files Affected:
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVRegisterBankInfo.cpp b/llvm/lib/Target/RISCV/GISel/RISCVRegisterBankInfo.cpp
index 86e44343b50865..ca77a9729e03b9 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVRegisterBankInfo.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVRegisterBankInfo.cpp
@@ -110,6 +110,8 @@ RISCVRegisterBankInfo::getRegBankFromRegClass(const TargetRegisterClass &RC,
LLT Ty) const {
switch (RC.getID()) {
default:
+ if (RISCVRI::isVRegClass(RC.TSFlags))
+ return getRegBank(RISCV::VRBRegBankID);
llvm_unreachable("Register class not supported");
case RISCV::GPRRegClassID:
case RISCV::GPRF16RegClassID:
@@ -131,20 +133,6 @@ RISCVRegisterBankInfo::getRegBankFromRegClass(const TargetRegisterClass &RC,
case RISCV::FPR64CRegClassID:
case RISCV::FPR32CRegClassID:
return getRegBank(RISCV::FPRBRegBankID);
- case RISCV::VMRegClassID:
- case RISCV::VRRegClassID:
- case RISCV::VRNoV0RegClassID:
- case RISCV::VRM2RegClassID:
- case RISCV::VRM2NoV0RegClassID:
- case RISCV::VRM4RegClassID:
- case RISCV::VRM4NoV0RegClassID:
- case RISCV::VMV0RegClassID:
- case RISCV::VRM2_with_sub_vrm1_0_in_VMV0RegClassID:
- case RISCV::VRM4_with_sub_vrm1_0_in_VMV0RegClassID:
- case RISCV::VRM8RegClassID:
- case RISCV::VRM8NoV0RegClassID:
- case RISCV::VRM8_with_sub_vrm1_0_in_VMV0RegClassID:
- return getRegBank(RISCV::VRBRegBankID);
}
}
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
index ad1821d57256bc..686bfd1af0d062 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
@@ -143,22 +143,24 @@ class PseudoToVInst<string PseudoInst> {
// This class describes information associated to the LMUL.
class LMULInfo<int lmul, int oct, VReg regclass, VReg wregclass,
- VReg f2regclass, VReg f4regclass, VReg f8regclass, string mx> {
+ VReg f2regclass, VReg f4regclass, VReg f8regclass, string mx,
+ VReg moutregclass = VMM1> {
bits<3> value = lmul; // This is encoded as the vlmul field of vtype.
VReg vrclass = regclass;
VReg wvrclass = wregclass;
VReg f8vrclass = f8regclass;
VReg f4vrclass = f4regclass;
VReg f2vrclass = f2regclass;
+ VReg moutclass = moutregclass;
string MX = mx;
int octuple = oct;
}
// Associate LMUL with tablegen records of register classes.
def V_M1 : LMULInfo<0b000, 8, VR, VRM2, VR, VR, VR, "M1">;
-def V_M2 : LMULInfo<0b001, 16, VRM2, VRM4, VR, VR, VR, "M2">;
-def V_M4 : LMULInfo<0b010, 32, VRM4, VRM8, VRM2, VR, VR, "M4">;
-def V_M8 : LMULInfo<0b011, 64, VRM8,/*NoVReg*/VR, VRM4, VRM2, VR, "M8">;
+def V_M2 : LMULInfo<0b001, 16, VRM2, VRM4, VR, VR, VR, "M2", VMM2>;
+def V_M4 : LMULInfo<0b010, 32, VRM4, VRM8, VRM2, VR, VR, "M4", VMM4>;
+def V_M8 : LMULInfo<0b011, 64, VRM8,/*NoVReg*/VR, VRM4, VRM2, VR, "M8", VMM8>;
def V_MF8 : LMULInfo<0b101, 1, VR, VR,/*NoVReg*/VR,/*NoVReg*/VR,/*NoVReg*/VR, "MF8">;
def V_MF4 : LMULInfo<0b110, 2, VR, VR, VR,/*NoVReg*/VR,/*NoVReg*/VR, "MF4">;
@@ -2668,25 +2670,21 @@ multiclass PseudoVEXT_VF8 {
// With LMUL<=1 the source and dest occupy a single register so any overlap
// is in the lowest-numbered part.
multiclass VPseudoBinaryM_VV<LMULInfo m, int TargetConstraintType = 1> {
- defm _VV : VPseudoBinaryM<VR, m.vrclass, m.vrclass, m,
- !if(!ge(m.octuple, 16), "@earlyclobber $rd", ""), TargetConstraintType>;
+ defm _VV : VPseudoBinaryM<m.moutclass, m.vrclass, m.vrclass, m, "", TargetConstraintType>;
}
multiclass VPseudoBinaryM_VX<LMULInfo m, int TargetConstraintType = 1> {
defm "_VX" :
- VPseudoBinaryM<VR, m.vrclass, GPR, m,
- !if(!ge(m.octuple, 16), "@earlyclobber $rd", ""), TargetConstraintType>;
+ VPseudoBinaryM<m.moutclass, m.vrclass, GPR, m, "", TargetConstraintType>;
}
multiclass VPseudoBinaryM_VF<LMULInfo m, FPR_Info f, int TargetConstraintType = 1> {
defm "_V" # f.FX :
- VPseudoBinaryM<VR, m.vrclass, f.fprclass, m,
- !if(!ge(m.octuple, 16), "@earlyclobber $rd", ""), TargetConstraintType>;
+ VPseudoBinaryM<m.moutclass, m.vrclass, f.fprclass, m, "", TargetConstraintType>;
}
multiclass VPseudoBinaryM_VI<LMULInfo m, int TargetConstraintType = 1> {
- defm _VI : VPseudoBinaryM<VR, m.vrclass, simm5, m,
- !if(!ge(m.octuple, 16), "@earlyclobber $rd", ""), TargetConstraintType>;
+ defm _VI : VPseudoBinaryM<m.moutclass, m.vrclass, simm5, m, "", TargetConstraintType>;
}
multiclass VPseudoVGTR_VV_VX_VI<Operand ImmType = simm5, string Constraint = ""> {
diff --git a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
index 316daf2763ca1e..1a0533c7072705 100644
--- a/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVRegisterInfo.td
@@ -533,6 +533,12 @@ def VR : VReg<!listconcat(VM1VTs, VMaskVTs),
(add (sequence "V%u", 8, 31),
(sequence "V%u", 7, 0)), 1>;
+// V0 is likely to be used as mask, so we move it in front of allocation order.
+def VMM1 : VReg<VMaskVTs, (add (sequence "V%u", 0, 31)), 1>;
+def VMM2 : VReg<VMaskVTs, (add (sequence "V%u", 0, 31, 2)), 1>;
+def VMM4 : VReg<VMaskVTs, (add (sequence "V%u", 0, 31, 4)), 1>;
+def VMM8 : VReg<VMaskVTs, (add (sequence "V%u", 0, 31, 8)), 1>;
+
def VRNoV0 : VReg<!listconcat(VM1VTs, VMaskVTs), (sub VR, V0), 1>;
def VRM2 : VReg<VM2VTs, (add (sequence "V%uM2", 8, 31, 2),
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/icmp.mir b/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/icmp.mir
index df0d48aac92551..0677232fa60677 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/icmp.mir
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/instruction-select/rvv/icmp.mir
@@ -13,13 +13,13 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv1i8
; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV32I-NEXT: [[PseudoVMSLTU_VV_MF8_:%[0-9]+]]:vr = PseudoVMSLTU_VV_MF8 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV32I-NEXT: [[PseudoVMSLTU_VV_MF8_:%[0-9]+]]:vmm1 = PseudoVMSLTU_VV_MF8 [[DEF]], [[DEF]], -1, 3 /* e8 */
; RV32I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_MF8_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv1i8
; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV64I-NEXT: [[PseudoVMSLTU_VV_MF8_:%[0-9]+]]:vr = PseudoVMSLTU_VV_MF8 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV64I-NEXT: [[PseudoVMSLTU_VV_MF8_:%[0-9]+]]:vmm1 = PseudoVMSLTU_VV_MF8 [[DEF]], [[DEF]], -1, 3 /* e8 */
; RV64I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_MF8_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 1 x s8>) = G_IMPLICIT_DEF
@@ -37,13 +37,13 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv2i8
; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV32I-NEXT: [[PseudoVMSLT_VV_MF4_:%[0-9]+]]:vr = PseudoVMSLT_VV_MF4 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV32I-NEXT: [[PseudoVMSLT_VV_MF4_:%[0-9]+]]:vmm1 = PseudoVMSLT_VV_MF4 [[DEF]], [[DEF]], -1, 3 /* e8 */
; RV32I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_MF4_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv2i8
; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV64I-NEXT: [[PseudoVMSLT_VV_MF4_:%[0-9]+]]:vr = PseudoVMSLT_VV_MF4 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV64I-NEXT: [[PseudoVMSLT_VV_MF4_:%[0-9]+]]:vmm1 = PseudoVMSLT_VV_MF4 [[DEF]], [[DEF]], -1, 3 /* e8 */
; RV64I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_MF4_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 2 x s8>) = G_IMPLICIT_DEF
@@ -61,13 +61,13 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv4i8
; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV32I-NEXT: [[PseudoVMSLEU_VV_MF2_:%[0-9]+]]:vr = PseudoVMSLEU_VV_MF2 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV32I-NEXT: [[PseudoVMSLEU_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSLEU_VV_MF2 [[DEF]], [[DEF]], -1, 3 /* e8 */
; RV32I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_MF2_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv4i8
; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV64I-NEXT: [[PseudoVMSLEU_VV_MF2_:%[0-9]+]]:vr = PseudoVMSLEU_VV_MF2 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV64I-NEXT: [[PseudoVMSLEU_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSLEU_VV_MF2 [[DEF]], [[DEF]], -1, 3 /* e8 */
; RV64I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_MF2_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 4 x s8>) = G_IMPLICIT_DEF
@@ -85,13 +85,13 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv8i8
; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV32I-NEXT: [[PseudoVMSLE_VV_M1_:%[0-9]+]]:vr = PseudoVMSLE_VV_M1 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV32I-NEXT: [[PseudoVMSLE_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_M1 [[DEF]], [[DEF]], -1, 3 /* e8 */
; RV32I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_M1_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv8i8
; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV64I-NEXT: [[PseudoVMSLE_VV_M1_:%[0-9]+]]:vr = PseudoVMSLE_VV_M1 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV64I-NEXT: [[PseudoVMSLE_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_M1 [[DEF]], [[DEF]], -1, 3 /* e8 */
; RV64I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_M1_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 8 x s8>) = G_IMPLICIT_DEF
@@ -109,14 +109,14 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv16i8
; RV32I: [[DEF:%[0-9]+]]:vrm2 = IMPLICIT_DEF
- ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 3 /* e8 */
- ; RV32I-NEXT: $v8 = COPY %1
+ ; RV32I-NEXT: [[PseudoVMSLTU_VV_M2_:%[0-9]+]]:vmm2 = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M2_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv16i8
; RV64I: [[DEF:%[0-9]+]]:vrm2 = IMPLICIT_DEF
- ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 3 /* e8 */
- ; RV64I-NEXT: $v8 = COPY %1
+ ; RV64I-NEXT: [[PseudoVMSLTU_VV_M2_:%[0-9]+]]:vmm2 = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M2_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 16 x s8>) = G_IMPLICIT_DEF
%1:vrb(<vscale x 16 x s1>) = G_ICMP intpred(ugt), %0(<vscale x 16 x s8>), %0
@@ -133,14 +133,14 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv32i8
; RV32I: [[DEF:%[0-9]+]]:vrm4 = IMPLICIT_DEF
- ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 3 /* e8 */
- ; RV32I-NEXT: $v8 = COPY %1
+ ; RV32I-NEXT: [[PseudoVMSLT_VV_M4_:%[0-9]+]]:vmm4 = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_M4_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv32i8
; RV64I: [[DEF:%[0-9]+]]:vrm4 = IMPLICIT_DEF
- ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 3 /* e8 */
- ; RV64I-NEXT: $v8 = COPY %1
+ ; RV64I-NEXT: [[PseudoVMSLT_VV_M4_:%[0-9]+]]:vmm4 = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_M4_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 32 x s8>) = G_IMPLICIT_DEF
%1:vrb(<vscale x 32 x s1>) = G_ICMP intpred(sgt), %0(<vscale x 32 x s8>), %0
@@ -157,14 +157,14 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv64i8
; RV32I: [[DEF:%[0-9]+]]:vrm8 = IMPLICIT_DEF
- ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 3 /* e8 */
- ; RV32I-NEXT: $v8 = COPY %1
+ ; RV32I-NEXT: [[PseudoVMSLEU_VV_M8_:%[0-9]+]]:vmm8 = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_M8_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv64i8
; RV64I: [[DEF:%[0-9]+]]:vrm8 = IMPLICIT_DEF
- ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 3 /* e8 */
- ; RV64I-NEXT: $v8 = COPY %1
+ ; RV64I-NEXT: [[PseudoVMSLEU_VV_M8_:%[0-9]+]]:vmm8 = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 3 /* e8 */
+ ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_M8_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 64 x s8>) = G_IMPLICIT_DEF
%1:vrb(<vscale x 64 x s1>) = G_ICMP intpred(ule), %0(<vscale x 64 x s8>), %0
@@ -181,13 +181,13 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv1i16
; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV32I-NEXT: [[PseudoVMSLE_VV_MF4_:%[0-9]+]]:vr = PseudoVMSLE_VV_MF4 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV32I-NEXT: [[PseudoVMSLE_VV_MF4_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_MF4 [[DEF]], [[DEF]], -1, 4 /* e16 */
; RV32I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_MF4_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv1i16
; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV64I-NEXT: [[PseudoVMSLE_VV_MF4_:%[0-9]+]]:vr = PseudoVMSLE_VV_MF4 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV64I-NEXT: [[PseudoVMSLE_VV_MF4_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_MF4 [[DEF]], [[DEF]], -1, 4 /* e16 */
; RV64I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_MF4_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 1 x s16>) = G_IMPLICIT_DEF
@@ -205,13 +205,13 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv2i16
; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV32I-NEXT: [[PseudoVMSNE_VV_MF2_:%[0-9]+]]:vr = PseudoVMSNE_VV_MF2 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV32I-NEXT: [[PseudoVMSNE_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSNE_VV_MF2 [[DEF]], [[DEF]], -1, 4 /* e16 */
; RV32I-NEXT: $v8 = COPY [[PseudoVMSNE_VV_MF2_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv2i16
; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV64I-NEXT: [[PseudoVMSNE_VV_MF2_:%[0-9]+]]:vr = PseudoVMSNE_VV_MF2 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV64I-NEXT: [[PseudoVMSNE_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSNE_VV_MF2 [[DEF]], [[DEF]], -1, 4 /* e16 */
; RV64I-NEXT: $v8 = COPY [[PseudoVMSNE_VV_MF2_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 2 x s16>) = G_IMPLICIT_DEF
@@ -229,13 +229,13 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv4i16
; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV32I-NEXT: [[PseudoVMSEQ_VV_M1_:%[0-9]+]]:vr = PseudoVMSEQ_VV_M1 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV32I-NEXT: [[PseudoVMSEQ_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSEQ_VV_M1 [[DEF]], [[DEF]], -1, 4 /* e16 */
; RV32I-NEXT: $v8 = COPY [[PseudoVMSEQ_VV_M1_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv4i16
; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV64I-NEXT: [[PseudoVMSEQ_VV_M1_:%[0-9]+]]:vr = PseudoVMSEQ_VV_M1 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV64I-NEXT: [[PseudoVMSEQ_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSEQ_VV_M1 [[DEF]], [[DEF]], -1, 4 /* e16 */
; RV64I-NEXT: $v8 = COPY [[PseudoVMSEQ_VV_M1_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 4 x s16>) = G_IMPLICIT_DEF
@@ -253,14 +253,14 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv8i16
; RV32I: [[DEF:%[0-9]+]]:vrm2 = IMPLICIT_DEF
- ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 4 /* e16 */
- ; RV32I-NEXT: $v8 = COPY %1
+ ; RV32I-NEXT: [[PseudoVMSLTU_VV_M2_:%[0-9]+]]:vmm2 = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M2_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv8i16
; RV64I: [[DEF:%[0-9]+]]:vrm2 = IMPLICIT_DEF
- ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 4 /* e16 */
- ; RV64I-NEXT: $v8 = COPY %1
+ ; RV64I-NEXT: [[PseudoVMSLTU_VV_M2_:%[0-9]+]]:vmm2 = PseudoVMSLTU_VV_M2 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M2_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 8 x s16>) = G_IMPLICIT_DEF
%1:vrb(<vscale x 8 x s1>) = G_ICMP intpred(ult), %0(<vscale x 8 x s16>), %0
@@ -277,14 +277,14 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv16i16
; RV32I: [[DEF:%[0-9]+]]:vrm4 = IMPLICIT_DEF
- ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 4 /* e16 */
- ; RV32I-NEXT: $v8 = COPY %1
+ ; RV32I-NEXT: [[PseudoVMSLT_VV_M4_:%[0-9]+]]:vmm4 = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_M4_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv16i16
; RV64I: [[DEF:%[0-9]+]]:vrm4 = IMPLICIT_DEF
- ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 4 /* e16 */
- ; RV64I-NEXT: $v8 = COPY %1
+ ; RV64I-NEXT: [[PseudoVMSLT_VV_M4_:%[0-9]+]]:vmm4 = PseudoVMSLT_VV_M4 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLT_VV_M4_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 16 x s16>) = G_IMPLICIT_DEF
%1:vrb(<vscale x 16 x s1>) = G_ICMP intpred(slt), %0(<vscale x 16 x s16>), %0
@@ -301,14 +301,14 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv32i16
; RV32I: [[DEF:%[0-9]+]]:vrm8 = IMPLICIT_DEF
- ; RV32I-NEXT: early-clobber %1:vr = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 4 /* e16 */
- ; RV32I-NEXT: $v8 = COPY %1
+ ; RV32I-NEXT: [[PseudoVMSLEU_VV_M8_:%[0-9]+]]:vmm8 = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV32I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_M8_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv32i16
; RV64I: [[DEF:%[0-9]+]]:vrm8 = IMPLICIT_DEF
- ; RV64I-NEXT: early-clobber %1:vr = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 4 /* e16 */
- ; RV64I-NEXT: $v8 = COPY %1
+ ; RV64I-NEXT: [[PseudoVMSLEU_VV_M8_:%[0-9]+]]:vmm8 = PseudoVMSLEU_VV_M8 [[DEF]], [[DEF]], -1, 4 /* e16 */
+ ; RV64I-NEXT: $v8 = COPY [[PseudoVMSLEU_VV_M8_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 32 x s16>) = G_IMPLICIT_DEF
%1:vrb(<vscale x 32 x s1>) = G_ICMP intpred(uge), %0(<vscale x 32 x s16>), %0
@@ -325,13 +325,13 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv1i32
; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV32I-NEXT: [[PseudoVMSLE_VV_MF2_:%[0-9]+]]:vr = PseudoVMSLE_VV_MF2 [[DEF]], [[DEF]], -1, 5 /* e32 */
+ ; RV32I-NEXT: [[PseudoVMSLE_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_MF2 [[DEF]], [[DEF]], -1, 5 /* e32 */
; RV32I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_MF2_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv1i32
; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV64I-NEXT: [[PseudoVMSLE_VV_MF2_:%[0-9]+]]:vr = PseudoVMSLE_VV_MF2 [[DEF]], [[DEF]], -1, 5 /* e32 */
+ ; RV64I-NEXT: [[PseudoVMSLE_VV_MF2_:%[0-9]+]]:vmm1 = PseudoVMSLE_VV_MF2 [[DEF]], [[DEF]], -1, 5 /* e32 */
; RV64I-NEXT: $v8 = COPY [[PseudoVMSLE_VV_MF2_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 1 x s32>) = G_IMPLICIT_DEF
@@ -349,13 +349,13 @@ body: |
bb.0.entry:
; RV32I-LABEL: name: icmp_nxv2i32
; RV32I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV32I-NEXT: [[PseudoVMSLTU_VV_M1_:%[0-9]+]]:vr = PseudoVMSLTU_VV_M1 [[DEF]], [[DEF]], -1, 5 /* e32 */
+ ; RV32I-NEXT: [[PseudoVMSLTU_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSLTU_VV_M1 [[DEF]], [[DEF]], -1, 5 /* e32 */
; RV32I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M1_]]
; RV32I-NEXT: PseudoRET implicit $v8
;
; RV64I-LABEL: name: icmp_nxv2i32
; RV64I: [[DEF:%[0-9]+]]:vr = IMPLICIT_DEF
- ; RV64I-NEXT: [[PseudoVMSLTU_VV_M1_:%[0-9]+]]:vr = PseudoVMSLTU_VV_M1 [[DEF]], [[DEF]], -1, 5 /* e32 */
+ ; RV64I-NEXT: [[PseudoVMSLTU_VV_M1_:%[0-9]+]]:vmm1 = PseudoVMSLTU_VV_M1 [[DEF]], [[DEF]], -1, 5 /* e32 */
; RV64I-NEXT: $v8 = COPY [[PseudoVMSLTU_VV_M1_]]
; RV64I-NEXT: PseudoRET implicit $v8
%0:vrb(<vscale x 2 x s32>) = G_IMPLICIT_DEF
@@ -373,14 +373,14 @@ body: |
bb.0.entry:
...
[truncated]
|
; CHECK-NEXT: vmerge.vvm v24, v8, v16, v0 | ||
; CHECK-NEXT: vmv1r.v v0, v7 | ||
; CHECK-NEXT: vl1r.v v0, (a0) # Unknown-size Folded Reload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of regression here.
; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma | ||
; CHECK-NEXT: fsrmi a0, 2 | ||
; CHECK-NEXT: vmv1r.v v0, v12 | ||
; CHECK-NEXT: vfcvt.x.f.v v16, v8, v0.t | ||
; CHECK-NEXT: vfcvt.x.f.v v12, v8, v0.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example of improvement here.
What other instructions need this? |
I think, a long time ago I wondered if we should have a version of the compare pseudos that had the dest tied to v0 to avoid the early clobber. And let the 3 address instruction conversion break it if it needed. Similar to the _TIED instructions we have for VWADD.WV. |
Thanks! I didn't know we can avoid the early clobber in this way, I will have a try! |
I don't know if I understand correctly, I think this TIED pseudos way doesn't work for compare instructions. :-( |
@@ -88,11 +88,11 @@ define <vscale x 16 x i1> @nxv16i1(i1 %x, i1 %y) { | |||
; CHECK-NEXT: andi a0, a0, 1 | |||
; CHECK-NEXT: vsetvli a2, zero, e8, m2, ta, ma | |||
; CHECK-NEXT: vmv.v.x v8, a0 | |||
; CHECK-NEXT: vmsne.vi v10, v8, 0 | |||
; CHECK-NEXT: vmsne.vi v0, v8, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For LMUL 8 the dest reg can only be v0,v8,v16,v24 now right? I think we would want to check that the register coalescer isn't propagating the VMM8 reg class to other uses. Or at least make sure that it gets inflated back to VR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least make sure that it gets inflated back to VR?
Any thoughts about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that MachineRegisterInfo::recomputeRegClass
might be able to change a VMM8 register to VR. But after reading the function, I don't think it will if the compare instruction is still around since that will still have the constraint.
So I presume we'll end up with spilling if we have more than four LMUL8 compare results live at the same time? It would be good to know how often that happens in practice
It snot clear to me that this change is a net positive. Using m2 as an example, with the early clobber modeling the register allocator can pick any of 30 registers (for the vx forms). With your proposed form, we only get 16 possible registers. This is a net increase in register constraint and seems undesirable. The ability to reuse a source register is only useful if that source has no other use, for cases where the operand has other uses, this patch would be a regression. I think Craigs suggestion of a tied variant which is untied if needed might be reasonable. We could also consider biting the bullet and implementing the actual overlap constraint rules - though that would need some pretty major evidence of benefit to be worth the work. |
We remove the
@earlyclobber
constraint.Instead, we add
RegisterClass
es which contain only the lowestLMUL1 registers of different
LMUL
s. We use them as the outputoperand of comparison instrcutions to match the constraint:
The benefits:
regressions in LMUL8 tests). Of course, this conclusion is from
unit tests, we should evaluate it on some benchmarks.
earlyclobber mechanism should be the right way).