-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV] Remove custom instruction selection for VFCVT_RM and friends #72540
Conversation
We already have the pseudo's for lowering these as MI nodes with rounding mode operands, and the generic FRM insertion pass. Doing the insertion later in the backend allows SSA level passes to avoid reasoning about physical register copies, and happens to produce better code in practice. The later is mostly an accident of our insertion order; we happen to place the frm write after the vsetvli, and it's very common for a register to be killed at the vsetvli. End result is that we get slightly better scalar register allocation. I'm a bit unclear on the history here. I was surprised to find this code in ISEL lowering at all, but am also surprised once I found it that all the patterns and pseudos seem to already exist. My best guess is that maybe we didn't do all the possible cleanup after introducing the HasRoundMode mechanism?
@llvm/pr-subscribers-backend-risc-v Author: Philip Reames (preames) ChangesWe already have the pseudo's for lowering these as MI nodes with rounding mode operands, and the generic FRM insertion pass. Doing the insertion later in the backend allows SSA level passes to avoid reasoning about physical register copies, and happens to produce better code in practice. The later is mostly an accident of our insertion order; we happen to place the frm write after the vsetvli, and it's very common for a register to be killed at the vsetvli. End result is that we get slightly better scalar register allocation. I'm a bit unclear on the history here. I was surprised to find this code in ISEL lowering at all, but am also surprised once I found it that all the patterns and pseudos seem to already exist. My best guess is that maybe we didn't do all the possible cleanup after introducing the HasRoundMode mechanism? Patch is 173.52 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/72540.diff 21 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index a4cd8327f45f82a..84726e61f320ae0 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -16239,47 +16239,6 @@ static MachineBasicBlock *emitSelectPseudo(MachineInstr &MI,
return TailMBB;
}
-static MachineBasicBlock *emitVFCVT_RM(MachineInstr &MI, MachineBasicBlock *BB,
- unsigned Opcode) {
- DebugLoc DL = MI.getDebugLoc();
-
- const TargetInstrInfo &TII = *BB->getParent()->getSubtarget().getInstrInfo();
-
- MachineRegisterInfo &MRI = BB->getParent()->getRegInfo();
- Register SavedFRM = MRI.createVirtualRegister(&RISCV::GPRRegClass);
-
- assert(MI.getNumOperands() == 8 || MI.getNumOperands() == 7);
- unsigned FRMIdx = MI.getNumOperands() == 8 ? 4 : 3;
-
- // Update FRM and save the old value.
- BuildMI(*BB, MI, DL, TII.get(RISCV::SwapFRMImm), SavedFRM)
- .addImm(MI.getOperand(FRMIdx).getImm());
-
- // Emit an VFCVT with the FRM == DYN
- auto MIB = BuildMI(*BB, MI, DL, TII.get(Opcode));
-
- for (unsigned I = 0; I < MI.getNumOperands(); I++)
- if (I != FRMIdx)
- MIB = MIB.add(MI.getOperand(I));
- else
- MIB = MIB.add(MachineOperand::CreateImm(7)); // frm = DYN
-
- MIB.add(MachineOperand::CreateReg(RISCV::FRM,
- /*IsDef*/ false,
- /*IsImp*/ true));
-
- if (MI.getFlag(MachineInstr::MIFlag::NoFPExcept))
- MIB->setFlag(MachineInstr::MIFlag::NoFPExcept);
-
- // Restore FRM.
- BuildMI(*BB, MI, DL, TII.get(RISCV::WriteFRM))
- .addReg(SavedFRM, RegState::Kill);
-
- // Erase the pseudoinstruction.
- MI.eraseFromParent();
- return BB;
-}
-
static MachineBasicBlock *emitVFROUND_NOEXCEPT_MASK(MachineInstr &MI,
MachineBasicBlock *BB,
unsigned CVTXOpc,
@@ -16524,43 +16483,6 @@ RISCVTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
return emitQuietFCMP(MI, BB, RISCV::FLT_D_IN32X, RISCV::FEQ_D_IN32X,
Subtarget);
-#define PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, LMUL) \
- case RISCV::RMOpc##_##LMUL: \
- return emitVFCVT_RM(MI, BB, RISCV::Opc##_##LMUL); \
- case RISCV::RMOpc##_##LMUL##_MASK: \
- return emitVFCVT_RM(MI, BB, RISCV::Opc##_##LMUL##_MASK);
-
-#define PseudoVFCVT_RM_CASE(RMOpc, Opc) \
- PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M1) \
- PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M2) \
- PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M4) \
- PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, MF2) \
- PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, MF4)
-
-#define PseudoVFCVT_RM_CASE_M8(RMOpc, Opc) \
- PseudoVFCVT_RM_CASE(RMOpc, Opc) \
- PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, M8)
-
-#define PseudoVFCVT_RM_CASE_MF8(RMOpc, Opc) \
- PseudoVFCVT_RM_CASE(RMOpc, Opc) \
- PseudoVFCVT_RM_LMUL_CASE(RMOpc, Opc, MF8)
-
- // VFCVT
- PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_X_F_V, PseudoVFCVT_X_F_V)
- PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_XU_F_V, PseudoVFCVT_XU_F_V)
- PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_F_XU_V, PseudoVFCVT_F_XU_V)
- PseudoVFCVT_RM_CASE_M8(PseudoVFCVT_RM_F_X_V, PseudoVFCVT_F_X_V)
-
- // VFWCVT
- PseudoVFCVT_RM_CASE(PseudoVFWCVT_RM_XU_F_V, PseudoVFWCVT_XU_F_V);
- PseudoVFCVT_RM_CASE(PseudoVFWCVT_RM_X_F_V, PseudoVFWCVT_X_F_V);
-
- // VFNCVT
- PseudoVFCVT_RM_CASE_MF8(PseudoVFNCVT_RM_XU_F_W, PseudoVFNCVT_XU_F_W);
- PseudoVFCVT_RM_CASE_MF8(PseudoVFNCVT_RM_X_F_W, PseudoVFNCVT_X_F_W);
- PseudoVFCVT_RM_CASE(PseudoVFNCVT_RM_F_XU_W, PseudoVFNCVT_F_XU_W);
- PseudoVFCVT_RM_CASE(PseudoVFNCVT_RM_F_X_W, PseudoVFNCVT_F_X_W);
-
case RISCV::PseudoVFROUND_NOEXCEPT_V_M1_MASK:
return emitVFROUND_NOEXCEPT_MASK(MI, BB, RISCV::PseudoVFCVT_X_F_V_M1_MASK,
RISCV::PseudoVFCVT_F_X_V_M1_MASK);
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
index be50bb95c81164e..127d3080491d1aa 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
@@ -1127,7 +1127,6 @@ class VPseudoUnaryNoMask_FRM<VReg RetClass,
let HasSEWOp = 1;
let HasVecPolicyOp = 1;
let HasRoundModeOp = 1;
- let usesCustomInserter = 1;
}
class VPseudoUnaryMask_FRM<VReg RetClass,
@@ -1147,7 +1146,6 @@ class VPseudoUnaryMask_FRM<VReg RetClass,
let HasVecPolicyOp = 1;
let UsesMaskPolicy = 1;
let HasRoundModeOp = 1;
- let usesCustomInserter = 1;
}
class VPseudoUnaryNoMaskGPROut :
diff --git a/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll b/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
index 7ce167f8929736b..edc348ebc68ff3b 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
@@ -15,8 +15,8 @@ define <vscale x 1 x half> @vp_ceil_vv_nxv1f16(<vscale x 1 x half> %va, <vscale
; CHECK-NEXT: vfabs.v v9, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e16, mf4, ta, mu
; CHECK-NEXT: vmflt.vf v0, v9, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e16, mf4, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vfcvt.x.f.v v9, v8, v0.t
; CHECK-NEXT: fsrm a0
; CHECK-NEXT: vfcvt.f.x.v v9, v9, v0.t
@@ -59,8 +59,8 @@ define <vscale x 2 x half> @vp_ceil_vv_nxv2f16(<vscale x 2 x half> %va, <vscale
; CHECK-NEXT: vfabs.v v9, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, mu
; CHECK-NEXT: vmflt.vf v0, v9, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e16, mf2, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vfcvt.x.f.v v9, v8, v0.t
; CHECK-NEXT: fsrm a0
; CHECK-NEXT: vfcvt.f.x.v v9, v9, v0.t
@@ -103,8 +103,8 @@ define <vscale x 4 x half> @vp_ceil_vv_nxv4f16(<vscale x 4 x half> %va, <vscale
; CHECK-NEXT: vfabs.v v9, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e16, m1, ta, mu
; CHECK-NEXT: vmflt.vf v0, v9, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e16, m1, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vfcvt.x.f.v v9, v8, v0.t
; CHECK-NEXT: fsrm a0
; CHECK-NEXT: vfcvt.f.x.v v9, v9, v0.t
@@ -148,8 +148,8 @@ define <vscale x 8 x half> @vp_ceil_vv_nxv8f16(<vscale x 8 x half> %va, <vscale
; CHECK-NEXT: vfabs.v v12, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, mu
; CHECK-NEXT: vmflt.vf v10, v12, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e16, m2, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v10
; CHECK-NEXT: vfcvt.x.f.v v12, v8, v0.t
; CHECK-NEXT: fsrm a0
@@ -194,8 +194,8 @@ define <vscale x 16 x half> @vp_ceil_vv_nxv16f16(<vscale x 16 x half> %va, <vsca
; CHECK-NEXT: vfabs.v v16, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e16, m4, ta, mu
; CHECK-NEXT: vmflt.vf v12, v16, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e16, m4, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v12
; CHECK-NEXT: vfcvt.x.f.v v16, v8, v0.t
; CHECK-NEXT: fsrm a0
@@ -240,8 +240,8 @@ define <vscale x 32 x half> @vp_ceil_vv_nxv32f16(<vscale x 32 x half> %va, <vsca
; CHECK-NEXT: vfabs.v v24, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e16, m8, ta, mu
; CHECK-NEXT: vmflt.vf v16, v24, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e16, m8, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v16
; CHECK-NEXT: vfcvt.x.f.v v24, v8, v0.t
; CHECK-NEXT: fsrm a0
@@ -285,8 +285,8 @@ define <vscale x 1 x float> @vp_ceil_vv_nxv1f32(<vscale x 1 x float> %va, <vscal
; CHECK-NEXT: fmv.w.x fa5, a0
; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, mu
; CHECK-NEXT: vmflt.vf v0, v9, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e32, mf2, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vfcvt.x.f.v v9, v8, v0.t
; CHECK-NEXT: fsrm a0
; CHECK-NEXT: vfcvt.f.x.v v9, v9, v0.t
@@ -329,8 +329,8 @@ define <vscale x 2 x float> @vp_ceil_vv_nxv2f32(<vscale x 2 x float> %va, <vscal
; CHECK-NEXT: fmv.w.x fa5, a0
; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, mu
; CHECK-NEXT: vmflt.vf v0, v9, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e32, m1, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vfcvt.x.f.v v9, v8, v0.t
; CHECK-NEXT: fsrm a0
; CHECK-NEXT: vfcvt.f.x.v v9, v9, v0.t
@@ -374,8 +374,8 @@ define <vscale x 4 x float> @vp_ceil_vv_nxv4f32(<vscale x 4 x float> %va, <vscal
; CHECK-NEXT: fmv.w.x fa5, a0
; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, mu
; CHECK-NEXT: vmflt.vf v10, v12, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e32, m2, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v10
; CHECK-NEXT: vfcvt.x.f.v v12, v8, v0.t
; CHECK-NEXT: fsrm a0
@@ -420,8 +420,8 @@ define <vscale x 8 x float> @vp_ceil_vv_nxv8f32(<vscale x 8 x float> %va, <vscal
; CHECK-NEXT: fmv.w.x fa5, a0
; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, mu
; CHECK-NEXT: vmflt.vf v12, v16, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e32, m4, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v12
; CHECK-NEXT: vfcvt.x.f.v v16, v8, v0.t
; CHECK-NEXT: fsrm a0
@@ -466,8 +466,8 @@ define <vscale x 16 x float> @vp_ceil_vv_nxv16f32(<vscale x 16 x float> %va, <vs
; CHECK-NEXT: fmv.w.x fa5, a0
; CHECK-NEXT: vsetvli zero, zero, e32, m8, ta, mu
; CHECK-NEXT: vmflt.vf v16, v24, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e32, m8, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v16
; CHECK-NEXT: vfcvt.x.f.v v24, v8, v0.t
; CHECK-NEXT: fsrm a0
@@ -511,8 +511,8 @@ define <vscale x 1 x double> @vp_ceil_vv_nxv1f64(<vscale x 1 x double> %va, <vsc
; CHECK-NEXT: vfabs.v v9, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e64, m1, ta, mu
; CHECK-NEXT: vmflt.vf v0, v9, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e64, m1, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vfcvt.x.f.v v9, v8, v0.t
; CHECK-NEXT: fsrm a0
; CHECK-NEXT: vfcvt.f.x.v v9, v9, v0.t
@@ -556,8 +556,8 @@ define <vscale x 2 x double> @vp_ceil_vv_nxv2f64(<vscale x 2 x double> %va, <vsc
; CHECK-NEXT: vfabs.v v12, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e64, m2, ta, mu
; CHECK-NEXT: vmflt.vf v10, v12, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e64, m2, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v10
; CHECK-NEXT: vfcvt.x.f.v v12, v8, v0.t
; CHECK-NEXT: fsrm a0
@@ -602,8 +602,8 @@ define <vscale x 4 x double> @vp_ceil_vv_nxv4f64(<vscale x 4 x double> %va, <vsc
; CHECK-NEXT: vfabs.v v16, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, mu
; CHECK-NEXT: vmflt.vf v12, v16, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e64, m4, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v12
; CHECK-NEXT: vfcvt.x.f.v v16, v8, v0.t
; CHECK-NEXT: fsrm a0
@@ -648,8 +648,8 @@ define <vscale x 7 x double> @vp_ceil_vv_nxv7f64(<vscale x 7 x double> %va, <vsc
; CHECK-NEXT: vfabs.v v24, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; CHECK-NEXT: vmflt.vf v16, v24, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v16
; CHECK-NEXT: vfcvt.x.f.v v24, v8, v0.t
; CHECK-NEXT: fsrm a0
@@ -694,8 +694,8 @@ define <vscale x 8 x double> @vp_ceil_vv_nxv8f64(<vscale x 8 x double> %va, <vsc
; CHECK-NEXT: vfabs.v v24, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; CHECK-NEXT: vmflt.vf v16, v24, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v16
; CHECK-NEXT: vfcvt.x.f.v v24, v8, v0.t
; CHECK-NEXT: fsrm a0
@@ -758,8 +758,8 @@ define <vscale x 16 x double> @vp_ceil_vv_nxv16f64(<vscale x 16 x double> %va, <
; CHECK-NEXT: vfabs.v v8, v16, v0.t
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; CHECK-NEXT: vmflt.vf v25, v8, fa5, v0.t
-; CHECK-NEXT: fsrmi a2, 3
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, ma
+; CHECK-NEXT: fsrmi a2, 3
; CHECK-NEXT: vmv1r.v v0, v25
; CHECK-NEXT: vfcvt.x.f.v v8, v16, v0.t
; CHECK-NEXT: fsrm a2
@@ -782,8 +782,8 @@ define <vscale x 16 x double> @vp_ceil_vv_nxv16f64(<vscale x 16 x double> %va, <
; CHECK-NEXT: vfabs.v v16, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; CHECK-NEXT: vmflt.vf v24, v16, fa5, v0.t
-; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, ma
+; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v24
; CHECK-NEXT: vfcvt.x.f.v v16, v8, v0.t
; CHECK-NEXT: fsrm a0
diff --git a/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll b/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
index d78d67d5e359871..94cdbc9ed930474 100644
--- a/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/ctlz-sdnode.ll
@@ -806,8 +806,8 @@ define <vscale x 1 x i32> @ctlz_nxv1i32(<vscale x 1 x i32> %va) {
;
; CHECK-F-LABEL: ctlz_nxv1i32:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, mf2, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, mf2, ta, ma
; CHECK-F-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-F-NEXT: vsrl.vi v8, v8, 23
; CHECK-F-NEXT: li a1, 158
@@ -878,8 +878,8 @@ define <vscale x 2 x i32> @ctlz_nxv2i32(<vscale x 2 x i32> %va) {
;
; CHECK-F-LABEL: ctlz_nxv2i32:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, m1, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, m1, ta, ma
; CHECK-F-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-F-NEXT: vsrl.vi v8, v8, 23
; CHECK-F-NEXT: li a1, 158
@@ -950,8 +950,8 @@ define <vscale x 4 x i32> @ctlz_nxv4i32(<vscale x 4 x i32> %va) {
;
; CHECK-F-LABEL: ctlz_nxv4i32:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, m2, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, m2, ta, ma
; CHECK-F-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-F-NEXT: vsrl.vi v8, v8, 23
; CHECK-F-NEXT: li a1, 158
@@ -1022,8 +1022,8 @@ define <vscale x 8 x i32> @ctlz_nxv8i32(<vscale x 8 x i32> %va) {
;
; CHECK-F-LABEL: ctlz_nxv8i32:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, m4, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, m4, ta, ma
; CHECK-F-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-F-NEXT: vsrl.vi v8, v8, 23
; CHECK-F-NEXT: li a1, 158
@@ -1094,8 +1094,8 @@ define <vscale x 16 x i32> @ctlz_nxv16i32(<vscale x 16 x i32> %va) {
;
; CHECK-F-LABEL: ctlz_nxv16i32:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, m8, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, m8, ta, ma
; CHECK-F-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-F-NEXT: vsrl.vi v8, v8, 23
; CHECK-F-NEXT: li a1, 158
@@ -1107,8 +1107,8 @@ define <vscale x 16 x i32> @ctlz_nxv16i32(<vscale x 16 x i32> %va) {
;
; CHECK-D-LABEL: ctlz_nxv16i32:
; CHECK-D: # %bb.0:
+; CHECK-D-NEXT: vsetvli a0, zero, e32, m8, ta, ma
; CHECK-D-NEXT: fsrmi a0, 1
-; CHECK-D-NEXT: vsetvli a1, zero, e32, m8, ta, ma
; CHECK-D-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-D-NEXT: vsrl.vi v8, v8, 23
; CHECK-D-NEXT: li a1, 158
@@ -1231,8 +1231,8 @@ define <vscale x 1 x i64> @ctlz_nxv1i64(<vscale x 1 x i64> %va) {
;
; CHECK-F-LABEL: ctlz_nxv1i64:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, mf2, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, mf2, ta, ma
; CHECK-F-NEXT: vfncvt.f.xu.w v9, v8
; CHECK-F-NEXT: vsrl.vi v8, v9, 23
; CHECK-F-NEXT: vsetvli zero, zero, e64, m1, ta, ma
@@ -1246,8 +1246,8 @@ define <vscale x 1 x i64> @ctlz_nxv1i64(<vscale x 1 x i64> %va) {
;
; CHECK-D-LABEL: ctlz_nxv1i64:
; CHECK-D: # %bb.0:
+; CHECK-D-NEXT: vsetvli a0, zero, e64, m1, ta, ma
; CHECK-D-NEXT: fsrmi a0, 1
-; CHECK-D-NEXT: vsetvli a1, zero, e64, m1, ta, ma
; CHECK-D-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-D-NEXT: li a1, 52
; CHECK-D-NEXT: vsrl.vx v8, v8, a1
@@ -1371,8 +1371,8 @@ define <vscale x 2 x i64> @ctlz_nxv2i64(<vscale x 2 x i64> %va) {
;
; CHECK-F-LABEL: ctlz_nxv2i64:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, m1, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, m1, ta, ma
; CHECK-F-NEXT: vfncvt.f.xu.w v10, v8
; CHECK-F-NEXT: vsrl.vi v8, v10, 23
; CHECK-F-NEXT: vsetvli zero, zero, e64, m2, ta, ma
@@ -1386,8 +1386,8 @@ define <vscale x 2 x i64> @ctlz_nxv2i64(<vscale x 2 x i64> %va) {
;
; CHECK-D-LABEL: ctlz_nxv2i64:
; CHECK-D: # %bb.0:
+; CHECK-D-NEXT: vsetvli a0, zero, e64, m2, ta, ma
; CHECK-D-NEXT: fsrmi a0, 1
-; CHECK-D-NEXT: vsetvli a1, zero, e64, m2, ta, ma
; CHECK-D-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-D-NEXT: li a1, 52
; CHECK-D-NEXT: vsrl.vx v8, v8, a1
@@ -1511,8 +1511,8 @@ define <vscale x 4 x i64> @ctlz_nxv4i64(<vscale x 4 x i64> %va) {
;
; CHECK-F-LABEL: ctlz_nxv4i64:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, m2, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, m2, ta, ma
; CHECK-F-NEXT: vfncvt.f.xu.w v12, v8
; CHECK-F-NEXT: vsrl.vi v8, v12, 23
; CHECK-F-NEXT: vsetvli zero, zero, e64, m4, ta, ma
@@ -1526,8 +1526,8 @@ define <vscale x 4 x i64> @ctlz_nxv4i64(<vscale x 4 x i64> %va) {
;
; CHECK-D-LABEL: ctlz_nxv4i64:
; CHECK-D: # %bb.0:
+; CHECK-D-NEXT: vsetvli a0, zero, e64, m4, ta, ma
; CHECK-D-NEXT: fsrmi a0, 1
-; CHECK-D-NEXT: vsetvli a1, zero, e64, m4, ta, ma
; CHECK-D-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-D-NEXT: li a1, 52
; CHECK-D-NEXT: vsrl.vx v8, v8, a1
@@ -1651,8 +1651,8 @@ define <vscale x 8 x i64> @ctlz_nxv8i64(<vscale x 8 x i64> %va) {
;
; CHECK-F-LABEL: ctlz_nxv8i64:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, m4, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, m4, ta, ma
; CHECK-F-NEXT: vfncvt.f.xu.w v16, v8
; CHECK-F-NEXT: vsrl.vi v8, v16, 23
; CHECK-F-NEXT: vsetvli zero, zero, e64, m8, ta, ma
@@ -1666,8 +1666,8 @@ define <vscale x 8 x i64> @ctlz_nxv8i64(<vscale x 8 x i64> %va) {
;
; CHECK-D-LABEL: ctlz_nxv8i64:
; CHECK-D: # %bb.0:
+; CHECK-D-NEXT: vsetvli a0, zero, e64, m8, ta, ma
; CHECK-D-NEXT: fsrmi a0, 1
-; CHECK-D-NEXT: vsetvli a1, zero, e64, m8, ta, ma
; CHECK-D-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-D-NEXT: li a1, 52
; CHECK-D-NEXT: vsrl.vx v8, v8, a1
@@ -2433,8 +2433,8 @@ define <vscale x 1 x i32> @ctlz_zero_undef_nxv1i32(<vscale x 1 x i32> %va) {
;
; CHECK-F-LABEL: ctlz_zero_undef_nxv1i32:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, mf2, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, mf2, ta, ma
; CHECK-F-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-F-NEXT: vsrl.vi v8, v8, 23
; CHECK-F-NEXT: li a1, 158
@@ -2500,8 +2500,8 @@ define <vscale x 2 x i32> @ctlz_zero_undef_nxv2i32(<vscale x 2 x i32> %va) {
;
; CHECK-F-LABEL: ctlz_zero_undef_nxv2i32:
; CHECK-F: # %bb.0:
+; CHECK-F-NEXT: vsetvli a0, zero, e32, m1, ta, ma
; CHECK-F-NEXT: fsrmi a0, 1
-; CHECK-F-NEXT: vsetvli a1, zero, e32, m1, ta, ma
; CHECK-F-NEXT: vfcvt.f.xu.v v8, v8
; CHECK-F-NEXT: vsrl.vi v8, v8, 23
; CHECK-F-NEXT: li a1, 158
@@ -2567,8 +2567,8 @@ define <vscale x 4 x i32> @ctlz_zero_undef_...
[truncated]
|
That is exactly what happened. The ceil/floor/etc lowering is older than the rounding mode pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…lvm#72540) We already have the pseudo's for lowering these as MI nodes with rounding mode operands, and the generic FRM insertion pass. Doing the insertion later in the backend allows SSA level passes to avoid reasoning about physical register copies, and happens to produce better code in practice. The later is mostly an accident of our insertion order; we happen to place the frm write after the vsetvli, and it's very common for a register to be killed at the vsetvli. End result is that we get slightly better scalar register allocation. I'm a bit unclear on the history here. I was surprised to find this code in ISEL lowering at all, but am also surprised once I found it that all the patterns and pseudos seem to already exist. My best guess is that maybe we didn't do all the possible cleanup after introducing the HasRoundMode mechanism?
…lvm#72540) We already have the pseudo's for lowering these as MI nodes with rounding mode operands, and the generic FRM insertion pass. Doing the insertion later in the backend allows SSA level passes to avoid reasoning about physical register copies, and happens to produce better code in practice. The later is mostly an accident of our insertion order; we happen to place the frm write after the vsetvli, and it's very common for a register to be killed at the vsetvli. End result is that we get slightly better scalar register allocation. I'm a bit unclear on the history here. I was surprised to find this code in ISEL lowering at all, but am also surprised once I found it that all the patterns and pseudos seem to already exist. My best guess is that maybe we didn't do all the possible cleanup after introducing the HasRoundMode mechanism?
We already have the pseudo's for lowering these as MI nodes with rounding mode operands, and the generic FRM insertion pass. Doing the insertion later in the backend allows SSA level passes to avoid reasoning about physical register copies, and happens to produce better code in practice. The later is mostly an accident of our insertion order; we happen to place the frm write after the vsetvli, and it's very common for a register to be killed at the vsetvli. End result is that we get slightly better scalar register allocation.
I'm a bit unclear on the history here. I was surprised to find this code in ISEL lowering at all, but am also surprised once I found it that all the patterns and pseudos seem to already exist. My best guess is that maybe we didn't do all the possible cleanup after introducing the HasRoundMode mechanism?