[RISCV] Enable early if-conversion #92959

mgudim · 2024-05-21T20:16:58Z

Implements the necessary target methods (insertSelect, canInsertSelect) and adds early if-conversion to the RISCV pipeline. Doing if-conversion increases the number of executed instructions, so it only makes sense if there is enough ILP.

mgudim · 2024-05-21T20:18:33Z

llvm/lib/CodeGen/EarlyIfConversion.cpp

@@ -879,7 +879,7 @@ bool EarlyIfConverter::shouldConvertIf() {
  // from a loop-invariant address predictable; we were unable to prove that it
  // doesn't alias any of the memory-writes in the loop, but it is likely to
  // read to same value multiple times.
-  if (CurrentLoop && any_of(IfConv.Cond, [&](MachineOperand &MO) {
+  if (CurrentLoop && all_of(IfConv.Cond, [&](MachineOperand &MO) {


I created a separate PR for this change: #92405

github-actions · 2024-05-21T20:19:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

topperc · 2024-05-21T20:41:27Z

Tests?

topperc · 2024-05-21T20:48:56Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

+            NotCCReg)
+        .addReg(LHSReg)
+        .addReg(RHSReg);
+    BuildMI(MBB, MI, DL, get(RISCV::XORI), DstReg).addReg(NotCCReg).addImm(1);


Can we swap the true/false of the select to avoid the XORI?

Yes, thanks!

I did the swap in insertSelect.

mgudim · 2024-05-21T21:07:37Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

+  TargetSchedModel SchedModel;
+  SchedModel.init(&STI);
+
+  CondCycles = getICmpCost(Cond[0].getImm(), SchedModel);


I am not sure I understood correctly what CondCycles means. I need to double-check this.

wangpc-pp · 2024-05-30T04:36:49Z

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

+  SchedModel.init(this);
+  return RISCVForceEalyIfcvt ||
+         (!RISCVDisableEarlyIfcvt &&
+          (hasStdExtZicond() || hasVendorXVentanaCondOps()) &&


TODO: Add support for xtheadcondmov? cc @zixuan-wu

wangpc-pp

I just have evaluated this patch, and I think there is a room for more optimizations:

int loop(int m, int n, int *randArray) {
  int t = 0;
  for (int j = 0; j < n; j++) {
    for (int i = 0; i < 4 + m; i++) {
      if (randArray[i]) {
        t += 3 + 3 * t;
      } else {
        t -= 1 - 5 * t;
      }
    }
  }
  return t;
}

For this case, I compared the results of two configurations:

clang -O2 -march=rv64gc_zba_zbb_zbc_zbs_zicond -mllvm -two-entry-phi-node-folding-threshold=16. I increased the threshold so that we can do if conversion in LLVM IR level via SimplifyCFG.
clang -O2 -march=rv64gc_zba_zbb_zbc_zbs_zicond -mllvm -riscv-disable-early-ifcvt=false -mllvm -riscv-force-early-ifcvt=true -mllvm -stress-early-ifcvt=true.
The diff is:

We can remove the xor+seqz here.

wangpc-pp · 2024-06-24T10:44:11Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

+    CC = (CC == RISCVCC::COND_GE) ? RISCVCC::COND_LT : RISCVCC::COND_LTU;
+    std::swap(CondZeroEqzReg, CondZeroNezReg);
+  }
+  insertICmp(MBB, MI, DL, CCReg, CC, LHSReg, RHSReg);


We may not need this for some cases.

wangpc-pp · 2024-06-24T11:30:23Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

+
+  Register TrueValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  Register FalseValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  BuildMI(MBB, MI, DL, get(CondZeroEqzOpc), TrueValOrZeroReg)


We may need to implement all the patterns in SDAG ISel here to reduce some instructions.

Or, we implement these ISel patterns in a MIR pass, and then run the pass after early if-conversion pass.

These patterns apply when one of the sides is zero. In our case we are looking at select between the results of two basic blocks. How often will it be that the result of a basic block is just zero?

I have no data to show how common these cases will be, but we have done a lot of peepholes in RISCVInstrInfoZicond.td and RISCVTargetLowering::lowerSELECT to reduce the comparison and Zicond instructions.
After if-conversion, we need to do the same peepholes (which means we may not need czero.eqz+czero.nez+or to simulate select).

I see what you mean. This seems like a lot of work and code duplication to reimplement all of these here. Here's a list of all peepholes that we do in lowerSELECT:

(select c, t, 0) -> (czero_eqz t, c) (select c, 0, f) -> (czero_nez f, c) (select c, (and f, x), f) -> (or (and f, x), (czero_nez f, c)) (select c, t, (and t, x)) -> (or (czero_eqz t, c), (and t, x)) (select c, c1, c2) -> (add (czero_nez c2 - c1, c), c1) (select c, c1, c2) -> (add (czero_eqz c1 - c2, c), c2) (select c, -1, y) -> -c | y (select c, y, -1) -> (c-1) | y (select c, 0, y) -> (c-1) & y (select c, y, 0) -> -c & y select c, ~x, x --> xor -c, x (select x, x, y) -> x | y (select !x, x, y) -> x & y (select x, y, x) -> x & y (select !x, y, x) -> x | y binOp ((select cond, x, c0), c1) -> select cond binOp(x, c1), binOp(c0, c1)

But perhaps we can address this later in a different MR, this MR is just the basic functionality.

This seems like a lot of work and code duplication to reimplement all of these here.

Yes! So I said Or, we implement these ISel patterns in a MIR pass and then run the pass after early if-conversion pass. :-)

But perhaps we can address this later in a different MR, this MR is just the basic functionality.

Agree. Can you make this PR non-draft? I ran llvm-testsuite and some internel tests and I haven't found any correctness issue, so I think it's ready for reviewing.

This needs to be merged first: #95877
This MR uses a hack to bypass that.

Also I need to add some tests too.

It's OK to be stacked on other PRs, make it non-draft so that more reviewers can see it.

mgudim · 2024-06-24T17:09:13Z

@wangpc-pp Thanks for the test case, I'll look into that. I have only tested on x264 train workload and I got 1% impovement in cycle count. Can you share your results?

mgudim · 2024-06-24T17:11:49Z

This is a fix for detecting when condition is loop-invariant: #95877

That needs to be merged before this patch.

wangpc-pp · 2024-06-25T04:32:52Z

@wangpc-pp Thanks for the test case, I'll look into that. I have only tested on x264 train workload and I got 1% impovement in cycle count. Can you share your results?

I haven't run any particular benchmark and I just evaluated it on some internel microbenches. We can see some cycles reduction.

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

wangpc-pp · 2024-06-26T04:38:25Z

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

+
+  Register TrueValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  Register FalseValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  BuildMI(MBB, MI, DL, get(CondZeroEqzOpc), TrueValOrZeroReg)


I have no data to show how common these cases will be, but we have done a lot of peepholes in RISCVInstrInfoZicond.td and RISCVTargetLowering::lowerSELECT to reduce the comparison and Zicond instructions.
After if-conversion, we need to do the same peepholes (which means we may not need czero.eqz+czero.nez+or to simulate select).

wangpc-pp · 2024-06-27T03:32:41Z

llvm/lib/Target/RISCV/RISCVSubtarget.cpp

+  return RISCVForceEalyIfcvt ||
+         (!RISCVDisableEarlyIfcvt &&
+          (hasStdExtZicond() || hasVendorXVentanaCondOps()) &&
+          SchedModel.hasInstrSchedModelOrItineraries());


Do we really need hasInstrSchedModelOrItineraries here? We have an option to force early if-conversion, the schedule model is not really needed here.

llvmbot · 2024-06-27T04:19:38Z

@llvm/pr-subscribers-backend-risc-v

Author: Mikhail Gudim (mgudim)

Changes

Implements the necessary target methods (insertSelect, canInsertSelect) and adds early if-conversion to the RISCV pipeline. Doing if-conversion increases the number of executed instructions, so it only makes sense if there is enough ILP.

Patch is 27.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92959.diff

7 Files Affected:

(modified) llvm/lib/CodeGen/EarlyIfConversion.cpp (+1-1)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+193)
(modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+12)
(modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+17)
(modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+2)
(modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+8)
(added) llvm/test/CodeGen/RISCV/early-ifcvt.mir (+553)

diff --git a/llvm/lib/CodeGen/EarlyIfConversion.cpp b/llvm/lib/CodeGen/EarlyIfConversion.cpp
index 2a7bee1618deb..2f8729fcbb945 100644
--- a/llvm/lib/CodeGen/EarlyIfConversion.cpp
+++ b/llvm/lib/CodeGen/EarlyIfConversion.cpp
@@ -879,7 +879,7 @@ bool EarlyIfConverter::shouldConvertIf() {
   // from a loop-invariant address predictable; we were unable to prove that it
   // doesn't alias any of the memory-writes in the loop, but it is likely to
   // read to same value multiple times.
-  if (CurrentLoop && any_of(IfConv.Cond, [&](MachineOperand &MO) {
+  if (CurrentLoop && all_of(IfConv.Cond, [&](MachineOperand &MO) {
         if (!MO.isReg() || !MO.isUse())
           return false;
         Register Reg = MO.getReg();
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 444b9076005c2..748fc34cfe353 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1446,6 +1446,199 @@ RISCVInstrInfo::optimizeSelect(MachineInstr &MI,
   return NewMI;
 }
 
+int RISCVInstrInfo::getICmpCost(unsigned CC,
+                                const TargetSchedModel &SchedModel) const {
+  switch (CC) {
+  default:
+    llvm_unreachable("Unknown condition code!");
+  case RISCVCC::COND_LT:
+    return SchedModel.computeInstrLatency(RISCV::SLT);
+  case RISCVCC::COND_LTU:
+    return SchedModel.computeInstrLatency(RISCV::SLTU);
+  case RISCVCC::COND_EQ:
+    return SchedModel.computeInstrLatency(RISCV::XOR) +
+           SchedModel.computeInstrLatency(RISCV::SLTIU);
+  case RISCVCC::COND_NE:
+    return SchedModel.computeInstrLatency(RISCV::XOR) +
+           SchedModel.computeInstrLatency(RISCV::SLTU);
+  case RISCVCC::COND_GE:
+    return SchedModel.computeInstrLatency(RISCV::XORI) +
+           SchedModel.computeInstrLatency(RISCV::SLT);
+  case RISCVCC::COND_GEU:
+    return SchedModel.computeInstrLatency(RISCV::XORI) +
+           SchedModel.computeInstrLatency(RISCV::SLTU);
+  }
+}
+
+void RISCVInstrInfo::insertICmp(MachineBasicBlock &MBB,
+                                MachineBasicBlock::iterator MI,
+                                const DebugLoc &DL, Register DstReg,
+                                unsigned CC, Register LHSReg,
+                                Register RHSReg) const {
+  MachineRegisterInfo &MRI = MI->getParent()->getParent()->getRegInfo();
+
+  switch (CC) {
+  default:
+    llvm_unreachable("Unknown condition code!");
+  case RISCVCC::COND_LT:
+  case RISCVCC::COND_LTU: {
+    BuildMI(MBB, MI, DL, get(CC == RISCVCC::COND_LT ? RISCV::SLT : RISCV::SLTU),
+            DstReg)
+        .addReg(LHSReg)
+        .addReg(RHSReg);
+    return;
+  }
+  case RISCVCC::COND_EQ:
+  case RISCVCC::COND_NE: {
+    Register XorReg = LHSReg;
+    if (RHSReg != RISCV::X0) {
+      XorReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+      BuildMI(MBB, MI, DL, get(RISCV::XOR), XorReg).addReg(LHSReg).addReg(RHSReg);
+    }
+    if (CC == RISCVCC::COND_EQ) {
+      BuildMI(MBB, MI, DL, get(RISCV::SLTIU), DstReg).addReg(XorReg).addImm(1);
+      return;
+    } else {
+      BuildMI(MBB, MI, DL, get(RISCV::SLTU), DstReg)
+          .addReg(RISCV::X0)
+          .addReg(XorReg);
+      return;
+    }
+  }
+  case RISCVCC::COND_GE:
+  case RISCVCC::COND_GEU: {
+    Register NotCCReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+    BuildMI(MBB, MI, DL, get(CC == RISCVCC::COND_GE ? RISCV::SLT : RISCV::SLTU),
+            NotCCReg)
+        .addReg(LHSReg)
+        .addReg(RHSReg);
+    BuildMI(MBB, MI, DL, get(RISCV::XORI), DstReg).addReg(NotCCReg).addImm(1);
+    return;
+  }
+  }
+}
+
+void RISCVInstrInfo::insertSelect(MachineBasicBlock &MBB,
+                                  MachineBasicBlock::iterator MI,
+                                  const DebugLoc &DL, Register DstReg,
+                                  ArrayRef<MachineOperand> Cond,
+                                  Register TrueReg, Register FalseReg) const {
+  MachineFunction &MF = *MI->getParent()->getParent();
+  const RISCVSubtarget &ST = MF.getSubtarget<RISCVSubtarget>();
+  const TargetRegisterInfo &TRI = *ST.getRegisterInfo();
+  MachineRegisterInfo &MRI = MF.getRegInfo();
+  Register CCReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+
+  unsigned CC = Cond[0].getImm();
+  Register LHSReg = Cond[1].getReg();
+  Register RHSReg = Cond[2].getReg();
+
+  unsigned CondZeroEqzOpc =
+      ST.hasVendorXVentanaCondOps() ? RISCV::VT_MASKC : RISCV::CZERO_EQZ;
+  unsigned CondZeroNezOpc =
+      ST.hasVendorXVentanaCondOps() ? RISCV::VT_MASKCN : RISCV::CZERO_NEZ;
+
+  const TargetRegisterClass *DstRC = MRI.getRegClass(DstReg);
+  const TargetRegisterClass *CommonRC =
+      TRI.getCommonSubClass(DstRC, &RISCV::GPRRegClass);
+  bool NeedsRCCopies = (CommonRC != DstRC) && (CommonRC != &RISCV::GPRRegClass);
+
+  Register CondZeroEqzReg = TrueReg;
+  Register CondZeroNezReg = FalseReg;
+  if (NeedsRCCopies) {
+    CondZeroEqzReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+    BuildMI(MBB, MI, DL, get(TargetOpcode::COPY), CondZeroEqzReg)
+        .addReg(TrueReg);
+    CondZeroNezReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+    BuildMI(MBB, MI, DL, get(TargetOpcode::COPY), CondZeroNezReg)
+        .addReg(FalseReg);
+  }
+  if (CC == RISCVCC::COND_GE || CC == RISCVCC::COND_GEU) {
+    CC = (CC == RISCVCC::COND_GE) ? RISCVCC::COND_LT : RISCVCC::COND_LTU;
+    std::swap(CondZeroEqzReg, CondZeroNezReg);
+  }
+  insertICmp(MBB, MI, DL, CCReg, CC, LHSReg, RHSReg);
+
+  Register TrueValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  Register FalseValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  BuildMI(MBB, MI, DL, get(CondZeroEqzOpc), TrueValOrZeroReg)
+      .addReg(CondZeroEqzReg)
+      .addReg(CCReg);
+  BuildMI(MBB, MI, DL, get(CondZeroNezOpc), FalseValOrZeroReg)
+      .addReg(CondZeroNezReg)
+      .addReg(CCReg);
+
+  Register SelectOutReg = DstReg;
+  if (NeedsRCCopies) {
+    SelectOutReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  }
+  BuildMI(MBB, MI, DL, get(RISCV::OR), SelectOutReg)
+      .addReg(TrueValOrZeroReg)
+      .addReg(FalseValOrZeroReg);
+  if (NeedsRCCopies) {
+    BuildMI(MBB, MI, DL, get(TargetOpcode::COPY), DstReg).addReg(SelectOutReg);
+  }
+
+  return;
+}
+
+bool RISCVInstrInfo::canInsertSelect(const MachineBasicBlock &MBB,
+                                     ArrayRef<MachineOperand> Cond,
+                                     Register DstReg, Register TrueReg,
+                                     Register FalseReg, int &CondCycles,
+                                     int &TrueCycles, int &FalseCycles) const {
+  const MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
+  const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
+  const TargetRegisterClass *TrueRC = MRI.getRegClass(TrueReg);
+  const TargetRegisterClass *DstRC = MRI.getRegClass(DstReg);
+  const TargetRegisterClass *CommonRC =
+      TRI.getCommonSubClass(DstRC, &RISCV::GPRRegClass);
+  bool NeedsRCCopies = (CommonRC != DstRC) && (CommonRC != &RISCV::GPRRegClass);
+
+  TargetSchedModel SchedModel;
+  SchedModel.init(&STI);
+
+  // this is used for testing only.
+  if (!SchedModel.hasInstrSchedModelOrItineraries())
+    return true;
+
+  CondCycles = getICmpCost(Cond[0].getImm(), SchedModel);
+  TrueCycles = SchedModel.computeInstrLatency(RISCV::OR) +
+               SchedModel.computeInstrLatency(STI.hasVendorXVentanaCondOps()
+                                                  ? RISCV::VT_MASKC
+                                                  : RISCV::CZERO_EQZ);
+  FalseCycles = SchedModel.computeInstrLatency(RISCV::OR) +
+                SchedModel.computeInstrLatency(STI.hasVendorXVentanaCondOps()
+                                                   ? RISCV::VT_MASKCN
+                                                   : RISCV::CZERO_NEZ);
+
+  if (NeedsRCCopies) {
+    unsigned CopyIntoGPROpc = 0;
+    if (TrueRC == &RISCV::FPR32RegClass) {
+      CopyIntoGPROpc = RISCV::FMV_X_W;
+    } else if (TrueRC == &RISCV::FPR64RegClass) {
+      CopyIntoGPROpc = RISCV::FMV_X_D;
+    } else {
+      llvm_unreachable("Unknown register class");
+    }
+    int CopyIntoGPRCycles = SchedModel.computeInstrLatency(CopyIntoGPROpc);
+
+    unsigned CopyFromGPROpc = 0;
+    if (DstRC == &RISCV::FPR32RegClass) {
+      CopyIntoGPROpc = RISCV::FMV_W_X;
+    } else if (DstRC == &RISCV::FPR64RegClass) {
+      CopyIntoGPROpc = RISCV::FMV_D_X;
+    } else {
+      llvm_unreachable("Unknown register class");
+    }
+    int CopyFromGPRCycles = SchedModel.computeInstrLatency(CopyFromGPROpc);
+
+    TrueCycles += (CopyIntoGPRCycles + CopyFromGPRCycles);
+    FalseCycles += (CopyIntoGPRCycles + CopyFromGPRCycles);
+  }
+  return true;
+}
+
 unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
   if (MI.isMetaInstruction())
     return 0;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index e069717aaef23..a331f1e2af9db 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -149,6 +149,18 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
                                SmallPtrSetImpl<MachineInstr *> &SeenMIs,
                                bool) const override;
 
+  int getICmpCost(unsigned CC, const TargetSchedModel &SchedModel) const;
+  void insertICmp(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
+                  const DebugLoc &DL, Register DstReg, unsigned CC,
+                  Register LHSReg, Register RHSReg) const;
+  bool canInsertSelect(const MachineBasicBlock &, ArrayRef<MachineOperand> Cond,
+                       Register, Register, Register, int &, int &,
+                       int &) const override;
+  void insertSelect(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
+                    const DebugLoc &DL, Register DstReg,
+                    ArrayRef<MachineOperand> Cond, Register TrueReg,
+                    Register FalseReg) const override;
+
   bool isAsCheapAsAMove(const MachineInstr &MI) const override;
 
   std::optional<DestSourcePair>
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index d3236bb07d56d..0b0347003f0d4 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -65,6 +65,14 @@ static cl::opt<unsigned> RISCVMinimumJumpTableEntries(
     "riscv-min-jump-table-entries", cl::Hidden,
     cl::desc("Set minimum number of entries to use a jump table on RISCV"));
 
+static cl::opt<bool>
+    RISCVDisableEarlyIfcvt("riscv-disable-early-ifcvt", cl::Hidden,
+                           cl::desc("Disable early if-conversion"),
+                           cl::init(true), cl::Hidden);
+
+static cl::opt<bool> RISCVForceEalyIfcvt("riscv-force-early-ifcvt", cl::Hidden,
+                                         cl::desc("Force early if-conversion"),
+                                         cl::init(false), cl::Hidden);
 void RISCVSubtarget::anchor() {}
 
 RISCVSubtarget &
@@ -203,3 +211,12 @@ unsigned RISCVSubtarget::getMinimumJumpTableEntries() const {
              ? RISCVMinimumJumpTableEntries
              : TuneInfo->MinimumJumpTableEntries;
 }
+
+bool RISCVSubtarget::enableEarlyIfConversion() const {
+  TargetSchedModel SchedModel;
+  SchedModel.init(this);
+  return RISCVForceEalyIfcvt ||
+         (!RISCVDisableEarlyIfcvt &&
+          (hasStdExtZicond() || hasVendorXVentanaCondOps()) &&
+          SchedModel.hasInstrSchedModelOrItineraries());
+}
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index c880c9e921e0e..fe8143942e938 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -303,6 +303,8 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   unsigned getMinimumJumpTableEntries() const;
 
   bool supportsInitUndef() const override { return hasVInstructions(); }
+
+  bool enableEarlyIfConversion() const override;
 };
 } // End llvm namespace
 
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 5aab138dae408..a205738eeebd9 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -349,6 +349,7 @@ class RISCVPassConfig : public TargetPassConfig {
   bool addPreISel() override;
   void addCodeGenPrepare() override;
   bool addInstSelector() override;
+  bool addILPOpts() override;
   bool addIRTranslator() override;
   void addPreLegalizeMachineIR() override;
   bool addLegalizeMachineIR() override;
@@ -450,6 +451,13 @@ bool RISCVPassConfig::addInstSelector() {
   return false;
 }
 
+bool RISCVPassConfig::addILPOpts() {
+  if (getOptLevel() != CodeGenOptLevel::None) {
+    addPass(&EarlyIfConverterID);
+  }
+  return true;
+}
+
 bool RISCVPassConfig::addIRTranslator() {
   addPass(new IRTranslator(getOptLevel()));
   return false;
diff --git a/llvm/test/CodeGen/RISCV/early-ifcvt.mir b/llvm/test/CodeGen/RISCV/early-ifcvt.mir
new file mode 100644
index 0000000000000..f4acd2e3a38ec
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/early-ifcvt.mir
@@ -0,0 +1,553 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=riscv64 -mcpu=veyron-v1 -riscv-disable-early-ifcvt=false -riscv-force-early-ifcvt=true -stress-early-ifcvt=true \
+# RUN: -run-pass=early-ifcvt -simplify-mir  -o - %s | FileCheck %s
+
+---
+name: blt
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: blt
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   liveins: $x10, $x11, $x12
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %x:gpr = COPY $x10
+  ; CHECK-NEXT:   %n:gpr = COPY $x11
+  ; CHECK-NEXT:   %k:gpr = COPY $x12
+  ; CHECK-NEXT:   %zero:gpr = COPY $x0
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1, %bb.5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+  ; CHECK-NEXT:   %x_i_addr:gpr = ADD %x, %i
+  ; CHECK-NEXT:   %x_i:gpr = LD %x_i_addr, 0
+  ; CHECK-NEXT:   %add1:gpr = ADDI %x_i, 1
+  ; CHECK-NEXT:   %add2:gpr = ADDI %x_i, 2
+  ; CHECK-NEXT:   [[SLT:%[0-9]+]]:gpr = SLT %x_i, %k
+  ; CHECK-NEXT:   [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLT]]
+  ; CHECK-NEXT:   [[VT_MASKCN:%[0-9]+]]:gpr = VT_MASKCN %add2, [[SLT]]
+  ; CHECK-NEXT:   %res:gpr = OR [[VT_MASKC]], [[VT_MASKCN]]
+  ; CHECK-NEXT:   SD %res, %x_i, 0
+  ; CHECK-NEXT:   %i_inc:gpr = ADDI %i, 1
+  ; CHECK-NEXT:   BEQ %i_inc, %n, %bb.5
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.5:
+  ; CHECK-NEXT:   PseudoRET
+  bb.0:
+    successors: %bb.1
+    liveins: $x10, $x11, $x12
+
+    %x:gpr = COPY $x10
+    %n:gpr = COPY $x11
+    %k:gpr = COPY $x12
+    %zero:gpr = COPY $x0
+    PseudoBR %bb.1
+
+  bb.1:
+    successors: %bb.2, %bb.3
+
+    %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.4
+    %x_i_addr:gpr = ADD %x, %i
+    %x_i:gpr = LD %x_i_addr, 0
+    BLT %x_i, %k, %bb.2
+    PseudoBR %bb.3
+
+  bb.2:
+    successors: %bb.4
+
+    %add1:gpr = ADDI %x_i, 1
+    PseudoBR %bb.4
+
+  bb.3:
+    successors: %bb.4
+
+    %add2:gpr = ADDI %x_i, 2
+    PseudoBR %bb.4
+
+  bb.4:
+    successors: %bb.1, %bb.5
+
+    %res:gpr = PHI %add1, %bb.2, %add2, %bb.3
+    SD %res, %x_i, 0
+    %i_inc:gpr = ADDI %i, 1
+    BEQ %i_inc, %n, %bb.5
+    PseudoBR %bb.1
+
+  bb.5:
+    PseudoRET
+...
+
+---
+name: bltu
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: bltu
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   liveins: $x10, $x11, $x12
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %x:gpr = COPY $x10
+  ; CHECK-NEXT:   %n:gpr = COPY $x11
+  ; CHECK-NEXT:   %k:gpr = COPY $x12
+  ; CHECK-NEXT:   %zero:gpr = COPY $x0
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1, %bb.5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+  ; CHECK-NEXT:   %x_i_addr:gpr = ADD %x, %i
+  ; CHECK-NEXT:   %x_i:gpr = LD %x_i_addr, 0
+  ; CHECK-NEXT:   %add1:gpr = ADDI %x_i, 1
+  ; CHECK-NEXT:   %add2:gpr = ADDI %x_i, 2
+  ; CHECK-NEXT:   [[SLTU:%[0-9]+]]:gpr = SLTU %x_i, %k
+  ; CHECK-NEXT:   [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLTU]]
+  ; CHECK-NEXT:   [[VT_MASKCN:%[0-9]+]]:gpr = VT_MASKCN %add2, [[SLTU]]
+  ; CHECK-NEXT:   %res:gpr = OR [[VT_MASKC]], [[VT_MASKCN]]
+  ; CHECK-NEXT:   SD %res, %x_i, 0
+  ; CHECK-NEXT:   %i_inc:gpr = ADDI %i, 1
+  ; CHECK-NEXT:   BEQ %i_inc, %n, %bb.5
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.5:
+  ; CHECK-NEXT:   PseudoRET
+  bb.0:
+    successors: %bb.1
+    liveins: $x10, $x11, $x12
+
+    %x:gpr = COPY $x10
+    %n:gpr = COPY $x11
+    %k:gpr = COPY $x12
+    %zero:gpr = COPY $x0
+    PseudoBR %bb.1
+
+  bb.1:
+    successors: %bb.2, %bb.3
+
+    %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.4
+    %x_i_addr:gpr = ADD %x, %i
+    %x_i:gpr = LD %x_i_addr, 0
+    BLTU %x_i, %k, %bb.2
+    PseudoBR %bb.3
+
+  bb.2:
+    successors: %bb.4
+
+    %add1:gpr = ADDI %x_i, 1
+    PseudoBR %bb.4
+
+  bb.3:
+    successors: %bb.4
+
+    %add2:gpr = ADDI %x_i, 2
+    PseudoBR %bb.4
+
+  bb.4:
+    successors: %bb.1, %bb.5
+
+    %res:gpr = PHI %add1, %bb.2, %add2, %bb.3
+    SD %res, %x_i, 0
+    %i_inc:gpr = ADDI %i, 1
+    BEQ %i_inc, %n, %bb.5
+    PseudoBR %bb.1
+
+  bb.5:
+    PseudoRET
+...
+
+---
+name: beq
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: beq
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   liveins: $x10, $x11, $x12
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %x:gpr = COPY $x10
+  ; CHECK-NEXT:   %n:gpr = COPY $x11
+  ; CHECK-NEXT:   %k:gpr = COPY $x12
+  ; CHECK-NEXT:   %zero:gpr = COPY $x0
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1, %bb.5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+  ; CHECK-NEXT:   %x_i_addr:gpr = ADD %x, %i
+  ; CHECK-NEXT:   %x_i:gpr = LD %x_i_addr, 0
+  ; CHECK-NEXT:   %add1:gpr = ADDI %x_i, 1
+  ; CHECK-NEXT:   %add2:gpr = ADDI %x_i, 2
+  ; CHECK-NEXT:   [[XOR:%[0-9]+]]:gpr = XOR %x_i, %k
+  ; CHECK-NEXT:   [[SLTIU:%[0-9]+]]:gpr = SLTIU [[XOR]], 1
+  ; CHECK-NEXT:   [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLTIU]]
+  ; CHECK-NEXT:   [[VT_MASKCN:%[0-9]+]]:gpr = VT_MASKCN %add2, [[SLTIU]]
+  ; CHECK-NEXT:   %res:gpr = OR [[VT_MASKC]], [[VT_MASKCN]]
+  ; CHECK-NEXT:   SD %res, %x_i, 0
+  ; CHECK-NEXT:   %i_inc:gpr = ADDI %i, 1
+  ; CHECK-NEXT:   BEQ %i_inc, %n, %bb.5
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.5:
+  ; CHECK-NEXT:   PseudoRET
+  bb.0:
+    successors: %bb.1
+    liveins: $x10, $x11, $x12
+
+    %x:gpr = COPY $x10
+    %n:gpr = COPY $x11
+    %k:gpr = COPY $x12
+    %zero:gpr = COPY $x0
+    PseudoBR %bb.1
+
+  bb.1:
+    successors: %bb.2, %bb.3
+
+    %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.4
+    %x_i_addr:gpr = ADD %x, %i
+    %x_i:gpr = LD %x_i_addr, 0
+    BEQ %x_i, %k, %bb.2
+    PseudoBR %bb.3
+
+  bb.2:
+    successors: %bb.4
+
+    %add1:gpr = ADDI %x_i, 1
+    PseudoBR %bb.4
+
+  bb.3:
+    successors: %bb.4
+
+    %add2:gpr = ADDI %x_i, 2
+    PseudoBR %bb.4
+
+  bb.4:
+    successors: %bb.1, %bb.5
+
+    %res:gpr = PHI %add1, %bb.2, %add2, %bb.3
+    SD %res, %x_i, 0
+    %i_inc:gpr = ADDI %i, 1
+    BEQ %i_inc, %n, %bb.5
+    PseudoBR %bb.1
+
+  bb.5:
+    PseudoRET
+...
+
+---
+name: bne
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: bne
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   liveins: $x10, $x11, $x12
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %x:gpr = COPY $x10
+  ; CHECK-NEXT:   %n:gpr = COPY $x11
+  ; CHECK-NEXT:   %k:gpr = COPY $x12
+  ; CHECK-NEXT:   %zero:gpr = COPY $x0
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1, %bb.5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+  ; CHECK-NEXT:   %x_i_addr:gpr = ADD %x, %i
+  ; CHECK-NEXT:   %x_i:gpr = LD %x_i_addr, 0
+  ; CHECK-NEXT:   %add1:gpr = ADDI %x_i, 1
+  ; CHECK-NEXT:   %add2:gpr = ADDI %x_i, 2
+  ; CHECK-NEXT:   [[XOR:%[0-9]+]]:gpr = XOR %x_i, %k
+  ; CHECK-NEXT:   [[SLTU:%[0-9]+]]:gpr = SLTU $x0, [[XOR]]
+  ; CHECK-NEXT:   [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLTU]]
+  ; CHECK-NEXT:   ...
[truncated]

Implements the necessary target methods (insertSelect, canInsertSelect) and adds early if-conversion to the RISCV pipeline. Doing if-conversion increases the number of executed instructions, so it only makes sense if there is enough ILP.

Updated cost estimate for these cases. Made `getICmpCost` and `insertICmp` accept `Cond` to make their signatures more like `insertSelect`.

mgudim requested review from topperc, Stylie777 and wangpc-pp May 21, 2024 20:16

mgudim commented May 21, 2024

View reviewed changes

mgudim force-pushed the riscv_early_ifcvt branch from ba0bf89 to 27d71e4 Compare May 21, 2024 20:19

topperc reviewed May 21, 2024

View reviewed changes

mgudim commented May 21, 2024

View reviewed changes

mgudim force-pushed the riscv_early_ifcvt branch 2 times, most recently from b4a79a0 to e11b541 Compare May 22, 2024 00:12

mgudim force-pushed the riscv_early_ifcvt branch from e11b541 to a93059c Compare May 29, 2024 22:22

wangpc-pp reviewed May 30, 2024

View reviewed changes

wangpc-pp reviewed Jun 24, 2024

View reviewed changes

wangpc-pp reviewed Jun 26, 2024

View reviewed changes

wangpc-pp reviewed Jun 27, 2024

View reviewed changes

mgudim marked this pull request as ready for review June 27, 2024 04:18

llvmbot added the backend:RISC-V label Jun 27, 2024

mgudim added 4 commits June 27, 2024 01:42

[RISCV] Enable early if-conversion.

350505f

Implements the necessary target methods (insertSelect, canInsertSelect) and adds early if-conversion to the RISCV pipeline. Doing if-conversion increases the number of executed instructions, so it only makes sense if there is enough ILP.

Swap TrueReg and FalseReg to avoid negation in insertSelect

fd3f3f9

Some bug fixes and added test.

b3dc357

Simplified insertICmp for the cases of seqz and snez.

c03161d

Updated cost estimate for these cases. Made `getICmpCost` and `insertICmp` accept `Cond` to make their signatures more like `insertSelect`.

mgudim force-pushed the riscv_early_ifcvt branch from 356a6c7 to c03161d Compare June 27, 2024 05:42

Updated tests.

3936f91

[RISCV] Enable early if-conversion #92959

Are you sure you want to change the base?

[RISCV] Enable early if-conversion #92959

Uh oh!

Conversation

mgudim commented May 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc commented May 21, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangpc-pp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangpc-pp Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgudim commented Jun 24, 2024

Uh oh!

mgudim commented Jun 24, 2024

Uh oh!

wangpc-pp commented Jun 25, 2024

Uh oh!

Uh oh!

wangpc-pp Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Jun 27, 2024

Uh oh!

Uh oh!

github-actions bot commented May 21, 2024 •

edited

Loading

wangpc-pp Jun 26, 2024 •

edited

Loading

wangpc-pp Jun 26, 2024 •

edited

Loading