Skip to content

[RISCV] Enable early if-conversion #92959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

mgudim
Copy link
Contributor

@mgudim mgudim commented May 21, 2024

Implements the necessary target methods (insertSelect, canInsertSelect) and adds early if-conversion to the RISCV pipeline. Doing if-conversion increases the number of executed instructions, so it only makes sense if there is enough ILP.

@mgudim mgudim requested review from topperc, Stylie777 and wangpc-pp May 21, 2024 20:16
@@ -879,7 +879,7 @@ bool EarlyIfConverter::shouldConvertIf() {
// from a loop-invariant address predictable; we were unable to prove that it
// doesn't alias any of the memory-writes in the loop, but it is likely to
// read to same value multiple times.
if (CurrentLoop && any_of(IfConv.Cond, [&](MachineOperand &MO) {
if (CurrentLoop && all_of(IfConv.Cond, [&](MachineOperand &MO) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a separate PR for this change: #92405

@mgudim mgudim force-pushed the riscv_early_ifcvt branch from ba0bf89 to 27d71e4 Compare May 21, 2024 20:19
Copy link

github-actions bot commented May 21, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@topperc
Copy link
Collaborator

topperc commented May 21, 2024

Tests?

NotCCReg)
.addReg(LHSReg)
.addReg(RHSReg);
BuildMI(MBB, MI, DL, get(RISCV::XORI), DstReg).addReg(NotCCReg).addImm(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we swap the true/false of the select to avoid the XORI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the swap in insertSelect.

TargetSchedModel SchedModel;
SchedModel.init(&STI);

CondCycles = getICmpCost(Cond[0].getImm(), SchedModel);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understood correctly what CondCycles means. I need to double-check this.

@mgudim mgudim force-pushed the riscv_early_ifcvt branch 2 times, most recently from b4a79a0 to e11b541 Compare May 22, 2024 00:12
@mgudim mgudim force-pushed the riscv_early_ifcvt branch from e11b541 to a93059c Compare May 29, 2024 22:22
SchedModel.init(this);
return RISCVForceEalyIfcvt ||
(!RISCVDisableEarlyIfcvt &&
(hasStdExtZicond() || hasVendorXVentanaCondOps()) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Add support for xtheadcondmov? cc @zixuan-wu

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have evaluated this patch, and I think there is a room for more optimizations:

int loop(int m, int n, int *randArray) {
  int t = 0;
  for (int j = 0; j < n; j++) {
    for (int i = 0; i < 4 + m; i++) {
      if (randArray[i]) {
        t += 3 + 3 * t;
      } else {
        t -= 1 - 5 * t;
      }
    }
  }
  return t;
}

For this case, I compared the results of two configurations:

  1. clang -O2 -march=rv64gc_zba_zbb_zbc_zbs_zicond -mllvm -two-entry-phi-node-folding-threshold=16. I increased the threshold so that we can do if conversion in LLVM IR level via SimplifyCFG.
  2. clang -O2 -march=rv64gc_zba_zbb_zbc_zbs_zicond -mllvm -riscv-disable-early-ifcvt=false -mllvm -riscv-force-early-ifcvt=true -mllvm -stress-early-ifcvt=true.
    The diff is:
    图片

We can remove the xor+seqz here.

CC = (CC == RISCVCC::COND_GE) ? RISCVCC::COND_LT : RISCVCC::COND_LTU;
std::swap(CondZeroEqzReg, CondZeroNezReg);
}
insertICmp(MBB, MI, DL, CCReg, CC, LHSReg, RHSReg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not need this for some cases.


Register TrueValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
Register FalseValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
BuildMI(MBB, MI, DL, get(CondZeroEqzOpc), TrueValOrZeroReg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to implement all the patterns in SDAG ISel here to reduce some instructions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, we implement these ISel patterns in a MIR pass, and then run the pass after early if-conversion pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These patterns apply when one of the sides is zero. In our case we are looking at select between the results of two basic blocks. How often will it be that the result of a basic block is just zero?

Copy link
Contributor

@wangpc-pp wangpc-pp Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no data to show how common these cases will be, but we have done a lot of peepholes in RISCVInstrInfoZicond.td and RISCVTargetLowering::lowerSELECT to reduce the comparison and Zicond instructions.
After if-conversion, we need to do the same peepholes (which means we may not need czero.eqz+czero.nez+or to simulate select).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. This seems like a lot of work and code duplication to reimplement all of these here. Here's a list of all peepholes that we do in lowerSELECT:

(select c, t, 0) -> (czero_eqz t, c)
(select c, 0, f) -> (czero_nez f, c)
(select c, (and f, x), f) -> (or (and f, x), (czero_nez f, c))
(select c, t, (and t, x)) -> (or (czero_eqz t, c), (and t, x))
(select c, c1, c2) -> (add (czero_nez c2 - c1, c), c1)
(select c, c1, c2) -> (add (czero_eqz c1 - c2, c), c2)

(select c, -1, y) -> -c | y
(select c, y, -1) -> (c-1) | y
(select c, 0, y) -> (c-1) & y
(select c, y, 0) -> -c & y
select c, ~x, x --> xor -c, x
(select x, x, y) -> x | y
(select !x, x, y) -> x & y
(select x, y, x) -> x & y
(select !x, y, x) -> x | y
binOp ((select cond, x, c0), c1) -> select cond binOp(x, c1), binOp(c0, c1)

But perhaps we can address this later in a different MR, this MR is just the basic functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of work and code duplication to reimplement all of these here.

Yes! So I said Or, we implement these ISel patterns in a MIR pass and then run the pass after early if-conversion pass. :-)

But perhaps we can address this later in a different MR, this MR is just the basic functionality.

Agree. Can you make this PR non-draft? I ran llvm-testsuite and some internel tests and I haven't found any correctness issue, so I think it's ready for reviewing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be merged first: #95877
This MR uses a hack to bypass that.

Also I need to add some tests too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK to be stacked on other PRs, make it non-draft so that more reviewers can see it.

@mgudim
Copy link
Contributor Author

mgudim commented Jun 24, 2024

@wangpc-pp Thanks for the test case, I'll look into that. I have only tested on x264 train workload and I got 1% impovement in cycle count. Can you share your results?

@mgudim
Copy link
Contributor Author

mgudim commented Jun 24, 2024

This is a fix for detecting when condition is loop-invariant: #95877

That needs to be merged before this patch.

@wangpc-pp
Copy link
Contributor

@wangpc-pp Thanks for the test case, I'll look into that. I have only tested on x264 train workload and I got 1% impovement in cycle count. Can you share your results?

I haven't run any particular benchmark and I just evaluated it on some internel microbenches. We can see some cycles reduction.


Register TrueValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
Register FalseValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
BuildMI(MBB, MI, DL, get(CondZeroEqzOpc), TrueValOrZeroReg)
Copy link
Contributor

@wangpc-pp wangpc-pp Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no data to show how common these cases will be, but we have done a lot of peepholes in RISCVInstrInfoZicond.td and RISCVTargetLowering::lowerSELECT to reduce the comparison and Zicond instructions.
After if-conversion, we need to do the same peepholes (which means we may not need czero.eqz+czero.nez+or to simulate select).

return RISCVForceEalyIfcvt ||
(!RISCVDisableEarlyIfcvt &&
(hasStdExtZicond() || hasVendorXVentanaCondOps()) &&
SchedModel.hasInstrSchedModelOrItineraries());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need hasInstrSchedModelOrItineraries here? We have an option to force early if-conversion, the schedule model is not really needed here.

@mgudim mgudim marked this pull request as ready for review June 27, 2024 04:18
@llvmbot
Copy link
Member

llvmbot commented Jun 27, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Mikhail Gudim (mgudim)

Changes

Implements the necessary target methods (insertSelect, canInsertSelect) and adds early if-conversion to the RISCV pipeline. Doing if-conversion increases the number of executed instructions, so it only makes sense if there is enough ILP.


Patch is 27.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92959.diff

7 Files Affected:

  • (modified) llvm/lib/CodeGen/EarlyIfConversion.cpp (+1-1)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+193)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+12)
  • (modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+17)
  • (modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+2)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+8)
  • (added) llvm/test/CodeGen/RISCV/early-ifcvt.mir (+553)
diff --git a/llvm/lib/CodeGen/EarlyIfConversion.cpp b/llvm/lib/CodeGen/EarlyIfConversion.cpp
index 2a7bee1618deb..2f8729fcbb945 100644
--- a/llvm/lib/CodeGen/EarlyIfConversion.cpp
+++ b/llvm/lib/CodeGen/EarlyIfConversion.cpp
@@ -879,7 +879,7 @@ bool EarlyIfConverter::shouldConvertIf() {
   // from a loop-invariant address predictable; we were unable to prove that it
   // doesn't alias any of the memory-writes in the loop, but it is likely to
   // read to same value multiple times.
-  if (CurrentLoop && any_of(IfConv.Cond, [&](MachineOperand &MO) {
+  if (CurrentLoop && all_of(IfConv.Cond, [&](MachineOperand &MO) {
         if (!MO.isReg() || !MO.isUse())
           return false;
         Register Reg = MO.getReg();
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 444b9076005c2..748fc34cfe353 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1446,6 +1446,199 @@ RISCVInstrInfo::optimizeSelect(MachineInstr &MI,
   return NewMI;
 }
 
+int RISCVInstrInfo::getICmpCost(unsigned CC,
+                                const TargetSchedModel &SchedModel) const {
+  switch (CC) {
+  default:
+    llvm_unreachable("Unknown condition code!");
+  case RISCVCC::COND_LT:
+    return SchedModel.computeInstrLatency(RISCV::SLT);
+  case RISCVCC::COND_LTU:
+    return SchedModel.computeInstrLatency(RISCV::SLTU);
+  case RISCVCC::COND_EQ:
+    return SchedModel.computeInstrLatency(RISCV::XOR) +
+           SchedModel.computeInstrLatency(RISCV::SLTIU);
+  case RISCVCC::COND_NE:
+    return SchedModel.computeInstrLatency(RISCV::XOR) +
+           SchedModel.computeInstrLatency(RISCV::SLTU);
+  case RISCVCC::COND_GE:
+    return SchedModel.computeInstrLatency(RISCV::XORI) +
+           SchedModel.computeInstrLatency(RISCV::SLT);
+  case RISCVCC::COND_GEU:
+    return SchedModel.computeInstrLatency(RISCV::XORI) +
+           SchedModel.computeInstrLatency(RISCV::SLTU);
+  }
+}
+
+void RISCVInstrInfo::insertICmp(MachineBasicBlock &MBB,
+                                MachineBasicBlock::iterator MI,
+                                const DebugLoc &DL, Register DstReg,
+                                unsigned CC, Register LHSReg,
+                                Register RHSReg) const {
+  MachineRegisterInfo &MRI = MI->getParent()->getParent()->getRegInfo();
+
+  switch (CC) {
+  default:
+    llvm_unreachable("Unknown condition code!");
+  case RISCVCC::COND_LT:
+  case RISCVCC::COND_LTU: {
+    BuildMI(MBB, MI, DL, get(CC == RISCVCC::COND_LT ? RISCV::SLT : RISCV::SLTU),
+            DstReg)
+        .addReg(LHSReg)
+        .addReg(RHSReg);
+    return;
+  }
+  case RISCVCC::COND_EQ:
+  case RISCVCC::COND_NE: {
+    Register XorReg = LHSReg;
+    if (RHSReg != RISCV::X0) {
+      XorReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+      BuildMI(MBB, MI, DL, get(RISCV::XOR), XorReg).addReg(LHSReg).addReg(RHSReg);
+    }
+    if (CC == RISCVCC::COND_EQ) {
+      BuildMI(MBB, MI, DL, get(RISCV::SLTIU), DstReg).addReg(XorReg).addImm(1);
+      return;
+    } else {
+      BuildMI(MBB, MI, DL, get(RISCV::SLTU), DstReg)
+          .addReg(RISCV::X0)
+          .addReg(XorReg);
+      return;
+    }
+  }
+  case RISCVCC::COND_GE:
+  case RISCVCC::COND_GEU: {
+    Register NotCCReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+    BuildMI(MBB, MI, DL, get(CC == RISCVCC::COND_GE ? RISCV::SLT : RISCV::SLTU),
+            NotCCReg)
+        .addReg(LHSReg)
+        .addReg(RHSReg);
+    BuildMI(MBB, MI, DL, get(RISCV::XORI), DstReg).addReg(NotCCReg).addImm(1);
+    return;
+  }
+  }
+}
+
+void RISCVInstrInfo::insertSelect(MachineBasicBlock &MBB,
+                                  MachineBasicBlock::iterator MI,
+                                  const DebugLoc &DL, Register DstReg,
+                                  ArrayRef<MachineOperand> Cond,
+                                  Register TrueReg, Register FalseReg) const {
+  MachineFunction &MF = *MI->getParent()->getParent();
+  const RISCVSubtarget &ST = MF.getSubtarget<RISCVSubtarget>();
+  const TargetRegisterInfo &TRI = *ST.getRegisterInfo();
+  MachineRegisterInfo &MRI = MF.getRegInfo();
+  Register CCReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+
+  unsigned CC = Cond[0].getImm();
+  Register LHSReg = Cond[1].getReg();
+  Register RHSReg = Cond[2].getReg();
+
+  unsigned CondZeroEqzOpc =
+      ST.hasVendorXVentanaCondOps() ? RISCV::VT_MASKC : RISCV::CZERO_EQZ;
+  unsigned CondZeroNezOpc =
+      ST.hasVendorXVentanaCondOps() ? RISCV::VT_MASKCN : RISCV::CZERO_NEZ;
+
+  const TargetRegisterClass *DstRC = MRI.getRegClass(DstReg);
+  const TargetRegisterClass *CommonRC =
+      TRI.getCommonSubClass(DstRC, &RISCV::GPRRegClass);
+  bool NeedsRCCopies = (CommonRC != DstRC) && (CommonRC != &RISCV::GPRRegClass);
+
+  Register CondZeroEqzReg = TrueReg;
+  Register CondZeroNezReg = FalseReg;
+  if (NeedsRCCopies) {
+    CondZeroEqzReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+    BuildMI(MBB, MI, DL, get(TargetOpcode::COPY), CondZeroEqzReg)
+        .addReg(TrueReg);
+    CondZeroNezReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+    BuildMI(MBB, MI, DL, get(TargetOpcode::COPY), CondZeroNezReg)
+        .addReg(FalseReg);
+  }
+  if (CC == RISCVCC::COND_GE || CC == RISCVCC::COND_GEU) {
+    CC = (CC == RISCVCC::COND_GE) ? RISCVCC::COND_LT : RISCVCC::COND_LTU;
+    std::swap(CondZeroEqzReg, CondZeroNezReg);
+  }
+  insertICmp(MBB, MI, DL, CCReg, CC, LHSReg, RHSReg);
+
+  Register TrueValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  Register FalseValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  BuildMI(MBB, MI, DL, get(CondZeroEqzOpc), TrueValOrZeroReg)
+      .addReg(CondZeroEqzReg)
+      .addReg(CCReg);
+  BuildMI(MBB, MI, DL, get(CondZeroNezOpc), FalseValOrZeroReg)
+      .addReg(CondZeroNezReg)
+      .addReg(CCReg);
+
+  Register SelectOutReg = DstReg;
+  if (NeedsRCCopies) {
+    SelectOutReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+  }
+  BuildMI(MBB, MI, DL, get(RISCV::OR), SelectOutReg)
+      .addReg(TrueValOrZeroReg)
+      .addReg(FalseValOrZeroReg);
+  if (NeedsRCCopies) {
+    BuildMI(MBB, MI, DL, get(TargetOpcode::COPY), DstReg).addReg(SelectOutReg);
+  }
+
+  return;
+}
+
+bool RISCVInstrInfo::canInsertSelect(const MachineBasicBlock &MBB,
+                                     ArrayRef<MachineOperand> Cond,
+                                     Register DstReg, Register TrueReg,
+                                     Register FalseReg, int &CondCycles,
+                                     int &TrueCycles, int &FalseCycles) const {
+  const MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
+  const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
+  const TargetRegisterClass *TrueRC = MRI.getRegClass(TrueReg);
+  const TargetRegisterClass *DstRC = MRI.getRegClass(DstReg);
+  const TargetRegisterClass *CommonRC =
+      TRI.getCommonSubClass(DstRC, &RISCV::GPRRegClass);
+  bool NeedsRCCopies = (CommonRC != DstRC) && (CommonRC != &RISCV::GPRRegClass);
+
+  TargetSchedModel SchedModel;
+  SchedModel.init(&STI);
+
+  // this is used for testing only.
+  if (!SchedModel.hasInstrSchedModelOrItineraries())
+    return true;
+
+  CondCycles = getICmpCost(Cond[0].getImm(), SchedModel);
+  TrueCycles = SchedModel.computeInstrLatency(RISCV::OR) +
+               SchedModel.computeInstrLatency(STI.hasVendorXVentanaCondOps()
+                                                  ? RISCV::VT_MASKC
+                                                  : RISCV::CZERO_EQZ);
+  FalseCycles = SchedModel.computeInstrLatency(RISCV::OR) +
+                SchedModel.computeInstrLatency(STI.hasVendorXVentanaCondOps()
+                                                   ? RISCV::VT_MASKCN
+                                                   : RISCV::CZERO_NEZ);
+
+  if (NeedsRCCopies) {
+    unsigned CopyIntoGPROpc = 0;
+    if (TrueRC == &RISCV::FPR32RegClass) {
+      CopyIntoGPROpc = RISCV::FMV_X_W;
+    } else if (TrueRC == &RISCV::FPR64RegClass) {
+      CopyIntoGPROpc = RISCV::FMV_X_D;
+    } else {
+      llvm_unreachable("Unknown register class");
+    }
+    int CopyIntoGPRCycles = SchedModel.computeInstrLatency(CopyIntoGPROpc);
+
+    unsigned CopyFromGPROpc = 0;
+    if (DstRC == &RISCV::FPR32RegClass) {
+      CopyIntoGPROpc = RISCV::FMV_W_X;
+    } else if (DstRC == &RISCV::FPR64RegClass) {
+      CopyIntoGPROpc = RISCV::FMV_D_X;
+    } else {
+      llvm_unreachable("Unknown register class");
+    }
+    int CopyFromGPRCycles = SchedModel.computeInstrLatency(CopyFromGPROpc);
+
+    TrueCycles += (CopyIntoGPRCycles + CopyFromGPRCycles);
+    FalseCycles += (CopyIntoGPRCycles + CopyFromGPRCycles);
+  }
+  return true;
+}
+
 unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
   if (MI.isMetaInstruction())
     return 0;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index e069717aaef23..a331f1e2af9db 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -149,6 +149,18 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
                                SmallPtrSetImpl<MachineInstr *> &SeenMIs,
                                bool) const override;
 
+  int getICmpCost(unsigned CC, const TargetSchedModel &SchedModel) const;
+  void insertICmp(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
+                  const DebugLoc &DL, Register DstReg, unsigned CC,
+                  Register LHSReg, Register RHSReg) const;
+  bool canInsertSelect(const MachineBasicBlock &, ArrayRef<MachineOperand> Cond,
+                       Register, Register, Register, int &, int &,
+                       int &) const override;
+  void insertSelect(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
+                    const DebugLoc &DL, Register DstReg,
+                    ArrayRef<MachineOperand> Cond, Register TrueReg,
+                    Register FalseReg) const override;
+
   bool isAsCheapAsAMove(const MachineInstr &MI) const override;
 
   std::optional<DestSourcePair>
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index d3236bb07d56d..0b0347003f0d4 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -65,6 +65,14 @@ static cl::opt<unsigned> RISCVMinimumJumpTableEntries(
     "riscv-min-jump-table-entries", cl::Hidden,
     cl::desc("Set minimum number of entries to use a jump table on RISCV"));
 
+static cl::opt<bool>
+    RISCVDisableEarlyIfcvt("riscv-disable-early-ifcvt", cl::Hidden,
+                           cl::desc("Disable early if-conversion"),
+                           cl::init(true), cl::Hidden);
+
+static cl::opt<bool> RISCVForceEalyIfcvt("riscv-force-early-ifcvt", cl::Hidden,
+                                         cl::desc("Force early if-conversion"),
+                                         cl::init(false), cl::Hidden);
 void RISCVSubtarget::anchor() {}
 
 RISCVSubtarget &
@@ -203,3 +211,12 @@ unsigned RISCVSubtarget::getMinimumJumpTableEntries() const {
              ? RISCVMinimumJumpTableEntries
              : TuneInfo->MinimumJumpTableEntries;
 }
+
+bool RISCVSubtarget::enableEarlyIfConversion() const {
+  TargetSchedModel SchedModel;
+  SchedModel.init(this);
+  return RISCVForceEalyIfcvt ||
+         (!RISCVDisableEarlyIfcvt &&
+          (hasStdExtZicond() || hasVendorXVentanaCondOps()) &&
+          SchedModel.hasInstrSchedModelOrItineraries());
+}
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index c880c9e921e0e..fe8143942e938 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -303,6 +303,8 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   unsigned getMinimumJumpTableEntries() const;
 
   bool supportsInitUndef() const override { return hasVInstructions(); }
+
+  bool enableEarlyIfConversion() const override;
 };
 } // End llvm namespace
 
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 5aab138dae408..a205738eeebd9 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -349,6 +349,7 @@ class RISCVPassConfig : public TargetPassConfig {
   bool addPreISel() override;
   void addCodeGenPrepare() override;
   bool addInstSelector() override;
+  bool addILPOpts() override;
   bool addIRTranslator() override;
   void addPreLegalizeMachineIR() override;
   bool addLegalizeMachineIR() override;
@@ -450,6 +451,13 @@ bool RISCVPassConfig::addInstSelector() {
   return false;
 }
 
+bool RISCVPassConfig::addILPOpts() {
+  if (getOptLevel() != CodeGenOptLevel::None) {
+    addPass(&EarlyIfConverterID);
+  }
+  return true;
+}
+
 bool RISCVPassConfig::addIRTranslator() {
   addPass(new IRTranslator(getOptLevel()));
   return false;
diff --git a/llvm/test/CodeGen/RISCV/early-ifcvt.mir b/llvm/test/CodeGen/RISCV/early-ifcvt.mir
new file mode 100644
index 0000000000000..f4acd2e3a38ec
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/early-ifcvt.mir
@@ -0,0 +1,553 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=riscv64 -mcpu=veyron-v1 -riscv-disable-early-ifcvt=false -riscv-force-early-ifcvt=true -stress-early-ifcvt=true \
+# RUN: -run-pass=early-ifcvt -simplify-mir  -o - %s | FileCheck %s
+
+---
+name: blt
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: blt
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   liveins: $x10, $x11, $x12
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %x:gpr = COPY $x10
+  ; CHECK-NEXT:   %n:gpr = COPY $x11
+  ; CHECK-NEXT:   %k:gpr = COPY $x12
+  ; CHECK-NEXT:   %zero:gpr = COPY $x0
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1, %bb.5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+  ; CHECK-NEXT:   %x_i_addr:gpr = ADD %x, %i
+  ; CHECK-NEXT:   %x_i:gpr = LD %x_i_addr, 0
+  ; CHECK-NEXT:   %add1:gpr = ADDI %x_i, 1
+  ; CHECK-NEXT:   %add2:gpr = ADDI %x_i, 2
+  ; CHECK-NEXT:   [[SLT:%[0-9]+]]:gpr = SLT %x_i, %k
+  ; CHECK-NEXT:   [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLT]]
+  ; CHECK-NEXT:   [[VT_MASKCN:%[0-9]+]]:gpr = VT_MASKCN %add2, [[SLT]]
+  ; CHECK-NEXT:   %res:gpr = OR [[VT_MASKC]], [[VT_MASKCN]]
+  ; CHECK-NEXT:   SD %res, %x_i, 0
+  ; CHECK-NEXT:   %i_inc:gpr = ADDI %i, 1
+  ; CHECK-NEXT:   BEQ %i_inc, %n, %bb.5
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.5:
+  ; CHECK-NEXT:   PseudoRET
+  bb.0:
+    successors: %bb.1
+    liveins: $x10, $x11, $x12
+
+    %x:gpr = COPY $x10
+    %n:gpr = COPY $x11
+    %k:gpr = COPY $x12
+    %zero:gpr = COPY $x0
+    PseudoBR %bb.1
+
+  bb.1:
+    successors: %bb.2, %bb.3
+
+    %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.4
+    %x_i_addr:gpr = ADD %x, %i
+    %x_i:gpr = LD %x_i_addr, 0
+    BLT %x_i, %k, %bb.2
+    PseudoBR %bb.3
+
+  bb.2:
+    successors: %bb.4
+
+    %add1:gpr = ADDI %x_i, 1
+    PseudoBR %bb.4
+
+  bb.3:
+    successors: %bb.4
+
+    %add2:gpr = ADDI %x_i, 2
+    PseudoBR %bb.4
+
+  bb.4:
+    successors: %bb.1, %bb.5
+
+    %res:gpr = PHI %add1, %bb.2, %add2, %bb.3
+    SD %res, %x_i, 0
+    %i_inc:gpr = ADDI %i, 1
+    BEQ %i_inc, %n, %bb.5
+    PseudoBR %bb.1
+
+  bb.5:
+    PseudoRET
+...
+
+---
+name: bltu
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: bltu
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   liveins: $x10, $x11, $x12
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %x:gpr = COPY $x10
+  ; CHECK-NEXT:   %n:gpr = COPY $x11
+  ; CHECK-NEXT:   %k:gpr = COPY $x12
+  ; CHECK-NEXT:   %zero:gpr = COPY $x0
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1, %bb.5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+  ; CHECK-NEXT:   %x_i_addr:gpr = ADD %x, %i
+  ; CHECK-NEXT:   %x_i:gpr = LD %x_i_addr, 0
+  ; CHECK-NEXT:   %add1:gpr = ADDI %x_i, 1
+  ; CHECK-NEXT:   %add2:gpr = ADDI %x_i, 2
+  ; CHECK-NEXT:   [[SLTU:%[0-9]+]]:gpr = SLTU %x_i, %k
+  ; CHECK-NEXT:   [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLTU]]
+  ; CHECK-NEXT:   [[VT_MASKCN:%[0-9]+]]:gpr = VT_MASKCN %add2, [[SLTU]]
+  ; CHECK-NEXT:   %res:gpr = OR [[VT_MASKC]], [[VT_MASKCN]]
+  ; CHECK-NEXT:   SD %res, %x_i, 0
+  ; CHECK-NEXT:   %i_inc:gpr = ADDI %i, 1
+  ; CHECK-NEXT:   BEQ %i_inc, %n, %bb.5
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.5:
+  ; CHECK-NEXT:   PseudoRET
+  bb.0:
+    successors: %bb.1
+    liveins: $x10, $x11, $x12
+
+    %x:gpr = COPY $x10
+    %n:gpr = COPY $x11
+    %k:gpr = COPY $x12
+    %zero:gpr = COPY $x0
+    PseudoBR %bb.1
+
+  bb.1:
+    successors: %bb.2, %bb.3
+
+    %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.4
+    %x_i_addr:gpr = ADD %x, %i
+    %x_i:gpr = LD %x_i_addr, 0
+    BLTU %x_i, %k, %bb.2
+    PseudoBR %bb.3
+
+  bb.2:
+    successors: %bb.4
+
+    %add1:gpr = ADDI %x_i, 1
+    PseudoBR %bb.4
+
+  bb.3:
+    successors: %bb.4
+
+    %add2:gpr = ADDI %x_i, 2
+    PseudoBR %bb.4
+
+  bb.4:
+    successors: %bb.1, %bb.5
+
+    %res:gpr = PHI %add1, %bb.2, %add2, %bb.3
+    SD %res, %x_i, 0
+    %i_inc:gpr = ADDI %i, 1
+    BEQ %i_inc, %n, %bb.5
+    PseudoBR %bb.1
+
+  bb.5:
+    PseudoRET
+...
+
+---
+name: beq
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: beq
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   liveins: $x10, $x11, $x12
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %x:gpr = COPY $x10
+  ; CHECK-NEXT:   %n:gpr = COPY $x11
+  ; CHECK-NEXT:   %k:gpr = COPY $x12
+  ; CHECK-NEXT:   %zero:gpr = COPY $x0
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1, %bb.5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+  ; CHECK-NEXT:   %x_i_addr:gpr = ADD %x, %i
+  ; CHECK-NEXT:   %x_i:gpr = LD %x_i_addr, 0
+  ; CHECK-NEXT:   %add1:gpr = ADDI %x_i, 1
+  ; CHECK-NEXT:   %add2:gpr = ADDI %x_i, 2
+  ; CHECK-NEXT:   [[XOR:%[0-9]+]]:gpr = XOR %x_i, %k
+  ; CHECK-NEXT:   [[SLTIU:%[0-9]+]]:gpr = SLTIU [[XOR]], 1
+  ; CHECK-NEXT:   [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLTIU]]
+  ; CHECK-NEXT:   [[VT_MASKCN:%[0-9]+]]:gpr = VT_MASKCN %add2, [[SLTIU]]
+  ; CHECK-NEXT:   %res:gpr = OR [[VT_MASKC]], [[VT_MASKCN]]
+  ; CHECK-NEXT:   SD %res, %x_i, 0
+  ; CHECK-NEXT:   %i_inc:gpr = ADDI %i, 1
+  ; CHECK-NEXT:   BEQ %i_inc, %n, %bb.5
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.5:
+  ; CHECK-NEXT:   PseudoRET
+  bb.0:
+    successors: %bb.1
+    liveins: $x10, $x11, $x12
+
+    %x:gpr = COPY $x10
+    %n:gpr = COPY $x11
+    %k:gpr = COPY $x12
+    %zero:gpr = COPY $x0
+    PseudoBR %bb.1
+
+  bb.1:
+    successors: %bb.2, %bb.3
+
+    %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.4
+    %x_i_addr:gpr = ADD %x, %i
+    %x_i:gpr = LD %x_i_addr, 0
+    BEQ %x_i, %k, %bb.2
+    PseudoBR %bb.3
+
+  bb.2:
+    successors: %bb.4
+
+    %add1:gpr = ADDI %x_i, 1
+    PseudoBR %bb.4
+
+  bb.3:
+    successors: %bb.4
+
+    %add2:gpr = ADDI %x_i, 2
+    PseudoBR %bb.4
+
+  bb.4:
+    successors: %bb.1, %bb.5
+
+    %res:gpr = PHI %add1, %bb.2, %add2, %bb.3
+    SD %res, %x_i, 0
+    %i_inc:gpr = ADDI %i, 1
+    BEQ %i_inc, %n, %bb.5
+    PseudoBR %bb.1
+
+  bb.5:
+    PseudoRET
+...
+
+---
+name: bne
+tracksRegLiveness: true
+body: |
+  ; CHECK-LABEL: name: bne
+  ; CHECK: bb.0:
+  ; CHECK-NEXT:   liveins: $x10, $x11, $x12
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %x:gpr = COPY $x10
+  ; CHECK-NEXT:   %n:gpr = COPY $x11
+  ; CHECK-NEXT:   %k:gpr = COPY $x12
+  ; CHECK-NEXT:   %zero:gpr = COPY $x0
+  ; CHECK-NEXT:   PseudoBR %bb.1
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT: bb.1:
+  ; CHECK-NEXT:   successors: %bb.1, %bb.5
+  ; CHECK-NEXT: {{  $}}
+  ; CHECK-NEXT:   %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+  ; CHECK-NEXT:   %x_i_addr:gpr = ADD %x, %i
+  ; CHECK-NEXT:   %x_i:gpr = LD %x_i_addr, 0
+  ; CHECK-NEXT:   %add1:gpr = ADDI %x_i, 1
+  ; CHECK-NEXT:   %add2:gpr = ADDI %x_i, 2
+  ; CHECK-NEXT:   [[XOR:%[0-9]+]]:gpr = XOR %x_i, %k
+  ; CHECK-NEXT:   [[SLTU:%[0-9]+]]:gpr = SLTU $x0, [[XOR]]
+  ; CHECK-NEXT:   [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLTU]]
+  ; CHECK-NEXT:   ...
[truncated]

mgudim added 4 commits June 27, 2024 01:42
Implements the necessary target methods (insertSelect, canInsertSelect)
and adds early if-conversion to the RISCV pipeline. Doing if-conversion
increases the number of executed instructions, so it only makes sense if
there is enough ILP.
Updated cost estimate for these cases.
Made `getICmpCost` and `insertICmp` accept `Cond` to make their
signatures more like `insertSelect`.
@mgudim mgudim force-pushed the riscv_early_ifcvt branch from 356a6c7 to c03161d Compare June 27, 2024 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants