-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[RISCV] Enable early if-conversion #92959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -879,7 +879,7 @@ bool EarlyIfConverter::shouldConvertIf() { | |||
// from a loop-invariant address predictable; we were unable to prove that it | |||
// doesn't alias any of the memory-writes in the loop, but it is likely to | |||
// read to same value multiple times. | |||
if (CurrentLoop && any_of(IfConv.Cond, [&](MachineOperand &MO) { | |||
if (CurrentLoop && all_of(IfConv.Cond, [&](MachineOperand &MO) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a separate PR for this change: #92405
✅ With the latest revision this PR passed the C/C++ code formatter. |
Tests? |
NotCCReg) | ||
.addReg(LHSReg) | ||
.addReg(RHSReg); | ||
BuildMI(MBB, MI, DL, get(RISCV::XORI), DstReg).addReg(NotCCReg).addImm(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we swap the true/false of the select to avoid the XORI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did the swap in insertSelect
.
TargetSchedModel SchedModel; | ||
SchedModel.init(&STI); | ||
|
||
CondCycles = getICmpCost(Cond[0].getImm(), SchedModel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understood correctly what CondCycles
means. I need to double-check this.
b4a79a0
to
e11b541
Compare
SchedModel.init(this); | ||
return RISCVForceEalyIfcvt || | ||
(!RISCVDisableEarlyIfcvt && | ||
(hasStdExtZicond() || hasVendorXVentanaCondOps()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Add support for xtheadcondmov
? cc @zixuan-wu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have evaluated this patch, and I think there is a room for more optimizations:
int loop(int m, int n, int *randArray) {
int t = 0;
for (int j = 0; j < n; j++) {
for (int i = 0; i < 4 + m; i++) {
if (randArray[i]) {
t += 3 + 3 * t;
} else {
t -= 1 - 5 * t;
}
}
}
return t;
}
For this case, I compared the results of two configurations:
clang -O2 -march=rv64gc_zba_zbb_zbc_zbs_zicond -mllvm -two-entry-phi-node-folding-threshold=16
. I increased the threshold so that we can do if conversion in LLVM IR level viaSimplifyCFG
.clang -O2 -march=rv64gc_zba_zbb_zbc_zbs_zicond -mllvm -riscv-disable-early-ifcvt=false -mllvm -riscv-force-early-ifcvt=true -mllvm -stress-early-ifcvt=true
.
The diff is:
We can remove the xor
+seqz
here.
CC = (CC == RISCVCC::COND_GE) ? RISCVCC::COND_LT : RISCVCC::COND_LTU; | ||
std::swap(CondZeroEqzReg, CondZeroNezReg); | ||
} | ||
insertICmp(MBB, MI, DL, CCReg, CC, LHSReg, RHSReg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not need this for some cases.
|
||
Register TrueValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass); | ||
Register FalseValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass); | ||
BuildMI(MBB, MI, DL, get(CondZeroEqzOpc), TrueValOrZeroReg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to implement all the patterns in SDAG ISel here to reduce some instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, we implement these ISel patterns in a MIR pass, and then run the pass after early if-conversion pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These patterns apply when one of the sides is zero. In our case we are looking at select between the results of two basic blocks. How often will it be that the result of a basic block is just zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no data to show how common these cases will be, but we have done a lot of peepholes in RISCVInstrInfoZicond.td
and RISCVTargetLowering::lowerSELECT
to reduce the comparison and Zicond instructions.
After if-conversion, we need to do the same peepholes (which means we may not need czero.eqz+czero.nez+or to simulate select
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. This seems like a lot of work and code duplication to reimplement all of these here. Here's a list of all peepholes that we do in lowerSELECT
:
(select c, t, 0) -> (czero_eqz t, c)
(select c, 0, f) -> (czero_nez f, c)
(select c, (and f, x), f) -> (or (and f, x), (czero_nez f, c))
(select c, t, (and t, x)) -> (or (czero_eqz t, c), (and t, x))
(select c, c1, c2) -> (add (czero_nez c2 - c1, c), c1)
(select c, c1, c2) -> (add (czero_eqz c1 - c2, c), c2)
(select c, -1, y) -> -c | y
(select c, y, -1) -> (c-1) | y
(select c, 0, y) -> (c-1) & y
(select c, y, 0) -> -c & y
select c, ~x, x --> xor -c, x
(select x, x, y) -> x | y
(select !x, x, y) -> x & y
(select x, y, x) -> x & y
(select !x, y, x) -> x | y
binOp ((select cond, x, c0), c1) -> select cond binOp(x, c1), binOp(c0, c1)
But perhaps we can address this later in a different MR, this MR is just the basic functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a lot of work and code duplication to reimplement all of these here.
Yes! So I said Or, we implement these ISel patterns in a MIR pass and then run the pass after early if-conversion pass
. :-)
But perhaps we can address this later in a different MR, this MR is just the basic functionality.
Agree. Can you make this PR non-draft? I ran llvm-testsuite and some internel tests and I haven't found any correctness issue, so I think it's ready for reviewing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be merged first: #95877
This MR uses a hack to bypass that.
Also I need to add some tests too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK to be stacked on other PRs, make it non-draft so that more reviewers can see it.
@wangpc-pp Thanks for the test case, I'll look into that. I have only tested on x264 train workload and I got 1% impovement in cycle count. Can you share your results? |
This is a fix for detecting when condition is loop-invariant: #95877 That needs to be merged before this patch. |
I haven't run any particular benchmark and I just evaluated it on some internel microbenches. We can see some cycles reduction. |
|
||
Register TrueValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass); | ||
Register FalseValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass); | ||
BuildMI(MBB, MI, DL, get(CondZeroEqzOpc), TrueValOrZeroReg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no data to show how common these cases will be, but we have done a lot of peepholes in RISCVInstrInfoZicond.td
and RISCVTargetLowering::lowerSELECT
to reduce the comparison and Zicond instructions.
After if-conversion, we need to do the same peepholes (which means we may not need czero.eqz+czero.nez+or to simulate select
).
return RISCVForceEalyIfcvt || | ||
(!RISCVDisableEarlyIfcvt && | ||
(hasStdExtZicond() || hasVendorXVentanaCondOps()) && | ||
SchedModel.hasInstrSchedModelOrItineraries()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need hasInstrSchedModelOrItineraries
here? We have an option to force early if-conversion, the schedule model is not really needed here.
@llvm/pr-subscribers-backend-risc-v Author: Mikhail Gudim (mgudim) ChangesImplements the necessary target methods (insertSelect, canInsertSelect) and adds early if-conversion to the RISCV pipeline. Doing if-conversion increases the number of executed instructions, so it only makes sense if there is enough ILP. Patch is 27.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92959.diff 7 Files Affected:
diff --git a/llvm/lib/CodeGen/EarlyIfConversion.cpp b/llvm/lib/CodeGen/EarlyIfConversion.cpp
index 2a7bee1618deb..2f8729fcbb945 100644
--- a/llvm/lib/CodeGen/EarlyIfConversion.cpp
+++ b/llvm/lib/CodeGen/EarlyIfConversion.cpp
@@ -879,7 +879,7 @@ bool EarlyIfConverter::shouldConvertIf() {
// from a loop-invariant address predictable; we were unable to prove that it
// doesn't alias any of the memory-writes in the loop, but it is likely to
// read to same value multiple times.
- if (CurrentLoop && any_of(IfConv.Cond, [&](MachineOperand &MO) {
+ if (CurrentLoop && all_of(IfConv.Cond, [&](MachineOperand &MO) {
if (!MO.isReg() || !MO.isUse())
return false;
Register Reg = MO.getReg();
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index 444b9076005c2..748fc34cfe353 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -1446,6 +1446,199 @@ RISCVInstrInfo::optimizeSelect(MachineInstr &MI,
return NewMI;
}
+int RISCVInstrInfo::getICmpCost(unsigned CC,
+ const TargetSchedModel &SchedModel) const {
+ switch (CC) {
+ default:
+ llvm_unreachable("Unknown condition code!");
+ case RISCVCC::COND_LT:
+ return SchedModel.computeInstrLatency(RISCV::SLT);
+ case RISCVCC::COND_LTU:
+ return SchedModel.computeInstrLatency(RISCV::SLTU);
+ case RISCVCC::COND_EQ:
+ return SchedModel.computeInstrLatency(RISCV::XOR) +
+ SchedModel.computeInstrLatency(RISCV::SLTIU);
+ case RISCVCC::COND_NE:
+ return SchedModel.computeInstrLatency(RISCV::XOR) +
+ SchedModel.computeInstrLatency(RISCV::SLTU);
+ case RISCVCC::COND_GE:
+ return SchedModel.computeInstrLatency(RISCV::XORI) +
+ SchedModel.computeInstrLatency(RISCV::SLT);
+ case RISCVCC::COND_GEU:
+ return SchedModel.computeInstrLatency(RISCV::XORI) +
+ SchedModel.computeInstrLatency(RISCV::SLTU);
+ }
+}
+
+void RISCVInstrInfo::insertICmp(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator MI,
+ const DebugLoc &DL, Register DstReg,
+ unsigned CC, Register LHSReg,
+ Register RHSReg) const {
+ MachineRegisterInfo &MRI = MI->getParent()->getParent()->getRegInfo();
+
+ switch (CC) {
+ default:
+ llvm_unreachable("Unknown condition code!");
+ case RISCVCC::COND_LT:
+ case RISCVCC::COND_LTU: {
+ BuildMI(MBB, MI, DL, get(CC == RISCVCC::COND_LT ? RISCV::SLT : RISCV::SLTU),
+ DstReg)
+ .addReg(LHSReg)
+ .addReg(RHSReg);
+ return;
+ }
+ case RISCVCC::COND_EQ:
+ case RISCVCC::COND_NE: {
+ Register XorReg = LHSReg;
+ if (RHSReg != RISCV::X0) {
+ XorReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+ BuildMI(MBB, MI, DL, get(RISCV::XOR), XorReg).addReg(LHSReg).addReg(RHSReg);
+ }
+ if (CC == RISCVCC::COND_EQ) {
+ BuildMI(MBB, MI, DL, get(RISCV::SLTIU), DstReg).addReg(XorReg).addImm(1);
+ return;
+ } else {
+ BuildMI(MBB, MI, DL, get(RISCV::SLTU), DstReg)
+ .addReg(RISCV::X0)
+ .addReg(XorReg);
+ return;
+ }
+ }
+ case RISCVCC::COND_GE:
+ case RISCVCC::COND_GEU: {
+ Register NotCCReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+ BuildMI(MBB, MI, DL, get(CC == RISCVCC::COND_GE ? RISCV::SLT : RISCV::SLTU),
+ NotCCReg)
+ .addReg(LHSReg)
+ .addReg(RHSReg);
+ BuildMI(MBB, MI, DL, get(RISCV::XORI), DstReg).addReg(NotCCReg).addImm(1);
+ return;
+ }
+ }
+}
+
+void RISCVInstrInfo::insertSelect(MachineBasicBlock &MBB,
+ MachineBasicBlock::iterator MI,
+ const DebugLoc &DL, Register DstReg,
+ ArrayRef<MachineOperand> Cond,
+ Register TrueReg, Register FalseReg) const {
+ MachineFunction &MF = *MI->getParent()->getParent();
+ const RISCVSubtarget &ST = MF.getSubtarget<RISCVSubtarget>();
+ const TargetRegisterInfo &TRI = *ST.getRegisterInfo();
+ MachineRegisterInfo &MRI = MF.getRegInfo();
+ Register CCReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+
+ unsigned CC = Cond[0].getImm();
+ Register LHSReg = Cond[1].getReg();
+ Register RHSReg = Cond[2].getReg();
+
+ unsigned CondZeroEqzOpc =
+ ST.hasVendorXVentanaCondOps() ? RISCV::VT_MASKC : RISCV::CZERO_EQZ;
+ unsigned CondZeroNezOpc =
+ ST.hasVendorXVentanaCondOps() ? RISCV::VT_MASKCN : RISCV::CZERO_NEZ;
+
+ const TargetRegisterClass *DstRC = MRI.getRegClass(DstReg);
+ const TargetRegisterClass *CommonRC =
+ TRI.getCommonSubClass(DstRC, &RISCV::GPRRegClass);
+ bool NeedsRCCopies = (CommonRC != DstRC) && (CommonRC != &RISCV::GPRRegClass);
+
+ Register CondZeroEqzReg = TrueReg;
+ Register CondZeroNezReg = FalseReg;
+ if (NeedsRCCopies) {
+ CondZeroEqzReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+ BuildMI(MBB, MI, DL, get(TargetOpcode::COPY), CondZeroEqzReg)
+ .addReg(TrueReg);
+ CondZeroNezReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+ BuildMI(MBB, MI, DL, get(TargetOpcode::COPY), CondZeroNezReg)
+ .addReg(FalseReg);
+ }
+ if (CC == RISCVCC::COND_GE || CC == RISCVCC::COND_GEU) {
+ CC = (CC == RISCVCC::COND_GE) ? RISCVCC::COND_LT : RISCVCC::COND_LTU;
+ std::swap(CondZeroEqzReg, CondZeroNezReg);
+ }
+ insertICmp(MBB, MI, DL, CCReg, CC, LHSReg, RHSReg);
+
+ Register TrueValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+ Register FalseValOrZeroReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+ BuildMI(MBB, MI, DL, get(CondZeroEqzOpc), TrueValOrZeroReg)
+ .addReg(CondZeroEqzReg)
+ .addReg(CCReg);
+ BuildMI(MBB, MI, DL, get(CondZeroNezOpc), FalseValOrZeroReg)
+ .addReg(CondZeroNezReg)
+ .addReg(CCReg);
+
+ Register SelectOutReg = DstReg;
+ if (NeedsRCCopies) {
+ SelectOutReg = MRI.createVirtualRegister(&RISCV::GPRRegClass);
+ }
+ BuildMI(MBB, MI, DL, get(RISCV::OR), SelectOutReg)
+ .addReg(TrueValOrZeroReg)
+ .addReg(FalseValOrZeroReg);
+ if (NeedsRCCopies) {
+ BuildMI(MBB, MI, DL, get(TargetOpcode::COPY), DstReg).addReg(SelectOutReg);
+ }
+
+ return;
+}
+
+bool RISCVInstrInfo::canInsertSelect(const MachineBasicBlock &MBB,
+ ArrayRef<MachineOperand> Cond,
+ Register DstReg, Register TrueReg,
+ Register FalseReg, int &CondCycles,
+ int &TrueCycles, int &FalseCycles) const {
+ const MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
+ const TargetRegisterInfo &TRI = *STI.getRegisterInfo();
+ const TargetRegisterClass *TrueRC = MRI.getRegClass(TrueReg);
+ const TargetRegisterClass *DstRC = MRI.getRegClass(DstReg);
+ const TargetRegisterClass *CommonRC =
+ TRI.getCommonSubClass(DstRC, &RISCV::GPRRegClass);
+ bool NeedsRCCopies = (CommonRC != DstRC) && (CommonRC != &RISCV::GPRRegClass);
+
+ TargetSchedModel SchedModel;
+ SchedModel.init(&STI);
+
+ // this is used for testing only.
+ if (!SchedModel.hasInstrSchedModelOrItineraries())
+ return true;
+
+ CondCycles = getICmpCost(Cond[0].getImm(), SchedModel);
+ TrueCycles = SchedModel.computeInstrLatency(RISCV::OR) +
+ SchedModel.computeInstrLatency(STI.hasVendorXVentanaCondOps()
+ ? RISCV::VT_MASKC
+ : RISCV::CZERO_EQZ);
+ FalseCycles = SchedModel.computeInstrLatency(RISCV::OR) +
+ SchedModel.computeInstrLatency(STI.hasVendorXVentanaCondOps()
+ ? RISCV::VT_MASKCN
+ : RISCV::CZERO_NEZ);
+
+ if (NeedsRCCopies) {
+ unsigned CopyIntoGPROpc = 0;
+ if (TrueRC == &RISCV::FPR32RegClass) {
+ CopyIntoGPROpc = RISCV::FMV_X_W;
+ } else if (TrueRC == &RISCV::FPR64RegClass) {
+ CopyIntoGPROpc = RISCV::FMV_X_D;
+ } else {
+ llvm_unreachable("Unknown register class");
+ }
+ int CopyIntoGPRCycles = SchedModel.computeInstrLatency(CopyIntoGPROpc);
+
+ unsigned CopyFromGPROpc = 0;
+ if (DstRC == &RISCV::FPR32RegClass) {
+ CopyIntoGPROpc = RISCV::FMV_W_X;
+ } else if (DstRC == &RISCV::FPR64RegClass) {
+ CopyIntoGPROpc = RISCV::FMV_D_X;
+ } else {
+ llvm_unreachable("Unknown register class");
+ }
+ int CopyFromGPRCycles = SchedModel.computeInstrLatency(CopyFromGPROpc);
+
+ TrueCycles += (CopyIntoGPRCycles + CopyFromGPRCycles);
+ FalseCycles += (CopyIntoGPRCycles + CopyFromGPRCycles);
+ }
+ return true;
+}
+
unsigned RISCVInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
if (MI.isMetaInstruction())
return 0;
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index e069717aaef23..a331f1e2af9db 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -149,6 +149,18 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
SmallPtrSetImpl<MachineInstr *> &SeenMIs,
bool) const override;
+ int getICmpCost(unsigned CC, const TargetSchedModel &SchedModel) const;
+ void insertICmp(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
+ const DebugLoc &DL, Register DstReg, unsigned CC,
+ Register LHSReg, Register RHSReg) const;
+ bool canInsertSelect(const MachineBasicBlock &, ArrayRef<MachineOperand> Cond,
+ Register, Register, Register, int &, int &,
+ int &) const override;
+ void insertSelect(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
+ const DebugLoc &DL, Register DstReg,
+ ArrayRef<MachineOperand> Cond, Register TrueReg,
+ Register FalseReg) const override;
+
bool isAsCheapAsAMove(const MachineInstr &MI) const override;
std::optional<DestSourcePair>
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index d3236bb07d56d..0b0347003f0d4 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -65,6 +65,14 @@ static cl::opt<unsigned> RISCVMinimumJumpTableEntries(
"riscv-min-jump-table-entries", cl::Hidden,
cl::desc("Set minimum number of entries to use a jump table on RISCV"));
+static cl::opt<bool>
+ RISCVDisableEarlyIfcvt("riscv-disable-early-ifcvt", cl::Hidden,
+ cl::desc("Disable early if-conversion"),
+ cl::init(true), cl::Hidden);
+
+static cl::opt<bool> RISCVForceEalyIfcvt("riscv-force-early-ifcvt", cl::Hidden,
+ cl::desc("Force early if-conversion"),
+ cl::init(false), cl::Hidden);
void RISCVSubtarget::anchor() {}
RISCVSubtarget &
@@ -203,3 +211,12 @@ unsigned RISCVSubtarget::getMinimumJumpTableEntries() const {
? RISCVMinimumJumpTableEntries
: TuneInfo->MinimumJumpTableEntries;
}
+
+bool RISCVSubtarget::enableEarlyIfConversion() const {
+ TargetSchedModel SchedModel;
+ SchedModel.init(this);
+ return RISCVForceEalyIfcvt ||
+ (!RISCVDisableEarlyIfcvt &&
+ (hasStdExtZicond() || hasVendorXVentanaCondOps()) &&
+ SchedModel.hasInstrSchedModelOrItineraries());
+}
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index c880c9e921e0e..fe8143942e938 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -303,6 +303,8 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
unsigned getMinimumJumpTableEntries() const;
bool supportsInitUndef() const override { return hasVInstructions(); }
+
+ bool enableEarlyIfConversion() const override;
};
} // End llvm namespace
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index 5aab138dae408..a205738eeebd9 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -349,6 +349,7 @@ class RISCVPassConfig : public TargetPassConfig {
bool addPreISel() override;
void addCodeGenPrepare() override;
bool addInstSelector() override;
+ bool addILPOpts() override;
bool addIRTranslator() override;
void addPreLegalizeMachineIR() override;
bool addLegalizeMachineIR() override;
@@ -450,6 +451,13 @@ bool RISCVPassConfig::addInstSelector() {
return false;
}
+bool RISCVPassConfig::addILPOpts() {
+ if (getOptLevel() != CodeGenOptLevel::None) {
+ addPass(&EarlyIfConverterID);
+ }
+ return true;
+}
+
bool RISCVPassConfig::addIRTranslator() {
addPass(new IRTranslator(getOptLevel()));
return false;
diff --git a/llvm/test/CodeGen/RISCV/early-ifcvt.mir b/llvm/test/CodeGen/RISCV/early-ifcvt.mir
new file mode 100644
index 0000000000000..f4acd2e3a38ec
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/early-ifcvt.mir
@@ -0,0 +1,553 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=riscv64 -mcpu=veyron-v1 -riscv-disable-early-ifcvt=false -riscv-force-early-ifcvt=true -stress-early-ifcvt=true \
+# RUN: -run-pass=early-ifcvt -simplify-mir -o - %s | FileCheck %s
+
+---
+name: blt
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: blt
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: liveins: $x10, $x11, $x12
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %x:gpr = COPY $x10
+ ; CHECK-NEXT: %n:gpr = COPY $x11
+ ; CHECK-NEXT: %k:gpr = COPY $x12
+ ; CHECK-NEXT: %zero:gpr = COPY $x0
+ ; CHECK-NEXT: PseudoBR %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.1, %bb.5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+ ; CHECK-NEXT: %x_i_addr:gpr = ADD %x, %i
+ ; CHECK-NEXT: %x_i:gpr = LD %x_i_addr, 0
+ ; CHECK-NEXT: %add1:gpr = ADDI %x_i, 1
+ ; CHECK-NEXT: %add2:gpr = ADDI %x_i, 2
+ ; CHECK-NEXT: [[SLT:%[0-9]+]]:gpr = SLT %x_i, %k
+ ; CHECK-NEXT: [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLT]]
+ ; CHECK-NEXT: [[VT_MASKCN:%[0-9]+]]:gpr = VT_MASKCN %add2, [[SLT]]
+ ; CHECK-NEXT: %res:gpr = OR [[VT_MASKC]], [[VT_MASKCN]]
+ ; CHECK-NEXT: SD %res, %x_i, 0
+ ; CHECK-NEXT: %i_inc:gpr = ADDI %i, 1
+ ; CHECK-NEXT: BEQ %i_inc, %n, %bb.5
+ ; CHECK-NEXT: PseudoBR %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.5:
+ ; CHECK-NEXT: PseudoRET
+ bb.0:
+ successors: %bb.1
+ liveins: $x10, $x11, $x12
+
+ %x:gpr = COPY $x10
+ %n:gpr = COPY $x11
+ %k:gpr = COPY $x12
+ %zero:gpr = COPY $x0
+ PseudoBR %bb.1
+
+ bb.1:
+ successors: %bb.2, %bb.3
+
+ %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.4
+ %x_i_addr:gpr = ADD %x, %i
+ %x_i:gpr = LD %x_i_addr, 0
+ BLT %x_i, %k, %bb.2
+ PseudoBR %bb.3
+
+ bb.2:
+ successors: %bb.4
+
+ %add1:gpr = ADDI %x_i, 1
+ PseudoBR %bb.4
+
+ bb.3:
+ successors: %bb.4
+
+ %add2:gpr = ADDI %x_i, 2
+ PseudoBR %bb.4
+
+ bb.4:
+ successors: %bb.1, %bb.5
+
+ %res:gpr = PHI %add1, %bb.2, %add2, %bb.3
+ SD %res, %x_i, 0
+ %i_inc:gpr = ADDI %i, 1
+ BEQ %i_inc, %n, %bb.5
+ PseudoBR %bb.1
+
+ bb.5:
+ PseudoRET
+...
+
+---
+name: bltu
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: bltu
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: liveins: $x10, $x11, $x12
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %x:gpr = COPY $x10
+ ; CHECK-NEXT: %n:gpr = COPY $x11
+ ; CHECK-NEXT: %k:gpr = COPY $x12
+ ; CHECK-NEXT: %zero:gpr = COPY $x0
+ ; CHECK-NEXT: PseudoBR %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.1, %bb.5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+ ; CHECK-NEXT: %x_i_addr:gpr = ADD %x, %i
+ ; CHECK-NEXT: %x_i:gpr = LD %x_i_addr, 0
+ ; CHECK-NEXT: %add1:gpr = ADDI %x_i, 1
+ ; CHECK-NEXT: %add2:gpr = ADDI %x_i, 2
+ ; CHECK-NEXT: [[SLTU:%[0-9]+]]:gpr = SLTU %x_i, %k
+ ; CHECK-NEXT: [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLTU]]
+ ; CHECK-NEXT: [[VT_MASKCN:%[0-9]+]]:gpr = VT_MASKCN %add2, [[SLTU]]
+ ; CHECK-NEXT: %res:gpr = OR [[VT_MASKC]], [[VT_MASKCN]]
+ ; CHECK-NEXT: SD %res, %x_i, 0
+ ; CHECK-NEXT: %i_inc:gpr = ADDI %i, 1
+ ; CHECK-NEXT: BEQ %i_inc, %n, %bb.5
+ ; CHECK-NEXT: PseudoBR %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.5:
+ ; CHECK-NEXT: PseudoRET
+ bb.0:
+ successors: %bb.1
+ liveins: $x10, $x11, $x12
+
+ %x:gpr = COPY $x10
+ %n:gpr = COPY $x11
+ %k:gpr = COPY $x12
+ %zero:gpr = COPY $x0
+ PseudoBR %bb.1
+
+ bb.1:
+ successors: %bb.2, %bb.3
+
+ %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.4
+ %x_i_addr:gpr = ADD %x, %i
+ %x_i:gpr = LD %x_i_addr, 0
+ BLTU %x_i, %k, %bb.2
+ PseudoBR %bb.3
+
+ bb.2:
+ successors: %bb.4
+
+ %add1:gpr = ADDI %x_i, 1
+ PseudoBR %bb.4
+
+ bb.3:
+ successors: %bb.4
+
+ %add2:gpr = ADDI %x_i, 2
+ PseudoBR %bb.4
+
+ bb.4:
+ successors: %bb.1, %bb.5
+
+ %res:gpr = PHI %add1, %bb.2, %add2, %bb.3
+ SD %res, %x_i, 0
+ %i_inc:gpr = ADDI %i, 1
+ BEQ %i_inc, %n, %bb.5
+ PseudoBR %bb.1
+
+ bb.5:
+ PseudoRET
+...
+
+---
+name: beq
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: beq
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: liveins: $x10, $x11, $x12
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %x:gpr = COPY $x10
+ ; CHECK-NEXT: %n:gpr = COPY $x11
+ ; CHECK-NEXT: %k:gpr = COPY $x12
+ ; CHECK-NEXT: %zero:gpr = COPY $x0
+ ; CHECK-NEXT: PseudoBR %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.1, %bb.5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+ ; CHECK-NEXT: %x_i_addr:gpr = ADD %x, %i
+ ; CHECK-NEXT: %x_i:gpr = LD %x_i_addr, 0
+ ; CHECK-NEXT: %add1:gpr = ADDI %x_i, 1
+ ; CHECK-NEXT: %add2:gpr = ADDI %x_i, 2
+ ; CHECK-NEXT: [[XOR:%[0-9]+]]:gpr = XOR %x_i, %k
+ ; CHECK-NEXT: [[SLTIU:%[0-9]+]]:gpr = SLTIU [[XOR]], 1
+ ; CHECK-NEXT: [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLTIU]]
+ ; CHECK-NEXT: [[VT_MASKCN:%[0-9]+]]:gpr = VT_MASKCN %add2, [[SLTIU]]
+ ; CHECK-NEXT: %res:gpr = OR [[VT_MASKC]], [[VT_MASKCN]]
+ ; CHECK-NEXT: SD %res, %x_i, 0
+ ; CHECK-NEXT: %i_inc:gpr = ADDI %i, 1
+ ; CHECK-NEXT: BEQ %i_inc, %n, %bb.5
+ ; CHECK-NEXT: PseudoBR %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.5:
+ ; CHECK-NEXT: PseudoRET
+ bb.0:
+ successors: %bb.1
+ liveins: $x10, $x11, $x12
+
+ %x:gpr = COPY $x10
+ %n:gpr = COPY $x11
+ %k:gpr = COPY $x12
+ %zero:gpr = COPY $x0
+ PseudoBR %bb.1
+
+ bb.1:
+ successors: %bb.2, %bb.3
+
+ %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.4
+ %x_i_addr:gpr = ADD %x, %i
+ %x_i:gpr = LD %x_i_addr, 0
+ BEQ %x_i, %k, %bb.2
+ PseudoBR %bb.3
+
+ bb.2:
+ successors: %bb.4
+
+ %add1:gpr = ADDI %x_i, 1
+ PseudoBR %bb.4
+
+ bb.3:
+ successors: %bb.4
+
+ %add2:gpr = ADDI %x_i, 2
+ PseudoBR %bb.4
+
+ bb.4:
+ successors: %bb.1, %bb.5
+
+ %res:gpr = PHI %add1, %bb.2, %add2, %bb.3
+ SD %res, %x_i, 0
+ %i_inc:gpr = ADDI %i, 1
+ BEQ %i_inc, %n, %bb.5
+ PseudoBR %bb.1
+
+ bb.5:
+ PseudoRET
+...
+
+---
+name: bne
+tracksRegLiveness: true
+body: |
+ ; CHECK-LABEL: name: bne
+ ; CHECK: bb.0:
+ ; CHECK-NEXT: liveins: $x10, $x11, $x12
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %x:gpr = COPY $x10
+ ; CHECK-NEXT: %n:gpr = COPY $x11
+ ; CHECK-NEXT: %k:gpr = COPY $x12
+ ; CHECK-NEXT: %zero:gpr = COPY $x0
+ ; CHECK-NEXT: PseudoBR %bb.1
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: bb.1:
+ ; CHECK-NEXT: successors: %bb.1, %bb.5
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: %i:gpr = PHI %zero, %bb.0, %i_inc, %bb.1
+ ; CHECK-NEXT: %x_i_addr:gpr = ADD %x, %i
+ ; CHECK-NEXT: %x_i:gpr = LD %x_i_addr, 0
+ ; CHECK-NEXT: %add1:gpr = ADDI %x_i, 1
+ ; CHECK-NEXT: %add2:gpr = ADDI %x_i, 2
+ ; CHECK-NEXT: [[XOR:%[0-9]+]]:gpr = XOR %x_i, %k
+ ; CHECK-NEXT: [[SLTU:%[0-9]+]]:gpr = SLTU $x0, [[XOR]]
+ ; CHECK-NEXT: [[VT_MASKC:%[0-9]+]]:gpr = VT_MASKC %add1, [[SLTU]]
+ ; CHECK-NEXT: ...
[truncated]
|
Implements the necessary target methods (insertSelect, canInsertSelect) and adds early if-conversion to the RISCV pipeline. Doing if-conversion increases the number of executed instructions, so it only makes sense if there is enough ILP.
Updated cost estimate for these cases. Made `getICmpCost` and `insertICmp` accept `Cond` to make their signatures more like `insertSelect`.
356a6c7
to
c03161d
Compare
Implements the necessary target methods (insertSelect, canInsertSelect) and adds early if-conversion to the RISCV pipeline. Doing if-conversion increases the number of executed instructions, so it only makes sense if there is enough ILP.