-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[RegAllocFast] Handle single-vdef instrs faster #96284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-regalloc Author: Alexis Engelke (aengelke) ChangesOn x86, many instructions have tied operands, so allocateInstruction uses the more complex assignment strategy, which computes the assignment order of virtual defs first. This involves iterating over all register classes (or register aliases for physical defs) to compute the possible number of defs per register class. However, this information is only used for sorting virtual defs and therefore not required when there's only one virtual def -- which is a very common case. As iterating over all register classes/aliases is not cheap, do this only when there's more than one virtual def. I'm wondering, how many instructions besides inlineasm actually need this analysis. When asserting Full diff: https://github.com/llvm/llvm-project/pull/96284.diff 1 Files Affected:
diff --git a/llvm/lib/CodeGen/RegAllocFast.cpp b/llvm/lib/CodeGen/RegAllocFast.cpp
index 09ce8c42a3850..d194445abbfc8 100644
--- a/llvm/lib/CodeGen/RegAllocFast.cpp
+++ b/llvm/lib/CodeGen/RegAllocFast.cpp
@@ -1289,10 +1289,6 @@ void RegAllocFastImpl::addRegClassDefCounts(
void RegAllocFastImpl::findAndSortDefOperandIndexes(const MachineInstr &MI) {
DefOperandIndexes.clear();
- // Track number of defs which may consume a register from the class.
- std::vector<unsigned> RegClassDefCounts(TRI->getNumRegClasses(), 0);
- assert(RegClassDefCounts[0] == 0);
-
LLVM_DEBUG(dbgs() << "Need to assign livethroughs\n");
for (unsigned I = 0, E = MI.getNumOperands(); I < E; ++I) {
const MachineOperand &MO = MI.getOperand(I);
@@ -1306,14 +1302,27 @@ void RegAllocFastImpl::findAndSortDefOperandIndexes(const MachineInstr &MI) {
}
}
- if (MO.isDef()) {
- if (Reg.isVirtual() && shouldAllocateRegister(Reg))
- DefOperandIndexes.push_back(I);
-
- addRegClassDefCounts(RegClassDefCounts, Reg);
- }
+ if (MO.isDef() && Reg.isVirtual() && shouldAllocateRegister(Reg))
+ DefOperandIndexes.push_back(I);
}
+ // Most instructions only have one virtual def, so there's no point in
+ // computing the possible number of defs for every register class.
+ if (DefOperandIndexes.size() <= 1)
+ return;
+
+ // Track number of defs which may consume a register from the class. This is
+ // used to assign registers for possibly-too-small classes first. Example:
+ // defs are eax, 3 * gr32_abcd, 2 * gr32 => we want to assign the gr32_abcd
+ // registers first so that the gr32 don't use the gr32_abcd registers before
+ // we assign these.
+ std::vector<unsigned> RegClassDefCounts(TRI->getNumRegClasses(), 0);
+ assert(RegClassDefCounts[0] == 0);
+
+ for (const MachineOperand &MO : MI.operands())
+ if (MO.isReg() && MO.isDef())
+ addRegClassDefCounts(RegClassDefCounts, MO.getReg());
+
llvm::sort(DefOperandIndexes, [&](uint16_t I0, uint16_t I1) {
const MachineOperand &MO0 = MI.getOperand(I0);
const MachineOperand &MO1 = MI.getOperand(I1);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering, how many instructions besides inlineasm actually need this analysis.
I know of a couple cases that have 2 tied vreg defs
llvm/lib/CodeGen/RegAllocFast.cpp
Outdated
// defs are eax, 3 * gr32_abcd, 2 * gr32 => we want to assign the gr32_abcd | ||
// registers first so that the gr32 don't use the gr32_abcd registers before | ||
// we assign these. | ||
std::vector<unsigned> RegClassDefCounts(TRI->getNumRegClasses(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a SmallVector.
We also probably should have a typedef in MachineInstr for the maximum number of operands type. Currently it's an uint32_t with an implied maximum of 24-bits but that seems like it could change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using SmallVector now and replaced operand index uint16_t with unsigned. Changing the type of NumOperands seems like a very invasive refactoring and should probably be done spearately?
I was wondering in how many cases we actually hit a |
llvm/lib/CodeGen/RegAllocFast.cpp
Outdated
// registers first so that the gr32 don't use the gr32_abcd registers before | ||
// we assign these. | ||
SmallVector<unsigned> RegClassDefCounts(TRI->getNumRegClasses(), 0); | ||
assert(RegClassDefCounts[0] == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not worth asserting the SmallVector constructor works correctly?
llvm/lib/CodeGen/RegAllocFast.cpp
Outdated
@@ -258,7 +258,7 @@ class RegAllocFastImpl { | |||
/// cannot be allocated. | |||
RegUnitSet UsedInInstr; | |||
RegUnitSet PhysRegUses; | |||
SmallVector<uint16_t, 8> DefOperandIndexes; | |||
SmallVector<unsigned, 8> DefOperandIndexes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should pre-commit this
On x86, most instructions have tied operands, so allocateInstruction uses the more complex assignment strategy which computes the assignment order of virtual defs first. This involves iterating over all register classes (or register aliases for physical defs) to compute the possible number of defs per register class. However, this information is only used for sorting virtual defs and therefore not required when there's only one virtual def -- which is a very common case. As iterating over all register classes/aliases is not cheap, do this only when there's more than one virtual def.
On x86, many instructions have tied operands, so allocateInstruction uses the more complex assignment strategy, which computes the assignment order of virtual defs first. This involves iterating over all register classes (or register aliases for physical defs) to compute the possible number of defs per register class. However, this information is only used for sorting virtual defs and therefore not required when there's only one virtual def -- which is a very common case. As iterating over all register classes/aliases is not cheap, do this only when there's more than one virtual def.
On x86, many instructions have tied operands, so allocateInstruction uses the more complex assignment strategy, which computes the assignment order of virtual defs first. This involves iterating over all register classes (or register aliases for physical defs) to compute the possible number of defs per register class.
However, this information is only used for sorting virtual defs and therefore not required when there's only one virtual def -- which is a very common case. As iterating over all register classes/aliases is not cheap, do this only when there's more than one virtual def.
I'm wondering, how many instructions besides inlineasm actually need this analysis. When asserting
!SmallClass
, there's only a single inlineasm test case where this case is actually hit. c-t-t indicates a ~0.8% performance improvement.