Skip to content

[RegAllocFast] Handle single-vdef instrs faster #96284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 21, 2024

Conversation

aengelke
Copy link
Contributor

On x86, many instructions have tied operands, so allocateInstruction uses the more complex assignment strategy, which computes the assignment order of virtual defs first. This involves iterating over all register classes (or register aliases for physical defs) to compute the possible number of defs per register class.

However, this information is only used for sorting virtual defs and therefore not required when there's only one virtual def -- which is a very common case. As iterating over all register classes/aliases is not cheap, do this only when there's more than one virtual def.


I'm wondering, how many instructions besides inlineasm actually need this analysis. When asserting !SmallClass, there's only a single inlineasm test case where this case is actually hit. c-t-t indicates a ~0.8% performance improvement.

@aengelke aengelke requested a review from arsenm June 21, 2024 08:11
@llvmbot
Copy link
Member

llvmbot commented Jun 21, 2024

@llvm/pr-subscribers-llvm-regalloc

Author: Alexis Engelke (aengelke)

Changes

On x86, many instructions have tied operands, so allocateInstruction uses the more complex assignment strategy, which computes the assignment order of virtual defs first. This involves iterating over all register classes (or register aliases for physical defs) to compute the possible number of defs per register class.

However, this information is only used for sorting virtual defs and therefore not required when there's only one virtual def -- which is a very common case. As iterating over all register classes/aliases is not cheap, do this only when there's more than one virtual def.


I'm wondering, how many instructions besides inlineasm actually need this analysis. When asserting !SmallClass, there's only a single inlineasm test case where this case is actually hit. c-t-t indicates a ~0.8% performance improvement.


Full diff: https://github.com/llvm/llvm-project/pull/96284.diff

1 Files Affected:

  • (modified) llvm/lib/CodeGen/RegAllocFast.cpp (+19-10)
diff --git a/llvm/lib/CodeGen/RegAllocFast.cpp b/llvm/lib/CodeGen/RegAllocFast.cpp
index 09ce8c42a3850..d194445abbfc8 100644
--- a/llvm/lib/CodeGen/RegAllocFast.cpp
+++ b/llvm/lib/CodeGen/RegAllocFast.cpp
@@ -1289,10 +1289,6 @@ void RegAllocFastImpl::addRegClassDefCounts(
 void RegAllocFastImpl::findAndSortDefOperandIndexes(const MachineInstr &MI) {
   DefOperandIndexes.clear();
 
-  // Track number of defs which may consume a register from the class.
-  std::vector<unsigned> RegClassDefCounts(TRI->getNumRegClasses(), 0);
-  assert(RegClassDefCounts[0] == 0);
-
   LLVM_DEBUG(dbgs() << "Need to assign livethroughs\n");
   for (unsigned I = 0, E = MI.getNumOperands(); I < E; ++I) {
     const MachineOperand &MO = MI.getOperand(I);
@@ -1306,14 +1302,27 @@ void RegAllocFastImpl::findAndSortDefOperandIndexes(const MachineInstr &MI) {
       }
     }
 
-    if (MO.isDef()) {
-      if (Reg.isVirtual() && shouldAllocateRegister(Reg))
-        DefOperandIndexes.push_back(I);
-
-      addRegClassDefCounts(RegClassDefCounts, Reg);
-    }
+    if (MO.isDef() && Reg.isVirtual() && shouldAllocateRegister(Reg))
+      DefOperandIndexes.push_back(I);
   }
 
+  // Most instructions only have one virtual def, so there's no point in
+  // computing the possible number of defs for every register class.
+  if (DefOperandIndexes.size() <= 1)
+    return;
+
+  // Track number of defs which may consume a register from the class. This is
+  // used to assign registers for possibly-too-small classes first. Example:
+  // defs are eax, 3 * gr32_abcd, 2 * gr32 => we want to assign the gr32_abcd
+  // registers first so that the gr32 don't use the gr32_abcd registers before
+  // we assign these.
+  std::vector<unsigned> RegClassDefCounts(TRI->getNumRegClasses(), 0);
+  assert(RegClassDefCounts[0] == 0);
+
+  for (const MachineOperand &MO : MI.operands())
+    if (MO.isReg() && MO.isDef())
+      addRegClassDefCounts(RegClassDefCounts, MO.getReg());
+
   llvm::sort(DefOperandIndexes, [&](uint16_t I0, uint16_t I1) {
     const MachineOperand &MO0 = MI.getOperand(I0);
     const MachineOperand &MO1 = MI.getOperand(I1);

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering, how many instructions besides inlineasm actually need this analysis.

I know of a couple cases that have 2 tied vreg defs

// defs are eax, 3 * gr32_abcd, 2 * gr32 => we want to assign the gr32_abcd
// registers first so that the gr32 don't use the gr32_abcd registers before
// we assign these.
std::vector<unsigned> RegClassDefCounts(TRI->getNumRegClasses(), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a SmallVector.

We also probably should have a typedef in MachineInstr for the maximum number of operands type. Currently it's an uint32_t with an implied maximum of 24-bits but that seems like it could change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using SmallVector now and replaced operand index uint16_t with unsigned. Changing the type of NumOperands seems like a very invasive refactoring and should probably be done spearately?

@aengelke
Copy link
Contributor Author

I know of a couple cases that have 2 tied vreg defs

I was wondering in how many cases we actually hit a ClassSize < RegClassDefCounts[RC] case -- so multiple virtual defs and a possibly-too-small regclass.

// registers first so that the gr32 don't use the gr32_abcd registers before
// we assign these.
SmallVector<unsigned> RegClassDefCounts(TRI->getNumRegClasses(), 0);
assert(RegClassDefCounts[0] == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not worth asserting the SmallVector constructor works correctly?

@@ -258,7 +258,7 @@ class RegAllocFastImpl {
/// cannot be allocated.
RegUnitSet UsedInInstr;
RegUnitSet PhysRegUses;
SmallVector<uint16_t, 8> DefOperandIndexes;
SmallVector<unsigned, 8> DefOperandIndexes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should pre-commit this

On x86, most instructions have tied operands, so allocateInstruction
uses the more complex assignment strategy which computes the assignment
order of virtual defs first. This involves iterating over all register
classes (or register aliases for physical defs) to compute the possible
number of defs per register class.

However, this information is only used for sorting virtual defs and
therefore not required when there's only one virtual def -- which is a
very common case. As iterating over all register classes/aliases is not
cheap, do this only when there's more than one virtual def.
@aengelke aengelke merged commit 0ae6cfc into llvm:main Jun 21, 2024
@aengelke aengelke deleted the perf/regalloc1 branch June 21, 2024 10:31
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
On x86, many instructions have tied operands, so allocateInstruction
uses the more complex assignment strategy, which computes the assignment
order of virtual defs first. This involves iterating over all register
classes (or register aliases for physical defs) to compute the possible
number of defs per register class.

However, this information is only used for sorting virtual defs and
therefore not required when there's only one virtual def -- which is a
very common case. As iterating over all register classes/aliases is not
cheap, do this only when there's more than one virtual def.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants