[NFC][X86] Reorder the registers to reduce unnecessary iterations #70222

KanRobert · 2023-10-25T15:56:03Z

Introduce field PositionOrder for class Register and RegisterTuples
If register A's PositionOrder < register B's PositionOrder, then A is placed before B in the enum in X86GenRegisterInfo.inc
The new order of registers in the enum for X86 will be
1. Registers before AVX512,
2. AVX512 registers (X/YMM16-31, ZMM0-31, K registers)
3. AMX registers (TMM)
4. APX registers (R16-R31)
Add a new target hook getNumSupportedRegs() to return the number of registers for the function (may overestimate).
Replace getNumRegs() with getNumSupportedRegs() in LiveVariables to eliminate iterations on unsupported registers

This patch can reduce 0.3% instruction count regression for sqlite3 during compile-stage (O3) by not iterating on APX registers
for #67702

github-actions · 2023-10-25T16:08:57Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2023-10-25T16:22:06Z

@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-regalloc

Author: Shengchen Kan (KanRobert)

Changes

Introduce field PositionOrder for class Register and RegisterTuples
If register A's PositionOrder < register B's PositionOrder, then A is placed before B in the enum in X86GenRegisterInfo.inc
The new order of registers in the enum will be
1. Registers before AVX512,
2. AVX512 registers (X/YMM16-31, ZMM0-31, K registers)
3. AMX registers (TMM)
4. APX registers (R16-R31)
Add a new target hook getNumSupportedRegs() to return the number of registers for the function (may overestimate).
Replace getNumRegs() with getNumSupportedRegs() in LiveVariables to eliminate iterations on unsupported registers

This patch can reduce 0.2% instruction count for sqlite3 during compile-stage (O3) by not iterating on AVX, AVX512 and AMX registers.

Motivation of this patch: Reduce the regression of #67702

Patch is 21.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70222.diff

13 Files Affected:

(modified) llvm/include/llvm/CodeGen/LiveVariables.h (+2-2)
(modified) llvm/include/llvm/CodeGen/TargetRegisterInfo.h (+5)
(modified) llvm/include/llvm/TableGen/Record.h (+5)
(modified) llvm/include/llvm/Target/Target.td (+8)
(modified) llvm/lib/CodeGen/LiveVariables.cpp (+10-8)
(modified) llvm/lib/Target/X86/AsmParser/X86Operand.h (+8-8)
(modified) llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h (+18-2)
(modified) llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp (+3-3)
(modified) llvm/lib/Target/X86/X86RegisterInfo.cpp (+28-2)
(modified) llvm/lib/Target/X86/X86RegisterInfo.h (+3)
(modified) llvm/lib/Target/X86/X86RegisterInfo.td (+31-18)
(modified) llvm/test/CodeGen/X86/ipra-reg-usage.ll (+1-1)
(modified) llvm/utils/TableGen/CodeGenRegisters.cpp (+8-11)

diff --git a/llvm/include/llvm/CodeGen/LiveVariables.h b/llvm/include/llvm/CodeGen/LiveVariables.h
index 9ed4c7bdf7b1771..0c3414531ab1b0c 100644
--- a/llvm/include/llvm/CodeGen/LiveVariables.h
+++ b/llvm/include/llvm/CodeGen/LiveVariables.h
@@ -147,7 +147,7 @@ class LiveVariables : public MachineFunctionPass {
   bool HandlePhysRegKill(Register Reg, MachineInstr *MI);
 
   /// HandleRegMask - Call HandlePhysRegKill for all registers clobbered by Mask.
-  void HandleRegMask(const MachineOperand&);
+  void HandleRegMask(const MachineOperand&, unsigned);
 
   void HandlePhysRegUse(Register Reg, MachineInstr &MI);
   void HandlePhysRegDef(Register Reg, MachineInstr *MI,
@@ -170,7 +170,7 @@ class LiveVariables : public MachineFunctionPass {
   /// is coming from.
   void analyzePHINodes(const MachineFunction& Fn);
 
-  void runOnInstr(MachineInstr &MI, SmallVectorImpl<unsigned> &Defs);
+  void runOnInstr(MachineInstr &MI, SmallVectorImpl<unsigned> &Defs, unsigned NumRegs);
 
   void runOnBlock(MachineBasicBlock *MBB, unsigned NumRegs);
 public:
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 5bf27e40eee8909..337fab735a09522 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -266,6 +266,11 @@ class TargetRegisterInfo : public MCRegisterInfo {
   virtual ~TargetRegisterInfo();
 
 public:
+  /// Return the number of registers for the function. (may overestimate)
+  virtual unsigned getNumSupportedRegs(const MachineFunction &) const {
+    return getNumRegs();
+  }
+
   // Register numbers can represent physical registers, virtual registers, and
   // sometimes stack slots. The unsigned values are divided into these ranges:
   //
diff --git a/llvm/include/llvm/TableGen/Record.h b/llvm/include/llvm/TableGen/Record.h
index d1023a73d606847..ba0d0de79851b9d 100644
--- a/llvm/include/llvm/TableGen/Record.h
+++ b/llvm/include/llvm/TableGen/Record.h
@@ -2155,6 +2155,11 @@ struct LessRecordRegister {
   };
 
   bool operator()(const Record *Rec1, const Record *Rec2) const {
+    int64_t LHSPositionOrder = Rec1->getValueAsInt("PositionOrder");
+    int64_t RHSPositionOrder = Rec2->getValueAsInt("PositionOrder");
+    if (LHSPositionOrder != RHSPositionOrder)
+      return LHSPositionOrder < RHSPositionOrder;
+
     RecordParts LHSParts(StringRef(Rec1->getName()));
     RecordParts RHSParts(StringRef(Rec2->getName()));
 
diff --git a/llvm/include/llvm/Target/Target.td b/llvm/include/llvm/Target/Target.td
index 8ad2066aa83de9e..3b1d2f45267e9e0 100644
--- a/llvm/include/llvm/Target/Target.td
+++ b/llvm/include/llvm/Target/Target.td
@@ -205,6 +205,10 @@ class Register<string n, list<string> altNames = []> {
   // isConstant - This register always holds a constant value (e.g. the zero
   // register in architectures such as MIPS)
   bit isConstant = false;
+
+  /// PositionOrder - Indicate tablegen to place the newly added register at a later
+  /// position to avoid iterations on them on unsupported target.
+  int PositionOrder = 0;
 }
 
 // RegisterWithSubRegs - This can be used to define instances of Register which
@@ -417,6 +421,10 @@ class RegisterTuples<list<SubRegIndex> Indices, list<dag> Regs,
 
   // List of asm names for the generated tuple registers.
   list<string> RegAsmNames = RegNames;
+
+  // PositionOrder - Indicate tablegen to place the newly added register at a later
+  // position to avoid iterations on them on unsupported target.
+  int PositionOrder = 0;
 }
 
 // RegisterCategory - This class is a list of RegisterClasses that belong to a
diff --git a/llvm/lib/CodeGen/LiveVariables.cpp b/llvm/lib/CodeGen/LiveVariables.cpp
index 6b983b6320c711f..b85526cfb380b6a 100644
--- a/llvm/lib/CodeGen/LiveVariables.cpp
+++ b/llvm/lib/CodeGen/LiveVariables.cpp
@@ -406,11 +406,11 @@ bool LiveVariables::HandlePhysRegKill(Register Reg, MachineInstr *MI) {
   return true;
 }
 
-void LiveVariables::HandleRegMask(const MachineOperand &MO) {
+void LiveVariables::HandleRegMask(const MachineOperand &MO, unsigned NumRegs) {
   // Call HandlePhysRegKill() for all live registers clobbered by Mask.
   // Clobbered registers are always dead, sp there is no need to use
   // HandlePhysRegDef().
-  for (unsigned Reg = 1, NumRegs = TRI->getNumRegs(); Reg != NumRegs; ++Reg) {
+  for (unsigned Reg = 1; Reg != NumRegs; ++Reg) {
     // Skip dead regs.
     if (!PhysRegDef[Reg] && !PhysRegUse[Reg])
       continue;
@@ -421,7 +421,8 @@ void LiveVariables::HandleRegMask(const MachineOperand &MO) {
     // This avoids needless implicit operands.
     unsigned Super = Reg;
     for (MCPhysReg SR : TRI->superregs(Reg))
-      if ((PhysRegDef[SR] || PhysRegUse[SR]) && MO.clobbersPhysReg(SR))
+      if (SR < NumRegs && (PhysRegDef[SR] || PhysRegUse[SR]) &&
+          MO.clobbersPhysReg(SR))
         Super = SR;
     HandlePhysRegKill(Super, nullptr);
   }
@@ -478,7 +479,8 @@ void LiveVariables::UpdatePhysRegDefs(MachineInstr &MI,
 }
 
 void LiveVariables::runOnInstr(MachineInstr &MI,
-                               SmallVectorImpl<unsigned> &Defs) {
+                               SmallVectorImpl<unsigned> &Defs,
+                               unsigned NumRegs) {
   assert(!MI.isDebugOrPseudoInstr());
   // Process all of the operands of the instruction...
   unsigned NumOperandsToProcess = MI.getNumOperands();
@@ -527,7 +529,7 @@ void LiveVariables::runOnInstr(MachineInstr &MI,
 
   // Process all masked registers. (Call clobbers).
   for (unsigned Mask : RegMasks)
-    HandleRegMask(MI.getOperand(Mask));
+    HandleRegMask(MI.getOperand(Mask), NumRegs);
 
   // Process all defs.
   for (unsigned MOReg : DefRegs) {
@@ -539,7 +541,7 @@ void LiveVariables::runOnInstr(MachineInstr &MI,
   UpdatePhysRegDefs(MI, Defs);
 }
 
-void LiveVariables::runOnBlock(MachineBasicBlock *MBB, const unsigned NumRegs) {
+void LiveVariables::runOnBlock(MachineBasicBlock *MBB, unsigned NumRegs) {
   // Mark live-in registers as live-in.
   SmallVector<unsigned, 4> Defs;
   for (const auto &LI : MBB->liveins()) {
@@ -556,7 +558,7 @@ void LiveVariables::runOnBlock(MachineBasicBlock *MBB, const unsigned NumRegs) {
       continue;
     DistanceMap.insert(std::make_pair(&MI, Dist++));
 
-    runOnInstr(MI, Defs);
+    runOnInstr(MI, Defs, NumRegs);
   }
 
   // Handle any virtual assignments from PHI nodes which might be at the
@@ -597,7 +599,7 @@ bool LiveVariables::runOnMachineFunction(MachineFunction &mf) {
   MRI = &mf.getRegInfo();
   TRI = MF->getSubtarget().getRegisterInfo();
 
-  const unsigned NumRegs = TRI->getNumRegs();
+  const unsigned NumRegs = TRI->getNumSupportedRegs(mf);
   PhysRegDef.assign(NumRegs, nullptr);
   PhysRegUse.assign(NumRegs, nullptr);
   PHIVarInfo.resize(MF->getNumBlockIDs());
diff --git a/llvm/lib/Target/X86/AsmParser/X86Operand.h b/llvm/lib/Target/X86/AsmParser/X86Operand.h
index 4661e73c3ef8e86..641158cb351fc4f 100644
--- a/llvm/lib/Target/X86/AsmParser/X86Operand.h
+++ b/llvm/lib/Target/X86/AsmParser/X86Operand.h
@@ -357,28 +357,28 @@ struct X86Operand final : public MCParsedAsmOperand {
   }
 
   bool isMem64_RC128X() const {
-    return isMem64() && isMemIndexReg(X86::XMM0, X86::XMM31);
+    return isMem64() && X86II::isXMMReg(Mem.IndexReg);
   }
   bool isMem128_RC128X() const {
-    return isMem128() && isMemIndexReg(X86::XMM0, X86::XMM31);
+    return isMem128() && X86II::isXMMReg(Mem.IndexReg);
   }
   bool isMem128_RC256X() const {
-    return isMem128() && isMemIndexReg(X86::YMM0, X86::YMM31);
+    return isMem128() && X86II::isYMMReg(Mem.IndexReg);
   }
   bool isMem256_RC128X() const {
-    return isMem256() && isMemIndexReg(X86::XMM0, X86::XMM31);
+    return isMem256() && X86II::isXMMReg(Mem.IndexReg);
   }
   bool isMem256_RC256X() const {
-    return isMem256() && isMemIndexReg(X86::YMM0, X86::YMM31);
+    return isMem256() && X86II::isYMMReg(Mem.IndexReg);
   }
   bool isMem256_RC512() const {
-    return isMem256() && isMemIndexReg(X86::ZMM0, X86::ZMM31);
+    return isMem256() && X86II::isZMMReg(Mem.IndexReg);
   }
   bool isMem512_RC256X() const {
-    return isMem512() && isMemIndexReg(X86::YMM0, X86::YMM31);
+    return isMem512() && X86II::isYMMReg(Mem.IndexReg);
   }
   bool isMem512_RC512() const {
-    return isMem512() && isMemIndexReg(X86::ZMM0, X86::ZMM31);
+    return isMem512() && X86II::isZMMReg(Mem.IndexReg);
   }
   bool isMem512_GR16() const {
     if (!isMem512())
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h b/llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h
index 1e5a3606f33a6fc..4d35bc34cf6bfef 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h
@@ -1182,11 +1182,27 @@ namespace X86II {
     }
   }
 
+  inline bool isXMMReg(unsigned RegNo) {
+    return (RegNo >= X86::XMM0 && RegNo <= X86::XMM15) ||
+           (RegNo >= X86::XMM16 && RegNo <= X86::XMM31);
+  }
+
+  inline bool isYMMReg(unsigned RegNo) {
+    return (RegNo >= X86::YMM0 && RegNo <= X86::YMM15) ||
+           (RegNo >= X86::YMM16 && RegNo <= X86::YMM31);
+  }
+
+  inline bool isZMMReg(unsigned RegNo) {
+    return RegNo >= X86::ZMM0 && RegNo <= X86::ZMM31;
+  }
+
   /// \returns true if the MachineOperand is a x86-64 extended (r8 or
   /// higher) register,  e.g. r8, xmm8, xmm13, etc.
   inline bool isX86_64ExtendedReg(unsigned RegNo) {
-    if ((RegNo >= X86::XMM8 && RegNo <= X86::XMM31) ||
-        (RegNo >= X86::YMM8 && RegNo <= X86::YMM31) ||
+    if ((RegNo >= X86::XMM8 && RegNo <= X86::XMM15) ||
+        (RegNo >= X86::XMM16 && RegNo <= X86::XMM31) ||
+        (RegNo >= X86::YMM8 && RegNo <= X86::YMM15) ||
+        (RegNo >= X86::YMM16 && RegNo <= X86::YMM31) ||
         (RegNo >= X86::ZMM8 && RegNo <= X86::ZMM31))
       return true;
 
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp
index 031ba9f87acbbfd..ee82faebb57e6ce 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp
@@ -234,11 +234,11 @@ using namespace llvm;
   CASE_AVX_INS_COMMON(Inst##SS4, , mr_Int)
 
 static unsigned getVectorRegSize(unsigned RegNo) {
-  if (X86::ZMM0 <= RegNo && RegNo <= X86::ZMM31)
+  if (X86II::isZMMReg(RegNo))
     return 512;
-  if (X86::YMM0 <= RegNo && RegNo <= X86::YMM31)
+  if (X86II::isYMMReg(RegNo))
     return 256;
-  if (X86::XMM0 <= RegNo && RegNo <= X86::XMM31)
+  if (X86II::isXMMReg(RegNo))
     return 128;
   if (X86::MM0 <= RegNo && RegNo <= X86::MM7)
     return 64;
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.cpp b/llvm/lib/Target/X86/X86RegisterInfo.cpp
index 3504ca2b5743f88..28a134cff9beaee 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.cpp
+++ b/llvm/lib/Target/X86/X86RegisterInfo.cpp
@@ -604,8 +604,8 @@ BitVector X86RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
     }
   }
   if (!Is64Bit || !MF.getSubtarget<X86Subtarget>().hasAVX512()) {
-    for (unsigned n = 16; n != 32; ++n) {
-      for (MCRegAliasIterator AI(X86::XMM0 + n, this, true); AI.isValid(); ++AI)
+    for (unsigned n = 0; n != 16; ++n) {
+      for (MCRegAliasIterator AI(X86::XMM16 + n, this, true); AI.isValid(); ++AI)
         Reserved.set(*AI);
     }
   }
@@ -616,6 +616,32 @@ BitVector X86RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
   return Reserved;
 }
 
+unsigned X86RegisterInfo::getNumSupportedRegs(const MachineFunction &MF) const {
+  const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();
+  // All existing Intel CPUs that support AMX support AVX512 and all existing
+  // Intel CPUs that support APX support AMX. AVX512 implies AVX.
+  //
+  // We enumerate the registers in X86GenRegisterInfo.inc in this order:
+  //
+  // Registers before AVX512,
+  // AVX512 registers (X/YMM16-31, ZMM0-31, K registers)
+  // AMX registers (TMM)
+  // APX registers (R16-R31)
+  //
+  // and try to return the minimum number of registers supported by the target.
+
+  bool HasAVX = ST.hasAVX();
+  bool HasAVX512 = ST.hasAVX512();
+  bool HasAMX = ST.hasAMXTILE();
+  if (HasAMX)
+    return X86::TMM7 + 1;
+  if (HasAVX512)
+    return X86::K6_K7 + 1;
+  if (HasAVX)
+    return X86::YMM15 + 1;
+  return X86::YMM0;
+}
+
 bool X86RegisterInfo::isArgumentRegister(const MachineFunction &MF,
                                          MCRegister Reg) const {
   const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.h b/llvm/lib/Target/X86/X86RegisterInfo.h
index 0671f79676009ee..7296a5f021e4ad4 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.h
+++ b/llvm/lib/Target/X86/X86RegisterInfo.h
@@ -51,6 +51,9 @@ class X86RegisterInfo final : public X86GenRegisterInfo {
 public:
   explicit X86RegisterInfo(const Triple &TT);
 
+  /// Return the number of registers for the function.
+  unsigned getNumSupportedRegs(const MachineFunction &MF) const override;
+
   // FIXME: This should be tablegen'd like getDwarfRegNum is
   int getSEHRegNum(unsigned i) const;
 
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.td b/llvm/lib/Target/X86/X86RegisterInfo.td
index 81b7597cc8ea5c0..898a3f97e5236df 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.td
+++ b/llvm/lib/Target/X86/X86RegisterInfo.td
@@ -223,6 +223,8 @@ def XMM13: X86Reg<"xmm13", 13>, DwarfRegNum<[30, -2, -2]>;
 def XMM14: X86Reg<"xmm14", 14>, DwarfRegNum<[31, -2, -2]>;
 def XMM15: X86Reg<"xmm15", 15>, DwarfRegNum<[32, -2, -2]>;
 
+let PositionOrder = 2 in {
+// XMM16-31 registers, used by AVX-512 instructions.
 def XMM16:  X86Reg<"xmm16", 16>, DwarfRegNum<[67, -2, -2]>;
 def XMM17:  X86Reg<"xmm17", 17>, DwarfRegNum<[68, -2, -2]>;
 def XMM18:  X86Reg<"xmm18", 18>, DwarfRegNum<[69, -2, -2]>;
@@ -239,27 +241,51 @@ def XMM28:  X86Reg<"xmm28", 28>, DwarfRegNum<[79, -2, -2]>;
 def XMM29:  X86Reg<"xmm29", 29>, DwarfRegNum<[80, -2, -2]>;
 def XMM30:  X86Reg<"xmm30", 30>, DwarfRegNum<[81, -2, -2]>;
 def XMM31:  X86Reg<"xmm31", 31>, DwarfRegNum<[82, -2, -2]>;
+}
 
 // YMM0-15 registers, used by AVX instructions and
 // YMM16-31 registers, used by AVX-512 instructions.
-let SubRegIndices = [sub_xmm] in {
-  foreach  Index = 0-31 in {
+let SubRegIndices = [sub_xmm], PositionOrder = 1 in {
+  foreach  Index = 0-15 in {
+    def YMM#Index : X86Reg<"ymm"#Index, Index, [!cast<X86Reg>("XMM"#Index)]>,
+                    DwarfRegAlias<!cast<X86Reg>("XMM"#Index)>;
+  }
+}
+let SubRegIndices = [sub_xmm], PositionOrder = 2 in {
+  foreach  Index = 16-31 in {
     def YMM#Index : X86Reg<"ymm"#Index, Index, [!cast<X86Reg>("XMM"#Index)]>,
                     DwarfRegAlias<!cast<X86Reg>("XMM"#Index)>;
   }
 }
 
+
 // ZMM Registers, used by AVX-512 instructions.
-let SubRegIndices = [sub_ymm] in {
+let SubRegIndices = [sub_ymm], PositionOrder = 2 in {
   foreach  Index = 0-31 in {
     def ZMM#Index : X86Reg<"zmm"#Index, Index, [!cast<X86Reg>("YMM"#Index)]>,
                     DwarfRegAlias<!cast<X86Reg>("XMM"#Index)>;
   }
 }
 
+let PositionOrder = 2 in {
+// Mask Registers, used by AVX-512 instructions.
+def K0 : X86Reg<"k0", 0>, DwarfRegNum<[118,  93,  93]>;
+def K1 : X86Reg<"k1", 1>, DwarfRegNum<[119,  94,  94]>;
+def K2 : X86Reg<"k2", 2>, DwarfRegNum<[120,  95,  95]>;
+def K3 : X86Reg<"k3", 3>, DwarfRegNum<[121,  96,  96]>;
+def K4 : X86Reg<"k4", 4>, DwarfRegNum<[122,  97,  97]>;
+def K5 : X86Reg<"k5", 5>, DwarfRegNum<[123,  98,  98]>;
+def K6 : X86Reg<"k6", 6>, DwarfRegNum<[124,  99,  99]>;
+def K7 : X86Reg<"k7", 7>, DwarfRegNum<[125, 100, 100]>;
+// Mask register pairs
+def KPAIRS : RegisterTuples<[sub_mask_0, sub_mask_1],
+                             [(add K0, K2, K4, K6), (add K1, K3, K5, K7)]>;
+}
+
+// TMM registers, used by AMX instructions.
+let PositionOrder = 3 in {
 // Tile config registers.
 def TMMCFG: X86Reg<"tmmcfg", 0>;
-
 // Tile "registers".
 def TMM0:  X86Reg<"tmm0",   0>;
 def TMM1:  X86Reg<"tmm1",   1>;
@@ -269,16 +295,7 @@ def TMM4:  X86Reg<"tmm4",   4>;
 def TMM5:  X86Reg<"tmm5",   5>;
 def TMM6:  X86Reg<"tmm6",   6>;
 def TMM7:  X86Reg<"tmm7",   7>;
-
-// Mask Registers, used by AVX-512 instructions.
-def K0 : X86Reg<"k0", 0>, DwarfRegNum<[118,  93,  93]>;
-def K1 : X86Reg<"k1", 1>, DwarfRegNum<[119,  94,  94]>;
-def K2 : X86Reg<"k2", 2>, DwarfRegNum<[120,  95,  95]>;
-def K3 : X86Reg<"k3", 3>, DwarfRegNum<[121,  96,  96]>;
-def K4 : X86Reg<"k4", 4>, DwarfRegNum<[122,  97,  97]>;
-def K5 : X86Reg<"k5", 5>, DwarfRegNum<[123,  98,  98]>;
-def K6 : X86Reg<"k6", 6>, DwarfRegNum<[124,  99,  99]>;
-def K7 : X86Reg<"k7", 7>, DwarfRegNum<[125, 100, 100]>;
+}
 
 // Floating point stack registers. These don't map one-to-one to the FP
 // pseudo registers, but we still mark them as aliasing FP registers. That
@@ -627,10 +644,6 @@ def VK16    : RegisterClass<"X86", [v16i1], 16, (add VK8)> {let Size = 16;}
 def VK32    : RegisterClass<"X86", [v32i1], 32, (add VK16)> {let Size = 32;}
 def VK64    : RegisterClass<"X86", [v64i1], 64, (add VK32)> {let Size = 64;}
 
-// Mask register pairs
-def KPAIRS : RegisterTuples<[sub_mask_0, sub_mask_1],
-                             [(add K0, K2, K4, K6), (add K1, K3, K5, K7)]>;
-
 def VK1PAIR   : RegisterClass<"X86", [untyped], 16, (add KPAIRS)> {let Size = 32;}
 def VK2PAIR   : RegisterClass<"X86", [untyped], 16, (add KPAIRS)> {let Size = 32;}
 def VK4PAIR   : RegisterClass<"X86", [untyped], 16, (add KPAIRS)> {let Size = 32;}
diff --git a/llvm/test/CodeGen/X86/ipra-reg-usage.ll b/llvm/test/CodeGen/X86/ipra-reg-usage.ll
index 36c4d6eff001885..4d0c94125c761c9 100644
--- a/llvm/test/CodeGen/X86/ipra-reg-usage.ll
+++ b/llvm/test/CodeGen/X86/ipra-reg-usage.ll
@@ -3,7 +3,7 @@
 target triple = "x86_64-unknown-unknown"
 declare void @bar1()
 define preserve_allcc void @foo()#0 {
-; CHECK: foo Clobbered Registers: $cs $df $ds $eflags $eip $eiz $es $esp $fpcw $fpsw $fs $fs_base $gs $gs_base $hip $hsp $ip $mxcsr $rflags $rip $riz $rsp $sp $sph $spl $ss $ssp $tmmcfg $_eflags $cr0 $cr1 $cr2 $cr3 $cr4 $cr5 $cr6 $cr7 $cr8 $cr9 $cr10 $cr11 $cr12 $cr13 $cr14 $cr15 $dr0 $dr1 $dr2 $dr3 $dr4 $dr5 $dr6 $dr7 $dr8 $dr9 $dr10 $dr11 $dr12 $dr13 $dr14 $dr15 $fp0 $fp1 $fp2 $fp3 $fp4 $fp5 $fp6 $fp7 $k0 $k1 $k2 $k3 $k4 $k5 $k6 $k7 $mm0 $mm1 $mm2 $mm3 $mm4 $mm5 $mm6 $mm7 $r11 $st0 $st1 $st2 $st3 $st4 $st5 $st6 $st7 $tmm0 $tmm1 $tmm2 $tmm3 $tmm4 $tmm5 $tmm6 $tmm7 $xmm16 $xmm17 $xmm18 $xmm19 $xmm20 $xmm21 $xmm22 $xmm23 $xmm24 $xmm25 $xmm26 $xmm27 $xmm28 $xmm29 $xmm30 $xmm31 $ymm0 $ymm1 $ymm2 $ymm3 $ymm4 $ymm5 $ymm6 $ymm7 $ymm8 $ymm9 $ymm10 $ymm11 $ymm12 $ymm13 $ymm14 $ymm15 $ymm16 $ymm17 $ymm18 $ymm19 $ymm20 $ymm21 $ymm22 $ymm23 $ymm24 $ymm25 $ymm26 $ymm27 $ymm28 $ymm29 $ymm30 $ymm31 $zmm0 $zmm1 $zmm2 $zmm3 $zmm4 $zmm5 $zmm6 $zmm7 $zmm8 $zmm9 $zmm10 $zmm11 $zmm12 $zmm13 $zmm14 $zmm15 $zmm16 $zmm17 $zmm18 $zmm19 $zmm20 $zmm21 $zmm22 $zmm23 $zmm24 $zmm25 $zmm26 $zmm27 $zmm28 $zmm29 $zmm30 $zmm31 $r11b $r11bh $r11d $r11w $r11wh $k0_k1 $k2_k3 $k4_k5 $k6_k7
+; CHECK: foo Clobbered Registers: $cs $df $ds $eflags $eip $eiz $es $esp $fpcw $fpsw $fs $fs_base $gs $gs_base $hip $hsp $ip $mxcsr $rflags $rip $riz $rsp $sp $sph $spl $ss $ssp $_eflags $cr0 $cr1 $cr2 $cr3 $cr4 $cr5 $cr6 $cr7 $cr8 $cr9 $cr10 $cr11 $cr12 $cr13 $cr14 $cr15 $dr0 $dr1 $dr2 $dr3 $dr4 $dr5 $dr6 $dr7 $dr8 $dr9 $dr10 $dr11 $dr12 $dr13 $dr14 $dr15 $fp0 $fp1 $fp2 $fp3 $fp4 $fp5 $fp6 $fp7 $mm0 $mm1 $mm2 $mm3 $mm4 $mm5 $mm6 $mm7 $r11 $st0 $st1 $st2 $st3 $st4 $st5 $st6 $st7 $r11b $r11bh $r11d $r11w $r11wh $ymm0 $ymm1 $ymm2 $ymm3 $ymm4 $ymm5 $ymm6 $ymm7 $ymm8 $ymm9 $ymm10 $ymm11 $ymm12 $ymm13 $ymm14 $ymm15 $k0 $k1 $k2 $k3 $k4 $k5 $k6 $k7 $xmm16 $xmm17 $xmm18 $xmm19 $xmm20 $xmm21 $xmm22 $xmm23 $xmm24 $xmm25 $xmm26 $xmm27 $xmm28 $xmm29 $xmm30 $xmm31 $ymm16 $ymm17 $ymm18 $ymm19 $ymm20 $ymm21 $ymm22 $ymm23 $ymm24 $ymm25 $ymm26 $ymm27 $ymm28 $ymm29 $ymm30 $ymm31 $zmm0 $zmm1 $zmm2 $zmm3 $zmm4 $zmm5 $zmm6 $zmm7 $zmm8 $zmm9 $zmm10 $zmm11 $zmm12 $zmm13 $zmm14 $zmm15 $zmm16 $zmm17 $zmm18 $zmm19 $zmm20 $zmm21 $zmm22 $zmm23 $zmm24 $zmm25 $zmm26 $zmm27 $zmm28 $zmm29 $zmm30 $zmm31 $k0_k1 $k2_k3 $k4_k5 $k6_k7 $tmmcfg $tmm0 $tmm1 $tmm2 $tmm3 $tmm4 $tmm5 $tmm6 $tmm7
   call void @bar1()
   call void @bar2()
   ret void
diff --git a/llvm/utils/TableGen/CodeGenRegisters.cpp b/llvm/utils/TableGen/CodeGenRegisters.cpp
index 2d5f5c841a174af..b5162dda544f7fb 100644
--- a/llvm/utils/TableGen/CodeGenRegisters.cpp
+++ b/llvm/utils/TableGen/CodeGenRegisters.cpp
@@ -1175,24 +1175,21 @@ CodeGenRegBank::CodeGenRegBank(RecordKeeper &Records,
   for (aut...
[truncated]

RKSimon · 2023-10-25T17:10:18Z

Please can you fix the clang-format fails

llvm/utils/TableGen/CodeGenRegisters.cpp

KanRobert · 2023-10-26T02:29:36Z

Please can you fix the clang-format fails

Done.

llvm/lib/Target/X86/X86RegisterInfo.cpp

KanRobert · 2023-10-26T02:35:27Z

llvm/lib/Target/X86/X86RegisterInfo.cpp

+  //
+  // Registers before AVX512,
+  // AVX512 registers (X/YMM16-31, ZMM0-31, K registers)
+  // AMX registers (TMM)


We can also see the difference by the update in llvm/test/CodeGen/X86/ipra-reg-usage.ll

llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h

topperc · 2023-10-26T03:09:03Z

llvm/lib/CodeGen/LiveVariables.cpp

  // Call HandlePhysRegKill() for all live registers clobbered by Mask.
  // Clobbered registers are always dead, sp there is no need to use
  // HandlePhysRegDef().
-  for (unsigned Reg = 1, NumRegs = TRI->getNumRegs(); Reg != NumRegs; ++Reg) {
+  for (unsigned Reg = 1; Reg != NumRegs; ++Reg) {


I want to take a step back.

Have you tried checking MRI->isReserved(Reg) here? Does that do anything to improve compile time?

Where is the time spent?

Am I reading this right, that it calls HandlePhysRegKill on the super register anytime it finds one of the subregisters is clobbered? Even if its already called it before?

I did an analysis on the spent time there #67702 (comment).

The extra time is spent on iterations over the new registers and new register classes. LiveVariables is one of the passes with the greatest impact b/c it iterates them for each basic block. Besides, for LiveVariables and some other passes, the new registers make a bigger regmask and regset, and then operations on the regmask and regset take more time too, including allocation and loopup, etc.

MRI->isReserved(Reg)? I think it's a AArch64 function only and even not used by X86 at all.

Am I reading this right, that it calls HandlePhysRegKill on the super register anytime it finds one of the subregisters is clobbered? Even if its already called it before?

My understanding is same as yours.

MRI->isReserved(Reg) doesn't use the information from X86RegisterInfo::getReservedRegs?

I misunderstood your idea...

diff --git a/llvm/lib/CodeGen/LiveVariables.cpp b/llvm/lib/CodeGen/LiveVariables.cpp index 6b983b6320c7..44e127151004 100644 --- a/llvm/lib/CodeGen/LiveVariables.cpp +++ b/llvm/lib/CodeGen/LiveVariables.cpp @@ -411,6 +411,9 @@ void LiveVariables::HandleRegMask(const MachineOperand &MO) { // Clobbered registers are always dead, sp there is no need to use // HandlePhysRegDef(). for (unsigned Reg = 1, NumRegs = TRI->getNumRegs(); Reg != NumRegs; ++Reg) { + // Skip reserved regs. + if (MRI->isReserved(Reg)) + continue; // Skip dead regs. if (!PhysRegDef[Reg] && !PhysRegUse[Reg]) continue;

After testing, I find checking MRI->isReserved(Reg) does not improve compile time, instead, it will bring 0.2% instruction count regression.

I did an analysis on the spent time there #67702 (comment).

The extra time is spent on iterations over the new registers and new register classes. LiveVariables is one of the passes with the greatest impact b/c it iterates them for each basic block. Besides, for LiveVariables and some other passes, the new registers make a bigger regmask and regset, and then operations on the regmask and regset take more time too, including allocation and loopup, etc.

FYI LiveVariables is deprecated and we should work towards getting rid of it. LiveIntervals has been the replacement for ages

What do you mean by "deprecated"? I see this function is still called when compiling workloads.
And according to the comment #67702 (comment) , we have not got rid of LiveVariables completely

As in all uses of it should be removed and replaced with LiveIntervals. You still see it because nobody ever completed the work to remove it

I see. But currently LiveVariables brings about 0.3% instruction count regression when I introduce new registers in #67702. If the regression is intolerable, then it will block my APX enabling work. I have to resolve it to before the migration is complete.

How much work is left to remove LIveVariables?

@jayfoad commented in #67702. I believe he/she knows more about that than me.

KanRobert · 2023-10-30T01:45:58Z

Ping?

phoebewang

I don't see objections to this patch, so I'll approve to unblock our follow up patches.
I think as a workaround, this patch solves the compiler time regression caused by adding new registers. We can remove it once we don't need LIveVariables.

This reverts commit c3a6d36.

llvm#70222 introduces a hook to return a more accurate number of registers supported for a specific subtarget (rather than target). However, while x86 registers were reordered to allow using this, the implementation still always returned NUM_TARGET_REGS. Adjust it to return a smaller number of registers depending on availability of avx/avx512/amx.

#70222 introduced a hook to return a more accurate number of registers supported for a specific subtarget (rather than target). However, while x86 registers were reordered to allow using this, the implementation currently still always returns NUM_TARGET_REGS. Adjust it to return a smaller number of registers depending on availability of avx/avx512/amx. The actual impact of this seems to be pretty small, on the order of 0.05%.

This makes use of the more accurate register number introduced in PR llvm#70222 to avoid CFI calculations for unsupported registers.

This makes use of the more accurate register number introduced in PR #70222 to avoid CFI calculations for unsupported registers. This has basically no impact right now, but results in a 0.2% compile-time improvement at O0 when applied on top of #70958. The reason is that the extra registers that PR adds push the `BitVector` out of the `SmallVector` space, which results in an outsized impact. (This does make me wonder whether `BitVector` should accept an `N` template parameter to allow using a larger `SmallVector`...)

1. Map R16-R31 to DWARF registers 130-145. 2. Make R16-R31 caller-saved registers. 3. Make R16-31 allocatable only when feature EGPR is supported 4. Make R16-31 availabe for instructions in legacy maps 0/1 and EVEX space, except XSAVE*/XRSTOR RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4 Explanations for some seemingly unrelated changes: inline-asm-registers.mir, statepoint-invoke-ra-enter-at-end.mir: The immediate (TargetInstrInfo.cpp:1612) used for the regdef/reguse is the encoding for the register class in the enum generated by tablegen. This encoding will change any time a new register class is added. Since the number is part of the input, this means it can become stale. seh-directive-errors.s: R16-R31 makes ".seh_pushreg 17" legal musttail-varargs.ll: It seems some LLVM passes use the number of registers rather the number of allocatable registers as heuristic. This PR is to reland #67702 after #70222 in order to reduce some compile-time regression when EGPR is not used.

llvm/llvm-project#70222 introduced a hook to return a more accurate number of registers supported for a specific subtarget (rather than target). However, while x86 registers were reordered to allow using this, the implementation currently still always returns NUM_TARGET_REGS. Adjust it to return a smaller number of registers depending on availability of avx/avx512/amx. The actual impact of this seems to be pretty small, on the order of 0.05%.

KanRobert requested review from nikic, RKSimon, topperc and phoebewang October 25, 2023 15:57

KanRobert marked this pull request as ready for review October 25, 2023 16:20

llvmbot added backend:X86 llvm:regalloc labels Oct 25, 2023

phoebewang reviewed Oct 26, 2023

View reviewed changes

llvm/utils/TableGen/CodeGenRegisters.cpp Outdated Show resolved Hide resolved

phoebewang reviewed Oct 26, 2023

View reviewed changes

llvm/lib/Target/X86/X86RegisterInfo.cpp Outdated Show resolved Hide resolved

KanRobert commented Oct 26, 2023

View reviewed changes

phoebewang reviewed Oct 26, 2023

View reviewed changes

llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h Show resolved Hide resolved

phoebewang reviewed Oct 26, 2023

View reviewed changes

llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h Show resolved Hide resolved

topperc reviewed Oct 26, 2023

View reviewed changes

phoebewang approved these changes Nov 1, 2023

View reviewed changes

KanRobert added 4 commits November 1, 2023 15:57

[X86] Reorder the register

1d324d5

Update API

2e34f1b

format code

9ebb155

Add asserts and comments

3b279bc

KanRobert force-pushed the reg branch from 3ed05d2 to 3b279bc Compare November 1, 2023 08:04

KanRobert added 2 commits November 1, 2023 16:15

Leave the logic in X86RegisterInfo::getNumSupportedRegs to next commit

9d3bec6

Update non-x86 tests

c3a6d36

llvmbot added backend:ARM backend:AArch64 backend:AMDGPU labels Nov 1, 2023

KanRobert changed the title ~~[X86][NFC] Reorder the registers to reduce unnecessary iterations~~ [NFC] Reorder the registers to reduce unnecessary iterations Nov 1, 2023

Revert "Update non-x86 tests"

ee6b711

This reverts commit c3a6d36.

KanRobert removed backend:ARM backend:AArch64 backend:AMDGPU labels Nov 1, 2023

Reorder for X86 only

9814552

KanRobert changed the title ~~[NFC] Reorder the registers to reduce unnecessary iterations~~ [NFC][X86] Reorder the registers to reduce unnecessary iterations Nov 1, 2023

KanRobert merged commit 860f9e5 into llvm:main Nov 1, 2023
2 of 3 checks passed

KanRobert mentioned this pull request Nov 1, 2023

[X86] Support EGPR (R16-R31) for APX #70958

Merged

nikic mentioned this pull request Nov 8, 2023

[X86] Return more accurate getNumSupportedRegs() (NFC) #71690

Merged

nikic added a commit to nikic/llvm-project that referenced this pull request Nov 9, 2023

[CFIInstrInserter] Use number of supported registers (NFC)

a57b1ec

This makes use of the more accurate register number introduced in PR llvm#70222 to avoid CFI calculations for unsupported registers.

nikic mentioned this pull request Nov 9, 2023

[CFIInstrInserter] Use number of supported registers (NFC) #71797

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NFC][X86] Reorder the registers to reduce unnecessary iterations #70222

[NFC][X86] Reorder the registers to reduce unnecessary iterations #70222

KanRobert commented Oct 25, 2023 •

edited

github-actions bot commented Oct 25, 2023 •

edited

llvmbot commented Oct 25, 2023 •

edited

RKSimon commented Oct 25, 2023

KanRobert commented Oct 26, 2023

KanRobert Oct 26, 2023

topperc Oct 26, 2023

KanRobert Oct 26, 2023 •

edited

topperc Oct 26, 2023

KanRobert Oct 26, 2023 •

edited

arsenm Oct 26, 2023

KanRobert Oct 26, 2023 •

edited

arsenm Oct 26, 2023

KanRobert Oct 26, 2023

topperc Oct 30, 2023

KanRobert Oct 30, 2023

KanRobert commented Oct 30, 2023

phoebewang left a comment

[NFC][X86] Reorder the registers to reduce unnecessary iterations #70222

[NFC][X86] Reorder the registers to reduce unnecessary iterations #70222

Conversation

KanRobert commented Oct 25, 2023 • edited

github-actions bot commented Oct 25, 2023 • edited

llvmbot commented Oct 25, 2023 • edited

RKSimon commented Oct 25, 2023

KanRobert commented Oct 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KanRobert Oct 26, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KanRobert Oct 26, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KanRobert Oct 26, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KanRobert commented Oct 30, 2023

phoebewang left a comment

Choose a reason for hiding this comment

KanRobert commented Oct 25, 2023 •

edited

github-actions bot commented Oct 25, 2023 •

edited

llvmbot commented Oct 25, 2023 •

edited

KanRobert Oct 26, 2023 •

edited

KanRobert Oct 26, 2023 •

edited

KanRobert Oct 26, 2023 •

edited