Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NFC][X86] Reorder the registers to reduce unnecessary iterations #70222

Merged
merged 8 commits into from
Nov 1, 2023

Conversation

KanRobert
Copy link
Contributor

@KanRobert KanRobert commented Oct 25, 2023

  • Introduce field PositionOrder for class Register and RegisterTuples
  • If register A's PositionOrder < register B's PositionOrder, then A is placed before B in the enum in X86GenRegisterInfo.inc
  • The new order of registers in the enum for X86 will be
    1. Registers before AVX512,
    2. AVX512 registers (X/YMM16-31, ZMM0-31, K registers)
    3. AMX registers (TMM)
    4. APX registers (R16-R31)
  • Add a new target hook getNumSupportedRegs() to return the number of registers for the function (may overestimate).
  • Replace getNumRegs() with getNumSupportedRegs() in LiveVariables to eliminate iterations on unsupported registers

This patch can reduce 0.3% instruction count regression for sqlite3 during compile-stage (O3) by not iterating on APX registers
for #67702

@github-actions
Copy link

github-actions bot commented Oct 25, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@KanRobert KanRobert marked this pull request as ready for review October 25, 2023 16:20
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 25, 2023

@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-regalloc

Author: Shengchen Kan (KanRobert)

Changes
  • Introduce field PositionOrder for class Register and RegisterTuples
  • If register A's PositionOrder < register B's PositionOrder, then A is placed before B in the enum in X86GenRegisterInfo.inc
  • The new order of registers in the enum will be
    1. Registers before AVX512,
    2. AVX512 registers (X/YMM16-31, ZMM0-31, K registers)
    3. AMX registers (TMM)
    4. APX registers (R16-R31)
  • Add a new target hook getNumSupportedRegs() to return the number of registers for the function (may overestimate).
  • Replace getNumRegs() with getNumSupportedRegs() in LiveVariables to eliminate iterations on unsupported registers

This patch can reduce 0.2% instruction count for sqlite3 during compile-stage (O3) by not iterating on AVX, AVX512 and AMX registers.

Motivation of this patch: Reduce the regression of #67702


Patch is 21.10 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70222.diff

13 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/LiveVariables.h (+2-2)
  • (modified) llvm/include/llvm/CodeGen/TargetRegisterInfo.h (+5)
  • (modified) llvm/include/llvm/TableGen/Record.h (+5)
  • (modified) llvm/include/llvm/Target/Target.td (+8)
  • (modified) llvm/lib/CodeGen/LiveVariables.cpp (+10-8)
  • (modified) llvm/lib/Target/X86/AsmParser/X86Operand.h (+8-8)
  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h (+18-2)
  • (modified) llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp (+3-3)
  • (modified) llvm/lib/Target/X86/X86RegisterInfo.cpp (+28-2)
  • (modified) llvm/lib/Target/X86/X86RegisterInfo.h (+3)
  • (modified) llvm/lib/Target/X86/X86RegisterInfo.td (+31-18)
  • (modified) llvm/test/CodeGen/X86/ipra-reg-usage.ll (+1-1)
  • (modified) llvm/utils/TableGen/CodeGenRegisters.cpp (+8-11)
diff --git a/llvm/include/llvm/CodeGen/LiveVariables.h b/llvm/include/llvm/CodeGen/LiveVariables.h
index 9ed4c7bdf7b1771..0c3414531ab1b0c 100644
--- a/llvm/include/llvm/CodeGen/LiveVariables.h
+++ b/llvm/include/llvm/CodeGen/LiveVariables.h
@@ -147,7 +147,7 @@ class LiveVariables : public MachineFunctionPass {
   bool HandlePhysRegKill(Register Reg, MachineInstr *MI);
 
   /// HandleRegMask - Call HandlePhysRegKill for all registers clobbered by Mask.
-  void HandleRegMask(const MachineOperand&);
+  void HandleRegMask(const MachineOperand&, unsigned);
 
   void HandlePhysRegUse(Register Reg, MachineInstr &MI);
   void HandlePhysRegDef(Register Reg, MachineInstr *MI,
@@ -170,7 +170,7 @@ class LiveVariables : public MachineFunctionPass {
   /// is coming from.
   void analyzePHINodes(const MachineFunction& Fn);
 
-  void runOnInstr(MachineInstr &MI, SmallVectorImpl<unsigned> &Defs);
+  void runOnInstr(MachineInstr &MI, SmallVectorImpl<unsigned> &Defs, unsigned NumRegs);
 
   void runOnBlock(MachineBasicBlock *MBB, unsigned NumRegs);
 public:
diff --git a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
index 5bf27e40eee8909..337fab735a09522 100644
--- a/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
+++ b/llvm/include/llvm/CodeGen/TargetRegisterInfo.h
@@ -266,6 +266,11 @@ class TargetRegisterInfo : public MCRegisterInfo {
   virtual ~TargetRegisterInfo();
 
 public:
+  /// Return the number of registers for the function. (may overestimate)
+  virtual unsigned getNumSupportedRegs(const MachineFunction &) const {
+    return getNumRegs();
+  }
+
   // Register numbers can represent physical registers, virtual registers, and
   // sometimes stack slots. The unsigned values are divided into these ranges:
   //
diff --git a/llvm/include/llvm/TableGen/Record.h b/llvm/include/llvm/TableGen/Record.h
index d1023a73d606847..ba0d0de79851b9d 100644
--- a/llvm/include/llvm/TableGen/Record.h
+++ b/llvm/include/llvm/TableGen/Record.h
@@ -2155,6 +2155,11 @@ struct LessRecordRegister {
   };
 
   bool operator()(const Record *Rec1, const Record *Rec2) const {
+    int64_t LHSPositionOrder = Rec1->getValueAsInt("PositionOrder");
+    int64_t RHSPositionOrder = Rec2->getValueAsInt("PositionOrder");
+    if (LHSPositionOrder != RHSPositionOrder)
+      return LHSPositionOrder < RHSPositionOrder;
+
     RecordParts LHSParts(StringRef(Rec1->getName()));
     RecordParts RHSParts(StringRef(Rec2->getName()));
 
diff --git a/llvm/include/llvm/Target/Target.td b/llvm/include/llvm/Target/Target.td
index 8ad2066aa83de9e..3b1d2f45267e9e0 100644
--- a/llvm/include/llvm/Target/Target.td
+++ b/llvm/include/llvm/Target/Target.td
@@ -205,6 +205,10 @@ class Register<string n, list<string> altNames = []> {
   // isConstant - This register always holds a constant value (e.g. the zero
   // register in architectures such as MIPS)
   bit isConstant = false;
+
+  /// PositionOrder - Indicate tablegen to place the newly added register at a later
+  /// position to avoid iterations on them on unsupported target.
+  int PositionOrder = 0;
 }
 
 // RegisterWithSubRegs - This can be used to define instances of Register which
@@ -417,6 +421,10 @@ class RegisterTuples<list<SubRegIndex> Indices, list<dag> Regs,
 
   // List of asm names for the generated tuple registers.
   list<string> RegAsmNames = RegNames;
+
+  // PositionOrder - Indicate tablegen to place the newly added register at a later
+  // position to avoid iterations on them on unsupported target.
+  int PositionOrder = 0;
 }
 
 // RegisterCategory - This class is a list of RegisterClasses that belong to a
diff --git a/llvm/lib/CodeGen/LiveVariables.cpp b/llvm/lib/CodeGen/LiveVariables.cpp
index 6b983b6320c711f..b85526cfb380b6a 100644
--- a/llvm/lib/CodeGen/LiveVariables.cpp
+++ b/llvm/lib/CodeGen/LiveVariables.cpp
@@ -406,11 +406,11 @@ bool LiveVariables::HandlePhysRegKill(Register Reg, MachineInstr *MI) {
   return true;
 }
 
-void LiveVariables::HandleRegMask(const MachineOperand &MO) {
+void LiveVariables::HandleRegMask(const MachineOperand &MO, unsigned NumRegs) {
   // Call HandlePhysRegKill() for all live registers clobbered by Mask.
   // Clobbered registers are always dead, sp there is no need to use
   // HandlePhysRegDef().
-  for (unsigned Reg = 1, NumRegs = TRI->getNumRegs(); Reg != NumRegs; ++Reg) {
+  for (unsigned Reg = 1; Reg != NumRegs; ++Reg) {
     // Skip dead regs.
     if (!PhysRegDef[Reg] && !PhysRegUse[Reg])
       continue;
@@ -421,7 +421,8 @@ void LiveVariables::HandleRegMask(const MachineOperand &MO) {
     // This avoids needless implicit operands.
     unsigned Super = Reg;
     for (MCPhysReg SR : TRI->superregs(Reg))
-      if ((PhysRegDef[SR] || PhysRegUse[SR]) && MO.clobbersPhysReg(SR))
+      if (SR < NumRegs && (PhysRegDef[SR] || PhysRegUse[SR]) &&
+          MO.clobbersPhysReg(SR))
         Super = SR;
     HandlePhysRegKill(Super, nullptr);
   }
@@ -478,7 +479,8 @@ void LiveVariables::UpdatePhysRegDefs(MachineInstr &MI,
 }
 
 void LiveVariables::runOnInstr(MachineInstr &MI,
-                               SmallVectorImpl<unsigned> &Defs) {
+                               SmallVectorImpl<unsigned> &Defs,
+                               unsigned NumRegs) {
   assert(!MI.isDebugOrPseudoInstr());
   // Process all of the operands of the instruction...
   unsigned NumOperandsToProcess = MI.getNumOperands();
@@ -527,7 +529,7 @@ void LiveVariables::runOnInstr(MachineInstr &MI,
 
   // Process all masked registers. (Call clobbers).
   for (unsigned Mask : RegMasks)
-    HandleRegMask(MI.getOperand(Mask));
+    HandleRegMask(MI.getOperand(Mask), NumRegs);
 
   // Process all defs.
   for (unsigned MOReg : DefRegs) {
@@ -539,7 +541,7 @@ void LiveVariables::runOnInstr(MachineInstr &MI,
   UpdatePhysRegDefs(MI, Defs);
 }
 
-void LiveVariables::runOnBlock(MachineBasicBlock *MBB, const unsigned NumRegs) {
+void LiveVariables::runOnBlock(MachineBasicBlock *MBB, unsigned NumRegs) {
   // Mark live-in registers as live-in.
   SmallVector<unsigned, 4> Defs;
   for (const auto &LI : MBB->liveins()) {
@@ -556,7 +558,7 @@ void LiveVariables::runOnBlock(MachineBasicBlock *MBB, const unsigned NumRegs) {
       continue;
     DistanceMap.insert(std::make_pair(&MI, Dist++));
 
-    runOnInstr(MI, Defs);
+    runOnInstr(MI, Defs, NumRegs);
   }
 
   // Handle any virtual assignments from PHI nodes which might be at the
@@ -597,7 +599,7 @@ bool LiveVariables::runOnMachineFunction(MachineFunction &mf) {
   MRI = &mf.getRegInfo();
   TRI = MF->getSubtarget().getRegisterInfo();
 
-  const unsigned NumRegs = TRI->getNumRegs();
+  const unsigned NumRegs = TRI->getNumSupportedRegs(mf);
   PhysRegDef.assign(NumRegs, nullptr);
   PhysRegUse.assign(NumRegs, nullptr);
   PHIVarInfo.resize(MF->getNumBlockIDs());
diff --git a/llvm/lib/Target/X86/AsmParser/X86Operand.h b/llvm/lib/Target/X86/AsmParser/X86Operand.h
index 4661e73c3ef8e86..641158cb351fc4f 100644
--- a/llvm/lib/Target/X86/AsmParser/X86Operand.h
+++ b/llvm/lib/Target/X86/AsmParser/X86Operand.h
@@ -357,28 +357,28 @@ struct X86Operand final : public MCParsedAsmOperand {
   }
 
   bool isMem64_RC128X() const {
-    return isMem64() && isMemIndexReg(X86::XMM0, X86::XMM31);
+    return isMem64() && X86II::isXMMReg(Mem.IndexReg);
   }
   bool isMem128_RC128X() const {
-    return isMem128() && isMemIndexReg(X86::XMM0, X86::XMM31);
+    return isMem128() && X86II::isXMMReg(Mem.IndexReg);
   }
   bool isMem128_RC256X() const {
-    return isMem128() && isMemIndexReg(X86::YMM0, X86::YMM31);
+    return isMem128() && X86II::isYMMReg(Mem.IndexReg);
   }
   bool isMem256_RC128X() const {
-    return isMem256() && isMemIndexReg(X86::XMM0, X86::XMM31);
+    return isMem256() && X86II::isXMMReg(Mem.IndexReg);
   }
   bool isMem256_RC256X() const {
-    return isMem256() && isMemIndexReg(X86::YMM0, X86::YMM31);
+    return isMem256() && X86II::isYMMReg(Mem.IndexReg);
   }
   bool isMem256_RC512() const {
-    return isMem256() && isMemIndexReg(X86::ZMM0, X86::ZMM31);
+    return isMem256() && X86II::isZMMReg(Mem.IndexReg);
   }
   bool isMem512_RC256X() const {
-    return isMem512() && isMemIndexReg(X86::YMM0, X86::YMM31);
+    return isMem512() && X86II::isYMMReg(Mem.IndexReg);
   }
   bool isMem512_RC512() const {
-    return isMem512() && isMemIndexReg(X86::ZMM0, X86::ZMM31);
+    return isMem512() && X86II::isZMMReg(Mem.IndexReg);
   }
   bool isMem512_GR16() const {
     if (!isMem512())
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h b/llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h
index 1e5a3606f33a6fc..4d35bc34cf6bfef 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h
@@ -1182,11 +1182,27 @@ namespace X86II {
     }
   }
 
+  inline bool isXMMReg(unsigned RegNo) {
+    return (RegNo >= X86::XMM0 && RegNo <= X86::XMM15) ||
+           (RegNo >= X86::XMM16 && RegNo <= X86::XMM31);
+  }
+
+  inline bool isYMMReg(unsigned RegNo) {
+    return (RegNo >= X86::YMM0 && RegNo <= X86::YMM15) ||
+           (RegNo >= X86::YMM16 && RegNo <= X86::YMM31);
+  }
+
+  inline bool isZMMReg(unsigned RegNo) {
+    return RegNo >= X86::ZMM0 && RegNo <= X86::ZMM31;
+  }
+
   /// \returns true if the MachineOperand is a x86-64 extended (r8 or
   /// higher) register,  e.g. r8, xmm8, xmm13, etc.
   inline bool isX86_64ExtendedReg(unsigned RegNo) {
-    if ((RegNo >= X86::XMM8 && RegNo <= X86::XMM31) ||
-        (RegNo >= X86::YMM8 && RegNo <= X86::YMM31) ||
+    if ((RegNo >= X86::XMM8 && RegNo <= X86::XMM15) ||
+        (RegNo >= X86::XMM16 && RegNo <= X86::XMM31) ||
+        (RegNo >= X86::YMM8 && RegNo <= X86::YMM15) ||
+        (RegNo >= X86::YMM16 && RegNo <= X86::YMM31) ||
         (RegNo >= X86::ZMM8 && RegNo <= X86::ZMM31))
       return true;
 
diff --git a/llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp b/llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp
index 031ba9f87acbbfd..ee82faebb57e6ce 100644
--- a/llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp
+++ b/llvm/lib/Target/X86/MCTargetDesc/X86InstComments.cpp
@@ -234,11 +234,11 @@ using namespace llvm;
   CASE_AVX_INS_COMMON(Inst##SS4, , mr_Int)
 
 static unsigned getVectorRegSize(unsigned RegNo) {
-  if (X86::ZMM0 <= RegNo && RegNo <= X86::ZMM31)
+  if (X86II::isZMMReg(RegNo))
     return 512;
-  if (X86::YMM0 <= RegNo && RegNo <= X86::YMM31)
+  if (X86II::isYMMReg(RegNo))
     return 256;
-  if (X86::XMM0 <= RegNo && RegNo <= X86::XMM31)
+  if (X86II::isXMMReg(RegNo))
     return 128;
   if (X86::MM0 <= RegNo && RegNo <= X86::MM7)
     return 64;
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.cpp b/llvm/lib/Target/X86/X86RegisterInfo.cpp
index 3504ca2b5743f88..28a134cff9beaee 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.cpp
+++ b/llvm/lib/Target/X86/X86RegisterInfo.cpp
@@ -604,8 +604,8 @@ BitVector X86RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
     }
   }
   if (!Is64Bit || !MF.getSubtarget<X86Subtarget>().hasAVX512()) {
-    for (unsigned n = 16; n != 32; ++n) {
-      for (MCRegAliasIterator AI(X86::XMM0 + n, this, true); AI.isValid(); ++AI)
+    for (unsigned n = 0; n != 16; ++n) {
+      for (MCRegAliasIterator AI(X86::XMM16 + n, this, true); AI.isValid(); ++AI)
         Reserved.set(*AI);
     }
   }
@@ -616,6 +616,32 @@ BitVector X86RegisterInfo::getReservedRegs(const MachineFunction &MF) const {
   return Reserved;
 }
 
+unsigned X86RegisterInfo::getNumSupportedRegs(const MachineFunction &MF) const {
+  const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();
+  // All existing Intel CPUs that support AMX support AVX512 and all existing
+  // Intel CPUs that support APX support AMX. AVX512 implies AVX.
+  //
+  // We enumerate the registers in X86GenRegisterInfo.inc in this order:
+  //
+  // Registers before AVX512,
+  // AVX512 registers (X/YMM16-31, ZMM0-31, K registers)
+  // AMX registers (TMM)
+  // APX registers (R16-R31)
+  //
+  // and try to return the minimum number of registers supported by the target.
+
+  bool HasAVX = ST.hasAVX();
+  bool HasAVX512 = ST.hasAVX512();
+  bool HasAMX = ST.hasAMXTILE();
+  if (HasAMX)
+    return X86::TMM7 + 1;
+  if (HasAVX512)
+    return X86::K6_K7 + 1;
+  if (HasAVX)
+    return X86::YMM15 + 1;
+  return X86::YMM0;
+}
+
 bool X86RegisterInfo::isArgumentRegister(const MachineFunction &MF,
                                          MCRegister Reg) const {
   const X86Subtarget &ST = MF.getSubtarget<X86Subtarget>();
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.h b/llvm/lib/Target/X86/X86RegisterInfo.h
index 0671f79676009ee..7296a5f021e4ad4 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.h
+++ b/llvm/lib/Target/X86/X86RegisterInfo.h
@@ -51,6 +51,9 @@ class X86RegisterInfo final : public X86GenRegisterInfo {
 public:
   explicit X86RegisterInfo(const Triple &TT);
 
+  /// Return the number of registers for the function.
+  unsigned getNumSupportedRegs(const MachineFunction &MF) const override;
+
   // FIXME: This should be tablegen'd like getDwarfRegNum is
   int getSEHRegNum(unsigned i) const;
 
diff --git a/llvm/lib/Target/X86/X86RegisterInfo.td b/llvm/lib/Target/X86/X86RegisterInfo.td
index 81b7597cc8ea5c0..898a3f97e5236df 100644
--- a/llvm/lib/Target/X86/X86RegisterInfo.td
+++ b/llvm/lib/Target/X86/X86RegisterInfo.td
@@ -223,6 +223,8 @@ def XMM13: X86Reg<"xmm13", 13>, DwarfRegNum<[30, -2, -2]>;
 def XMM14: X86Reg<"xmm14", 14>, DwarfRegNum<[31, -2, -2]>;
 def XMM15: X86Reg<"xmm15", 15>, DwarfRegNum<[32, -2, -2]>;
 
+let PositionOrder = 2 in {
+// XMM16-31 registers, used by AVX-512 instructions.
 def XMM16:  X86Reg<"xmm16", 16>, DwarfRegNum<[67, -2, -2]>;
 def XMM17:  X86Reg<"xmm17", 17>, DwarfRegNum<[68, -2, -2]>;
 def XMM18:  X86Reg<"xmm18", 18>, DwarfRegNum<[69, -2, -2]>;
@@ -239,27 +241,51 @@ def XMM28:  X86Reg<"xmm28", 28>, DwarfRegNum<[79, -2, -2]>;
 def XMM29:  X86Reg<"xmm29", 29>, DwarfRegNum<[80, -2, -2]>;
 def XMM30:  X86Reg<"xmm30", 30>, DwarfRegNum<[81, -2, -2]>;
 def XMM31:  X86Reg<"xmm31", 31>, DwarfRegNum<[82, -2, -2]>;
+}
 
 // YMM0-15 registers, used by AVX instructions and
 // YMM16-31 registers, used by AVX-512 instructions.
-let SubRegIndices = [sub_xmm] in {
-  foreach  Index = 0-31 in {
+let SubRegIndices = [sub_xmm], PositionOrder = 1 in {
+  foreach  Index = 0-15 in {
+    def YMM#Index : X86Reg<"ymm"#Index, Index, [!cast<X86Reg>("XMM"#Index)]>,
+                    DwarfRegAlias<!cast<X86Reg>("XMM"#Index)>;
+  }
+}
+let SubRegIndices = [sub_xmm], PositionOrder = 2 in {
+  foreach  Index = 16-31 in {
     def YMM#Index : X86Reg<"ymm"#Index, Index, [!cast<X86Reg>("XMM"#Index)]>,
                     DwarfRegAlias<!cast<X86Reg>("XMM"#Index)>;
   }
 }
 
+
 // ZMM Registers, used by AVX-512 instructions.
-let SubRegIndices = [sub_ymm] in {
+let SubRegIndices = [sub_ymm], PositionOrder = 2 in {
   foreach  Index = 0-31 in {
     def ZMM#Index : X86Reg<"zmm"#Index, Index, [!cast<X86Reg>("YMM"#Index)]>,
                     DwarfRegAlias<!cast<X86Reg>("XMM"#Index)>;
   }
 }
 
+let PositionOrder = 2 in {
+// Mask Registers, used by AVX-512 instructions.
+def K0 : X86Reg<"k0", 0>, DwarfRegNum<[118,  93,  93]>;
+def K1 : X86Reg<"k1", 1>, DwarfRegNum<[119,  94,  94]>;
+def K2 : X86Reg<"k2", 2>, DwarfRegNum<[120,  95,  95]>;
+def K3 : X86Reg<"k3", 3>, DwarfRegNum<[121,  96,  96]>;
+def K4 : X86Reg<"k4", 4>, DwarfRegNum<[122,  97,  97]>;
+def K5 : X86Reg<"k5", 5>, DwarfRegNum<[123,  98,  98]>;
+def K6 : X86Reg<"k6", 6>, DwarfRegNum<[124,  99,  99]>;
+def K7 : X86Reg<"k7", 7>, DwarfRegNum<[125, 100, 100]>;
+// Mask register pairs
+def KPAIRS : RegisterTuples<[sub_mask_0, sub_mask_1],
+                             [(add K0, K2, K4, K6), (add K1, K3, K5, K7)]>;
+}
+
+// TMM registers, used by AMX instructions.
+let PositionOrder = 3 in {
 // Tile config registers.
 def TMMCFG: X86Reg<"tmmcfg", 0>;
-
 // Tile "registers".
 def TMM0:  X86Reg<"tmm0",   0>;
 def TMM1:  X86Reg<"tmm1",   1>;
@@ -269,16 +295,7 @@ def TMM4:  X86Reg<"tmm4",   4>;
 def TMM5:  X86Reg<"tmm5",   5>;
 def TMM6:  X86Reg<"tmm6",   6>;
 def TMM7:  X86Reg<"tmm7",   7>;
-
-// Mask Registers, used by AVX-512 instructions.
-def K0 : X86Reg<"k0", 0>, DwarfRegNum<[118,  93,  93]>;
-def K1 : X86Reg<"k1", 1>, DwarfRegNum<[119,  94,  94]>;
-def K2 : X86Reg<"k2", 2>, DwarfRegNum<[120,  95,  95]>;
-def K3 : X86Reg<"k3", 3>, DwarfRegNum<[121,  96,  96]>;
-def K4 : X86Reg<"k4", 4>, DwarfRegNum<[122,  97,  97]>;
-def K5 : X86Reg<"k5", 5>, DwarfRegNum<[123,  98,  98]>;
-def K6 : X86Reg<"k6", 6>, DwarfRegNum<[124,  99,  99]>;
-def K7 : X86Reg<"k7", 7>, DwarfRegNum<[125, 100, 100]>;
+}
 
 // Floating point stack registers. These don't map one-to-one to the FP
 // pseudo registers, but we still mark them as aliasing FP registers. That
@@ -627,10 +644,6 @@ def VK16    : RegisterClass<"X86", [v16i1], 16, (add VK8)> {let Size = 16;}
 def VK32    : RegisterClass<"X86", [v32i1], 32, (add VK16)> {let Size = 32;}
 def VK64    : RegisterClass<"X86", [v64i1], 64, (add VK32)> {let Size = 64;}
 
-// Mask register pairs
-def KPAIRS : RegisterTuples<[sub_mask_0, sub_mask_1],
-                             [(add K0, K2, K4, K6), (add K1, K3, K5, K7)]>;
-
 def VK1PAIR   : RegisterClass<"X86", [untyped], 16, (add KPAIRS)> {let Size = 32;}
 def VK2PAIR   : RegisterClass<"X86", [untyped], 16, (add KPAIRS)> {let Size = 32;}
 def VK4PAIR   : RegisterClass<"X86", [untyped], 16, (add KPAIRS)> {let Size = 32;}
diff --git a/llvm/test/CodeGen/X86/ipra-reg-usage.ll b/llvm/test/CodeGen/X86/ipra-reg-usage.ll
index 36c4d6eff001885..4d0c94125c761c9 100644
--- a/llvm/test/CodeGen/X86/ipra-reg-usage.ll
+++ b/llvm/test/CodeGen/X86/ipra-reg-usage.ll
@@ -3,7 +3,7 @@
 target triple = "x86_64-unknown-unknown"
 declare void @bar1()
 define preserve_allcc void @foo()#0 {
-; CHECK: foo Clobbered Registers: $cs $df $ds $eflags $eip $eiz $es $esp $fpcw $fpsw $fs $fs_base $gs $gs_base $hip $hsp $ip $mxcsr $rflags $rip $riz $rsp $sp $sph $spl $ss $ssp $tmmcfg $_eflags $cr0 $cr1 $cr2 $cr3 $cr4 $cr5 $cr6 $cr7 $cr8 $cr9 $cr10 $cr11 $cr12 $cr13 $cr14 $cr15 $dr0 $dr1 $dr2 $dr3 $dr4 $dr5 $dr6 $dr7 $dr8 $dr9 $dr10 $dr11 $dr12 $dr13 $dr14 $dr15 $fp0 $fp1 $fp2 $fp3 $fp4 $fp5 $fp6 $fp7 $k0 $k1 $k2 $k3 $k4 $k5 $k6 $k7 $mm0 $mm1 $mm2 $mm3 $mm4 $mm5 $mm6 $mm7 $r11 $st0 $st1 $st2 $st3 $st4 $st5 $st6 $st7 $tmm0 $tmm1 $tmm2 $tmm3 $tmm4 $tmm5 $tmm6 $tmm7 $xmm16 $xmm17 $xmm18 $xmm19 $xmm20 $xmm21 $xmm22 $xmm23 $xmm24 $xmm25 $xmm26 $xmm27 $xmm28 $xmm29 $xmm30 $xmm31 $ymm0 $ymm1 $ymm2 $ymm3 $ymm4 $ymm5 $ymm6 $ymm7 $ymm8 $ymm9 $ymm10 $ymm11 $ymm12 $ymm13 $ymm14 $ymm15 $ymm16 $ymm17 $ymm18 $ymm19 $ymm20 $ymm21 $ymm22 $ymm23 $ymm24 $ymm25 $ymm26 $ymm27 $ymm28 $ymm29 $ymm30 $ymm31 $zmm0 $zmm1 $zmm2 $zmm3 $zmm4 $zmm5 $zmm6 $zmm7 $zmm8 $zmm9 $zmm10 $zmm11 $zmm12 $zmm13 $zmm14 $zmm15 $zmm16 $zmm17 $zmm18 $zmm19 $zmm20 $zmm21 $zmm22 $zmm23 $zmm24 $zmm25 $zmm26 $zmm27 $zmm28 $zmm29 $zmm30 $zmm31 $r11b $r11bh $r11d $r11w $r11wh $k0_k1 $k2_k3 $k4_k5 $k6_k7
+; CHECK: foo Clobbered Registers: $cs $df $ds $eflags $eip $eiz $es $esp $fpcw $fpsw $fs $fs_base $gs $gs_base $hip $hsp $ip $mxcsr $rflags $rip $riz $rsp $sp $sph $spl $ss $ssp $_eflags $cr0 $cr1 $cr2 $cr3 $cr4 $cr5 $cr6 $cr7 $cr8 $cr9 $cr10 $cr11 $cr12 $cr13 $cr14 $cr15 $dr0 $dr1 $dr2 $dr3 $dr4 $dr5 $dr6 $dr7 $dr8 $dr9 $dr10 $dr11 $dr12 $dr13 $dr14 $dr15 $fp0 $fp1 $fp2 $fp3 $fp4 $fp5 $fp6 $fp7 $mm0 $mm1 $mm2 $mm3 $mm4 $mm5 $mm6 $mm7 $r11 $st0 $st1 $st2 $st3 $st4 $st5 $st6 $st7 $r11b $r11bh $r11d $r11w $r11wh $ymm0 $ymm1 $ymm2 $ymm3 $ymm4 $ymm5 $ymm6 $ymm7 $ymm8 $ymm9 $ymm10 $ymm11 $ymm12 $ymm13 $ymm14 $ymm15 $k0 $k1 $k2 $k3 $k4 $k5 $k6 $k7 $xmm16 $xmm17 $xmm18 $xmm19 $xmm20 $xmm21 $xmm22 $xmm23 $xmm24 $xmm25 $xmm26 $xmm27 $xmm28 $xmm29 $xmm30 $xmm31 $ymm16 $ymm17 $ymm18 $ymm19 $ymm20 $ymm21 $ymm22 $ymm23 $ymm24 $ymm25 $ymm26 $ymm27 $ymm28 $ymm29 $ymm30 $ymm31 $zmm0 $zmm1 $zmm2 $zmm3 $zmm4 $zmm5 $zmm6 $zmm7 $zmm8 $zmm9 $zmm10 $zmm11 $zmm12 $zmm13 $zmm14 $zmm15 $zmm16 $zmm17 $zmm18 $zmm19 $zmm20 $zmm21 $zmm22 $zmm23 $zmm24 $zmm25 $zmm26 $zmm27 $zmm28 $zmm29 $zmm30 $zmm31 $k0_k1 $k2_k3 $k4_k5 $k6_k7 $tmmcfg $tmm0 $tmm1 $tmm2 $tmm3 $tmm4 $tmm5 $tmm6 $tmm7
   call void @bar1()
   call void @bar2()
   ret void
diff --git a/llvm/utils/TableGen/CodeGenRegisters.cpp b/llvm/utils/TableGen/CodeGenRegisters.cpp
index 2d5f5c841a174af..b5162dda544f7fb 100644
--- a/llvm/utils/TableGen/CodeGenRegisters.cpp
+++ b/llvm/utils/TableGen/CodeGenRegisters.cpp
@@ -1175,24 +1175,21 @@ CodeGenRegBank::CodeGenRegBank(RecordKeeper &Records,
   for (aut...
[truncated]

@RKSimon
Copy link
Collaborator

RKSimon commented Oct 25, 2023

Please can you fix the clang-format fails

@KanRobert
Copy link
Contributor Author

Please can you fix the clang-format fails

Done.

//
// Registers before AVX512,
// AVX512 registers (X/YMM16-31, ZMM0-31, K registers)
// AMX registers (TMM)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also see the difference by the update in llvm/test/CodeGen/X86/ipra-reg-usage.ll

// Call HandlePhysRegKill() for all live registers clobbered by Mask.
// Clobbered registers are always dead, sp there is no need to use
// HandlePhysRegDef().
for (unsigned Reg = 1, NumRegs = TRI->getNumRegs(); Reg != NumRegs; ++Reg) {
for (unsigned Reg = 1; Reg != NumRegs; ++Reg) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to take a step back.

Have you tried checking MRI->isReserved(Reg) here? Does that do anything to improve compile time?

Where is the time spent?

Am I reading this right, that it calls HandlePhysRegKill on the super register anytime it finds one of the subregisters is clobbered? Even if its already called it before?

Copy link
Contributor Author

@KanRobert KanRobert Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an analysis on the spent time there #67702 (comment).

The extra time is spent on iterations over the new registers and new register classes. LiveVariables is one of the passes with the greatest impact b/c it iterates them for each basic block. Besides, for LiveVariables and some other passes, the new registers make a bigger regmask and regset, and then operations on the regmask and regset take more time too, including allocation and loopup, etc.

MRI->isReserved(Reg)? I think it's a AArch64 function only and even not used by X86 at all.

Am I reading this right, that it calls HandlePhysRegKill on the super register anytime it finds one of the subregisters is clobbered? Even if its already called it before?

My understanding is same as yours.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MRI->isReserved(Reg) doesn't use the information from X86RegisterInfo::getReservedRegs?

Copy link
Contributor Author

@KanRobert KanRobert Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood your idea...

diff --git a/llvm/lib/CodeGen/LiveVariables.cpp b/llvm/lib/CodeGen/LiveVariables.cpp
index 6b983b6320c7..44e127151004 100644
--- a/llvm/lib/CodeGen/LiveVariables.cpp
+++ b/llvm/lib/CodeGen/LiveVariables.cpp
@@ -411,6 +411,9 @@ void LiveVariables::HandleRegMask(const MachineOperand &MO) {
   // Clobbered registers are always dead, sp there is no need to use
   // HandlePhysRegDef().
   for (unsigned Reg = 1, NumRegs = TRI->getNumRegs(); Reg != NumRegs; ++Reg) {
+    // Skip reserved regs.
+    if (MRI->isReserved(Reg))
+      continue;
     // Skip dead regs.
     if (!PhysRegDef[Reg] && !PhysRegUse[Reg])
       continue;

After testing, I find checking MRI->isReserved(Reg) does not improve compile time, instead, it will bring 0.2% instruction count regression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an analysis on the spent time there #67702 (comment).

The extra time is spent on iterations over the new registers and new register classes. LiveVariables is one of the passes with the greatest impact b/c it iterates them for each basic block. Besides, for LiveVariables and some other passes, the new registers make a bigger regmask and regset, and then operations on the regmask and regset take more time too, including allocation and loopup, etc.

FYI LiveVariables is deprecated and we should work towards getting rid of it. LiveIntervals has been the replacement for ages

Copy link
Contributor Author

@KanRobert KanRobert Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "deprecated"? I see this function is still called when compiling workloads.
And according to the comment #67702 (comment) , we have not got rid of LiveVariables completely

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in all uses of it should be removed and replaced with LiveIntervals. You still see it because nobody ever completed the work to remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But currently LiveVariables brings about 0.3% instruction count regression when I introduce new registers in #67702. If the regression is intolerable, then it will block my APX enabling work. I have to resolve it to before the migration is complete.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much work is left to remove LIveVariables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayfoad commented in #67702. I believe he/she knows more about that than me.

@KanRobert
Copy link
Contributor Author

Ping?

Copy link
Contributor

@phoebewang phoebewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see objections to this patch, so I'll approve to unblock our follow up patches.
I think as a workaround, this patch solves the compiler time regression caused by adding new registers. We can remove it once we don't need LIveVariables.

@KanRobert KanRobert changed the title [X86][NFC] Reorder the registers to reduce unnecessary iterations [NFC] Reorder the registers to reduce unnecessary iterations Nov 1, 2023
@KanRobert KanRobert changed the title [NFC] Reorder the registers to reduce unnecessary iterations [NFC][X86] Reorder the registers to reduce unnecessary iterations Nov 1, 2023
@KanRobert KanRobert merged commit 860f9e5 into llvm:main Nov 1, 2023
2 of 3 checks passed
nikic added a commit to nikic/llvm-project that referenced this pull request Nov 8, 2023
llvm#70222 introduces a hook
to return a more accurate number of registers supported for a
specific subtarget (rather than target). However, while x86 registers
were reordered to allow using this, the implementation still always
returned NUM_TARGET_REGS.

Adjust it to return a smaller number of registers depending on
availability of avx/avx512/amx.
nikic added a commit that referenced this pull request Nov 9, 2023
#70222 introduced a hook to
return a more accurate number of registers supported for a specific
subtarget (rather than target). However, while x86 registers were
reordered to allow using this, the implementation currently still always
returns NUM_TARGET_REGS.

Adjust it to return a smaller number of registers depending on
availability of avx/avx512/amx.

The actual impact of this seems to be pretty small, on the order of
0.05%.
nikic added a commit to nikic/llvm-project that referenced this pull request Nov 9, 2023
This makes use of the more accurate register number introduced in
PR llvm#70222 to avoid CFI calculations for unsupported registers.
nikic added a commit that referenced this pull request Nov 9, 2023
This makes use of the more accurate register number introduced in PR
#70222 to avoid CFI calculations for unsupported registers.

This has basically no impact right now, but results in a 0.2% compile-time
improvement at O0 when applied on top of #70958.

The reason is that the extra registers that PR adds push the `BitVector`
out of the `SmallVector` space, which results in an outsized impact.
(This does make me wonder whether `BitVector` should accept an `N`
template parameter to allow using a larger `SmallVector`...)
KanRobert added a commit that referenced this pull request Nov 9, 2023
1. Map R16-R31 to DWARF registers 130-145.
2. Make R16-R31 caller-saved registers.
3. Make R16-31 allocatable only when feature EGPR is supported
4. Make R16-31 availabe for instructions in legacy maps 0/1 and EVEX
space, except XSAVE*/XRSTOR

RFC:

https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031/4

Explanations for some seemingly unrelated changes:

inline-asm-registers.mir, statepoint-invoke-ra-enter-at-end.mir:
The immediate (TargetInstrInfo.cpp:1612) used for the regdef/reguse is
the encoding for the register
  class in the enum generated by tablegen. This encoding will change
  any time a new register class is added. Since the number is part
  of the input, this means it can become stale.

seh-directive-errors.s:
   R16-R31 makes ".seh_pushreg 17" legal

musttail-varargs.ll:
It seems some LLVM passes use the number of registers rather the number
of allocatable registers as heuristic.

This PR is to reland #67702 after #70222 in order to reduce some
compile-time regression when EGPR is not used.
qihangkong pushed a commit to rvgpu/llvm that referenced this pull request Apr 18, 2024
llvm/llvm-project#70222 introduced a hook to
return a more accurate number of registers supported for a specific
subtarget (rather than target). However, while x86 registers were
reordered to allow using this, the implementation currently still always
returns NUM_TARGET_REGS.

Adjust it to return a smaller number of registers depending on
availability of avx/avx512/amx.

The actual impact of this seems to be pretty small, on the order of
0.05%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants