Skip to content

Conversation

@kosarev
Copy link
Collaborator

@kosarev kosarev commented Nov 12, 2025

Reduces memory usage compiling backend sources, most notably for
AMDGPU by ~98 MB per source on average.

AMDGPUGenRegisterInfo.inc is tens of megabytes in size now, and
is even larger downstream. At the same time, it is included in
nearly all backend sources, typically just for a small portion of
its content, resulting in compilation being unnecessarily
memory-hungry, which in turn stresses buildbots and wastes their
resources.

Splitting .inc files also helps avoiding extra ccache misses
where changes in .td files don't cause changes in all parts of
what previously was a single .inc file.

It is thought that rather than building on top of the current
single-output-file design of TableGen, e.g., using split-file,
it would be more preferable to recognise the need for multi-file
outputs and give it a proper first-class support directly in
TableGen.

@kosarev kosarev marked this pull request as ready for review November 12, 2025 15:22
Copy link
Collaborator Author

kosarev commented Nov 12, 2025

@llvmbot
Copy link
Member

llvmbot commented Nov 12, 2025

@llvm/pr-subscribers-tablegen
@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-mlir-core

@llvm/pr-subscribers-mlir

Author: Ivan Kosarev (kosarev)

Changes

Reduces memory usage compiling backend sources, most notably for
AMDGPU by ~98 MB per source on average.

AMDGPUGenRegisterInfo.inc is tens of megabytes in size now, and
is even larger downstream. At the same time, it is included in
nearly all backend sources, typically just for a small portion of
its content, resulting in compilation being unnecessarily
memory-hungry, which in turn stresses buildbots and wastes their
resources.

Splitting .inc files also helps avoiding extra ccache misses
where changes in .td files don't cause changes in all parts of
what previously was a single .inc file.

It is thought that rather than building on top of the current
single-output-file design of TableGen, e.g., using split-file,
it would be more preferable to recognise the need for multi-file
outputs and give it a proper first-class support directly in
TableGen.


Patch is 21.52 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/167700.diff

13 Files Affected:

  • (modified) llvm/include/llvm/TableGen/Main.h (+16-1)
  • (modified) llvm/include/llvm/TableGen/TableGenBackend.h (+40-3)
  • (modified) llvm/lib/TableGen/Main.cpp (+52-24)
  • (modified) llvm/lib/TableGen/TableGenBackend.cpp (+14-5)
  • (modified) llvm/lib/Target/AMDGPU/SIFoldOperands.cpp (+1-1)
  • (modified) llvm/test/TableGen/ArtificialSubregs.td (+1-1)
  • (modified) llvm/test/TableGen/ConcatenatedSubregs.td (+1-1)
  • (modified) llvm/test/TableGen/HwModeBitSet.td (+1-1)
  • (modified) llvm/test/TableGen/HwModeSubRegs.td (+1-1)
  • (modified) llvm/test/TableGen/SubRegsAndAliases.td (+1-1)
  • (modified) llvm/utils/TableGen/Basic/TableGen.cpp (+5-2)
  • (modified) llvm/utils/TableGen/RegisterInfoEmitter.cpp (+55-31)
  • (modified) mlir/lib/Tools/mlir-tblgen/MlirTblgenMain.cpp (+8-1)
diff --git a/llvm/include/llvm/TableGen/Main.h b/llvm/include/llvm/TableGen/Main.h
index 5f68be188de78..bafce3a463acc 100644
--- a/llvm/include/llvm/TableGen/Main.h
+++ b/llvm/include/llvm/TableGen/Main.h
@@ -15,19 +15,34 @@
 
 #include "llvm/Support/CommandLine.h"
 #include <functional>
+#include <map>
 
 namespace llvm {
 
 class raw_ostream;
 class RecordKeeper;
 
-/// Perform the action using Records, and write output to OS.
+struct TableGenOutputFiles {
+  std::string MainFile;
+
+  // Translates additional output file names to their contents.
+  std::map<StringRef, std::string> AdditionalFiles;
+};
+
 /// Returns true on error, false otherwise.
 using TableGenMainFn = bool(raw_ostream &OS, const RecordKeeper &Records);
 
+/// Perform the action using Records, and store output in OutFiles.
+/// Returns true on error, false otherwise.
+using MultiFileTableGenMainFn = bool(TableGenOutputFiles &OutFiles,
+                                     const RecordKeeper &Records);
+
 int TableGenMain(const char *argv0,
                  std::function<TableGenMainFn> MainFn = nullptr);
 
+int TableGenMain(const char *argv0,
+                 std::function<MultiFileTableGenMainFn> MainFn = nullptr);
+
 /// Controls emitting large character arrays as strings or character arrays.
 /// Typically set to false when building with MSVC.
 extern cl::opt<bool> EmitLongStrLiterals;
diff --git a/llvm/include/llvm/TableGen/TableGenBackend.h b/llvm/include/llvm/TableGen/TableGenBackend.h
index 7cb540f66ec70..d3545c1d18cd9 100644
--- a/llvm/include/llvm/TableGen/TableGenBackend.h
+++ b/llvm/include/llvm/TableGen/TableGenBackend.h
@@ -15,6 +15,7 @@
 
 #include "llvm/ADT/STLFunctionalExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/TableGen/Main.h"
 #include "llvm/TableGen/Record.h"
 
 namespace llvm {
@@ -23,7 +24,27 @@ class RecordKeeper;
 class raw_ostream;
 
 namespace TableGen::Emitter {
-using FnT = function_ref<void(const RecordKeeper &Records, raw_ostream &OS)>;
+
+/// Represents the emitting function. Can produce a single or multple output
+/// files.
+struct FnT {
+  using SingleFileGeneratorType = void(const RecordKeeper &Records,
+                                       raw_ostream &OS);
+  using MultiFileGeneratorType = TableGenOutputFiles(
+      StringRef FilenamePrefix, const RecordKeeper &Records);
+
+  SingleFileGeneratorType *SingleFileGenerator = nullptr;
+  MultiFileGeneratorType *MultiFileGenerator = nullptr;
+
+  FnT() {}
+  FnT(SingleFileGeneratorType *Gen) : SingleFileGenerator(Gen) {}
+  FnT(MultiFileGeneratorType *Gen) : MultiFileGenerator(Gen) {}
+
+  bool operator==(const FnT &Other) const {
+    return SingleFileGenerator == Other.SingleFileGenerator &&
+           MultiFileGenerator == Other.MultiFileGenerator;
+  }
+};
 
 /// Creating an `Opt` object registers the command line option \p Name with
 /// TableGen backend and associates the callback \p CB with that option. If
@@ -36,17 +57,33 @@ struct Opt {
 /// Convienence wrapper around `Opt` that registers `EmitterClass::run` as the
 /// callback.
 template <class EmitterC> class OptClass : Opt {
-  static void run(const RecordKeeper &RK, raw_ostream &OS) {
+  static TableGenOutputFiles run(StringRef /*FilenamePrefix*/,
+                                 const RecordKeeper &RK) {
+    std::string S;
+    raw_string_ostream OS(S);
     EmitterC(RK).run(OS);
+    return {S, {}};
   }
 
 public:
   OptClass(StringRef Name, StringRef Desc) : Opt(Name, run, Desc) {}
 };
 
+/// A version of the wrapper for backends emitting multiple files.
+template <class EmitterC> class MultiFileOptClass : Opt {
+  static TableGenOutputFiles run(StringRef FilenamePrefix,
+                                 const RecordKeeper &RK) {
+    return EmitterC(RK).run(FilenamePrefix);
+  }
+
+public:
+  MultiFileOptClass(StringRef Name, StringRef Desc) : Opt(Name, run, Desc) {}
+};
+
 /// Apply callback for any command line option registered above. Returns false
 /// is no callback was applied.
-bool ApplyCallback(const RecordKeeper &Records, raw_ostream &OS);
+bool ApplyCallback(const RecordKeeper &Records, TableGenOutputFiles &OutFiles,
+                   StringRef FilenamePrefix);
 
 } // namespace TableGen::Emitter
 
diff --git a/llvm/lib/TableGen/Main.cpp b/llvm/lib/TableGen/Main.cpp
index b1024a8d39e00..dadbb81299dcc 100644
--- a/llvm/lib/TableGen/Main.cpp
+++ b/llvm/lib/TableGen/Main.cpp
@@ -23,6 +23,7 @@
 #include "llvm/Support/ErrorOr.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Path.h"
 #include "llvm/Support/SMLoc.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Support/ToolOutputFile.h"
@@ -104,8 +105,30 @@ static int createDependencyFile(const TGParser &Parser, const char *argv0) {
   return 0;
 }
 
+static int WriteOutput(const TGParser &Parser, const char *argv0,
+                       StringRef Filename, StringRef Content) {
+  if (WriteIfChanged) {
+    // Only updates the real output file if there are any differences.
+    // This prevents recompilation of all the files depending on it if there
+    // aren't any.
+    if (auto ExistingOrErr = MemoryBuffer::getFile(Filename, /*IsText=*/true))
+      if (std::move(ExistingOrErr.get())->getBuffer() == Content)
+        return 0;
+  }
+  std::error_code EC;
+  ToolOutputFile OutFile(Filename, EC, sys::fs::OF_Text);
+  if (EC)
+    return reportError(argv0, "error opening " + Filename + ": " +
+                                  EC.message() + "\n");
+  OutFile.os() << Content;
+  if (ErrorsPrinted == 0)
+    OutFile.keep();
+
+  return 0;
+}
+
 int llvm::TableGenMain(const char *argv0,
-                       std::function<TableGenMainFn> MainFn) {
+                       std::function<MultiFileTableGenMainFn> MainFn) {
   RecordKeeper Records;
   TGTimer &Timer = Records.getTimer();
 
@@ -144,13 +167,14 @@ int llvm::TableGenMain(const char *argv0,
 
   // Write output to memory.
   Timer.startBackendTimer("Backend overall");
-  std::string OutString;
-  raw_string_ostream Out(OutString);
+  SmallString<128> FilenamePrefix = StringRef(OutputFilename);
+  sys::path::replace_extension(FilenamePrefix, "");
+  TableGenOutputFiles OutFiles;
   unsigned status = 0;
   // ApplyCallback will return true if it did not apply any callback. In that
   // case, attempt to apply the MainFn.
-  if (TableGen::Emitter::ApplyCallback(Records, Out))
-    status = MainFn ? MainFn(Out, Records) : 1;
+  if (TableGen::Emitter::ApplyCallback(Records, OutFiles, FilenamePrefix))
+    status = MainFn ? MainFn(OutFiles, Records) : 1;
   Timer.stopBackendTimer();
   if (status)
     return 1;
@@ -165,25 +189,17 @@ int llvm::TableGenMain(const char *argv0,
   }
 
   Timer.startTimer("Write output");
-  bool WriteFile = true;
-  if (WriteIfChanged) {
-    // Only updates the real output file if there are any differences.
-    // This prevents recompilation of all the files depending on it if there
-    // aren't any.
-    if (auto ExistingOrErr =
-            MemoryBuffer::getFile(OutputFilename, /*IsText=*/true))
-      if (std::move(ExistingOrErr.get())->getBuffer() == OutString)
-        WriteFile = false;
-  }
-  if (WriteFile) {
-    std::error_code EC;
-    ToolOutputFile OutFile(OutputFilename, EC, sys::fs::OF_Text);
-    if (EC)
-      return reportError(argv0, "error opening " + OutputFilename + ": " +
-                                    EC.message() + "\n");
-    OutFile.os() << OutString;
-    if (ErrorsPrinted == 0)
-      OutFile.keep();
+  if (int Ret = WriteOutput(Parser, argv0, OutputFilename, OutFiles.MainFile))
+    return Ret;
+  for (auto [Suffix, Content] : OutFiles.AdditionalFiles) {
+    SmallString<128> Filename = StringRef(OutputFilename);
+    // TODO: Format using the split-flit convention when writing to stdout?
+    if (Filename != "-") {
+      Filename = FilenamePrefix;
+      Filename.append(Suffix);
+    }
+    if (int Ret = WriteOutput(Parser, argv0, Filename, Content))
+      return Ret;
   }
 
   Timer.stopTimer();
@@ -193,3 +209,15 @@ int llvm::TableGenMain(const char *argv0,
     return reportError(argv0, Twine(ErrorsPrinted) + " errors.\n");
   return 0;
 }
+
+int llvm::TableGenMain(const char *argv0,
+                       std::function<TableGenMainFn> MainFn) {
+  return TableGenMain(argv0, [&MainFn](TableGenOutputFiles &OutFiles,
+                                       const RecordKeeper &Records) {
+    std::string S;
+    raw_string_ostream OS(S);
+    auto Res = MainFn(OS, Records);
+    OutFiles = {S, {}};
+    return Res;
+  });
+}
diff --git a/llvm/lib/TableGen/TableGenBackend.cpp b/llvm/lib/TableGen/TableGenBackend.cpp
index 153ca39cfba08..116c3fcca8bd8 100644
--- a/llvm/lib/TableGen/TableGenBackend.cpp
+++ b/llvm/lib/TableGen/TableGenBackend.cpp
@@ -61,12 +61,21 @@ Opt::Opt(StringRef Name, FnT CB, StringRef Desc, bool ByDefault) {
 /// Apply callback specified on the command line. Returns true if no callback
 /// was applied.
 bool llvm::TableGen::Emitter::ApplyCallback(const RecordKeeper &Records,
-                                            raw_ostream &OS) {
+                                            TableGenOutputFiles &OutFiles,
+                                            StringRef FilenamePrefix) {
   FnT Fn = CallbackFunction->getValue();
-  if (!Fn)
-    return true;
-  Fn(Records, OS);
-  return false;
+  if (Fn.SingleFileGenerator) {
+    std::string S;
+    raw_string_ostream OS(S);
+    Fn.SingleFileGenerator(Records, OS);
+    OutFiles = {S, {}};
+    return false;
+  }
+  if (Fn.MultiFileGenerator) {
+    OutFiles = Fn.MultiFileGenerator(FilenamePrefix, Records);
+    return false;
+  }
+  return true;
 }
 
 static void printLine(raw_ostream &OS, const Twine &Prefix, char Fill,
diff --git a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
index 2c00e23d113cb..a4faf9ae7376e 100644
--- a/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
@@ -1358,7 +1358,7 @@ void SIFoldOperandsImpl::foldOperand(
       // Remove this if 16-bit SGPRs (i.e. SGPR_LO16) are added to the
       // VS_16RegClass
       //
-      // Excerpt from AMDGPUGenRegisterInfo.inc
+      // Excerpt from AMDGPUGenRegisterInfoEnums.inc
       // NoSubRegister, //0
       // hi16, // 1
       // lo16, // 2
diff --git a/llvm/test/TableGen/ArtificialSubregs.td b/llvm/test/TableGen/ArtificialSubregs.td
index b8d5686dee51b..e1e83e504137e 100644
--- a/llvm/test/TableGen/ArtificialSubregs.td
+++ b/llvm/test/TableGen/ArtificialSubregs.td
@@ -1,4 +1,4 @@
-// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK
+// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o - 2>&1 >/dev/null | FileCheck %s --check-prefix=CHECK
 include "llvm/Target/Target.td"
 
 // This file tests that when using `isArtificial` for subregisters in
diff --git a/llvm/test/TableGen/ConcatenatedSubregs.td b/llvm/test/TableGen/ConcatenatedSubregs.td
index ea4e7f01a2e2d..8b0bd2a691cc8 100644
--- a/llvm/test/TableGen/ConcatenatedSubregs.td
+++ b/llvm/test/TableGen/ConcatenatedSubregs.td
@@ -1,4 +1,4 @@
-// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o /dev/null 2>&1 | FileCheck %s
+// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o - 2>&1 >/dev/null | FileCheck %s
 // Checks that tablegen correctly and completely infers subregister relations.
 include "llvm/Target/Target.td"
 
diff --git a/llvm/test/TableGen/HwModeBitSet.td b/llvm/test/TableGen/HwModeBitSet.td
index 869b09b3c6316..70cf789e4a9c7 100644
--- a/llvm/test/TableGen/HwModeBitSet.td
+++ b/llvm/test/TableGen/HwModeBitSet.td
@@ -1,5 +1,5 @@
 // This is to test the scenario where different HwMode attributes coexist.
-// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-REG
+// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o - 2>&1 >/dev/null | FileCheck %s --check-prefix=CHECK-REG
 // RUN: llvm-tblgen -gen-subtarget -I %p/../../include %s -o - 2>&1 | FileCheck %s --check-prefix=CHECK-SUBTARGET
 
 
diff --git a/llvm/test/TableGen/HwModeSubRegs.td b/llvm/test/TableGen/HwModeSubRegs.td
index 31a569fcdaad1..c71e963fca13b 100644
--- a/llvm/test/TableGen/HwModeSubRegs.td
+++ b/llvm/test/TableGen/HwModeSubRegs.td
@@ -1,4 +1,4 @@
-// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o /dev/null 2>&1 | FileCheck %s
+// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o - 2>&1 >/dev/null | FileCheck %s
 include "llvm/Target/Target.td"
 
 def HasFeat : Predicate<"Subtarget->hasFeat()">;
diff --git a/llvm/test/TableGen/SubRegsAndAliases.td b/llvm/test/TableGen/SubRegsAndAliases.td
index ec450f8945b49..bd062a7ab6f53 100644
--- a/llvm/test/TableGen/SubRegsAndAliases.td
+++ b/llvm/test/TableGen/SubRegsAndAliases.td
@@ -1,4 +1,4 @@
-// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o /dev/null 2>&1 | FileCheck %s
+// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o - 2>&1 >/dev/null | FileCheck %s
 include "llvm/Target/Target.td"
 def TestTarget : Target;
 
diff --git a/llvm/utils/TableGen/Basic/TableGen.cpp b/llvm/utils/TableGen/Basic/TableGen.cpp
index edb7791500699..b79ae93dab4f7 100644
--- a/llvm/utils/TableGen/Basic/TableGen.cpp
+++ b/llvm/utils/TableGen/Basic/TableGen.cpp
@@ -60,7 +60,9 @@ static TableGen::Emitter::Opt X[] = {
      true},
     {"print-detailed-records", EmitDetailedRecords,
      "Print full details of all records to stdout"},
-    {"null-backend", [](const RecordKeeper &Records, raw_ostream &OS) {},
+    {"null-backend",
+     TableGen::Emitter::FnT(
+         [](const RecordKeeper &Records, raw_ostream &OS) {}),
      "Do nothing after parsing (useful for timing)"},
     {"dump-json", EmitJSON, "Dump all records as machine-readable JSON"},
     {"print-enums", printEnums, "Print enum values for a class"},
@@ -71,7 +73,8 @@ int tblgen_main(int argc, char **argv) {
   InitLLVM X(argc, argv);
   cl::ParseCommandLineOptions(argc, argv);
 
-  return TableGenMain(argv[0]);
+  std::function<MultiFileTableGenMainFn> MainFn = nullptr;
+  return TableGenMain(argv[0], MainFn);
 }
 
 #ifndef __has_feature
diff --git a/llvm/utils/TableGen/RegisterInfoEmitter.cpp b/llvm/utils/TableGen/RegisterInfoEmitter.cpp
index 3e6e23ffec115..ee5381b228224 100644
--- a/llvm/utils/TableGen/RegisterInfoEmitter.cpp
+++ b/llvm/utils/TableGen/RegisterInfoEmitter.cpp
@@ -68,19 +68,22 @@ class RegisterInfoEmitter {
   }
 
   // runEnums - Print out enum values for all of the registers.
-  void runEnums(raw_ostream &OS);
+  void runEnums(raw_ostream &OS, raw_ostream &MainOS, StringRef FilenamePrefix);
 
   // runMCDesc - Print out MC register descriptions.
-  void runMCDesc(raw_ostream &OS);
+  void runMCDesc(raw_ostream &OS, raw_ostream &MainOS,
+                 StringRef FilenamePrefix);
 
   // runTargetHeader - Emit a header fragment for the register info emitter.
-  void runTargetHeader(raw_ostream &OS);
+  void runTargetHeader(raw_ostream &OS, raw_ostream &MainOS,
+                       StringRef FilenamePrefix);
 
   // runTargetDesc - Output the target register and register file descriptions.
-  void runTargetDesc(raw_ostream &OS);
+  void runTargetDesc(raw_ostream &OS, raw_ostream &MainOS,
+                     StringRef FilenamePrefix);
 
   // run - Output the register file description.
-  void run(raw_ostream &OS);
+  TableGenOutputFiles run(StringRef FilenamePrefix);
 
   void debugDump(raw_ostream &OS);
 
@@ -97,8 +100,19 @@ class RegisterInfoEmitter {
 
 } // end anonymous namespace
 
+static void emitInclude(StringRef FilenamePrefix, StringRef IncludeFile,
+                        StringRef GuardMacro, raw_ostream &OS) {
+  OS << "#ifdef " << GuardMacro << '\n';
+  OS << "#undef " << GuardMacro << '\n';
+  OS << "#include \"" << FilenamePrefix << IncludeFile << "\"\n";
+  OS << "#endif\n\n";
+}
+
 // runEnums - Print out enum values for all of the registers.
-void RegisterInfoEmitter::runEnums(raw_ostream &OS) {
+void RegisterInfoEmitter::runEnums(raw_ostream &OS, raw_ostream &MainOS,
+                                   StringRef FilenamePrefix) {
+  emitInclude(FilenamePrefix, "Enums.inc", "GET_REGINFO_ENUM", MainOS);
+
   const auto &Registers = RegBank.getRegisters();
 
   // Register enums are stored as uint16_t in the tables. Make sure we'll fit.
@@ -108,9 +122,6 @@ void RegisterInfoEmitter::runEnums(raw_ostream &OS) {
 
   emitSourceFileHeader("Target Register Enum Values", OS);
 
-  OS << "\n#ifdef GET_REGINFO_ENUM\n";
-  OS << "#undef GET_REGINFO_ENUM\n\n";
-
   OS << "namespace llvm {\n\n";
 
   OS << "class MCRegisterClass;\n"
@@ -194,7 +205,6 @@ void RegisterInfoEmitter::runEnums(raw_ostream &OS) {
   OS << '\n';
 
   OS << "} // end namespace llvm\n\n";
-  OS << "#endif // GET_REGINFO_ENUM\n\n";
 }
 
 static void printInt(raw_ostream &OS, int Val) { OS << Val; }
@@ -901,11 +911,11 @@ void RegisterInfoEmitter::emitComposeSubRegIndexLaneMask(raw_ostream &OS,
 //
 // runMCDesc - Print out MC register descriptions.
 //
-void RegisterInfoEmitter::runMCDesc(raw_ostream &OS) {
-  emitSourceFileHeader("MC Register Information", OS);
+void RegisterInfoEmitter::runMCDesc(raw_ostream &OS, raw_ostream &MainOS,
+                                    StringRef FilenamePrefix) {
+  emitInclude(FilenamePrefix, "MCDesc.inc", "GET_REGINFO_MC_DESC", MainOS);
 
-  OS << "\n#ifdef GET_REGINFO_MC_DESC\n";
-  OS << "#undef GET_REGINFO_MC_DESC\n\n";
+  emitSourceFileHeader("MC Register Information", OS);
 
   const auto &Regs = RegBank.getRegisters();
 
@@ -1130,14 +1140,13 @@ void RegisterInfoEmitter::runMCDesc(raw_ostream &OS) {
   OS << "}\n\n";
 
   OS << "} // end namespace llvm\n\n";
-  OS << "#endif // GET_REGINFO_MC_DESC\n\n";
 }
 
-void RegisterInfoEmitter::runTargetHeader(raw_ostream &OS) {
-  emitSourceFileHeader("Register Information Header Fragment", OS);
+void RegisterInfoEmitter::runTargetHeader(raw_ostream &OS, raw_ostream &MainOS,
+                                          StringRef FilenamePrefix) {
+  emitInclude(FilenamePrefix, "Header.inc", "GET_REGINFO_HEADER", MainOS);
 
-  OS << "\n#ifdef GET_REGINFO_HEADER\n";
-  OS << "#undef GET_REGINFO_HEADER\n\n";
+  emitSourceFileHeader("Register Information Header Fragment", OS);
 
   const std::string &TargetName = Target.getName().str();
   std::string ClassName = TargetName + "GenRegisterInfo";
@@ -1214,17 +1223,17 @@ void RegisterInfoEmitter::runTargetHeader(raw_ostream &OS) {
     OS << "} // end namespace " << RegisterClasses.front().Namespace << "\n\n";
   }
   OS << "} // end namespace llvm\n\n";
-  OS << "#endif // GET_REGINFO_HEADER\n\n";
 }
 
 //
 // runTargetDesc - Output the target register and register file descriptions.
 //
-void RegisterInfoEmitter::runTargetDesc(raw_ostream &OS) {
-  emitSourceFileHeader("Target Register and Register Classes Information", OS);
+void RegisterInfoEmitter::runTargetDesc(raw_ostream &OS, raw_ostream &MainOS,
+                                        StringRef FilenamePrefix) {
+  emitInclude(FilenamePrefix, "TargetDesc.inc", "GET_REGINFO_TARGET_DESC",
+              MainOS);
 
-  OS << "\n#ifdef GET_REGINFO_TARGET_DESC\n";
-  OS << "#undef GET_REGINFO_TARGET_DESC\n\n";
+  emitSourceFileHeader("Target Register and Register Classes Information", OS);
 
   OS << "namespace llvm {\n\n";
 
@@ -1839,25 +1848,40 @@ void RegisterInfoEmitter::runTargetDesc(raw_ostream &OS) {
      << "}\n\n";
 
   OS << "} // end namespace llvm\n\n";
-  OS << "#endif // GET_REGINFO_TARGET_DESC\n\n";
 }
 
-void RegisterInfoEmitter::run(raw_ostream &OS) {
+TableGenOutputFiles RegisterInfoEmitter::run(StringRef FilenamePrefix) {
   TGTimer &Timer = Records.getTimer();
   Timer.startTimer("Print enums");
-  runEnums(OS);
+  std::string Main;
+  raw_string_ostream MainOS(Main);
+  std::string Enums;
+  raw_string_ostream EnumsOS(Enums);
+  runEnums(EnumsOS, MainOS, FilenamePrefix);
 
   Timer.startTimer("Print MC registers");
-  runMCDesc(OS);
+  std::string MCDesc;
+  raw_string_ostream MCDescOS(MCDesc);
+  runMCDesc(MCDescOS, MainOS, FilenamePrefix);
 
   Timer.startTimer("Print header fragment");
-  runTargetHeader(OS);
+  std::string Header;
...
[truncated]

std::function<TableGenMainFn> MainFn = nullptr);

int TableGenMain(const char *argv0,
std::function<MultiFileTableGenMainFn> MainFn = nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function_ref?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #167888.


/// Perform the action using Records, and store output in OutFiles.
/// Returns true on error, false otherwise.
using MultiFileTableGenMainFn = bool(TableGenOutputFiles &OutFiles,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function_ref?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #167888.

SingleFileGeneratorType *SingleFileGenerator = nullptr;
MultiFileGeneratorType *MultiFileGenerator = nullptr;

FnT() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FnT() {}
FnT() = default;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

if (int Ret = WriteOutput(Parser, argv0, OutputFilename, OutFiles.MainFile))
return Ret;
for (auto [Suffix, Content] : OutFiles.AdditionalFiles) {
SmallString<128> Filename = StringRef(OutputFilename);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallString<128> Filename = StringRef(OutputFilename);
SmallString<128> Filename(OutputFilename);

Not sure what the intermediate StringRef is for

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return Ret;
for (auto [Suffix, Content] : OutFiles.AdditionalFiles) {
SmallString<128> Filename = StringRef(OutputFilename);
// TODO: Format using the split-flit convention when writing to stdout?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split-flit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


static void emitInclude(StringRef FilenamePrefix, StringRef IncludeFile,
StringRef GuardMacro, raw_ostream &OS) {
OS << "#ifdef " << GuardMacro << '\n';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use IncludeGuardEmitter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a different thing.

Copy link
Contributor

@s-barannikov s-barannikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work with dependency file generation (-d option)?

@@ -1,4 +1,4 @@
// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o /dev/null 2>&1 | FileCheck %s
// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o - 2>&1 >/dev/null | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand, -o /dev/null is no longer useful because it will try to write to files like /dev/nullEnums.inc, right? Does this patch need any documentation updates for the tool and its command line options?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But -o - doesn't try to write files like -Enums.inc?
I don't think it is a good idea to abuse -o option this way. It specifies the single output file, not a prefix.
Maybe prohibit it when generating multiple files and add another option specifying the prefix? Or require -o (or its multi-file replacement) be specified multiple times

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-o still specifies the name of the main .inc file.

Updated documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I misunderstood the concept. I thought it would generate several files which are supposed to be included directly and not via the "main" include that includes everything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I should have explained it. The general idea is to make the change as transparent to existing users as possible. They still include the same old .inc file but memory usage goes down anyway.

Comment on lines +1881 to +1885
{{"Enums.inc", Enums},
{"MCDesc.inc", MCDesc},
{"Header.inc", Header},
{"TargetDesc.inc", TargetDesc}}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some way to apply "don't repeat yourself" here? These strings are already mentioned earlier in this file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that, but thought that even just giving them names would look rather silly:

const StringLiteral EnumsIncludeSuffix = "Enums.inc";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we could have a single static table somewhere with entries something like:

  { "Header.inc", "GET_REGINFO_HEADER", &runTargetHeader }

and use it both to drive the code that calls each of the run* functions in turn, and to create this TableGenOutputFiles struct.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me whether the extra complexity and rigidness is worth it. If implemented, I'd like to assess it as a separate change.

@@ -1,4 +1,4 @@
// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o /dev/null 2>&1 | FileCheck %s
// RUN: llvm-tblgen -gen-register-info -register-info-debug -I %p/../../include %s -o - 2>&1 >/dev/null | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand, -o /dev/null is no longer useful because it will try to write to files like /dev/nullEnums.inc, right? Does this patch need any documentation updates for the tool and its command line options?

@kosarev
Copy link
Collaborator Author

kosarev commented Nov 13, 2025

Does it work with dependency file generation (-d option)?

Yes. Every main output file remains where it was, with dependecies still pointing to it.

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea is to make the change as transparent to existing users as possible. They still include the same old .inc file but memory usage goes down anyway.

That is a very nice property. LGTM but please wait another day for any more comments.

const RecordKeeper &Records) {
std::string S;
raw_string_ostream OS(S);
auto Res = MainFn(OS, Records);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto Res = MainFn(OS, Records);
int Res = MainFn(OS, Records);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Reduces memory usage compiling backend sources, most notably for
AMDGPU by ~98 MB per source on average.

AMDGPUGenRegisterInfo.inc is tens of megabytes in size now, and
is even larger downstream. At the same time, it is included in
nearly all backend sources, typically just for a small portion of
its content, resulting in compilation being unnecessarily
memory-hungry, which in turn stresses buildbots and wastes their
resources.

Splitting .inc files also helps avoiding extra ccache misses
where changes in .td files don't cause changes in all parts of
what previously was a single .inc file.

It is thought that rather than building on top of the current
single-output-file design of TableGen, e.g., using `split-file`,
it would be more preferable to recognise the need for multi-file
outputs and give it a proper first-class support directly in
TableGen.
@kosarev kosarev force-pushed the users/kosarev/split-registerinfo-inc branch from f1caa96 to ad20628 Compare November 14, 2025 10:21
@jayfoad
Copy link
Contributor

jayfoad commented Nov 14, 2025

I was pondering the implications of this PR and I came up with:

$ ninja llc # works
$ rm lib/Target/AMDGPU/AMDGPUGenRegisterInfo.inc && ninja llc # works
$ rm lib/Target/AMDGPU/AMDGPUGenRegisterInfoEnums.inc && ninja llc # fails!

The problem is that ninja does not know how to rebuild AMDGPUGenRegisterInfoEnums.inc.

I am not even sure if this is a real problem. Is the build system supposed to be able to cope with random files disappearing randomly?

@kosarev
Copy link
Collaborator Author

kosarev commented Nov 14, 2025

Worst case, it just won't build. We'll know if that's a practical problem.

@kosarev
Copy link
Collaborator Author

kosarev commented Nov 14, 2025

@jayfoad Jay, do you mind merging this as-is?

@jayfoad
Copy link
Contributor

jayfoad commented Nov 14, 2025

do you mind merging this as-is?

That's fine with me.

keith added a commit to keith/llvm-project that referenced this pull request Nov 15, 2025
keith added a commit to keith/llvm-project that referenced this pull request Nov 15, 2025
@nico
Copy link
Contributor

nico commented Nov 15, 2025

This generates lines like

#include "gen/llvm/lib/Target/X86/MCTargetDesc/X86GenRegisterInfoEnums.inc"

WDYT of generating

#include "X86GenRegisterInfoEnums.inc"

instead? Clients include the main inc header as #include "X86GenRegisterInfo.inc", so they need "gen/llvm/lib/Target/X86/MCTargetDesc/" on the include path already. Before this change, they didn't need the root of the build dir on the include search list.

@chapuni
Copy link
Contributor

chapuni commented Nov 15, 2025

Does this introduce an implicit byproduct? I am not a fan of it.

I suggest we could split the rule into "enums" and "bodies" with two actions rather than byproduct.

@kosarev kosarev restored the users/kosarev/split-registerinfo-inc branch November 17, 2025 11:39
kosarev added a commit that referenced this pull request Nov 17, 2025
Fixes #167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.
@kosarev kosarev deleted the users/kosarev/split-registerinfo-inc branch November 17, 2025 11:44
kosarev added a commit that referenced this pull request Nov 17, 2025
Fixes #167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.
kosarev added a commit that referenced this pull request Nov 17, 2025
Fixes #167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.
@kosarev
Copy link
Collaborator Author

kosarev commented Nov 17, 2025

This generates lines like

#include "gen/llvm/lib/Target/X86/MCTargetDesc/X86GenRegisterInfoEnums.inc"

Interesting, what kind of build is that? I don't see full paths be passed to TableGen in my Linux builds, but I see #168158 suffers from the same issue.

Does #168355 fix this?

I suggest we could split the rule into "enums" and "bodies" with two actions rather than byproduct.

That'd be the best to do, but generating AMDGPUGenRegisterInfo.inc takes a lot of time, so is problematic in practice.

@kosarev
Copy link
Collaborator Author

kosarev commented Nov 17, 2025

Worst case, it just won't build. We'll know if that's a practical problem.

That does seem like an issue for robustness.

Will look into it. I understand there is a way to specify multiple outputs with CMake.

It also occurred to me... if we don't overwrite the main .inc file because it doesn't change, do we end up re-running TableGen all the time now?

Why 'now'? This patch doesn't change anything in this regard.

From what I see, where possible we ask TableGen to --write-if-changed and then instruct Ninja to restat.

kosarev added a commit that referenced this pull request Nov 17, 2025
This tells the build system to check and regenerate the
*GenRegisterInfo*.inc files, should any of them be missing for
whatever reason.

A follow-up from
<#167700>.
@kosarev
Copy link
Collaborator Author

kosarev commented Nov 17, 2025

$ ninja llc # works
$ rm lib/Target/AMDGPU/AMDGPUGenRegisterInfo.inc && ninja llc # works
$ rm lib/Target/AMDGPU/AMDGPUGenRegisterInfoEnums.inc && ninja llc # fails!

Addressed in #168405.

keith added a commit to keith/llvm-project that referenced this pull request Nov 17, 2025
Fixes llvm#167700 to not rely on
the full output path of the file to be part of the include path later.
keith added a commit to keith/llvm-project that referenced this pull request Nov 17, 2025
Fixes llvm#167700 to not rely on
the full output path of the file to be part of the include path later.
kosarev added a commit that referenced this pull request Nov 17, 2025
Fixes #167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.
kosarev added a commit that referenced this pull request Nov 17, 2025
Fixes #167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.
keith added a commit that referenced this pull request Nov 17, 2025
Required after #167700

This adds yet another format for `tbl_outs` where you pass the list of
opts, and a list of outputs (where previously you could only have 1
output). In that case all outputs must be produced, but the first is
used for the `-o` arg since tblgen is generating the other names based
on that single argument.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 17, 2025
…355)

Fixes llvm/llvm-project#167700 to support
builds where TableGen's output file is specified as full path
rather than just filename.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 17, 2025
Required after llvm/llvm-project#167700

This adds yet another format for `tbl_outs` where you pass the list of
opts, and a list of outputs (where previously you could only have 1
output). In that case all outputs must be produced, but the first is
used for the `-o` arg since tblgen is generating the other names based
on that single argument.
kosarev added a commit that referenced this pull request Nov 18, 2025
This tells the build system to check and regenerate the
*GenRegisterInfo*.inc files, should any of them be missing for
whatever reason.

A follow-up from
<#167700>.
kosarev added a commit that referenced this pull request Nov 18, 2025
This tells the build system to check and regenerate the
*GenRegisterInfo*.inc files, should any of them be missing for
whatever reason.

A follow-up from
<#167700>.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 18, 2025
…ts. (#168405)

This tells the build system to check and regenerate the
*GenRegisterInfo*.inc files, should any of them be missing for
whatever reason.

A follow-up from
<llvm/llvm-project#167700>.
@danilaml
Copy link
Collaborator

Looks like after this PR the newly generated .inc files are not getting installed with the original one so this breaks consuming LLVM as a library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.