Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option #88024

Merged
merged 24 commits into from
May 20, 2024

Conversation

minglotus-6
Copy link
Contributor

@minglotus-6 minglotus-6 commented Apr 8, 2024

The goal is to populate declaration import status if a new flag -import-declaration is on.

  • For in-process ThinLTO, the declaration status is visible to backend function-import pass, so FunctionImporter::importFunctions should read the import status and be no-op for declaration summaries. Basically, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle declaration summaries. Two use cases (better call-graph sort or cross-module auto-init) would use this bit differently.

  • For distributed ThinLTO, the declaration status is not serialized to bitcode. As discussed, [ThinLTO][Bitcode] Generate import type in bitcode #87600 will do this.

@minglotus-6 minglotus-6 marked this pull request as ready for review May 6, 2024 23:08
@llvmbot llvmbot added LTO Link time optimization (regular/full LTO or ThinLTO) llvm:ir llvm:transforms labels May 6, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented May 6, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-ir

Author: Mingming Liu (minglotus-6)

Changes

The goal is to generate declaration import status in the per-module combined summaries, when the function's definition is not imported.

Implementation notes:

  • ThinLink
    1. Adds -import-declaration (default false). selectCallee stores the summary corresponding to {noinline,too-large} callees as output parameter, and the result is used to populate {Import,Export}List only if import-declaration is true.
    2. When function foo of a module needs the declaration of callee while function bar in the same module needs the definition of callee, definition will take precedence over declaration.
  • Postlink
    • With this patch, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle declaration summaries. Two use cases (better call-graph sort or cross-module auto-init) would use this bit differently.

Patch is 43.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88024.diff

12 Files Affected:

  • (modified) llvm/include/llvm/IR/ModuleSummaryIndex.h (+5-1)
  • (modified) llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h (+2-1)
  • (modified) llvm/include/llvm/Transforms/IPO/FunctionImport.h (+15-6)
  • (modified) llvm/lib/LTO/LTO.cpp (+11-7)
  • (modified) llvm/lib/LTO/LTOBackend.cpp (+8-1)
  • (modified) llvm/lib/LTO/ThinLTOCodeGenerator.cpp (+10-3)
  • (modified) llvm/lib/Transforms/IPO/FunctionImport.cpp (+207-70)
  • (modified) llvm/test/ThinLTO/X86/funcimport-stats.ll (+2-2)
  • (added) llvm/test/ThinLTO/X86/import_callee_declaration.ll (+165)
  • (modified) llvm/test/Transforms/FunctionImport/funcimport.ll (+6-2)
  • (modified) llvm/tools/llvm-link/llvm-link.cpp (+5-1)
  • (modified) llvm/tools/llvm-lto/llvm-lto.cpp (+5-2)
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 66209b8cf3ea2d..29d13cf4f4fae6 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -585,7 +585,11 @@ class GlobalValueSummary {
     return Flags.ImportType == GlobalValueSummary::ImportKind::Declaration;
   }
 
-  void setImportKind(ImportKind IK) { Flags.ImportType = IK; }
+  GlobalValueSummary::ImportKind importType() const {
+    return static_cast<ImportKind>(Flags.ImportType);
+  }
+
+  void setImportType(ImportKind IK) { Flags.ImportType = IK; }
 
   GlobalValue::VisibilityTypes getVisibility() const {
     return (GlobalValue::VisibilityTypes)Flags.Visibility;
diff --git a/llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h b/llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h
index c450acda82ad06..6e197ae2f279c9 100644
--- a/llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h
+++ b/llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h
@@ -276,7 +276,8 @@ class ThinLTOCodeGenerator {
   void gatherImportedSummariesForModule(
       Module &Module, ModuleSummaryIndex &Index,
       std::map<std::string, GVSummaryMapTy> &ModuleToSummariesForIndex,
-      const lto::InputFile &File);
+      const lto::InputFile &File,
+      ModuleToGVSummaryPtrSet &ModuleToDecSummaries);
 
   /**
    * Perform internalization. Index is updated to reflect linkage changes.
diff --git a/llvm/include/llvm/Transforms/IPO/FunctionImport.h b/llvm/include/llvm/Transforms/IPO/FunctionImport.h
index c4d19e8641eca2..fb38c68b56df0e 100644
--- a/llvm/include/llvm/Transforms/IPO/FunctionImport.h
+++ b/llvm/include/llvm/Transforms/IPO/FunctionImport.h
@@ -31,9 +31,9 @@ class Module;
 /// based on the provided summary informations.
 class FunctionImporter {
 public:
-  /// Set of functions to import from a source module. Each entry is a set
-  /// containing all the GUIDs of all functions to import for a source module.
-  using FunctionsToImportTy = std::unordered_set<GlobalValue::GUID>;
+  /// The functions to import from a source module and their import type.
+  using FunctionsToImportTy =
+      DenseMap<GlobalValue::GUID, GlobalValueSummary::ImportKind>;
 
   /// The different reasons selectCallee will chose not to import a
   /// candidate.
@@ -99,8 +99,12 @@ class FunctionImporter {
   /// index's module path string table).
   using ImportMapTy = DenseMap<StringRef, FunctionsToImportTy>;
 
-  /// The set contains an entry for every global value the module exports.
-  using ExportSetTy = DenseSet<ValueInfo>;
+  /// The map contains an entry for every global value the module exports.
+  /// The key is ValueInfo, and the value indicates whether the definition
+  /// or declaration is visible to another module. If a function's definition is
+  /// visible to other modules, the global values this function referenced are
+  /// visible and shouldn't be internalized.
+  using ExportSetTy = DenseMap<ValueInfo, GlobalValueSummary::ImportKind>;
 
   /// A function of this type is used to load modules referenced by the index.
   using ModuleLoaderTy =
@@ -207,11 +211,16 @@ bool convertToDeclaration(GlobalValue &GV);
 /// \p ModuleToSummariesForIndex will be populated with the needed summaries
 /// from each required module path. Use a std::map instead of StringMap to get
 /// stable order for bitcode emission.
+///
+/// \p ModuleToDecSummaries will be populated with the set of declarations \p
+/// ModulePath need from other modules. They key is module path, and the value
+/// is a set of summary pointers.
 void gatherImportedSummariesForModule(
     StringRef ModulePath,
     const DenseMap<StringRef, GVSummaryMapTy> &ModuleToDefinedGVSummaries,
     const FunctionImporter::ImportMapTy &ImportList,
-    std::map<std::string, GVSummaryMapTy> &ModuleToSummariesForIndex);
+    std::map<std::string, GVSummaryMapTy> &ModuleToSummariesForIndex,
+    ModuleToGVSummaryPtrSet &ModuleToDecSummaries);
 
 /// Emit into \p OutputFilename the files module \p ModulePath will import from.
 std::error_code EmitImportsFiles(
diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index 53060df7f503e0..e7e5a6ab392774 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -158,7 +158,7 @@ void llvm::computeLTOCacheKey(
 
   std::vector<uint64_t> ExportsGUID;
   ExportsGUID.reserve(ExportList.size());
-  for (const auto &VI : ExportList) {
+  for (const auto &[VI, UnusedImportType] : ExportList) {
     auto GUID = VI.getGUID();
     ExportsGUID.push_back(GUID);
   }
@@ -204,8 +204,8 @@ void llvm::computeLTOCacheKey(
     Hasher.update(ArrayRef<uint8_t>((uint8_t *)&ModHash[0], sizeof(ModHash)));
 
     AddUint64(Entry.getFunctions().size());
-    for (auto &Fn : Entry.getFunctions())
-      AddUint64(Fn);
+    for (auto &[GUID, UnusedImportType] : Entry.getFunctions())
+      AddUint64(GUID);
   }
 
   // Include the hash for the resolved ODR.
@@ -275,9 +275,9 @@ void llvm::computeLTOCacheKey(
   // Imported functions may introduce new uses of type identifier resolutions,
   // so we need to collect their used resolutions as well.
   for (const ImportModule &ImpM : ImportModulesVector)
-    for (auto &ImpF : ImpM.getFunctions()) {
+    for (auto &[GUID, UnusedImportType] : ImpM.getFunctions()) {
       GlobalValueSummary *S =
-          Index.findSummaryInModule(ImpF, ImpM.getIdentifier());
+          Index.findSummaryInModule(GUID, ImpM.getIdentifier());
       AddUsedThings(S);
       // If this is an alias, we also care about any types/etc. that the aliasee
       // may reference.
@@ -1389,15 +1389,19 @@ class lto::ThinBackendProc {
                   llvm::StringRef ModulePath,
                   const std::string &NewModulePath) {
     std::map<std::string, GVSummaryMapTy> ModuleToSummariesForIndex;
+    ModuleToGVSummaryPtrSet ModuleToDeclarationSummaries;
     std::error_code EC;
     gatherImportedSummariesForModule(ModulePath, ModuleToDefinedGVSummaries,
-                                     ImportList, ModuleToSummariesForIndex);
+                                     ImportList, ModuleToSummariesForIndex,
+                                     ModuleToDeclarationSummaries);
 
     raw_fd_ostream OS(NewModulePath + ".thinlto.bc", EC,
                       sys::fs::OpenFlags::OF_None);
     if (EC)
       return errorCodeToError(EC);
-    writeIndexToFile(CombinedIndex, OS, &ModuleToSummariesForIndex);
+
+    writeIndexToFile(CombinedIndex, OS, &ModuleToSummariesForIndex,
+                     &ModuleToDeclarationSummaries);
 
     if (ShouldEmitImportsFiles) {
       EC = EmitImportsFiles(ModulePath, NewModulePath + ".imports",
diff --git a/llvm/lib/LTO/LTOBackend.cpp b/llvm/lib/LTO/LTOBackend.cpp
index 71e8849dc3cc91..a2cd672e0f0a18 100644
--- a/llvm/lib/LTO/LTOBackend.cpp
+++ b/llvm/lib/LTO/LTOBackend.cpp
@@ -716,7 +716,14 @@ bool lto::initImportList(const Module &M,
       if (Summary->modulePath() == M.getModuleIdentifier())
         continue;
       // Add an entry to provoke importing by thinBackend.
-      ImportList[Summary->modulePath()].insert(GUID);
+      // Try emplace the entry first. If an entry with the same key already
+      // exists, set the value to 'std::min(existing-value, new-value)' to make
+      // sure a definition takes precedence over a declaration.
+      auto [Iter, Inserted] = ImportList[Summary->modulePath()].try_emplace(
+          GUID, Summary->importType());
+
+      if (!Inserted)
+        Iter->second = std::min(Iter->second, Summary->importType());
     }
   }
   return true;
diff --git a/llvm/lib/LTO/ThinLTOCodeGenerator.cpp b/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
index 8f517eb50dc76f..1227f26cc42805 100644
--- a/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
+++ b/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
@@ -766,7 +766,7 @@ void ThinLTOCodeGenerator::crossModuleImport(Module &TheModule,
 void ThinLTOCodeGenerator::gatherImportedSummariesForModule(
     Module &TheModule, ModuleSummaryIndex &Index,
     std::map<std::string, GVSummaryMapTy> &ModuleToSummariesForIndex,
-    const lto::InputFile &File) {
+    const lto::InputFile &File, ModuleToGVSummaryPtrSet &ModuleToDecSummaries) {
   auto ModuleCount = Index.modulePaths().size();
   auto ModuleIdentifier = TheModule.getModuleIdentifier();
 
@@ -796,7 +796,8 @@ void ThinLTOCodeGenerator::gatherImportedSummariesForModule(
 
   llvm::gatherImportedSummariesForModule(
       ModuleIdentifier, ModuleToDefinedGVSummaries,
-      ImportLists[ModuleIdentifier], ModuleToSummariesForIndex);
+      ImportLists[ModuleIdentifier], ModuleToSummariesForIndex,
+      ModuleToDecSummaries);
 }
 
 /**
@@ -833,9 +834,15 @@ void ThinLTOCodeGenerator::emitImports(Module &TheModule, StringRef OutputName,
                            ExportLists);
 
   std::map<std::string, GVSummaryMapTy> ModuleToSummariesForIndex;
+  // 'EmitImportsFiles' emits the list of modules from which to import from, and
+  // the set of keys in `ModuleToSummariesForIndex` should be a superset of keys
+  // in `ModuleToDecSummaries`, so no need to use `ModuleToDecSummaries` in
+  // `EmitImportFiles`.
+  ModuleToGVSummaryPtrSet ModuleToDecSummaries;
   llvm::gatherImportedSummariesForModule(
       ModuleIdentifier, ModuleToDefinedGVSummaries,
-      ImportLists[ModuleIdentifier], ModuleToSummariesForIndex);
+      ImportLists[ModuleIdentifier], ModuleToSummariesForIndex,
+      ModuleToDecSummaries);
 
   std::error_code EC;
   if ((EC = EmitImportsFiles(ModuleIdentifier, OutputName,
diff --git a/llvm/lib/Transforms/IPO/FunctionImport.cpp b/llvm/lib/Transforms/IPO/FunctionImport.cpp
index 68f9799616ae6d..77d644dc97ab7d 100644
--- a/llvm/lib/Transforms/IPO/FunctionImport.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionImport.cpp
@@ -140,6 +140,17 @@ static cl::opt<bool>
     ImportAllIndex("import-all-index",
                    cl::desc("Import all external functions in index."));
 
+/// This is a test-only option.
+/// If this option is enabled, the ThinLTO indexing step will import each
+/// function declaration as a fallback. In a real build this may increase ram
+/// usage of the indexing step unnecessarily.
+/// TODO: Implement selective import (based on combined summary analysis) to
+/// ensure the imported function has a use case in the postlink pipeline.
+static cl::opt<bool> ImportDeclaration(
+    "import-declaration", cl::init(false), cl::Hidden,
+    cl::desc("If true, import function declaration as fallback if the function "
+             "definition is not imported."));
+
 /// Pass a workload description file - an example of workload would be the
 /// functions executed to satisfy a RPC request. A workload is defined by a root
 /// function and the list of functions that are (frequently) needed to satisfy
@@ -245,8 +256,10 @@ static auto qualifyCalleeCandidates(
 }
 
 /// Given a list of possible callee implementation for a call site, select one
-/// that fits the \p Threshold. If none are found, the Reason will give the last
-/// reason for the failure (last, in the order of CalleeSummaryList entries).
+/// that fits the \p Threshold for function definition import. If none are
+/// found, the Reason will give the last reason for the failure (last, in the
+/// order of CalleeSummaryList entries). If caller wants to select eligible
+/// summary
 ///
 /// FIXME: select "best" instead of first that fits. But what is "best"?
 /// - The smallest: more likely to be inlined.
@@ -259,24 +272,32 @@ static const GlobalValueSummary *
 selectCallee(const ModuleSummaryIndex &Index,
              ArrayRef<std::unique_ptr<GlobalValueSummary>> CalleeSummaryList,
              unsigned Threshold, StringRef CallerModulePath,
+             const GlobalValueSummary *&TooLargeOrNoInlineSummary,
              FunctionImporter::ImportFailureReason &Reason) {
+  // Records the last summary with reason noinline or too-large.
+  TooLargeOrNoInlineSummary = nullptr;
   auto QualifiedCandidates =
       qualifyCalleeCandidates(Index, CalleeSummaryList, CallerModulePath);
   for (auto QualifiedValue : QualifiedCandidates) {
     Reason = QualifiedValue.first;
+    // Skip a summary if its import is not (proved to be) legal.
     if (Reason != FunctionImporter::ImportFailureReason::None)
       continue;
     auto *Summary =
         cast<FunctionSummary>(QualifiedValue.second->getBaseObject());
 
+    // Don't bother importing the definition if the chance of inlining it is
+    // not high enough (except under `--force-import-all`).
     if ((Summary->instCount() > Threshold) && !Summary->fflags().AlwaysInline &&
         !ForceImportAll) {
+      TooLargeOrNoInlineSummary = Summary;
       Reason = FunctionImporter::ImportFailureReason::TooLarge;
       continue;
     }
 
-    // Don't bother importing if we can't inline it anyway.
+    // Don't bother importing the definition if we can't inline it anyway.
     if (Summary->fflags().NoInline && !ForceImportAll) {
+      TooLargeOrNoInlineSummary = Summary;
       Reason = FunctionImporter::ImportFailureReason::NoInline;
       continue;
     }
@@ -358,17 +379,28 @@ class GlobalsImporter final {
         if (!GVS || !Index.canImportGlobalVar(GVS, /* AnalyzeRefs */ true) ||
             LocalNotInModule(GVS))
           continue;
-        auto ILI = ImportList[RefSummary->modulePath()].insert(VI.getGUID());
+
+        // If there isn't an entry for GUID, insert <GUID, Definition> pair.
+        // Otherwise, definition should take precedence over declaration.
+        auto [Iter, Inserted] =
+            ImportList[RefSummary->modulePath()].try_emplace(
+                VI.getGUID(), GlobalValueSummary::Definition);
         // Only update stat and exports if we haven't already imported this
         // variable.
-        if (!ILI.second)
+        if (!Inserted) {
+          // FIXME: Introduce a wrapper struct around ImportType, and provide
+          // an `updateType` method for better readability, just like how we
+          // update the hotness of a call edge.
+          Iter->second = std::min(GlobalValueSummary::Definition, Iter->second);
           break;
+        }
         NumImportedGlobalVarsThinLink++;
         // Any references made by this variable will be marked exported
         // later, in ComputeCrossModuleImport, after import decisions are
         // complete, which is more efficient than adding them here.
         if (ExportLists)
-          (*ExportLists)[RefSummary->modulePath()].insert(VI);
+          (*ExportLists)[RefSummary->modulePath()][VI] =
+              GlobalValueSummary::Definition;
 
         // If variable is not writeonly we attempt to recursively analyze
         // its references in order to import referenced constants.
@@ -545,10 +577,11 @@ class WorkloadImportsManager : public ModuleImportsManager {
       LLVM_DEBUG(dbgs() << "[Workload][Including]" << VI.name() << " from "
                         << ExportingModule << " : "
                         << Function::getGUID(VI.name()) << "\n");
-      ImportList[ExportingModule].insert(VI.getGUID());
+      ImportList[ExportingModule][VI.getGUID()] =
+          GlobalValueSummary::Definition;
       GVI.onImportingSummary(*GVS);
       if (ExportLists)
-        (*ExportLists)[ExportingModule].insert(VI);
+        (*ExportLists)[ExportingModule][VI] = GlobalValueSummary::Definition;
     }
     LLVM_DEBUG(dbgs() << "[Workload] Done\n");
   }
@@ -769,9 +802,28 @@ static void computeImportForFunction(
       }
 
       FunctionImporter::ImportFailureReason Reason{};
-      CalleeSummary = selectCallee(Index, VI.getSummaryList(), NewThreshold,
-                                   Summary.modulePath(), Reason);
+
+      // `SummaryForDeclImport` is an summary eligible for declaration import.
+      const GlobalValueSummary *SummaryForDeclImport = nullptr;
+      CalleeSummary =
+          selectCallee(Index, VI.getSummaryList(), NewThreshold,
+                       Summary.modulePath(), SummaryForDeclImport, Reason);
       if (!CalleeSummary) {
+        // There isn't a callee for definition import but one for declaration
+        // import.
+        if (ImportDeclaration && SummaryForDeclImport) {
+          StringRef DeclSourceModule = SummaryForDeclImport->modulePath();
+
+          // Since definition takes precedence over declaration for the same VI,
+          // try emplace <VI, declaration> pair without checking insert result.
+          // If insert doesn't happen, there must be an existing entry keyed by
+          // VI.
+          if (ExportLists)
+            (*ExportLists)[DeclSourceModule].try_emplace(
+                VI, GlobalValueSummary::Declaration);
+          ImportList[DeclSourceModule].try_emplace(
+              VI.getGUID(), GlobalValueSummary::Declaration);
+        }
         // Update with new larger threshold if this was a retry (otherwise
         // we would have already inserted with NewThreshold above). Also
         // update failure info if requested.
@@ -791,6 +843,7 @@ static void computeImportForFunction(
           FailureInfo = std::make_unique<FunctionImporter::ImportFailureInfo>(
               VI, Edge.second.getHotness(), Reason, 1);
         }
+
         if (ForceImportAll) {
           std::string Msg = std::string("Failed to import function ") +
                             VI.name().str() + " due to " +
@@ -816,11 +869,15 @@ static void computeImportForFunction(
              "selectCallee() didn't honor the threshold");
 
       auto ExportModulePath = ResolvedCalleeSummary->modulePath();
-      auto ILI = ImportList[ExportModulePath].insert(VI.getGUID());
+
+      // Try emplace the definition entry, and update stats based on insertion
+      // status.
+      auto [Iter, Inserted] = ImportList[ExportModulePath].try_emplace(
+          VI.getGUID(), GlobalValueSummary::Definition);
+
       // We previously decided to import this GUID definition if it was already
       // inserted in the set of imports from the exporting module.
-      bool PreviouslyImported = !ILI.second;
-      if (!PreviouslyImported) {
+      if (Inserted || Iter->second == GlobalValueSummary::Declaration) {
         NumImportedFunctionsThinLink++;
         if (IsHotCallsite)
           NumImportedHotFunctionsThinLink++;
@@ -828,11 +885,14 @@ static void computeImportForFunction(
           NumImportedCriticalFunctionsThinLink++;
       }
 
+      if (Iter->second == GlobalValueSummary::Declaration)
+        Iter->second = GlobalValueSummary::Definition;
+
       // Any calls/references made by this function will be marked exported
       // later, in ComputeCrossModuleImport, after import decisions are
       // complete, which is more efficient than adding them here.
       if (ExportLists)
-        (*ExportLists)[ExportModulePath].insert(VI);
+        (*ExportLists)[ExportModulePath][VI] = GlobalValueSummary::Definition;
     }
 
     auto GetAdjustedThreshold = [](unsigned Threshold, bool IsHotCallsite) {
@@ -939,12 +999,20 @@ static bool isGlobalVarSummary(const ModuleSummaryIndex &Index,
 }
 
 template <class T>
-static unsigned numGlobalVarSummaries(const ModuleSummaryIndex &Index,
-                                      T &Cont) {
+static unsigned numGlobalVarSummaries(const ModuleSummaryIndex &Index, T &Cont,
+                                      unsigned &DefinedGVS,
+                                      unsigned &DefinedFS) {
   unsigned NumGVS = 0;
-  for (auto &V : Cont)
-    if (isGlobalVarSummary(Index, V))
+  DefinedGVS = 0;
+  DefinedFS = 0;
+  for (auto &[GUID, Type] : Cont) {
+    if (isGlobalVarSummary(Index, GUID)) {
+      if (Type == GlobalValueSummary::Definition)
+        ++DefinedGVS;
       ++NumGVS;
+    } else if (Type == GlobalValueSummary::Definition)
+      ++DefinedFS;
+  }
   return NumGVS;
 }
 #endif
@@ -954,13 +1022,12 @@ static bool checkVariableImport(
     const ModuleSummaryIndex &Index,
     DenseMap<StringRef, FunctionImporter::ImportMapTy> &ImportLists,
     DenseMap<StringRef, FunctionImporter::ExportSetTy> &ExportLists) {
-
   DenseSet<GlobalValue::GUID> FlattenedImports;
 
   for (auto &ImportPerModule : ImportLists)
     for (auto &ExportPerModule : ImportPerModule.second)
-      FlattenedImports.insert(ExportPerModule.second.begin(),
-                              ExportPerModule.second.end());
+      for (auto &[GUID, Type] : ExportPerModule.second)
+        FlattenedImports.insert(GUID);
 
   // Checks that all GUIDs of read/writeonly var...
[truncated]

@llvmbot
Copy link
Collaborator

llvmbot commented May 6, 2024

@llvm/pr-subscribers-lto

Author: Mingming Liu (minglotus-6)

Changes

The goal is to generate declaration import status in the per-module combined summaries, when the function's definition is not imported.

Implementation notes:

  • ThinLink
    1. Adds -import-declaration (default false). selectCallee stores the summary corresponding to {noinline,too-large} callees as output parameter, and the result is used to populate {Import,Export}List only if import-declaration is true.
    2. When function foo of a module needs the declaration of callee while function bar in the same module needs the definition of callee, definition will take precedence over declaration.
  • Postlink
    • With this patch, the postlink pipeline is updated to keep its current behavior (import definitions), but not updated to handle declaration summaries. Two use cases (better call-graph sort or cross-module auto-init) would use this bit differently.

Patch is 43.68 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88024.diff

12 Files Affected:

  • (modified) llvm/include/llvm/IR/ModuleSummaryIndex.h (+5-1)
  • (modified) llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h (+2-1)
  • (modified) llvm/include/llvm/Transforms/IPO/FunctionImport.h (+15-6)
  • (modified) llvm/lib/LTO/LTO.cpp (+11-7)
  • (modified) llvm/lib/LTO/LTOBackend.cpp (+8-1)
  • (modified) llvm/lib/LTO/ThinLTOCodeGenerator.cpp (+10-3)
  • (modified) llvm/lib/Transforms/IPO/FunctionImport.cpp (+207-70)
  • (modified) llvm/test/ThinLTO/X86/funcimport-stats.ll (+2-2)
  • (added) llvm/test/ThinLTO/X86/import_callee_declaration.ll (+165)
  • (modified) llvm/test/Transforms/FunctionImport/funcimport.ll (+6-2)
  • (modified) llvm/tools/llvm-link/llvm-link.cpp (+5-1)
  • (modified) llvm/tools/llvm-lto/llvm-lto.cpp (+5-2)
diff --git a/llvm/include/llvm/IR/ModuleSummaryIndex.h b/llvm/include/llvm/IR/ModuleSummaryIndex.h
index 66209b8cf3ea2d..29d13cf4f4fae6 100644
--- a/llvm/include/llvm/IR/ModuleSummaryIndex.h
+++ b/llvm/include/llvm/IR/ModuleSummaryIndex.h
@@ -585,7 +585,11 @@ class GlobalValueSummary {
     return Flags.ImportType == GlobalValueSummary::ImportKind::Declaration;
   }
 
-  void setImportKind(ImportKind IK) { Flags.ImportType = IK; }
+  GlobalValueSummary::ImportKind importType() const {
+    return static_cast<ImportKind>(Flags.ImportType);
+  }
+
+  void setImportType(ImportKind IK) { Flags.ImportType = IK; }
 
   GlobalValue::VisibilityTypes getVisibility() const {
     return (GlobalValue::VisibilityTypes)Flags.Visibility;
diff --git a/llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h b/llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h
index c450acda82ad06..6e197ae2f279c9 100644
--- a/llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h
+++ b/llvm/include/llvm/LTO/legacy/ThinLTOCodeGenerator.h
@@ -276,7 +276,8 @@ class ThinLTOCodeGenerator {
   void gatherImportedSummariesForModule(
       Module &Module, ModuleSummaryIndex &Index,
       std::map<std::string, GVSummaryMapTy> &ModuleToSummariesForIndex,
-      const lto::InputFile &File);
+      const lto::InputFile &File,
+      ModuleToGVSummaryPtrSet &ModuleToDecSummaries);
 
   /**
    * Perform internalization. Index is updated to reflect linkage changes.
diff --git a/llvm/include/llvm/Transforms/IPO/FunctionImport.h b/llvm/include/llvm/Transforms/IPO/FunctionImport.h
index c4d19e8641eca2..fb38c68b56df0e 100644
--- a/llvm/include/llvm/Transforms/IPO/FunctionImport.h
+++ b/llvm/include/llvm/Transforms/IPO/FunctionImport.h
@@ -31,9 +31,9 @@ class Module;
 /// based on the provided summary informations.
 class FunctionImporter {
 public:
-  /// Set of functions to import from a source module. Each entry is a set
-  /// containing all the GUIDs of all functions to import for a source module.
-  using FunctionsToImportTy = std::unordered_set<GlobalValue::GUID>;
+  /// The functions to import from a source module and their import type.
+  using FunctionsToImportTy =
+      DenseMap<GlobalValue::GUID, GlobalValueSummary::ImportKind>;
 
   /// The different reasons selectCallee will chose not to import a
   /// candidate.
@@ -99,8 +99,12 @@ class FunctionImporter {
   /// index's module path string table).
   using ImportMapTy = DenseMap<StringRef, FunctionsToImportTy>;
 
-  /// The set contains an entry for every global value the module exports.
-  using ExportSetTy = DenseSet<ValueInfo>;
+  /// The map contains an entry for every global value the module exports.
+  /// The key is ValueInfo, and the value indicates whether the definition
+  /// or declaration is visible to another module. If a function's definition is
+  /// visible to other modules, the global values this function referenced are
+  /// visible and shouldn't be internalized.
+  using ExportSetTy = DenseMap<ValueInfo, GlobalValueSummary::ImportKind>;
 
   /// A function of this type is used to load modules referenced by the index.
   using ModuleLoaderTy =
@@ -207,11 +211,16 @@ bool convertToDeclaration(GlobalValue &GV);
 /// \p ModuleToSummariesForIndex will be populated with the needed summaries
 /// from each required module path. Use a std::map instead of StringMap to get
 /// stable order for bitcode emission.
+///
+/// \p ModuleToDecSummaries will be populated with the set of declarations \p
+/// ModulePath need from other modules. They key is module path, and the value
+/// is a set of summary pointers.
 void gatherImportedSummariesForModule(
     StringRef ModulePath,
     const DenseMap<StringRef, GVSummaryMapTy> &ModuleToDefinedGVSummaries,
     const FunctionImporter::ImportMapTy &ImportList,
-    std::map<std::string, GVSummaryMapTy> &ModuleToSummariesForIndex);
+    std::map<std::string, GVSummaryMapTy> &ModuleToSummariesForIndex,
+    ModuleToGVSummaryPtrSet &ModuleToDecSummaries);
 
 /// Emit into \p OutputFilename the files module \p ModulePath will import from.
 std::error_code EmitImportsFiles(
diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index 53060df7f503e0..e7e5a6ab392774 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -158,7 +158,7 @@ void llvm::computeLTOCacheKey(
 
   std::vector<uint64_t> ExportsGUID;
   ExportsGUID.reserve(ExportList.size());
-  for (const auto &VI : ExportList) {
+  for (const auto &[VI, UnusedImportType] : ExportList) {
     auto GUID = VI.getGUID();
     ExportsGUID.push_back(GUID);
   }
@@ -204,8 +204,8 @@ void llvm::computeLTOCacheKey(
     Hasher.update(ArrayRef<uint8_t>((uint8_t *)&ModHash[0], sizeof(ModHash)));
 
     AddUint64(Entry.getFunctions().size());
-    for (auto &Fn : Entry.getFunctions())
-      AddUint64(Fn);
+    for (auto &[GUID, UnusedImportType] : Entry.getFunctions())
+      AddUint64(GUID);
   }
 
   // Include the hash for the resolved ODR.
@@ -275,9 +275,9 @@ void llvm::computeLTOCacheKey(
   // Imported functions may introduce new uses of type identifier resolutions,
   // so we need to collect their used resolutions as well.
   for (const ImportModule &ImpM : ImportModulesVector)
-    for (auto &ImpF : ImpM.getFunctions()) {
+    for (auto &[GUID, UnusedImportType] : ImpM.getFunctions()) {
       GlobalValueSummary *S =
-          Index.findSummaryInModule(ImpF, ImpM.getIdentifier());
+          Index.findSummaryInModule(GUID, ImpM.getIdentifier());
       AddUsedThings(S);
       // If this is an alias, we also care about any types/etc. that the aliasee
       // may reference.
@@ -1389,15 +1389,19 @@ class lto::ThinBackendProc {
                   llvm::StringRef ModulePath,
                   const std::string &NewModulePath) {
     std::map<std::string, GVSummaryMapTy> ModuleToSummariesForIndex;
+    ModuleToGVSummaryPtrSet ModuleToDeclarationSummaries;
     std::error_code EC;
     gatherImportedSummariesForModule(ModulePath, ModuleToDefinedGVSummaries,
-                                     ImportList, ModuleToSummariesForIndex);
+                                     ImportList, ModuleToSummariesForIndex,
+                                     ModuleToDeclarationSummaries);
 
     raw_fd_ostream OS(NewModulePath + ".thinlto.bc", EC,
                       sys::fs::OpenFlags::OF_None);
     if (EC)
       return errorCodeToError(EC);
-    writeIndexToFile(CombinedIndex, OS, &ModuleToSummariesForIndex);
+
+    writeIndexToFile(CombinedIndex, OS, &ModuleToSummariesForIndex,
+                     &ModuleToDeclarationSummaries);
 
     if (ShouldEmitImportsFiles) {
       EC = EmitImportsFiles(ModulePath, NewModulePath + ".imports",
diff --git a/llvm/lib/LTO/LTOBackend.cpp b/llvm/lib/LTO/LTOBackend.cpp
index 71e8849dc3cc91..a2cd672e0f0a18 100644
--- a/llvm/lib/LTO/LTOBackend.cpp
+++ b/llvm/lib/LTO/LTOBackend.cpp
@@ -716,7 +716,14 @@ bool lto::initImportList(const Module &M,
       if (Summary->modulePath() == M.getModuleIdentifier())
         continue;
       // Add an entry to provoke importing by thinBackend.
-      ImportList[Summary->modulePath()].insert(GUID);
+      // Try emplace the entry first. If an entry with the same key already
+      // exists, set the value to 'std::min(existing-value, new-value)' to make
+      // sure a definition takes precedence over a declaration.
+      auto [Iter, Inserted] = ImportList[Summary->modulePath()].try_emplace(
+          GUID, Summary->importType());
+
+      if (!Inserted)
+        Iter->second = std::min(Iter->second, Summary->importType());
     }
   }
   return true;
diff --git a/llvm/lib/LTO/ThinLTOCodeGenerator.cpp b/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
index 8f517eb50dc76f..1227f26cc42805 100644
--- a/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
+++ b/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
@@ -766,7 +766,7 @@ void ThinLTOCodeGenerator::crossModuleImport(Module &TheModule,
 void ThinLTOCodeGenerator::gatherImportedSummariesForModule(
     Module &TheModule, ModuleSummaryIndex &Index,
     std::map<std::string, GVSummaryMapTy> &ModuleToSummariesForIndex,
-    const lto::InputFile &File) {
+    const lto::InputFile &File, ModuleToGVSummaryPtrSet &ModuleToDecSummaries) {
   auto ModuleCount = Index.modulePaths().size();
   auto ModuleIdentifier = TheModule.getModuleIdentifier();
 
@@ -796,7 +796,8 @@ void ThinLTOCodeGenerator::gatherImportedSummariesForModule(
 
   llvm::gatherImportedSummariesForModule(
       ModuleIdentifier, ModuleToDefinedGVSummaries,
-      ImportLists[ModuleIdentifier], ModuleToSummariesForIndex);
+      ImportLists[ModuleIdentifier], ModuleToSummariesForIndex,
+      ModuleToDecSummaries);
 }
 
 /**
@@ -833,9 +834,15 @@ void ThinLTOCodeGenerator::emitImports(Module &TheModule, StringRef OutputName,
                            ExportLists);
 
   std::map<std::string, GVSummaryMapTy> ModuleToSummariesForIndex;
+  // 'EmitImportsFiles' emits the list of modules from which to import from, and
+  // the set of keys in `ModuleToSummariesForIndex` should be a superset of keys
+  // in `ModuleToDecSummaries`, so no need to use `ModuleToDecSummaries` in
+  // `EmitImportFiles`.
+  ModuleToGVSummaryPtrSet ModuleToDecSummaries;
   llvm::gatherImportedSummariesForModule(
       ModuleIdentifier, ModuleToDefinedGVSummaries,
-      ImportLists[ModuleIdentifier], ModuleToSummariesForIndex);
+      ImportLists[ModuleIdentifier], ModuleToSummariesForIndex,
+      ModuleToDecSummaries);
 
   std::error_code EC;
   if ((EC = EmitImportsFiles(ModuleIdentifier, OutputName,
diff --git a/llvm/lib/Transforms/IPO/FunctionImport.cpp b/llvm/lib/Transforms/IPO/FunctionImport.cpp
index 68f9799616ae6d..77d644dc97ab7d 100644
--- a/llvm/lib/Transforms/IPO/FunctionImport.cpp
+++ b/llvm/lib/Transforms/IPO/FunctionImport.cpp
@@ -140,6 +140,17 @@ static cl::opt<bool>
     ImportAllIndex("import-all-index",
                    cl::desc("Import all external functions in index."));
 
+/// This is a test-only option.
+/// If this option is enabled, the ThinLTO indexing step will import each
+/// function declaration as a fallback. In a real build this may increase ram
+/// usage of the indexing step unnecessarily.
+/// TODO: Implement selective import (based on combined summary analysis) to
+/// ensure the imported function has a use case in the postlink pipeline.
+static cl::opt<bool> ImportDeclaration(
+    "import-declaration", cl::init(false), cl::Hidden,
+    cl::desc("If true, import function declaration as fallback if the function "
+             "definition is not imported."));
+
 /// Pass a workload description file - an example of workload would be the
 /// functions executed to satisfy a RPC request. A workload is defined by a root
 /// function and the list of functions that are (frequently) needed to satisfy
@@ -245,8 +256,10 @@ static auto qualifyCalleeCandidates(
 }
 
 /// Given a list of possible callee implementation for a call site, select one
-/// that fits the \p Threshold. If none are found, the Reason will give the last
-/// reason for the failure (last, in the order of CalleeSummaryList entries).
+/// that fits the \p Threshold for function definition import. If none are
+/// found, the Reason will give the last reason for the failure (last, in the
+/// order of CalleeSummaryList entries). If caller wants to select eligible
+/// summary
 ///
 /// FIXME: select "best" instead of first that fits. But what is "best"?
 /// - The smallest: more likely to be inlined.
@@ -259,24 +272,32 @@ static const GlobalValueSummary *
 selectCallee(const ModuleSummaryIndex &Index,
              ArrayRef<std::unique_ptr<GlobalValueSummary>> CalleeSummaryList,
              unsigned Threshold, StringRef CallerModulePath,
+             const GlobalValueSummary *&TooLargeOrNoInlineSummary,
              FunctionImporter::ImportFailureReason &Reason) {
+  // Records the last summary with reason noinline or too-large.
+  TooLargeOrNoInlineSummary = nullptr;
   auto QualifiedCandidates =
       qualifyCalleeCandidates(Index, CalleeSummaryList, CallerModulePath);
   for (auto QualifiedValue : QualifiedCandidates) {
     Reason = QualifiedValue.first;
+    // Skip a summary if its import is not (proved to be) legal.
     if (Reason != FunctionImporter::ImportFailureReason::None)
       continue;
     auto *Summary =
         cast<FunctionSummary>(QualifiedValue.second->getBaseObject());
 
+    // Don't bother importing the definition if the chance of inlining it is
+    // not high enough (except under `--force-import-all`).
     if ((Summary->instCount() > Threshold) && !Summary->fflags().AlwaysInline &&
         !ForceImportAll) {
+      TooLargeOrNoInlineSummary = Summary;
       Reason = FunctionImporter::ImportFailureReason::TooLarge;
       continue;
     }
 
-    // Don't bother importing if we can't inline it anyway.
+    // Don't bother importing the definition if we can't inline it anyway.
     if (Summary->fflags().NoInline && !ForceImportAll) {
+      TooLargeOrNoInlineSummary = Summary;
       Reason = FunctionImporter::ImportFailureReason::NoInline;
       continue;
     }
@@ -358,17 +379,28 @@ class GlobalsImporter final {
         if (!GVS || !Index.canImportGlobalVar(GVS, /* AnalyzeRefs */ true) ||
             LocalNotInModule(GVS))
           continue;
-        auto ILI = ImportList[RefSummary->modulePath()].insert(VI.getGUID());
+
+        // If there isn't an entry for GUID, insert <GUID, Definition> pair.
+        // Otherwise, definition should take precedence over declaration.
+        auto [Iter, Inserted] =
+            ImportList[RefSummary->modulePath()].try_emplace(
+                VI.getGUID(), GlobalValueSummary::Definition);
         // Only update stat and exports if we haven't already imported this
         // variable.
-        if (!ILI.second)
+        if (!Inserted) {
+          // FIXME: Introduce a wrapper struct around ImportType, and provide
+          // an `updateType` method for better readability, just like how we
+          // update the hotness of a call edge.
+          Iter->second = std::min(GlobalValueSummary::Definition, Iter->second);
           break;
+        }
         NumImportedGlobalVarsThinLink++;
         // Any references made by this variable will be marked exported
         // later, in ComputeCrossModuleImport, after import decisions are
         // complete, which is more efficient than adding them here.
         if (ExportLists)
-          (*ExportLists)[RefSummary->modulePath()].insert(VI);
+          (*ExportLists)[RefSummary->modulePath()][VI] =
+              GlobalValueSummary::Definition;
 
         // If variable is not writeonly we attempt to recursively analyze
         // its references in order to import referenced constants.
@@ -545,10 +577,11 @@ class WorkloadImportsManager : public ModuleImportsManager {
       LLVM_DEBUG(dbgs() << "[Workload][Including]" << VI.name() << " from "
                         << ExportingModule << " : "
                         << Function::getGUID(VI.name()) << "\n");
-      ImportList[ExportingModule].insert(VI.getGUID());
+      ImportList[ExportingModule][VI.getGUID()] =
+          GlobalValueSummary::Definition;
       GVI.onImportingSummary(*GVS);
       if (ExportLists)
-        (*ExportLists)[ExportingModule].insert(VI);
+        (*ExportLists)[ExportingModule][VI] = GlobalValueSummary::Definition;
     }
     LLVM_DEBUG(dbgs() << "[Workload] Done\n");
   }
@@ -769,9 +802,28 @@ static void computeImportForFunction(
       }
 
       FunctionImporter::ImportFailureReason Reason{};
-      CalleeSummary = selectCallee(Index, VI.getSummaryList(), NewThreshold,
-                                   Summary.modulePath(), Reason);
+
+      // `SummaryForDeclImport` is an summary eligible for declaration import.
+      const GlobalValueSummary *SummaryForDeclImport = nullptr;
+      CalleeSummary =
+          selectCallee(Index, VI.getSummaryList(), NewThreshold,
+                       Summary.modulePath(), SummaryForDeclImport, Reason);
       if (!CalleeSummary) {
+        // There isn't a callee for definition import but one for declaration
+        // import.
+        if (ImportDeclaration && SummaryForDeclImport) {
+          StringRef DeclSourceModule = SummaryForDeclImport->modulePath();
+
+          // Since definition takes precedence over declaration for the same VI,
+          // try emplace <VI, declaration> pair without checking insert result.
+          // If insert doesn't happen, there must be an existing entry keyed by
+          // VI.
+          if (ExportLists)
+            (*ExportLists)[DeclSourceModule].try_emplace(
+                VI, GlobalValueSummary::Declaration);
+          ImportList[DeclSourceModule].try_emplace(
+              VI.getGUID(), GlobalValueSummary::Declaration);
+        }
         // Update with new larger threshold if this was a retry (otherwise
         // we would have already inserted with NewThreshold above). Also
         // update failure info if requested.
@@ -791,6 +843,7 @@ static void computeImportForFunction(
           FailureInfo = std::make_unique<FunctionImporter::ImportFailureInfo>(
               VI, Edge.second.getHotness(), Reason, 1);
         }
+
         if (ForceImportAll) {
           std::string Msg = std::string("Failed to import function ") +
                             VI.name().str() + " due to " +
@@ -816,11 +869,15 @@ static void computeImportForFunction(
              "selectCallee() didn't honor the threshold");
 
       auto ExportModulePath = ResolvedCalleeSummary->modulePath();
-      auto ILI = ImportList[ExportModulePath].insert(VI.getGUID());
+
+      // Try emplace the definition entry, and update stats based on insertion
+      // status.
+      auto [Iter, Inserted] = ImportList[ExportModulePath].try_emplace(
+          VI.getGUID(), GlobalValueSummary::Definition);
+
       // We previously decided to import this GUID definition if it was already
       // inserted in the set of imports from the exporting module.
-      bool PreviouslyImported = !ILI.second;
-      if (!PreviouslyImported) {
+      if (Inserted || Iter->second == GlobalValueSummary::Declaration) {
         NumImportedFunctionsThinLink++;
         if (IsHotCallsite)
           NumImportedHotFunctionsThinLink++;
@@ -828,11 +885,14 @@ static void computeImportForFunction(
           NumImportedCriticalFunctionsThinLink++;
       }
 
+      if (Iter->second == GlobalValueSummary::Declaration)
+        Iter->second = GlobalValueSummary::Definition;
+
       // Any calls/references made by this function will be marked exported
       // later, in ComputeCrossModuleImport, after import decisions are
       // complete, which is more efficient than adding them here.
       if (ExportLists)
-        (*ExportLists)[ExportModulePath].insert(VI);
+        (*ExportLists)[ExportModulePath][VI] = GlobalValueSummary::Definition;
     }
 
     auto GetAdjustedThreshold = [](unsigned Threshold, bool IsHotCallsite) {
@@ -939,12 +999,20 @@ static bool isGlobalVarSummary(const ModuleSummaryIndex &Index,
 }
 
 template <class T>
-static unsigned numGlobalVarSummaries(const ModuleSummaryIndex &Index,
-                                      T &Cont) {
+static unsigned numGlobalVarSummaries(const ModuleSummaryIndex &Index, T &Cont,
+                                      unsigned &DefinedGVS,
+                                      unsigned &DefinedFS) {
   unsigned NumGVS = 0;
-  for (auto &V : Cont)
-    if (isGlobalVarSummary(Index, V))
+  DefinedGVS = 0;
+  DefinedFS = 0;
+  for (auto &[GUID, Type] : Cont) {
+    if (isGlobalVarSummary(Index, GUID)) {
+      if (Type == GlobalValueSummary::Definition)
+        ++DefinedGVS;
       ++NumGVS;
+    } else if (Type == GlobalValueSummary::Definition)
+      ++DefinedFS;
+  }
   return NumGVS;
 }
 #endif
@@ -954,13 +1022,12 @@ static bool checkVariableImport(
     const ModuleSummaryIndex &Index,
     DenseMap<StringRef, FunctionImporter::ImportMapTy> &ImportLists,
     DenseMap<StringRef, FunctionImporter::ExportSetTy> &ExportLists) {
-
   DenseSet<GlobalValue::GUID> FlattenedImports;
 
   for (auto &ImportPerModule : ImportLists)
     for (auto &ExportPerModule : ImportPerModule.second)
-      FlattenedImports.insert(ExportPerModule.second.begin(),
-                              ExportPerModule.second.end());
+      for (auto &[GUID, Type] : ExportPerModule.second)
+        FlattenedImports.insert(GUID);
 
   // Checks that all GUIDs of read/writeonly var...
[truncated]

@minglotus-6
Copy link
Contributor Author

Tagging @jvoung since I cannot add you into reviewer.

@minglotus-6 minglotus-6 changed the title [nfc][ThinLTO] Generate import status in per-module combined summary [ThinLTO] Generate import status in per-module combined summary May 6, 2024
@minglotus-6
Copy link
Contributor Author

#87600 is a functional change and the diffbase of this patch, and llvm/test/ThinLTO/X86/import_callee_declaration.ll should be a test case for both patches.

In the diffbase, bitcode writer takes maps as additional parameters to populate import status, and it's not straightforward to construct regression tests there without this patch. I wonder if I shall introduce cl::list<std::string> in llvm-lto/llvm-lto2 (as a repeated arg) to specify filename:GUID to test the diffbase alone.

@minglotus-6 minglotus-6 requested a review from aeubanks May 6, 2024 23:26
Copy link
Contributor Author

@minglotus-6 minglotus-6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptal, thanks!

@@ -276,7 +276,7 @@ class ThinLTOCodeGenerator {
void gatherImportedSummariesForModule(
Module &Module, ModuleSummaryIndex &Index,
std::map<std::string, GVSummaryMapTy> &ModuleToSummariesForIndex,
const lto::InputFile &File);
const lto::InputFile &File, GVSummaryPtrSet &DecSummaries);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. I'll do this in #87600 mostly because this patch doesn't use DecSummaries.

@@ -207,11 +211,16 @@ bool convertToDeclaration(GlobalValue &GV);
/// \p ModuleToSummariesForIndex will be populated with the needed summaries
/// from each required module path. Use a std::map instead of StringMap to get
/// stable order for bitcode emission.
///
/// \p ModuleToDecSummaries will be populated with the set of declarations \p
/// ModulePath need from other modules. They key is module path, and the value
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the catch. I'll update gatherImportedSummariesForModule in the next change (#87600) mostly because DecSummaries is used by bitcode writer.

@@ -833,9 +833,14 @@ void ThinLTOCodeGenerator::emitImports(Module &TheModule, StringRef OutputName,
ExportLists);

std::map<std::string, GVSummaryMapTy> ModuleToSummariesForIndex;
// 'EmitImportsFiles' emits the list of modules from which to import from, and
// the set of keys in `ModuleToSummariesForIndex` should be a superset of keys
// in `ModuleToDecSummaries`, so no need to use `ModuleToDecSummaries` in
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in the other two comments I'll do this in the next patch.

llvm/lib/Transforms/IPO/FunctionImport.cpp Outdated Show resolved Hide resolved
if (Import) {
auto ImportType = maybeGetImportType(ImportGUIDs, GUID);

if (!ImportType) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 'declarations' into debug message and kept the similar flow as before (no new continue path).

Added IMPORTDUMP: lines to regression test.

<< SrcModule->getSourceFileName() << "\n");
if (Import) {
auto ImportType = maybeGetImportType(ImportGUIDs, GUID);
if (!ImportType)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -158,7 +158,7 @@ void llvm::computeLTOCacheKey(

std::vector<uint64_t> ExportsGUID;
ExportsGUID.reserve(ExportList.size());
for (const auto &VI : ExportList) {
for (const auto &[VI, UnusedImportType] : ExportList) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done for ExportList and ImportList.

@minglotus-6
Copy link
Contributor Author

#87600 is a functional change and the diffbase of this patch, and llvm/test/ThinLTO/X86/import_callee_declaration.ll should be a test case for both patches.
In the diffbase, bitcode writer takes maps as additional parameters to populate import status, and it's not straightforward to construct regression tests there without this patch. I wonder if I shall introduce cl::list<std::string> in llvm-lto/llvm-lto2 (as a repeated arg) to specify filename:GUID to test the diffbase alone.

Rather than add an option just for testing that one alone, I have a suggestion for splitting up the PRs slightly differently. What if you submitted this one first, minus the modified calls to writeIndexToFile and the part of the test that checks the disassembled index (just have the testing for this one check the number of declarations imported and other debug messages). Then move the modified calls to writeIndexToFile and the index disassembly checking to PR87600 that can be committed as a follow on? That way each change comes with a test.

This way of splitting makes a lot of sense. For in-process ThinLTO, function-import pass in the backend can see thedeclaration status (i.e., without updating bitcode writer), so I updated import_callee_declaration.ll test to test the debugging logs are expected.

@minglotus-6 minglotus-6 changed the title [ThinLTO] Populate declaration import status except for distributed ThinLTO [ThinLTO] Populate declaration import status except for distributed ThinLTO under a default-off new option May 11, 2024
@@ -121,6 +121,9 @@ void llvm::computeLTOCacheKey(
support::endian::write64le(Data, I);
Hasher.update(Data);
};
auto AddUint8 = [&](const uint8_t &I) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could pass by value, since uint8_t is small (similar to AddUnsigned)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

// module. We do this after the above insertion since we may hit the same
// ref/call target multiple times in above loop, and it is more efficient to
// avoid a set lookup each time.
// Prune list computed above to only include values defined in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can keep the original single space in the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

@teresajohnson teresajohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with a couple minor comments below


; The following information are printed from `FunctionImporter::importFunctions`
; in the postlink pipeline.
; TODO: Update debug information for postlink pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about printing counters for both definitions and declarations around these lines https://github.com/minglotus-6/llvm-project/blob/f72611b40de8c060c3901dab0ee9f8424dd3012b/llvm/lib/Transforms/IPO/FunctionImport.cpp#L1757-L1762

I removed the comments in this test case and added TODO in the source code to make it clearer.

; IMPORT-DAG: define available_externally void @small_func
; IMPORT-DAG: define available_externally hidden void @small_indirect_callee
; IMPORT-DAG: declare void @large_func
; IMPORT-NOT: large_indirect_callee
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a TODO or other comment that this is temporary until the corresponding change to do the actual declaration import goes in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. I somehow didn't find the existingTODO when we discussed the follow-up plans (import declaration implementation and test in my llvm fork), and yesterday I thought I deleted them mistakenly when updating PRs.

Turns out there are two TODOs on line 25 and line 57, so gathered them at line 83 - line 85.

@minglotus-6
Copy link
Contributor Author

I had tested the proof-of-concept (along with #87600 and declaration import) on internal large workloads. I'll do a sanity check with the reviewed one and land this after that.

minglotus-6 added a commit that referenced this pull request May 19, 2024
Add 'sort' here since it's helpful when container type
changes (for example, #88024
wants to change container type from `unordered_set` to `DenseMap)

@MaskRay points out `std::` doesn't randomize the iteration order of
`unordered_{set,map}`, and the iteration order for single build is
deterministic.
Copy link

github-actions bot commented May 19, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@minglotus-6 minglotus-6 merged commit 8de7890 into main May 20, 2024
3 of 4 checks passed
@minglotus-6 minglotus-6 deleted the users/minglotus-6/spr/summary2 branch May 20, 2024 05:22
@minglotus-6 minglotus-6 restored the users/minglotus-6/spr/summary2 branch May 20, 2024 05:23
minglotus-6 added a commit that referenced this pull request May 20, 2024
…ibuted ThinLTO under a default-off new option" (#92718)

The original PR is reviewed in
#88024, and this PR adds one
line (b9f04d1)
to fix test

Limit to one thread for in-process ThinLTO to test `LLVM_DEBUG` log.
- This should fix build bot failure like
https://lab.llvm.org/buildbot/#/builders/259/builds/4727 and
https://lab.llvm.org/buildbot/#/builders/9/builds/43876
- I could repro the failure and see interleaved log messages by using
`-thinlto-threads=all`

**Original Commit Message:**

The goal is to populate `declaration` import status if a new flag
`-import-declaration` is on.

* For in-process ThinLTO, the `declaration` status is visible to backend
`function-import` pass, so `FunctionImporter::importFunctions` should
read the import status and be no-op for declaration summaries.
Basically, the postlink pipeline is updated to keep its current behavior
(import definitions), but not updated to handle `declaration` summaries.
Two use cases ([better call-graph
sort](https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5)
or [cross-module
auto-init](#87597 (comment)))
would use this bit differently.

* For distributed ThinLTO, the `declaration` status is not serialized to
bitcode. As discussed, #87600
will do this.
@minglotus-6
Copy link
Contributor Author

I reverted this patch and its follow-up in 707f4de and 53061ee. Despite testing on two server applications, this change caused OOM on some other applications.

I'll need to profile offline to bring the memory usage down.

@jvoung
Copy link
Contributor

jvoung commented Jun 6, 2024

FWIW, if I remember correctly from trying some versions of this, I think DenseMap for FunctionsToImportTy might have used much more memory than unordered_map (and the original unordered_set).

@minglotus-6
Copy link
Contributor Author

FWIW, if I remember correctly from trying some versions of this, I think DenseMap for FunctionsToImportTy might have used much more memory than unordered_map (and the original unordered_set).

Thanks for this info! To optimize the memory usage of this patch, I'm thinking about to storing per-module imported declaration set as another DenseMap<StringRef, unordered_set<uint64_t>>. With selective declaration import, this map should be small.

Note this will duplicate module paths (StringRef as key) in the map. It might be possible to have a separate map (DenseMap<StringRef, int>) to map module path into integers for ImportLists and ExportLists, and store integers in other map keys.

@teresajohnson
Copy link
Contributor

FWIW, if I remember correctly from trying some versions of this, I think DenseMap for FunctionsToImportTy might have used much more memory than unordered_map (and the original unordered_set).

Thanks for this info! To optimize the memory usage of this patch, I'm thinking about to storing per-module imported declaration set as another DenseMap<StringRef, unordered_set<uint64_t>>. With selective declaration import, this map should be small.

Note this will duplicate module paths (StringRef as key) in the map.

This would only duplicate the StringRef, not the whole string though I think? The module path strings should only be stored in the ModulePathStringTable.

It might be possible to have a separate map (DenseMap<StringRef, int>) to map module path into integers for ImportLists and ExportLists, and store integers in other map keys.

@minglotus-6
Copy link
Contributor Author

This would only duplicate the StringRef, not the whole string though I think?

Yes. I understand that StringRef is not std::string. StringRef consists of a pointer and a size_t. If we could use uint32_t to represent module ID (and lazily create entries in <ModulePath, ModuleID-as-uint32_t> when ImportLists or ExportLists needs a module), it might give a moderate saving (say if a third map keyed by StringRef is added besides ImportLists or ExportLists)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:ir llvm:transforms LTO Link time optimization (regular/full LTO or ThinLTO)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants