Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serialization] Load Specializations Lazily #76774

Closed
wants to merge 4 commits into from

Conversation

ChuanqiXu9
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 commented Jan 3, 2024

The idea comes from @vgvassilev and @vgvassilev had a patch for it on phab. Unfortunately phab is closed and I forgot the Dxxx number of that patch. But I remember the last comment from @vgvassilev is that we should use MultiOnDiskHashTable for it. So I followed that and rewrite the whole from the scratch in the new year.

Background

Currently all the specializations of a template (including instantiation, specialization and partial specializations) will be loaded at once if we want to instantiate another instance for the template, or find instantiation for the template, or just want to complete the redecl chain.

This means basically we need to load every specializations for the template once the template declaration got loaded. This is bad since when we load a specialization, we need to load all of its template arguments. Then we have to deserialize a lot of unnecessary declarations.

For example,

// M.cppm
export module M;
export template <class T>
class A {};

export class ShouldNotBeLoaded {};

export class Temp {
   A<ShouldNotBeLoaded> AS;
};

// use.cpp
import M;
A<int> a;

We should a specialization A<ShouldNotBeLoaded> in M.cppm and we instantiate the template A in use.cpp. Then we will deserialize ShouldNotBeLoaded surprisingly when compiling use.cpp. And this patch tries to avoid that.

Given that the templates are heavily used in C++, this is a pain point for the performance.

What this patch did

This patch adds MultiOnDiskHashTable for specializations in the ASTReader. Then we will only deserialize the specializations with the same template arguments. We made that by using ODRHash for the template arguments as the key of the hash table.

The partial specializations are not added to the MultiOnDiskHashTable. Since we can't know if a partial specialization is needed before deciding the template declaration for a instantiation request. There may be space for further optimizations, but let's do that in the future.

To review this patch, I think ASTReaderDecl::AddLazySpecializations may be a good entry point.

@ChuanqiXu9 ChuanqiXu9 added the clang:modules C++20 modules and Clang Header Modules label Jan 3, 2024
@ChuanqiXu9 ChuanqiXu9 self-assigned this Jan 3, 2024
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 3, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 3, 2024

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)

Changes

The idea comes from @vgvassilev and @vgvassilev had patch for it on phab. Unfortunately phab is closed and I forgot the Dxxx number of that patch. But I remember the last comment from @vgvassilev is that we should use MultiOnDiskHashTable for it. So I followed that and rewrite the whole from the scratch in the new year.

Background

Currently all the specializations of a template (including instantiation, specialization and partial specializations) will be loaded at once if we want to instantiate another instance for the template, or find instantiation for the template, or just want to complete the redecl chain.

This means basically we need to load every specializations for the template once the template declaration got loaded. This is bad since when we load a specialization, we need to load all of its template arguments. Then we have to deserialize a lot of unnecessary declarations.

For example,

// M.cppm
export module M;
export template &lt;class T&gt;
class A {};

export class ShouldNotBeLoaded {};

export class Temp {
   A&lt;ShouldNotBeLoaded&gt; AS;
};

// use.cpp
import M;
A&lt;int&gt; a;

We should a specialization A&lt;ShouldNotBeLoaded&gt; in M.cppm and we instantiate the template A in use.cpp. Then we will deserialize ShouldNotBeLoaded surprisingly when compiling use.cpp. And this patch tries to avoid that.

Given that the templates are heavily used in C++, this is a pain point for the performance.

What this patch did

This patch adds MultiOnDiskHashTable for specializations in the ASTReader. Then we will only deserialize the specializations with the same template arguments. We made that by using ODRHash for the template arguments as the key of the hash table.

The partial specializations are not added to the MultiOnDiskHashTable. Since we can't know if a partial specialization is needed before deciding the template declaration for a instantiation request. There may be space for further optimizations, but let's do that in the future.

To review this patch, I think ASTReaderDecl::AddLazySpecializations may be a good entry point.

What this patch not did

This patch doesn't solve the problem completely. Since we will add update specializations if there are new specializations in a different module:

void RegisterTemplateSpecialization(const Decl *Template,
const Decl *Specialization) {
Template = Template->getCanonicalDecl();
// If the canonical template is local, we'll write out this specialization
// when we emit it.
// FIXME: We can do the same thing if there is any local declaration of
// the template, to avoid emitting an update record.
if (!Template->isFromASTFile())
return;
// We only need to associate the first local declaration of the
// specialization. The other declarations will get pulled in by it.
if (Writer.getFirstLocalDecl(Specialization) != Specialization)
return;
Writer.DeclUpdates[Template].push_back(ASTWriter::DeclUpdate(
UPD_CXX_ADDED_TEMPLATE_SPECIALIZATION, Specialization));
}

That said, we can't handle this case now:

// M.cppm
export module M;
export template &lt;class T&gt;
class A {};

// N.cppm
export module N;
export import A;
export class ShouldNotBeLoaded {};

export class Temp {
   A&lt;ShouldNotBeLoaded&gt; AS;
};

// use.cpp
import N;
A&lt;int&gt; a;

Now ShouldNotBeLoaded will still be loaded.

But the current patch is already relatively big. So I want to split it in the next patch. I think the current patch is already self contained.


Patch is 53.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76774.diff

20 Files Affected:

  • (modified) clang/include/clang/AST/DeclTemplate.h (+41-10)
  • (modified) clang/include/clang/AST/ExternalASTSource.h (+5)
  • (modified) clang/include/clang/AST/ODRHash.h (+3)
  • (modified) clang/include/clang/Sema/MultiplexExternalSemaSource.h (+6)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3)
  • (modified) clang/include/clang/Serialization/ASTReader.h (+19)
  • (modified) clang/include/clang/Serialization/ASTWriter.h (+6)
  • (modified) clang/lib/AST/DeclTemplate.cpp (+45-21)
  • (modified) clang/lib/AST/ExternalASTSource.cpp (+5)
  • (modified) clang/lib/AST/ODRHash.cpp (+2)
  • (modified) clang/lib/Sema/MultiplexExternalSemaSource.cpp (+6)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+103-6)
  • (modified) clang/lib/Serialization/ASTReaderDecl.cpp (+28-5)
  • (modified) clang/lib/Serialization/ASTReaderInternals.h (+80)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+148-1)
  • (modified) clang/lib/Serialization/ASTWriterDecl.cpp (+55-20)
  • (modified) clang/test/Modules/odr_hash.cpp (+2-2)
  • (added) clang/test/Modules/static-member-in-templates.cppm (+52)
  • (modified) clang/unittests/Serialization/CMakeLists.txt (+1)
  • (added) clang/unittests/Serialization/LoadSpecLazily.cpp (+159)
diff --git a/clang/include/clang/AST/DeclTemplate.h b/clang/include/clang/AST/DeclTemplate.h
index 832ad2de6b08a8..ab380f55c038ee 100644
--- a/clang/include/clang/AST/DeclTemplate.h
+++ b/clang/include/clang/AST/DeclTemplate.h
@@ -30,6 +30,7 @@
 #include "llvm/ADT/FoldingSet.h"
 #include "llvm/ADT/PointerIntPair.h"
 #include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/iterator.h"
 #include "llvm/ADT/iterator_range.h"
 #include "llvm/Support/Casting.h"
@@ -525,8 +526,11 @@ class FunctionTemplateSpecializationInfo final
     return Function.getInt();
   }
 
+  void loadExternalRedecls();
+
 public:
   friend TrailingObjects;
+  friend class ASTReader;
 
   static FunctionTemplateSpecializationInfo *
   Create(ASTContext &C, FunctionDecl *FD, FunctionTemplateDecl *Template,
@@ -789,13 +793,15 @@ class RedeclarableTemplateDecl : public TemplateDecl,
     return SpecIterator<EntryType>(isEnd ? Specs.end() : Specs.begin());
   }
 
-  void loadLazySpecializationsImpl() const;
+  void loadExternalSpecializations() const;
 
   template <class EntryType, typename ...ProfileArguments>
   typename SpecEntryTraits<EntryType>::DeclType*
   findSpecializationImpl(llvm::FoldingSetVector<EntryType> &Specs,
                          void *&InsertPos, ProfileArguments &&...ProfileArgs);
 
+  void loadLazySpecializationsWithArgs(ArrayRef<TemplateArgument> TemplateArgs);
+
   template <class Derived, class EntryType>
   void addSpecializationImpl(llvm::FoldingSetVector<EntryType> &Specs,
                              EntryType *Entry, void *InsertPos);
@@ -814,9 +820,13 @@ class RedeclarableTemplateDecl : public TemplateDecl,
     /// If non-null, points to an array of specializations (including
     /// partial specializations) known only by their external declaration IDs.
     ///
+    /// These specializations needs to be loaded at once in
+    /// loadExternalSpecializations to complete the redecl chain or be preparing
+    /// for template resolution.
+    ///
     /// The first value in the array is the number of specializations/partial
     /// specializations that follow.
-    uint32_t *LazySpecializations = nullptr;
+    uint32_t *ExternalSpecializations = nullptr;
 
     /// The set of "injected" template arguments used within this
     /// template.
@@ -850,6 +860,8 @@ class RedeclarableTemplateDecl : public TemplateDecl,
   friend class ASTDeclWriter;
   friend class ASTReader;
   template <class decl_type> friend class RedeclarableTemplate;
+  friend class ClassTemplateSpecializationDecl;
+  friend class VarTemplateSpecializationDecl;
 
   /// Retrieves the canonical declaration of this template.
   RedeclarableTemplateDecl *getCanonicalDecl() override {
@@ -977,6 +989,12 @@ SpecEntryTraits<FunctionTemplateSpecializationInfo> {
 class FunctionTemplateDecl : public RedeclarableTemplateDecl {
 protected:
   friend class FunctionDecl;
+  friend class FunctionTemplateSpecializationInfo;
+
+  template <typename DeclTy>
+  friend void GetSpecializationsImpl(const DeclTy *,
+                                     llvm::SmallPtrSetImpl<const NamedDecl *> &,
+                                     ASTReader *Reader);
 
   /// Data that is common to all of the declarations of a given
   /// function template.
@@ -1012,13 +1030,13 @@ class FunctionTemplateDecl : public RedeclarableTemplateDecl {
   void addSpecialization(FunctionTemplateSpecializationInfo* Info,
                          void *InsertPos);
 
+  /// Load any lazily-loaded specializations from the external source.
+  void LoadLazySpecializations() const;
+
 public:
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
 
-  /// Load any lazily-loaded specializations from the external source.
-  void LoadLazySpecializations() const;
-
   /// Get the underlying function declaration of the template.
   FunctionDecl *getTemplatedDecl() const {
     return static_cast<FunctionDecl *>(TemplatedDecl);
@@ -1839,6 +1857,8 @@ class ClassTemplateSpecializationDecl
   LLVM_PREFERRED_TYPE(TemplateSpecializationKind)
   unsigned SpecializationKind : 3;
 
+  void loadExternalRedecls();
+
 protected:
   ClassTemplateSpecializationDecl(ASTContext &Context, Kind DK, TagKind TK,
                                   DeclContext *DC, SourceLocation StartLoc,
@@ -1852,6 +1872,7 @@ class ClassTemplateSpecializationDecl
 public:
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
+  friend class ASTReader;
 
   static ClassTemplateSpecializationDecl *
   Create(ASTContext &Context, TagKind TK, DeclContext *DC,
@@ -2238,6 +2259,11 @@ class ClassTemplatePartialSpecializationDecl
 /// Declaration of a class template.
 class ClassTemplateDecl : public RedeclarableTemplateDecl {
 protected:
+  template <typename DeclTy>
+  friend void GetSpecializationsImpl(const DeclTy *,
+                                     llvm::SmallPtrSetImpl<const NamedDecl *> &,
+                                     ASTReader *Reader);
+
   /// Data that is common to all of the declarations of a given
   /// class template.
   struct Common : CommonBase {
@@ -2285,9 +2311,7 @@ class ClassTemplateDecl : public RedeclarableTemplateDecl {
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
   friend class TemplateDeclInstantiator;
-
-  /// Load any lazily-loaded specializations from the external source.
-  void LoadLazySpecializations() const;
+  friend class ClassTemplateSpecializationDecl;
 
   /// Get the underlying class declarations of the template.
   CXXRecordDecl *getTemplatedDecl() const {
@@ -2651,6 +2675,8 @@ class VarTemplateSpecializationDecl : public VarDecl,
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsCompleteDefinition : 1;
 
+  void loadExternalRedecls();
+
 protected:
   VarTemplateSpecializationDecl(Kind DK, ASTContext &Context, DeclContext *DC,
                                 SourceLocation StartLoc, SourceLocation IdLoc,
@@ -2664,6 +2690,7 @@ class VarTemplateSpecializationDecl : public VarDecl,
 public:
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
+  friend class ASTReader;
   friend class VarDecl;
 
   static VarTemplateSpecializationDecl *
@@ -3018,6 +3045,11 @@ class VarTemplatePartialSpecializationDecl
 /// Declaration of a variable template.
 class VarTemplateDecl : public RedeclarableTemplateDecl {
 protected:
+  template <typename DeclTy>
+  friend void GetSpecializationsImpl(const DeclTy *,
+                                     llvm::SmallPtrSetImpl<const NamedDecl *> &,
+                                     ASTReader *Reader);
+
   /// Data that is common to all of the declarations of a given
   /// variable template.
   struct Common : CommonBase {
@@ -3057,8 +3089,7 @@ class VarTemplateDecl : public RedeclarableTemplateDecl {
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
 
-  /// Load any lazily-loaded specializations from the external source.
-  void LoadLazySpecializations() const;
+  friend class VarTemplatePartialSpecializationDecl;
 
   /// Get the underlying variable declarations of the template.
   VarDecl *getTemplatedDecl() const {
diff --git a/clang/include/clang/AST/ExternalASTSource.h b/clang/include/clang/AST/ExternalASTSource.h
index 8e573965b0a336..7f26afd53106ba 100644
--- a/clang/include/clang/AST/ExternalASTSource.h
+++ b/clang/include/clang/AST/ExternalASTSource.h
@@ -150,6 +150,11 @@ class ExternalASTSource : public RefCountedBase<ExternalASTSource> {
   virtual bool
   FindExternalVisibleDeclsByName(const DeclContext *DC, DeclarationName Name);
 
+  /// Load all the external specialzations for the Decl and the corresponding
+  /// template arguments.
+  virtual void LoadExternalSpecs(const Decl *D,
+                                 ArrayRef<TemplateArgument> TemplateArgs);
+
   /// Ensures that the table of all visible declarations inside this
   /// context is up to date.
   ///
diff --git a/clang/include/clang/AST/ODRHash.h b/clang/include/clang/AST/ODRHash.h
index cedf644520fc32..ddd1bb0f095e75 100644
--- a/clang/include/clang/AST/ODRHash.h
+++ b/clang/include/clang/AST/ODRHash.h
@@ -101,6 +101,9 @@ class ODRHash {
   // Save booleans until the end to lower the size of data to process.
   void AddBoolean(bool value);
 
+  // Add intergers to ID.
+  void AddInteger(unsigned Value);
+
   static bool isSubDeclToBeProcessed(const Decl *D, const DeclContext *Parent);
 
 private:
diff --git a/clang/include/clang/Sema/MultiplexExternalSemaSource.h b/clang/include/clang/Sema/MultiplexExternalSemaSource.h
index 2bf91cb5212c5e..886c3854adac6e 100644
--- a/clang/include/clang/Sema/MultiplexExternalSemaSource.h
+++ b/clang/include/clang/Sema/MultiplexExternalSemaSource.h
@@ -97,6 +97,12 @@ class MultiplexExternalSemaSource : public ExternalSemaSource {
   bool FindExternalVisibleDeclsByName(const DeclContext *DC,
                                       DeclarationName Name) override;
 
+  /// Load all the external specialzations for the Decl and the corresponding
+  /// template args.
+  virtual void
+  LoadExternalSpecs(const Decl *D,
+                    ArrayRef<TemplateArgument> TemplateArgs) override;
+
   /// Ensures that the table of all visible declarations inside this
   /// context is up to date.
   void completeVisibleDeclsMap(const DeclContext *DC) override;
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index fdd64f2abbe937..a1bf3659e91f3e 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -1523,6 +1523,9 @@ enum DeclCode {
   /// An ImplicitConceptSpecializationDecl record.
   DECL_IMPLICIT_CONCEPT_SPECIALIZATION,
 
+  // A decls specilization record.
+  DECL_SPECS,
+
   DECL_LAST = DECL_IMPLICIT_CONCEPT_SPECIALIZATION
 };
 
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 21d791f5cd89a2..52ca6c76db8e37 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -340,6 +340,9 @@ class ASTIdentifierLookupTrait;
 /// The on-disk hash table(s) used for DeclContext name lookup.
 struct DeclContextLookupTable;
 
+/// The on-disk hash table(s) used for specialization decls.
+struct SpecializedDeclsLookupTable;
+
 } // namespace reader
 
 } // namespace serialization
@@ -599,6 +602,11 @@ class ASTReader
   llvm::DenseMap<const DeclContext *,
                  serialization::reader::DeclContextLookupTable> Lookups;
 
+  /// Map from decls to specialized decls.
+  llvm::DenseMap<const Decl *,
+                 serialization::reader::SpecializedDeclsLookupTable>
+      SpecLookups;
+
   // Updates for visible decls can occur for other contexts than just the
   // TU, and when we read those update records, the actual context may not
   // be available yet, so have this pending map using the ID as a key. It
@@ -640,6 +648,9 @@ class ASTReader
                                      llvm::BitstreamCursor &Cursor,
                                      uint64_t Offset, serialization::DeclID ID);
 
+  bool ReadDeclsSpecs(ModuleFile &M, llvm::BitstreamCursor &Cursor,
+                      uint64_t Offset, Decl *D);
+
   /// A vector containing identifiers that have already been
   /// loaded.
   ///
@@ -1343,6 +1354,11 @@ class ASTReader
   const serialization::reader::DeclContextLookupTable *
   getLoadedLookupTables(DeclContext *Primary) const;
 
+  /// Get the loaded specializations lookup tables for \p D,
+  /// if any.
+  serialization::reader::SpecializedDeclsLookupTable *
+  getLoadedSpecLookupTables(Decl *D);
+
 private:
   struct ImportedModule {
     ModuleFile *Mod;
@@ -1982,6 +1998,9 @@ class ASTReader
   bool FindExternalVisibleDeclsByName(const DeclContext *DC,
                                       DeclarationName Name) override;
 
+  void LoadExternalSpecs(const Decl *D,
+                         ArrayRef<TemplateArgument> TemplateArgs) override;
+
   /// Read all of the declarations lexically stored in a
   /// declaration context.
   ///
diff --git a/clang/include/clang/Serialization/ASTWriter.h b/clang/include/clang/Serialization/ASTWriter.h
index de69f99003d827..c98beaa1a24dc0 100644
--- a/clang/include/clang/Serialization/ASTWriter.h
+++ b/clang/include/clang/Serialization/ASTWriter.h
@@ -527,6 +527,10 @@ class ASTWriter : public ASTDeserializationListener,
   bool isLookupResultExternal(StoredDeclsList &Result, DeclContext *DC);
   bool isLookupResultEntirelyExternal(StoredDeclsList &Result, DeclContext *DC);
 
+  uint64_t
+  WriteSpecsLookupTable(NamedDecl *D,
+                        llvm::SmallVectorImpl<const NamedDecl *> &Specs);
+
   void GenerateNameLookupTable(const DeclContext *DC,
                                llvm::SmallVectorImpl<char> &LookupTable);
   uint64_t WriteDeclContextLexicalBlock(ASTContext &Context, DeclContext *DC);
@@ -564,6 +568,8 @@ class ASTWriter : public ASTDeserializationListener,
   unsigned DeclEnumAbbrev = 0;
   unsigned DeclObjCIvarAbbrev = 0;
   unsigned DeclCXXMethodAbbrev = 0;
+  unsigned DeclSpecsAbbrev = 0;
+
   unsigned DeclDependentNonTemplateCXXMethodAbbrev = 0;
   unsigned DeclTemplateCXXMethodAbbrev = 0;
   unsigned DeclMemberSpecializedCXXMethodAbbrev = 0;
diff --git a/clang/lib/AST/DeclTemplate.cpp b/clang/lib/AST/DeclTemplate.cpp
index 7d7556e670f951..43c9158fb40413 100644
--- a/clang/lib/AST/DeclTemplate.cpp
+++ b/clang/lib/AST/DeclTemplate.cpp
@@ -331,14 +331,14 @@ RedeclarableTemplateDecl::CommonBase *RedeclarableTemplateDecl::getCommonPtr() c
   return Common;
 }
 
-void RedeclarableTemplateDecl::loadLazySpecializationsImpl() const {
+void RedeclarableTemplateDecl::loadExternalSpecializations() const {
   // Grab the most recent declaration to ensure we've loaded any lazy
   // redeclarations of this template.
   CommonBase *CommonBasePtr = getMostRecentDecl()->getCommonPtr();
-  if (CommonBasePtr->LazySpecializations) {
+  if (CommonBasePtr->ExternalSpecializations) {
     ASTContext &Context = getASTContext();
-    uint32_t *Specs = CommonBasePtr->LazySpecializations;
-    CommonBasePtr->LazySpecializations = nullptr;
+    uint32_t *Specs = CommonBasePtr->ExternalSpecializations;
+    CommonBasePtr->ExternalSpecializations = nullptr;
     for (uint32_t I = 0, N = *Specs++; I != N; ++I)
       (void)Context.getExternalSource()->GetExternalDecl(Specs[I]);
   }
@@ -358,6 +358,15 @@ RedeclarableTemplateDecl::findSpecializationImpl(
   return Entry ? SETraits::getDecl(Entry)->getMostRecentDecl() : nullptr;
 }
 
+void RedeclarableTemplateDecl::loadLazySpecializationsWithArgs(
+    ArrayRef<TemplateArgument> TemplateArgs) {
+  auto *ExternalSource = getASTContext().getExternalSource();
+  if (!ExternalSource)
+    return;
+
+  ExternalSource->LoadExternalSpecs(this->getCanonicalDecl(), TemplateArgs);
+}
+
 template<class Derived, class EntryType>
 void RedeclarableTemplateDecl::addSpecializationImpl(
     llvm::FoldingSetVector<EntryType> &Specializations, EntryType *Entry,
@@ -430,24 +439,23 @@ FunctionTemplateDecl::newCommon(ASTContext &C) const {
   return CommonPtr;
 }
 
-void FunctionTemplateDecl::LoadLazySpecializations() const {
-  loadLazySpecializationsImpl();
-}
-
 llvm::FoldingSetVector<FunctionTemplateSpecializationInfo> &
 FunctionTemplateDecl::getSpecializations() const {
-  LoadLazySpecializations();
+  loadExternalSpecializations();
   return getCommonPtr()->Specializations;
 }
 
 FunctionDecl *
 FunctionTemplateDecl::findSpecialization(ArrayRef<TemplateArgument> Args,
                                          void *&InsertPos) {
+  loadLazySpecializationsWithArgs(Args);
   return findSpecializationImpl(getSpecializations(), InsertPos, Args);
 }
 
 void FunctionTemplateDecl::addSpecialization(
       FunctionTemplateSpecializationInfo *Info, void *InsertPos) {
+  using SETraits = SpecEntryTraits<FunctionTemplateSpecializationInfo>;
+  loadLazySpecializationsWithArgs(SETraits::getTemplateArgs(Info));
   addSpecializationImpl<FunctionTemplateDecl>(getSpecializations(), Info,
                                               InsertPos);
 }
@@ -508,19 +516,15 @@ ClassTemplateDecl *ClassTemplateDecl::CreateDeserialized(ASTContext &C,
                                        DeclarationName(), nullptr, nullptr);
 }
 
-void ClassTemplateDecl::LoadLazySpecializations() const {
-  loadLazySpecializationsImpl();
-}
-
 llvm::FoldingSetVector<ClassTemplateSpecializationDecl> &
 ClassTemplateDecl::getSpecializations() const {
-  LoadLazySpecializations();
+  loadExternalSpecializations();
   return getCommonPtr()->Specializations;
 }
 
 llvm::FoldingSetVector<ClassTemplatePartialSpecializationDecl> &
 ClassTemplateDecl::getPartialSpecializations() const {
-  LoadLazySpecializations();
+  loadExternalSpecializations();
   return getCommonPtr()->PartialSpecializations;
 }
 
@@ -534,11 +538,14 @@ ClassTemplateDecl::newCommon(ASTContext &C) const {
 ClassTemplateSpecializationDecl *
 ClassTemplateDecl::findSpecialization(ArrayRef<TemplateArgument> Args,
                                       void *&InsertPos) {
+  loadLazySpecializationsWithArgs(Args);
   return findSpecializationImpl(getSpecializations(), InsertPos, Args);
 }
 
 void ClassTemplateDecl::AddSpecialization(ClassTemplateSpecializationDecl *D,
                                           void *InsertPos) {
+  using SETraits = SpecEntryTraits<ClassTemplateSpecializationDecl>;
+  loadLazySpecializationsWithArgs(SETraits::getTemplateArgs(D));
   addSpecializationImpl<ClassTemplateDecl>(getSpecializations(), D, InsertPos);
 }
 
@@ -546,6 +553,7 @@ ClassTemplatePartialSpecializationDecl *
 ClassTemplateDecl::findPartialSpecialization(
     ArrayRef<TemplateArgument> Args,
     TemplateParameterList *TPL, void *&InsertPos) {
+  loadLazySpecializationsWithArgs(Args);
   return findSpecializationImpl(getPartialSpecializations(), InsertPos, Args,
                                 TPL);
 }
@@ -900,6 +908,11 @@ FunctionTemplateSpecializationInfo *FunctionTemplateSpecializationInfo::Create(
       FD, Template, TSK, TemplateArgs, ArgsAsWritten, POI, MSInfo);
 }
 
+void FunctionTemplateSpecializationInfo::loadExternalRedecls() {
+  getTemplate()->loadExternalSpecializations();
+  getTemplate()->loadLazySpecializationsWithArgs(TemplateArguments->asArray());
+}
+
 //===----------------------------------------------------------------------===//
 // ClassTemplateSpecializationDecl Implementation
 //===----------------------------------------------------------------------===//
@@ -1024,6 +1037,12 @@ ClassTemplateSpecializationDecl::getSourceRange() const {
   }
 }
 
+void ClassTemplateSpecializationDecl::loadExternalRedecls() {
+  getSpecializedTemplate()->loadExternalSpecializations();
+  getSpecializedTemplate()->loadLazySpecializationsWithArgs(
+      getTemplateArgs().asArray());
+}
+
 //===----------------------------------------------------------------------===//
 // ConceptDecl Implementation
 //===----------------------------------------------------------------------===//
@@ -1226,19 +1245,15 @@ VarTemplateDecl *VarTemplateDecl::CreateDeserialized(ASTContext &C,
                                      DeclarationName(), nullptr, nullptr);
 }
 
-void VarTemplateDecl::LoadLazySpecializations() const {
-  loadLazySpecializationsImpl();
-}
-
 llvm::FoldingSetVector<VarTemplateSpecializationDecl> &
 VarTemplateDecl::getSpecializations() const {
-  LoadLazySpecializations();
+  loadExternalSpecializations();
   return getCommonPtr()->Specializations;
 }
 
 llvm::FoldingSetVector<VarTemplatePartialSpecializationDecl> &
 VarTemplateDecl::getPartialSpecializations() const {
-  LoadLazySpecializations();
+  loadExternalSpecializations();
   return getCommonPtr()->PartialSpecializations;
 }
 
@@ -1252,17 +1267,21 @@ VarTemplateDecl::newCommon(ASTContext &C) const {
 VarTemplateSpecializationDecl *
 VarTemplateDecl::findSpecialization(ArrayRef<TemplateArgument> Args,
                                     void *&InsertPos) {
+  loadLazySpecializationsWithArgs(Args);
   return findSpecializationImpl(getSpecializations(), InsertPos, Args);
 }
 
 void VarTemplateDecl::AddSpecialization(VarTemplateSpecializationDecl *D,
                                         void *InsertPos) {
+  using SETraits = SpecEntryTraits<VarTemplateSpecializationDecl>;
+  loadLazySpecializationsWithArgs(SETraits::getTemplateArgs(D));
   addSpecializationImpl<VarTemplateDecl>(getSpecializations(), D, InsertPos);
 }
 
 VarTemplatePartialSpecializationD...
[truncated]

Copy link

github-actions bot commented Jan 3, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 4118082f651a05cca258c684ab1199578b57afac 22c9d1145eb57d9c2cb2ef490b7c474598dd5d12 -- clang/unittests/Serialization/LoadSpecLazilyTest.cpp clang/include/clang/AST/DeclTemplate.h clang/include/clang/AST/ExternalASTSource.h clang/include/clang/AST/ODRHash.h clang/include/clang/Sema/MultiplexExternalSemaSource.h clang/include/clang/Serialization/ASTBitCodes.h clang/include/clang/Serialization/ASTReader.h clang/include/clang/Serialization/ASTWriter.h clang/lib/AST/DeclTemplate.cpp clang/lib/AST/ExternalASTSource.cpp clang/lib/AST/ODRHash.cpp clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Sema/MultiplexExternalSemaSource.cpp clang/lib/Serialization/ASTCommon.h clang/lib/Serialization/ASTReader.cpp clang/lib/Serialization/ASTReaderDecl.cpp clang/lib/Serialization/ASTReaderInternals.h clang/lib/Serialization/ASTWriter.cpp clang/lib/Serialization/ASTWriterDecl.cpp
View the diff from clang-format here.
diff --git a/clang/include/clang/AST/DeclTemplate.h b/clang/include/clang/AST/DeclTemplate.h
index 515c60e51e..26a9ebc468 100644
--- a/clang/include/clang/AST/DeclTemplate.h
+++ b/clang/include/clang/AST/DeclTemplate.h
@@ -802,7 +802,7 @@ protected:
   template <class EntryType, typename... ProfileArguments>
   typename SpecEntryTraits<EntryType>::DeclType *
   findLocalSpecialization(llvm::FoldingSetVector<EntryType> &Specs,
-                          void *&InsertPos, ProfileArguments &&... ProfileArgs);
+                          void *&InsertPos, ProfileArguments &&...ProfileArgs);
 
   bool loadLazySpecializationsWithArgs(ArrayRef<TemplateArgument> TemplateArgs);
 
diff --git a/clang/lib/AST/DeclTemplate.cpp b/clang/lib/AST/DeclTemplate.cpp
index f7d513b096..c2bbf29f61 100644
--- a/clang/lib/AST/DeclTemplate.cpp
+++ b/clang/lib/AST/DeclTemplate.cpp
@@ -344,7 +344,7 @@ void RedeclarableTemplateDecl::loadExternalSpecializations() const {
   }
 
   // We still load all the external specializations explicitly in the case
-  // the writer specified `-fload-external-specializations-lazily`. 
+  // the writer specified `-fload-external-specializations-lazily`.
   if (!getASTContext().getLangOpts().LoadExternalSpecializationsLazily &&
       getASTContext().getExternalSource())
     getASTContext().getExternalSource()->LoadAllExternalSpecializations(
@@ -355,7 +355,7 @@ template <class EntryType, typename... ProfileArguments>
 typename RedeclarableTemplateDecl::SpecEntryTraits<EntryType>::DeclType *
 RedeclarableTemplateDecl::findLocalSpecialization(
     llvm::FoldingSetVector<EntryType> &Specs, void *&InsertPos,
-    ProfileArguments &&... ProfileArgs) {
+    ProfileArguments &&...ProfileArgs) {
   using SETraits = SpecEntryTraits<EntryType>;
 
   llvm::FoldingSetNodeID ID;
@@ -370,7 +370,7 @@ template <class EntryType, typename... ProfileArguments>
 typename RedeclarableTemplateDecl::SpecEntryTraits<EntryType>::DeclType *
 RedeclarableTemplateDecl::findSpecializationImpl(
     llvm::FoldingSetVector<EntryType> &Specs, void *&InsertPos,
-    ProfileArguments &&... ProfileArgs) {
+    ProfileArguments &&...ProfileArgs) {
   if (auto *Ret = findLocalSpecialization(
           Specs, InsertPos, std::forward<ProfileArguments>(ProfileArgs)...))
     return Ret;
diff --git a/clang/lib/AST/ODRHash.cpp b/clang/lib/AST/ODRHash.cpp
index 9e274ff596..72a9a870ea 100644
--- a/clang/lib/AST/ODRHash.cpp
+++ b/clang/lib/AST/ODRHash.cpp
@@ -1318,4 +1318,3 @@ void ODRHash::AddStructuralValue(const APValue &Value) {
 }
 
 void ODRHash::AddInteger(unsigned Value) { ID.AddInteger(Value); }
-
diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 99b02f3987..1facfd4865 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -265,9 +265,10 @@ namespace clang {
         : Reader(Reader), Record(Record), Loc(Loc), ThisDeclID(thisDeclID),
           ThisDeclLoc(ThisDeclLoc) {}
 
-    template <typename T> static
-    void AddExternalSpecializations(T *D,
-                                SmallVectorImpl<serialization::DeclID>& IDs) {
+    template <typename T>
+    static void
+    AddExternalSpecializations(T *D,
+                               SmallVectorImpl<serialization::DeclID> &IDs) {
       if (IDs.empty())
         return;
 
@@ -4273,11 +4274,14 @@ void ASTReader::loadDeclUpdateRecords(PendingUpdateRecord &Record) {
           isa<ClassTemplateDecl, VarTemplateDecl, FunctionTemplateDecl>(D)) &&
          "Must not have pending specializations");
   if (auto *CTD = dyn_cast<ClassTemplateDecl>(D))
-    ASTDeclReader::AddExternalSpecializations(CTD, PendingExternalSpecializationIDs);
+    ASTDeclReader::AddExternalSpecializations(CTD,
+                                              PendingExternalSpecializationIDs);
   else if (auto *FTD = dyn_cast<FunctionTemplateDecl>(D))
-    ASTDeclReader::AddExternalSpecializations(FTD, PendingExternalSpecializationIDs);
+    ASTDeclReader::AddExternalSpecializations(FTD,
+                                              PendingExternalSpecializationIDs);
   else if (auto *VTD = dyn_cast<VarTemplateDecl>(D))
-    ASTDeclReader::AddExternalSpecializations(VTD, PendingExternalSpecializationIDs);
+    ASTDeclReader::AddExternalSpecializations(VTD,
+                                              PendingExternalSpecializationIDs);
   PendingExternalSpecializationIDs.clear();
 
   // Load the pending visible updates for this decl context, if it has any.
diff --git a/clang/unittests/Serialization/LoadSpecLazilyTest.cpp b/clang/unittests/Serialization/LoadSpecLazilyTest.cpp
index 39b183f774..58ffd1ca38 100644
--- a/clang/unittests/Serialization/LoadSpecLazilyTest.cpp
+++ b/clang/unittests/Serialization/LoadSpecLazilyTest.cpp
@@ -46,8 +46,7 @@ public:
     OS << Contents;
   }
 
-  std::string GenerateModuleInterface(StringRef ModuleName,
-                                      StringRef Contents,
+  std::string GenerateModuleInterface(StringRef ModuleName, StringRef Contents,
                                       bool WriteExternalSpecsTable) {
     std::string FileName = llvm::Twine(ModuleName + ".cppm").str();
     addFile(FileName, Contents);
@@ -65,9 +64,9 @@ public:
     const char *Args[] = {"clang++",
                           "-std=c++20",
                           "--precompile",
-                          (WriteExternalSpecsTable ?
-                           "-fload-external-specializations-lazily" :
-                           ""),
+                          (WriteExternalSpecsTable
+                               ? "-fload-external-specializations-lazily"
+                               : ""),
                           PrebuiltModulePath.c_str(),
                           "-working-directory",
                           TestDir.c_str(),
@@ -159,7 +158,8 @@ export class ShouldNotBeLoaded {};
 export class Temp {
    A<ShouldNotBeLoaded> AS;
 };
-  )cpp", /*WriteExternalSpecsTable=*/true);
+  )cpp",
+                          /*WriteExternalSpecsTable=*/true);
 
   const char *test_file_contents = R"cpp(
 import M;
@@ -185,7 +185,8 @@ TEST_F(LoadSpecLazilyTest, ChainedTest) {
 export module M;
 export template <class T>
 class A {};
-  )cpp", /*WriteExternalSpecsTable=*/true);
+  )cpp",
+                          /*WriteExternalSpecsTable=*/true);
 
   GenerateModuleInterface("N", R"cpp(
 export module N;
@@ -195,7 +196,8 @@ export class ShouldNotBeLoaded {};
 export class Temp {
    A<ShouldNotBeLoaded> AS;
 };
-  )cpp", /*WriteExternalSpecsTable=*/true);
+  )cpp",
+                          /*WriteExternalSpecsTable=*/true);
 
   const char *test_file_contents = R"cpp(
 import N;
@@ -223,7 +225,8 @@ TEST_F(LoadSpecLazilyTest, LoadAllTest) {
 export module M;
 export template <class T>
 class A {};
-  )cpp", /*WriteExternalSpecsTable=*/true);
+  )cpp",
+                          /*WriteExternalSpecsTable=*/true);
 
   GenerateModuleInterface("N", R"cpp(
 export module N;
@@ -233,7 +236,8 @@ export class ShouldBeLoaded {};
 export class Temp {
    A<ShouldBeLoaded> AS;
 };
-  )cpp", /*WriteExternalSpecsTable=*/true);
+  )cpp",
+                          /*WriteExternalSpecsTable=*/true);
 
   const char *test_file_contents = R"cpp(
 import N;

Copy link
Contributor

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great way to start a new year ;)

The phab link is https://reviews.llvm.org/D41416.

In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.

Perhaps we should put both patches together and that'd allow us to test them if they are on par with https://reviews.llvm.org/D41416 which we use downstream.

Thanks for working on this!

@@ -150,6 +150,11 @@ class ExternalASTSource : public RefCountedBase<ExternalASTSource> {
virtual bool
FindExternalVisibleDeclsByName(const DeclContext *DC, DeclarationName Name);

/// Load all the external specialzations for the Decl and the corresponding
/// template arguments.
virtual void LoadExternalSpecs(const Decl *D,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
virtual void LoadExternalSpecs(const Decl *D,
virtual void FindExternalSpecialization(const Decl *D,

sounds more consistent to the surroundings here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel Load may be a better name. Since from the signature it doesn't find anything. And if we want consistency, I suggest to rename FindExternalVisibleDeclsByName to LoadExternalVisibleDeclsByName.

@@ -527,6 +527,10 @@ class ASTWriter : public ASTDeserializationListener,
bool isLookupResultExternal(StoredDeclsList &Result, DeclContext *DC);
bool isLookupResultEntirelyExternal(StoredDeclsList &Result, DeclContext *DC);

uint64_t
WriteSpecsLookupTable(NamedDecl *D,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally spec would read as specification not specialization. Maybe we should use the full word.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will do in the next circle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

SmallVector<serialization::DeclID, 32> SpecIDs;
readDeclIDList(SpecIDs);

if (Record.readInt())
ReadDeclsSpecs(*Loc.F, D, Loc.F->DeclsCursor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the TemplateDecl came from a different module file and this module file contains only specializations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it won't fall here. It is the job of the latter patch (ChuanqiXu9@7f027f0)

@@ -1249,3 +1249,5 @@ void ODRHash::AddQualType(QualType T) {
void ODRHash::AddBoolean(bool Value) {
Bools.push_back(Value);
}

void ODRHash::AddInteger(unsigned Value) { ID.AddInteger(Value); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember @hahnjo and @zygoloid discussing that the odr-hasher is probably not the best way to has template arguments because the hasher would not take into account semantic aspects of template arguments. For example, a fully qualified template argument would not compare the same to a non-qualified one. We might need to implement our own folding set logic.

@hahnjo, could you help me out dig that discussion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I didn't recognize this. If this is true, we need to decide if we can leave a FIXME here or we must fix it to proceed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The review related to ODRHash is this one: https://reviews.llvm.org/D153003

In short, my understanding is that ODRHash gives the following guarantee: If the hashes are different, there is guaranteed to be a ODR violation. In the other direction, if two hashes are the same, the declarations have to be compared in more detail, ie there may or may not be an ODR violation.

For the specializations, we need the opposite: If two template arguments are semantically the same (*), they must hash to the same value or otherwise we will not find the correct bucket. On the other hand, two different specialization arguments may have the same hash, that's fine for the map data structure.

Now the additional caveat (*) is that "semantically the same" is not the same congruence as "no ODR violation". In https://reviews.llvm.org/D153003 we discuss using declarations, but IIRC it's also possible to construct problematic cases with (nested) namespaces, top-level :: prefixes, and template template parameters. Taken together, my conclusion from the discussion above is that ODRHash is simply not the right method to find template specialization parameters in a map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great analysis. Fair enough, let's find a method to proceed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add a test case to show the problem in 9b808a4. But the current patch works well for that. While I agree the ODRHash may be too aggressive for the problem we're solving, I don't want to write things that can't be well tested. I am wondering if we can proceed by leaving a FIXME here if we can't find good test in time? Or maybe we can add an option -fload-specialization-lazily, then we can regress smoothly if there are any problems.

@hahnjo @vgvassilev

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the qualified related problems in ODRHash (at least some of them) are fixed in https://reviews.llvm.org/D156210

Copy link
Contributor

@vgvassilev vgvassilev Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the comment we are discussing is here: https://reviews.llvm.org/D154324#4524368 by @zygoloid:

"
...

For D41416, ODR hashing may not be the best mechanism to hash the template arguments, unfortunately. ODR hashing is (or perhaps, should be) about determining whether two things are spelled the same way and have the same meaning (as required by the C++ ODR), whereas I think what you're looking for is whether they have the same meaning regardless of spelling. Maybe we can get away with reusing ODR hashing anyway, on the basis that any canonical, non-dependent template argument should have the same (invented) spelling in every translation unit, but I'm not certain that's true in all cases. There may still be cases where the canonical type includes some aspect of "whatever we saw first", in which case the ODR hash can differ across translation units for non-dependent, canonical template arguments that are spelled differently but have the same meaning, though I can't think of one off-hand.
"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just saw it. My concern for reinventing a new hash mechanism is how can we make sure it is correct. It may be not hard to invent a new hasher. But I am just worrying it may not be well tested. I prefer to make it step by step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the example of @hahnjo works, perhaps a FIXME referring to this discussion should be sufficient and we can revisit the issue once we have an example that breaks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ChuanqiXu9
Copy link
Member Author

This is a great way to start a new year ;)

The phab link is https://reviews.llvm.org/D41416.

In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.

Perhaps we should put both patches together and that'd allow us to test them if they are on par with https://reviews.llvm.org/D41416 which we use downstream.

Thanks for working on this!

Hi Vassilev, for testing purpose I sent https://github.com/ChuanqiXu9/llvm-project/tree/LoadSpecializationUpdatesLazily. I didn't create stacked review since I feel a standalone branch may be sufficient.

In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.

IIUC, it looks like what I do in ChuanqiXu9@7f027f0#diff-c61a3cce4bfa099b5af032fa83cbf1563f0af4bf58dc112b39571d74b6b681c1R3487-R3499. But I don't want to do that with this patch. Since we can avoid load the hash table if the template decl is not loaded.

ChuanqiXu9 added a commit that referenced this pull request Jan 8, 2024
…iased template args

This a test for #76774. In the
review comments, we're concerning about the case that ODRHash may
produce the different hash values for semantical same template
arguments. For example, if the template argument in a specialization is
not qualified and the semantical same template argument in the instantiation
point is qualified, we should be able to select that template
specialization. And this patch tests this behavior: we should be able to select
the correct specialization with semantical same template arguments.
@vgvassilev
Copy link
Contributor

This is a great way to start a new year ;)
The phab link is https://reviews.llvm.org/D41416.
In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.
Perhaps we should put both patches together and that'd allow us to test them if they are on par with https://reviews.llvm.org/D41416 which we use downstream.
Thanks for working on this!

Hi Vassilev, for testing purpose I sent https://github.com/ChuanqiXu9/llvm-project/tree/LoadSpecializationUpdatesLazily. I didn't create stacked review since I feel a standalone branch may be sufficient.

@ChuanqiXu9, I'd prefer to review both patches at the same time. Otherwise we risk of missing some important details.

@ChuanqiXu9
Copy link
Member Author

This is a great way to start a new year ;)
The phab link is https://reviews.llvm.org/D41416.
In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.
Perhaps we should put both patches together and that'd allow us to test them if they are on par with https://reviews.llvm.org/D41416 which we use downstream.
Thanks for working on this!

Hi Vassilev, for testing purpose I sent https://github.com/ChuanqiXu9/llvm-project/tree/LoadSpecializationUpdatesLazily. I didn't create stacked review since I feel a standalone branch may be sufficient.

@ChuanqiXu9, I'd prefer to review both patches at the same time. Otherwise we risk of missing some important details.

Got it. I can try to create a stacked review. But from I know about the status quo stacked review now, it will require us to lost the current contexnt...

And it will still be pretty valuable if you can test this with your internal workloads, then may be we can find something pretty important in the high level before going into the details. I've tested this in our local workloads, and it looks good and the performance improvements remains. But I know our uses about modules may be not so complex like yours.

@vgvassilev
Copy link
Contributor

This is a great way to start a new year ;)
The phab link is https://reviews.llvm.org/D41416.
In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.
Perhaps we should put both patches together and that'd allow us to test them if they are on par with https://reviews.llvm.org/D41416 which we use downstream.
Thanks for working on this!

Hi Vassilev, for testing purpose I sent https://github.com/ChuanqiXu9/llvm-project/tree/LoadSpecializationUpdatesLazily. I didn't create stacked review since I feel a standalone branch may be sufficient.

@ChuanqiXu9, I'd prefer to review both patches at the same time. Otherwise we risk of missing some important details.

Got it. I can try to create a stacked review. But from I know about the status quo stacked review now, it will require us to lost the current contexnt...

And it will still be pretty valuable if you can test this with your internal workloads, then may be we can find something pretty important in the high level before going into the details. I've tested this in our local workloads, and it looks good and the performance improvements remains. But I know our uses about modules may be not so complex like yours.

I would just push the second commit here. It should be good enough.

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Jan 9, 2024

I failed to use spr to create stacked review... So I just create the stacked PR manually: #77417. Luckily the context are remained. I heard the current context may be lost if we change to use spr now.

@ChuanqiXu9 ChuanqiXu9 changed the title [Serialization] Load Specializations Lazily (1/2) [Serialization] Load Specializations Lazily Jan 9, 2024
Copy link
Contributor

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks quite promising to me. Have you run that patch on bigger workflows? Do we have some performance numbers to compare?

I will run some tests on our infrastructure and report back.

@@ -100,6 +100,11 @@ ExternalASTSource::FindExternalVisibleDeclsByName(const DeclContext *DC,
return false;
}

void ExternalASTSource::LoadExternalSpecializations(
const Decl *D, ArrayRef<TemplateArgument> TemplateArgs) {
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in the next circle.

@dwblaikie
Copy link
Collaborator

@ilya-biryukov any chance you/your folks could test this change for performance implications in google? It's especially helpful to CERN, but the last iteration of this direction had some regressions that stalled out progress on that version a few years ago, so it'd be good to help poke this along while making sure it doesn't cause release hiccups/etc for google.

@ChuanqiXu9
Copy link
Member Author

Have you run that patch on bigger workflows? Do we have some performance numbers to compare?

I've tested it functionality in our largest workload about modules. It runs well. But our uses of modules don't have a lot of complexities while it has a large scale. For performances, I plan to make it this week. It is a little bit additional work since I need to compile the compiler with different optimizations to have a fair comparison.

@vgvassilev
Copy link
Contributor

@ChuanqiXu9, this PR does not seem to compile. Can you make the second commit work before I start testing?

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Jan 11, 2024

@ChuanqiXu9, this PR does not seem to compile. Can you make the second commit work before I start testing?

Oh, sorry. It should work now.

@ChuanqiXu9
Copy link
Member Author

ChuanqiXu9 commented Jan 11, 2024

Update:

Previously we will always try to load the specializations with the
corresponding arguments before finding the specializations. This
requires to hash the template arguments.

This patch tries to improve this by trying to load the specializations
only if we can't find it locally.

But I didn't observe significant improvement with this change locally.

@ChuanqiXu9
Copy link
Member Author

[do not merge] [runtime-cxxmodules] Rework our lazy template specialization deserialization mechanism root-project/root#14495

From root-project/root#14495, I see there is new reply saying the testing is actually fine. Do you think we still need to split the patch?

That comment was concerning the version of the patch that had the lazy template deserialization turned off by default. Yes, I still think that this patch should implement tha on-disk hash table on top of D41416

OK. And would you like to send a PR for D41416? I've already fixed the issue mentioned in the review page. Then I'd like to send small and incremental patches on that.

Do you mean that I should open a PR for D41416 and you will apply your patch there? I have no problem if we do everything here as part of this PR. This way we will have the full history of how this was born in one place ;)

Yeah, and please create a branch under llvm/llvm-project directly. Then I can perform stacked PR on that.

There it is: https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003

Is there a PR? So that we can comment on that. For example, is D153003 necessary? I remeber in the review process, we think it may not be wanted.

I can create a PR. My understanding is that D153003 because of the intricacies of the ODR hashing approach. If we create an alternative we can take it out. For now, we can keep it and later we can drop it.

I feel better to drop D153003 if it is not a blocking issue. I feel it make things more complicated...

@vgvassilev
Copy link
Contributor

[do not merge] [runtime-cxxmodules] Rework our lazy template specialization deserialization mechanism root-project/root#14495

From root-project/root#14495, I see there is new reply saying the testing is actually fine. Do you think we still need to split the patch?

That comment was concerning the version of the patch that had the lazy template deserialization turned off by default. Yes, I still think that this patch should implement tha on-disk hash table on top of D41416

OK. And would you like to send a PR for D41416? I've already fixed the issue mentioned in the review page. Then I'd like to send small and incremental patches on that.

Do you mean that I should open a PR for D41416 and you will apply your patch there? I have no problem if we do everything here as part of this PR. This way we will have the full history of how this was born in one place ;)

Yeah, and please create a branch under llvm/llvm-project directly. Then I can perform stacked PR on that.

There it is: https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003

If I drop it then our tests will break. IIUC that's somewhere deep in the hasher and should be not impact this PR. Does this make the work on the on-disk hashtable more complicated in some way?

@ChuanqiXu9
Copy link
Member Author

[do not merge] [runtime-cxxmodules] Rework our lazy template specialization deserialization mechanism root-project/root#14495

From root-project/root#14495, I see there is new reply saying the testing is actually fine. Do you think we still need to split the patch?

That comment was concerning the version of the patch that had the lazy template deserialization turned off by default. Yes, I still think that this patch should implement tha on-disk hash table on top of D41416

OK. And would you like to send a PR for D41416? I've already fixed the issue mentioned in the review page. Then I'd like to send small and incremental patches on that.

Do you mean that I should open a PR for D41416 and you will apply your patch there? I have no problem if we do everything here as part of this PR. This way we will have the full history of how this was born in one place ;)

Yeah, and please create a branch under llvm/llvm-project directly. Then I can perform stacked PR on that.

There it is: https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003

If I drop it then our tests will break. IIUC that's somewhere deep in the hasher and should be not impact this PR. Does this make the work on the on-disk hashtable more complicated in some way?

No, it won't block the work for on-disk hashtable. But if we want to land that, we must understand what happened actually...

@vgvassilev
Copy link
Contributor

[do not merge] [runtime-cxxmodules] Rework our lazy template specialization deserialization mechanism root-project/root#14495

From root-project/root#14495, I see there is new reply saying the testing is actually fine. Do you think we still need to split the patch?

That comment was concerning the version of the patch that had the lazy template deserialization turned off by default. Yes, I still think that this patch should implement tha on-disk hash table on top of D41416

OK. And would you like to send a PR for D41416? I've already fixed the issue mentioned in the review page. Then I'd like to send small and incremental patches on that.

Do you mean that I should open a PR for D41416 and you will apply your patch there? I have no problem if we do everything here as part of this PR. This way we will have the full history of how this was born in one place ;)

Yeah, and please create a branch under llvm/llvm-project directly. Then I can perform stacked PR on that.

There it is: https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003

If I drop it then our tests will break. IIUC that's somewhere deep in the hasher and should be not impact this PR. Does this make the work on the on-disk hashtable more complicated in some way?

No, it won't block the work for on-disk hashtable. But if we want to land that, we must understand what happened actually...

We can’t land that without attaching your on-disk hashtable implementation part of this PR because of what’s mentioned here #76774 (comment)

@ChuanqiXu9
Copy link
Member Author

[do not merge] [runtime-cxxmodules] Rework our lazy template specialization deserialization mechanism root-project/root#14495

From root-project/root#14495, I see there is new reply saying the testing is actually fine. Do you think we still need to split the patch?

That comment was concerning the version of the patch that had the lazy template deserialization turned off by default. Yes, I still think that this patch should implement tha on-disk hash table on top of D41416

OK. And would you like to send a PR for D41416? I've already fixed the issue mentioned in the review page. Then I'd like to send small and incremental patches on that.

Do you mean that I should open a PR for D41416 and you will apply your patch there? I have no problem if we do everything here as part of this PR. This way we will have the full history of how this was born in one place ;)

Yeah, and please create a branch under llvm/llvm-project directly. Then I can perform stacked PR on that.

There it is: https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003

If I drop it then our tests will break. IIUC that's somewhere deep in the hasher and should be not impact this PR. Does this make the work on the on-disk hashtable more complicated in some way?

No, it won't block the work for on-disk hashtable. But if we want to land that, we must understand what happened actually...

We can’t land that without attaching your on-disk hashtable implementation part of this PR because of what’s mentioned here #76774 (comment)

I know that. But we're not talking about the same thing. This is one of the reason that we can't land that. But my point is that we can't land that if we don't understand what's going wrong without that patch.

@hahnjo
Copy link
Member

hahnjo commented Feb 18, 2024

But my point is that we can't land that if we don't understand what's going wrong without that patch.

We understand that very well and it's described in https://reviews.llvm.org/D153003 as well as the surrounding discussions: because of the way that ODRHash works, template template arguments A and B will hash to different values, even if using A = B. However, for template specializations, we require them to hash to the same value (with some form of normalization) or we won't find nor load the right specializations. That's why I said that IMHO ODRHash is not the right tool for the job here, which follows directly from an old comment of yours: https://reviews.llvm.org/D153003#4427412

An important node here is that ODRHash is used to check the AST Nodes are keeping the same across compilations. There is gap to use ODRHash to check the semantical equality.

(and IIRC that's the same direction that Richard was going)

@ChuanqiXu9
Copy link
Member Author

But my point is that we can't land that if we don't understand what's going wrong without that patch.

We understand that very well and it's described in https://reviews.llvm.org/D153003 as well as the surrounding discussions: because of the way that ODRHash works, template template arguments A and B will hash to different values, even if using A = B.

Yeah, so I tried to fix that in the following patches. And if that works, I expect that can fix internal errors in your workloads.

However, for template specializations, we require them to hash to the same value (with some form of normalization) or we won't find nor load the right specializations. That's why I said that IMHO ODRHash is not the right tool for the job here, which follows directly from an old comment of yours: https://reviews.llvm.org/D153003#4427412

An important node here is that ODRHash is used to check the AST Nodes are keeping the same across compilations. There is gap to use ODRHash to check the semantical equality.

(and IIRC that's the same direction that Richard was going)

@vgvassilev
Copy link
Contributor

Let's zoom out a little. The approach in D41416 shows that it is feasible to store a hash of the template arguments to delay eager deserializations. The ODR hash approach is a second order problem because we can swap it with something better once we need to. In order to make progress we have introduced D153003 which allows our infrastructure to work. The way I see moving forward here is:

  • Base this PR on D41416 in the approach how we model the lazy deserialization of templates. That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.
  • Test the implementation on our infrastructure for correctness
  • Test the implementation on the Google infrastructure for scalability
  • Think on a better approach to replace odr hashing if we see more pathological problems.

@ChuanqiXu9
Copy link
Member Author

Let's zoom out a little. The approach in D41416 shows that it is feasible to store a hash of the template arguments to delay eager deserializations. The ODR hash approach is a second order problem because we can swap it with something better once we need to. In order to make progress we have introduced D153003 which allows our infrastructure to work. The way I see moving forward here is:

  • Base this PR on D41416 in the approach how we model the lazy deserialization of templates. That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.
  • Test the implementation on our infrastructure for correctness
  • Test the implementation on the Google infrastructure for scalability
  • Think on a better approach to replace odr hashing if we see more pathological problems.

Yeah, no problem at all. This is what I want in the higher level too. What I am confused is about the status of D153003. If it is true that we've describe the problem completely in the review page, then c31d6b4 should be a proper fix for that.

@vgvassilev
Copy link
Contributor

Let's zoom out a little. The approach in D41416 shows that it is feasible to store a hash of the template arguments to delay eager deserializations. The ODR hash approach is a second order problem because we can swap it with something better once we need to. In order to make progress we have introduced D153003 which allows our infrastructure to work. The way I see moving forward here is:

  • Base this PR on D41416 in the approach how we model the lazy deserialization of templates. That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.
  • Test the implementation on our infrastructure for correctness
  • Test the implementation on the Google infrastructure for scalability
  • Think on a better approach to replace odr hashing if we see more pathological problems.

Yeah, no problem at all. This is what I want in the higher level too. What I am confused is about the status of D153003. If it is true that we've describe the problem completely in the review page, then c31d6b4 should be a proper fix for that.

I can try it on our infrastructure and if it works I will remove D153003.

@ilya-biryukov
Copy link
Contributor

Sorry for losing track of the discussion here. What is the current status here? Should we run another round of testing?

Also, I see proposals to land the new behaviour under a flag and have it off by default.
If that does not add a lot of complexity, that would definitely be something that's makes testing easier on our side. Our compiler is build from revisions close to head and don't need to wait for the next Clang release to rip the benefits of this approach.

@vgvassilev
Copy link
Contributor

Sorry for losing track of the discussion here. What is the current status here? Should we run another round of testing?

Also, I see proposals to land the new behaviour under a flag and have it off by default. If that does not add a lot of complexity, that would definitely be something that's makes testing easier on our side. Our compiler is build from revisions close to head and don't need to wait for the next Clang release to rip the benefits of this approach.

@ilya-biryukov, this PR is not ready to test. However, I'd appreciate if you could test our baseline patch located here: https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003 on you

@vgvassilev
Copy link
Contributor

Let's zoom out a little. The approach in D41416 shows that it is feasible to store a hash of the template arguments to delay eager deserializations. The ODR hash approach is a second order problem because we can swap it with something better once we need to. In order to make progress we have introduced D153003 which allows our infrastructure to work. The way I see moving forward here is:

  • Base this PR on D41416 in the approach how we model the lazy deserialization of templates. That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.
  • Test the implementation on our infrastructure for correctness
  • Test the implementation on the Google infrastructure for scalability
  • Think on a better approach to replace odr hashing if we see more pathological problems.

Yeah, no problem at all. This is what I want in the higher level too. What I am confused is about the status of D153003. If it is true that we've describe the problem completely in the review page, then c31d6b4 should be a proper fix for that.

I can try it on our infrastructure and if it works I will remove D153003.

@ChuanqiXu9, you were right. We seem to not need D153003 and I have removed it from the branch.

@ChuanqiXu9
Copy link
Member Author

Let's zoom out a little. The approach in D41416 shows that it is feasible to store a hash of the template arguments to delay eager deserializations. The ODR hash approach is a second order problem because we can swap it with something better once we need to. In order to make progress we have introduced D153003 which allows our infrastructure to work. The way I see moving forward here is:

  • Base this PR on D41416 in the approach how we model the lazy deserialization of templates. That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.
  • Test the implementation on our infrastructure for correctness
  • Test the implementation on the Google infrastructure for scalability
  • Think on a better approach to replace odr hashing if we see more pathological problems.

Yeah, no problem at all. This is what I want in the higher level too. What I am confused is about the status of D153003. If it is true that we've describe the problem completely in the review page, then c31d6b4 should be a proper fix for that.

I can try it on our infrastructure and if it works I will remove D153003.

@ChuanqiXu9, you were right. We seem to not need D153003 and I have removed it from the branch.

Yeah, then let's create a new branch (the existing [D41416_D153003](https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003) sounds not like a good name) and a PR for that. Then I can start a stacked PR on that.

@ChuanqiXu9
Copy link
Member Author

Oh, I didn't notice you've removed D153003 already. But the branch name looks not good. So I've created a pr in #83108

@ChuanqiXu9
Copy link
Member Author

That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.

@vgvassilev Let me try to double check your advice. In you suggestion, you suggest to replace LazySpecializationInfo *LazySpecializations with an on-disk hash map from an integer (hash value for template args) to LazySpecializationInfo in D41416 instead of another integer (DeclID, just like my patch)?

ChuanqiXu9 added a commit that referenced this pull request Feb 28, 2024
Following up for #83108

This follows the suggestion literally from
#76774 (comment)

which introduces OnDiskHashTable for specializations based on D41416.

Note that I didn't polish this patch to reduce the diff from D41416 to
it easier to review. I'll make the polishing patch later. So that we can
focus what we're doing in this patch and focus on the style in the next
patch.
ChuanqiXu9 added a commit that referenced this pull request Mar 5, 2024
Following up for #83108

This follows the suggestion literally from
#76774 (comment)

which introduces OnDiskHashTable for specializations based on D41416.

Note that I didn't polish this patch to reduce the diff from D41416 to
it easier to review. I'll make the polishing patch later. So that we can
focus what we're doing in this patch and focus on the style in the next
patch.
Copy link
Contributor

@iains iains left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to defer to @vgvassilev et al. on this one.

ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this pull request Apr 7, 2024
Following up for llvm#83108

This follows the suggestion literally from
llvm#76774 (comment)

which introduces OnDiskHashTable for specializations based on D41416.

Note that I didn't polish this patch to reduce the diff from D41416 to
it easier to review. I'll make the polishing patch later. So that we can
focus what we're doing in this patch and focus on the style in the next
patch.
@ChuanqiXu9
Copy link
Member Author

Given we're pursuing #83237 series. I'll close this one.

@ChuanqiXu9 ChuanqiXu9 closed this Apr 25, 2024
ChuanqiXu9 added a commit that referenced this pull request Apr 25, 2024
Following up for #83108

This follows the suggestion literally from
#76774 (comment)

which introduces OnDiskHashTable for specializations based on
D41416.

Note that I didn't polish this patch to reduce the diff from D41416
to it easier to review. I'll make the polishing patch later. So that we
can focus what we're doing in this patch and focus on the style in the
next patch.
ChuanqiXu9 added a commit that referenced this pull request Jul 5, 2024
Following up for #83108

This follows the suggestion literally from
#76774 (comment)

which introduces OnDiskHashTable for specializations based on
D41416.

Note that I didn't polish this patch to reduce the diff from D41416
to it easier to review. I'll make the polishing patch later. So that we
can focus what we're doing in this patch and focus on the style in the
next patch.
ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this pull request Jul 10, 2024
Following up for llvm#83108

This follows the suggestion literally from
llvm#76774 (comment)

which introduces OnDiskHashTable for specializations based on
D41416.

Note that I didn't polish this patch to reduce the diff from D41416
to it easier to review. I'll make the polishing patch later. So that we
can focus what we're doing in this patch and focus on the style in the
next patch.
ilya-biryukov pushed a commit to ilya-biryukov/llvm-project that referenced this pull request Oct 16, 2024
…zations when looking for one.

fmt

[Serialization] Introduce OnDiskHashTable for specializations

Following up for llvm#83108

This follows the suggestion literally from
llvm#76774 (comment)

which introduces OnDiskHashTable for specializations based on
D41416.

Note that I didn't polish this patch to reduce the diff from D41416
to it easier to review. I'll make the polishing patch later. So that we
can focus what we're doing in this patch and focus on the style in the
next patch.

[Serialization] Code cleanups and polish 83233

fmt

load specializations before writing specialization decls

address comments

Revert "load specializations before writing specialization decls"

This reverts commit 61c451d.

Do not omit data from imported modules with same key

Handle merging spec info manually
ilya-biryukov pushed a commit to ilya-biryukov/llvm-project that referenced this pull request Oct 16, 2024
…zations when looking for one.

fmt

[Serialization] Introduce OnDiskHashTable for specializations

Following up for llvm#83108

This follows the suggestion literally from
llvm#76774 (comment)

which introduces OnDiskHashTable for specializations based on
D41416.

Note that I didn't polish this patch to reduce the diff from D41416
to it easier to review. I'll make the polishing patch later. So that we
can focus what we're doing in this patch and focus on the style in the
next patch.

[Serialization] Code cleanups and polish 83233

fmt

load specializations before writing specialization decls

address comments

Revert "load specializations before writing specialization decls"

This reverts commit 61c451d.

Do not omit data from imported modules with same key

Handle merging spec info manually
ChuanqiXu9 added a commit that referenced this pull request Oct 17, 2024
Following up for #83108

This follows the suggestion literally from
#76774 (comment)

which introduces OnDiskHashTable for specializations based on
D41416.

Note that I didn't polish this patch to reduce the diff from D41416
to it easier to review. I'll make the polishing patch later. So that we
can focus what we're doing in this patch and focus on the style in the
next patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants