[Serialization] Load Specializations Lazily #76774

ChuanqiXu9 · 2024-01-03T03:26:51Z

The idea comes from @vgvassilev and @vgvassilev had a patch for it on phab. Unfortunately phab is closed and I forgot the Dxxx number of that patch. But I remember the last comment from @vgvassilev is that we should use MultiOnDiskHashTable for it. So I followed that and rewrite the whole from the scratch in the new year.

Background

Currently all the specializations of a template (including instantiation, specialization and partial specializations) will be loaded at once if we want to instantiate another instance for the template, or find instantiation for the template, or just want to complete the redecl chain.

This means basically we need to load every specializations for the template once the template declaration got loaded. This is bad since when we load a specialization, we need to load all of its template arguments. Then we have to deserialize a lot of unnecessary declarations.

For example,

// M.cppm
export module M;
export template <class T>
class A {};

export class ShouldNotBeLoaded {};

export class Temp {
   A<ShouldNotBeLoaded> AS;
};

// use.cpp
import M;
A<int> a;

We should a specialization A<ShouldNotBeLoaded> in M.cppm and we instantiate the template A in use.cpp. Then we will deserialize ShouldNotBeLoaded surprisingly when compiling use.cpp. And this patch tries to avoid that.

Given that the templates are heavily used in C++, this is a pain point for the performance.

What this patch did

This patch adds MultiOnDiskHashTable for specializations in the ASTReader. Then we will only deserialize the specializations with the same template arguments. We made that by using ODRHash for the template arguments as the key of the hash table.

The partial specializations are not added to the MultiOnDiskHashTable. Since we can't know if a partial specialization is needed before deciding the template declaration for a instantiation request. There may be space for further optimizations, but let's do that in the future.

To review this patch, I think ASTReaderDecl::AddLazySpecializations may be a good entry point.

llvmbot · 2024-01-03T03:27:07Z

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)

Changes

The idea comes from @vgvassilev and @vgvassilev had patch for it on phab. Unfortunately phab is closed and I forgot the Dxxx number of that patch. But I remember the last comment from @vgvassilev is that we should use MultiOnDiskHashTable for it. So I followed that and rewrite the whole from the scratch in the new year.

Background

Currently all the specializations of a template (including instantiation, specialization and partial specializations) will be loaded at once if we want to instantiate another instance for the template, or find instantiation for the template, or just want to complete the redecl chain.

This means basically we need to load every specializations for the template once the template declaration got loaded. This is bad since when we load a specialization, we need to load all of its template arguments. Then we have to deserialize a lot of unnecessary declarations.

For example,

// M.cppm
export module M;
export template &lt;class T&gt;
class A {};

export class ShouldNotBeLoaded {};

export class Temp {
   A&lt;ShouldNotBeLoaded&gt; AS;
};

// use.cpp
import M;
A&lt;int&gt; a;

We should a specialization A<ShouldNotBeLoaded> in M.cppm and we instantiate the template A in use.cpp. Then we will deserialize ShouldNotBeLoaded surprisingly when compiling use.cpp. And this patch tries to avoid that.

Given that the templates are heavily used in C++, this is a pain point for the performance.

What this patch did

This patch adds MultiOnDiskHashTable for specializations in the ASTReader. Then we will only deserialize the specializations with the same template arguments. We made that by using ODRHash for the template arguments as the key of the hash table.

The partial specializations are not added to the MultiOnDiskHashTable. Since we can't know if a partial specialization is needed before deciding the template declaration for a instantiation request. There may be space for further optimizations, but let's do that in the future.

To review this patch, I think ASTReaderDecl::AddLazySpecializations may be a good entry point.

What this patch not did

This patch doesn't solve the problem completely. Since we will add update specializations if there are new specializations in a different module:

llvm-project/clang/lib/Serialization/ASTWriterDecl.cpp

Lines 251 to 269 in 8ae73fe

    
           void RegisterTemplateSpecialization(const Decl *Template, 
        
                                               const Decl *Specialization) { 
        
             Template = Template->getCanonicalDecl(); 
        
             // If the canonical template is local, we'll write out this specialization 
        
             // when we emit it. 
        
             // FIXME: We can do the same thing if there is any local declaration of 
        
             // the template, to avoid emitting an update record. 
        
             if (!Template->isFromASTFile()) 
        
               return; 
        
             // We only need to associate the first local declaration of the 
        
             // specialization. The other declarations will get pulled in by it. 
        
             if (Writer.getFirstLocalDecl(Specialization) != Specialization) 
        
               return; 
        
             Writer.DeclUpdates[Template].push_back(ASTWriter::DeclUpdate( 
        
                 UPD_CXX_ADDED_TEMPLATE_SPECIALIZATION, Specialization)); 
        
           }

That said, we can't handle this case now:

// M.cppm
export module M;
export template &lt;class T&gt;
class A {};

// N.cppm
export module N;
export import A;
export class ShouldNotBeLoaded {};

export class Temp {
   A&lt;ShouldNotBeLoaded&gt; AS;
};

// use.cpp
import N;
A&lt;int&gt; a;

Now ShouldNotBeLoaded will still be loaded.

But the current patch is already relatively big. So I want to split it in the next patch. I think the current patch is already self contained.

Patch is 53.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/76774.diff

20 Files Affected:

(modified) clang/include/clang/AST/DeclTemplate.h (+41-10)
(modified) clang/include/clang/AST/ExternalASTSource.h (+5)
(modified) clang/include/clang/AST/ODRHash.h (+3)
(modified) clang/include/clang/Sema/MultiplexExternalSemaSource.h (+6)
(modified) clang/include/clang/Serialization/ASTBitCodes.h (+3)
(modified) clang/include/clang/Serialization/ASTReader.h (+19)
(modified) clang/include/clang/Serialization/ASTWriter.h (+6)
(modified) clang/lib/AST/DeclTemplate.cpp (+45-21)
(modified) clang/lib/AST/ExternalASTSource.cpp (+5)
(modified) clang/lib/AST/ODRHash.cpp (+2)
(modified) clang/lib/Sema/MultiplexExternalSemaSource.cpp (+6)
(modified) clang/lib/Serialization/ASTReader.cpp (+103-6)
(modified) clang/lib/Serialization/ASTReaderDecl.cpp (+28-5)
(modified) clang/lib/Serialization/ASTReaderInternals.h (+80)
(modified) clang/lib/Serialization/ASTWriter.cpp (+148-1)
(modified) clang/lib/Serialization/ASTWriterDecl.cpp (+55-20)
(modified) clang/test/Modules/odr_hash.cpp (+2-2)
(added) clang/test/Modules/static-member-in-templates.cppm (+52)
(modified) clang/unittests/Serialization/CMakeLists.txt (+1)
(added) clang/unittests/Serialization/LoadSpecLazily.cpp (+159)

diff --git a/clang/include/clang/AST/DeclTemplate.h b/clang/include/clang/AST/DeclTemplate.h
index 832ad2de6b08a8..ab380f55c038ee 100644
--- a/clang/include/clang/AST/DeclTemplate.h
+++ b/clang/include/clang/AST/DeclTemplate.h
@@ -30,6 +30,7 @@
 #include "llvm/ADT/FoldingSet.h"
 #include "llvm/ADT/PointerIntPair.h"
 #include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/iterator.h"
 #include "llvm/ADT/iterator_range.h"
 #include "llvm/Support/Casting.h"
@@ -525,8 +526,11 @@ class FunctionTemplateSpecializationInfo final
     return Function.getInt();
   }
 
+  void loadExternalRedecls();
+
 public:
   friend TrailingObjects;
+  friend class ASTReader;
 
   static FunctionTemplateSpecializationInfo *
   Create(ASTContext &C, FunctionDecl *FD, FunctionTemplateDecl *Template,
@@ -789,13 +793,15 @@ class RedeclarableTemplateDecl : public TemplateDecl,
     return SpecIterator<EntryType>(isEnd ? Specs.end() : Specs.begin());
   }
 
-  void loadLazySpecializationsImpl() const;
+  void loadExternalSpecializations() const;
 
   template <class EntryType, typename ...ProfileArguments>
   typename SpecEntryTraits<EntryType>::DeclType*
   findSpecializationImpl(llvm::FoldingSetVector<EntryType> &Specs,
                          void *&InsertPos, ProfileArguments &&...ProfileArgs);
 
+  void loadLazySpecializationsWithArgs(ArrayRef<TemplateArgument> TemplateArgs);
+
   template <class Derived, class EntryType>
   void addSpecializationImpl(llvm::FoldingSetVector<EntryType> &Specs,
                              EntryType *Entry, void *InsertPos);
@@ -814,9 +820,13 @@ class RedeclarableTemplateDecl : public TemplateDecl,
     /// If non-null, points to an array of specializations (including
     /// partial specializations) known only by their external declaration IDs.
     ///
+    /// These specializations needs to be loaded at once in
+    /// loadExternalSpecializations to complete the redecl chain or be preparing
+    /// for template resolution.
+    ///
     /// The first value in the array is the number of specializations/partial
     /// specializations that follow.
-    uint32_t *LazySpecializations = nullptr;
+    uint32_t *ExternalSpecializations = nullptr;
 
     /// The set of "injected" template arguments used within this
     /// template.
@@ -850,6 +860,8 @@ class RedeclarableTemplateDecl : public TemplateDecl,
   friend class ASTDeclWriter;
   friend class ASTReader;
   template <class decl_type> friend class RedeclarableTemplate;
+  friend class ClassTemplateSpecializationDecl;
+  friend class VarTemplateSpecializationDecl;
 
   /// Retrieves the canonical declaration of this template.
   RedeclarableTemplateDecl *getCanonicalDecl() override {
@@ -977,6 +989,12 @@ SpecEntryTraits<FunctionTemplateSpecializationInfo> {
 class FunctionTemplateDecl : public RedeclarableTemplateDecl {
 protected:
   friend class FunctionDecl;
+  friend class FunctionTemplateSpecializationInfo;
+
+  template <typename DeclTy>
+  friend void GetSpecializationsImpl(const DeclTy *,
+                                     llvm::SmallPtrSetImpl<const NamedDecl *> &,
+                                     ASTReader *Reader);
 
   /// Data that is common to all of the declarations of a given
   /// function template.
@@ -1012,13 +1030,13 @@ class FunctionTemplateDecl : public RedeclarableTemplateDecl {
   void addSpecialization(FunctionTemplateSpecializationInfo* Info,
                          void *InsertPos);
 
+  /// Load any lazily-loaded specializations from the external source.
+  void LoadLazySpecializations() const;
+
 public:
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
 
-  /// Load any lazily-loaded specializations from the external source.
-  void LoadLazySpecializations() const;
-
   /// Get the underlying function declaration of the template.
   FunctionDecl *getTemplatedDecl() const {
     return static_cast<FunctionDecl *>(TemplatedDecl);
@@ -1839,6 +1857,8 @@ class ClassTemplateSpecializationDecl
   LLVM_PREFERRED_TYPE(TemplateSpecializationKind)
   unsigned SpecializationKind : 3;
 
+  void loadExternalRedecls();
+
 protected:
   ClassTemplateSpecializationDecl(ASTContext &Context, Kind DK, TagKind TK,
                                   DeclContext *DC, SourceLocation StartLoc,
@@ -1852,6 +1872,7 @@ class ClassTemplateSpecializationDecl
 public:
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
+  friend class ASTReader;
 
   static ClassTemplateSpecializationDecl *
   Create(ASTContext &Context, TagKind TK, DeclContext *DC,
@@ -2238,6 +2259,11 @@ class ClassTemplatePartialSpecializationDecl
 /// Declaration of a class template.
 class ClassTemplateDecl : public RedeclarableTemplateDecl {
 protected:
+  template <typename DeclTy>
+  friend void GetSpecializationsImpl(const DeclTy *,
+                                     llvm::SmallPtrSetImpl<const NamedDecl *> &,
+                                     ASTReader *Reader);
+
   /// Data that is common to all of the declarations of a given
   /// class template.
   struct Common : CommonBase {
@@ -2285,9 +2311,7 @@ class ClassTemplateDecl : public RedeclarableTemplateDecl {
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
   friend class TemplateDeclInstantiator;
-
-  /// Load any lazily-loaded specializations from the external source.
-  void LoadLazySpecializations() const;
+  friend class ClassTemplateSpecializationDecl;
 
   /// Get the underlying class declarations of the template.
   CXXRecordDecl *getTemplatedDecl() const {
@@ -2651,6 +2675,8 @@ class VarTemplateSpecializationDecl : public VarDecl,
   LLVM_PREFERRED_TYPE(bool)
   unsigned IsCompleteDefinition : 1;
 
+  void loadExternalRedecls();
+
 protected:
   VarTemplateSpecializationDecl(Kind DK, ASTContext &Context, DeclContext *DC,
                                 SourceLocation StartLoc, SourceLocation IdLoc,
@@ -2664,6 +2690,7 @@ class VarTemplateSpecializationDecl : public VarDecl,
 public:
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
+  friend class ASTReader;
   friend class VarDecl;
 
   static VarTemplateSpecializationDecl *
@@ -3018,6 +3045,11 @@ class VarTemplatePartialSpecializationDecl
 /// Declaration of a variable template.
 class VarTemplateDecl : public RedeclarableTemplateDecl {
 protected:
+  template <typename DeclTy>
+  friend void GetSpecializationsImpl(const DeclTy *,
+                                     llvm::SmallPtrSetImpl<const NamedDecl *> &,
+                                     ASTReader *Reader);
+
   /// Data that is common to all of the declarations of a given
   /// variable template.
   struct Common : CommonBase {
@@ -3057,8 +3089,7 @@ class VarTemplateDecl : public RedeclarableTemplateDecl {
   friend class ASTDeclReader;
   friend class ASTDeclWriter;
 
-  /// Load any lazily-loaded specializations from the external source.
-  void LoadLazySpecializations() const;
+  friend class VarTemplatePartialSpecializationDecl;
 
   /// Get the underlying variable declarations of the template.
   VarDecl *getTemplatedDecl() const {
diff --git a/clang/include/clang/AST/ExternalASTSource.h b/clang/include/clang/AST/ExternalASTSource.h
index 8e573965b0a336..7f26afd53106ba 100644
--- a/clang/include/clang/AST/ExternalASTSource.h
+++ b/clang/include/clang/AST/ExternalASTSource.h
@@ -150,6 +150,11 @@ class ExternalASTSource : public RefCountedBase<ExternalASTSource> {
   virtual bool
   FindExternalVisibleDeclsByName(const DeclContext *DC, DeclarationName Name);
 
+  /// Load all the external specialzations for the Decl and the corresponding
+  /// template arguments.
+  virtual void LoadExternalSpecs(const Decl *D,
+                                 ArrayRef<TemplateArgument> TemplateArgs);
+
   /// Ensures that the table of all visible declarations inside this
   /// context is up to date.
   ///
diff --git a/clang/include/clang/AST/ODRHash.h b/clang/include/clang/AST/ODRHash.h
index cedf644520fc32..ddd1bb0f095e75 100644
--- a/clang/include/clang/AST/ODRHash.h
+++ b/clang/include/clang/AST/ODRHash.h
@@ -101,6 +101,9 @@ class ODRHash {
   // Save booleans until the end to lower the size of data to process.
   void AddBoolean(bool value);
 
+  // Add intergers to ID.
+  void AddInteger(unsigned Value);
+
   static bool isSubDeclToBeProcessed(const Decl *D, const DeclContext *Parent);
 
 private:
diff --git a/clang/include/clang/Sema/MultiplexExternalSemaSource.h b/clang/include/clang/Sema/MultiplexExternalSemaSource.h
index 2bf91cb5212c5e..886c3854adac6e 100644
--- a/clang/include/clang/Sema/MultiplexExternalSemaSource.h
+++ b/clang/include/clang/Sema/MultiplexExternalSemaSource.h
@@ -97,6 +97,12 @@ class MultiplexExternalSemaSource : public ExternalSemaSource {
   bool FindExternalVisibleDeclsByName(const DeclContext *DC,
                                       DeclarationName Name) override;
 
+  /// Load all the external specialzations for the Decl and the corresponding
+  /// template args.
+  virtual void
+  LoadExternalSpecs(const Decl *D,
+                    ArrayRef<TemplateArgument> TemplateArgs) override;
+
   /// Ensures that the table of all visible declarations inside this
   /// context is up to date.
   void completeVisibleDeclsMap(const DeclContext *DC) override;
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index fdd64f2abbe937..a1bf3659e91f3e 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -1523,6 +1523,9 @@ enum DeclCode {
   /// An ImplicitConceptSpecializationDecl record.
   DECL_IMPLICIT_CONCEPT_SPECIALIZATION,
 
+  // A decls specilization record.
+  DECL_SPECS,
+
   DECL_LAST = DECL_IMPLICIT_CONCEPT_SPECIALIZATION
 };
 
diff --git a/clang/include/clang/Serialization/ASTReader.h b/clang/include/clang/Serialization/ASTReader.h
index 21d791f5cd89a2..52ca6c76db8e37 100644
--- a/clang/include/clang/Serialization/ASTReader.h
+++ b/clang/include/clang/Serialization/ASTReader.h
@@ -340,6 +340,9 @@ class ASTIdentifierLookupTrait;
 /// The on-disk hash table(s) used for DeclContext name lookup.
 struct DeclContextLookupTable;
 
+/// The on-disk hash table(s) used for specialization decls.
+struct SpecializedDeclsLookupTable;
+
 } // namespace reader
 
 } // namespace serialization
@@ -599,6 +602,11 @@ class ASTReader
   llvm::DenseMap<const DeclContext *,
                  serialization::reader::DeclContextLookupTable> Lookups;
 
+  /// Map from decls to specialized decls.
+  llvm::DenseMap<const Decl *,
+                 serialization::reader::SpecializedDeclsLookupTable>
+      SpecLookups;
+
   // Updates for visible decls can occur for other contexts than just the
   // TU, and when we read those update records, the actual context may not
   // be available yet, so have this pending map using the ID as a key. It
@@ -640,6 +648,9 @@ class ASTReader
                                      llvm::BitstreamCursor &Cursor,
                                      uint64_t Offset, serialization::DeclID ID);
 
+  bool ReadDeclsSpecs(ModuleFile &M, llvm::BitstreamCursor &Cursor,
+                      uint64_t Offset, Decl *D);
+
   /// A vector containing identifiers that have already been
   /// loaded.
   ///
@@ -1343,6 +1354,11 @@ class ASTReader
   const serialization::reader::DeclContextLookupTable *
   getLoadedLookupTables(DeclContext *Primary) const;
 
+  /// Get the loaded specializations lookup tables for \p D,
+  /// if any.
+  serialization::reader::SpecializedDeclsLookupTable *
+  getLoadedSpecLookupTables(Decl *D);
+
 private:
   struct ImportedModule {
     ModuleFile *Mod;
@@ -1982,6 +1998,9 @@ class ASTReader
   bool FindExternalVisibleDeclsByName(const DeclContext *DC,
                                       DeclarationName Name) override;
 
+  void LoadExternalSpecs(const Decl *D,
+                         ArrayRef<TemplateArgument> TemplateArgs) override;
+
   /// Read all of the declarations lexically stored in a
   /// declaration context.
   ///
diff --git a/clang/include/clang/Serialization/ASTWriter.h b/clang/include/clang/Serialization/ASTWriter.h
index de69f99003d827..c98beaa1a24dc0 100644
--- a/clang/include/clang/Serialization/ASTWriter.h
+++ b/clang/include/clang/Serialization/ASTWriter.h
@@ -527,6 +527,10 @@ class ASTWriter : public ASTDeserializationListener,
   bool isLookupResultExternal(StoredDeclsList &Result, DeclContext *DC);
   bool isLookupResultEntirelyExternal(StoredDeclsList &Result, DeclContext *DC);
 
+  uint64_t
+  WriteSpecsLookupTable(NamedDecl *D,
+                        llvm::SmallVectorImpl<const NamedDecl *> &Specs);
+
   void GenerateNameLookupTable(const DeclContext *DC,
                                llvm::SmallVectorImpl<char> &LookupTable);
   uint64_t WriteDeclContextLexicalBlock(ASTContext &Context, DeclContext *DC);
@@ -564,6 +568,8 @@ class ASTWriter : public ASTDeserializationListener,
   unsigned DeclEnumAbbrev = 0;
   unsigned DeclObjCIvarAbbrev = 0;
   unsigned DeclCXXMethodAbbrev = 0;
+  unsigned DeclSpecsAbbrev = 0;
+
   unsigned DeclDependentNonTemplateCXXMethodAbbrev = 0;
   unsigned DeclTemplateCXXMethodAbbrev = 0;
   unsigned DeclMemberSpecializedCXXMethodAbbrev = 0;
diff --git a/clang/lib/AST/DeclTemplate.cpp b/clang/lib/AST/DeclTemplate.cpp
index 7d7556e670f951..43c9158fb40413 100644
--- a/clang/lib/AST/DeclTemplate.cpp
+++ b/clang/lib/AST/DeclTemplate.cpp
@@ -331,14 +331,14 @@ RedeclarableTemplateDecl::CommonBase *RedeclarableTemplateDecl::getCommonPtr() c
   return Common;
 }
 
-void RedeclarableTemplateDecl::loadLazySpecializationsImpl() const {
+void RedeclarableTemplateDecl::loadExternalSpecializations() const {
   // Grab the most recent declaration to ensure we've loaded any lazy
   // redeclarations of this template.
   CommonBase *CommonBasePtr = getMostRecentDecl()->getCommonPtr();
-  if (CommonBasePtr->LazySpecializations) {
+  if (CommonBasePtr->ExternalSpecializations) {
     ASTContext &Context = getASTContext();
-    uint32_t *Specs = CommonBasePtr->LazySpecializations;
-    CommonBasePtr->LazySpecializations = nullptr;
+    uint32_t *Specs = CommonBasePtr->ExternalSpecializations;
+    CommonBasePtr->ExternalSpecializations = nullptr;
     for (uint32_t I = 0, N = *Specs++; I != N; ++I)
       (void)Context.getExternalSource()->GetExternalDecl(Specs[I]);
   }
@@ -358,6 +358,15 @@ RedeclarableTemplateDecl::findSpecializationImpl(
   return Entry ? SETraits::getDecl(Entry)->getMostRecentDecl() : nullptr;
 }
 
+void RedeclarableTemplateDecl::loadLazySpecializationsWithArgs(
+    ArrayRef<TemplateArgument> TemplateArgs) {
+  auto *ExternalSource = getASTContext().getExternalSource();
+  if (!ExternalSource)
+    return;
+
+  ExternalSource->LoadExternalSpecs(this->getCanonicalDecl(), TemplateArgs);
+}
+
 template<class Derived, class EntryType>
 void RedeclarableTemplateDecl::addSpecializationImpl(
     llvm::FoldingSetVector<EntryType> &Specializations, EntryType *Entry,
@@ -430,24 +439,23 @@ FunctionTemplateDecl::newCommon(ASTContext &C) const {
   return CommonPtr;
 }
 
-void FunctionTemplateDecl::LoadLazySpecializations() const {
-  loadLazySpecializationsImpl();
-}
-
 llvm::FoldingSetVector<FunctionTemplateSpecializationInfo> &
 FunctionTemplateDecl::getSpecializations() const {
-  LoadLazySpecializations();
+  loadExternalSpecializations();
   return getCommonPtr()->Specializations;
 }
 
 FunctionDecl *
 FunctionTemplateDecl::findSpecialization(ArrayRef<TemplateArgument> Args,
                                          void *&InsertPos) {
+  loadLazySpecializationsWithArgs(Args);
   return findSpecializationImpl(getSpecializations(), InsertPos, Args);
 }
 
 void FunctionTemplateDecl::addSpecialization(
       FunctionTemplateSpecializationInfo *Info, void *InsertPos) {
+  using SETraits = SpecEntryTraits<FunctionTemplateSpecializationInfo>;
+  loadLazySpecializationsWithArgs(SETraits::getTemplateArgs(Info));
   addSpecializationImpl<FunctionTemplateDecl>(getSpecializations(), Info,
                                               InsertPos);
 }
@@ -508,19 +516,15 @@ ClassTemplateDecl *ClassTemplateDecl::CreateDeserialized(ASTContext &C,
                                        DeclarationName(), nullptr, nullptr);
 }
 
-void ClassTemplateDecl::LoadLazySpecializations() const {
-  loadLazySpecializationsImpl();
-}
-
 llvm::FoldingSetVector<ClassTemplateSpecializationDecl> &
 ClassTemplateDecl::getSpecializations() const {
-  LoadLazySpecializations();
+  loadExternalSpecializations();
   return getCommonPtr()->Specializations;
 }
 
 llvm::FoldingSetVector<ClassTemplatePartialSpecializationDecl> &
 ClassTemplateDecl::getPartialSpecializations() const {
-  LoadLazySpecializations();
+  loadExternalSpecializations();
   return getCommonPtr()->PartialSpecializations;
 }
 
@@ -534,11 +538,14 @@ ClassTemplateDecl::newCommon(ASTContext &C) const {
 ClassTemplateSpecializationDecl *
 ClassTemplateDecl::findSpecialization(ArrayRef<TemplateArgument> Args,
                                       void *&InsertPos) {
+  loadLazySpecializationsWithArgs(Args);
   return findSpecializationImpl(getSpecializations(), InsertPos, Args);
 }
 
 void ClassTemplateDecl::AddSpecialization(ClassTemplateSpecializationDecl *D,
                                           void *InsertPos) {
+  using SETraits = SpecEntryTraits<ClassTemplateSpecializationDecl>;
+  loadLazySpecializationsWithArgs(SETraits::getTemplateArgs(D));
   addSpecializationImpl<ClassTemplateDecl>(getSpecializations(), D, InsertPos);
 }
 
@@ -546,6 +553,7 @@ ClassTemplatePartialSpecializationDecl *
 ClassTemplateDecl::findPartialSpecialization(
     ArrayRef<TemplateArgument> Args,
     TemplateParameterList *TPL, void *&InsertPos) {
+  loadLazySpecializationsWithArgs(Args);
   return findSpecializationImpl(getPartialSpecializations(), InsertPos, Args,
                                 TPL);
 }
@@ -900,6 +908,11 @@ FunctionTemplateSpecializationInfo *FunctionTemplateSpecializationInfo::Create(
       FD, Template, TSK, TemplateArgs, ArgsAsWritten, POI, MSInfo);
 }
 
+void FunctionTemplateSpecializationInfo::loadExternalRedecls() {
+  getTemplate()->loadExternalSpecializations();
+  getTemplate()->loadLazySpecializationsWithArgs(TemplateArguments->asArray());
+}
+
 //===----------------------------------------------------------------------===//
 // ClassTemplateSpecializationDecl Implementation
 //===----------------------------------------------------------------------===//
@@ -1024,6 +1037,12 @@ ClassTemplateSpecializationDecl::getSourceRange() const {
   }
 }
 
+void ClassTemplateSpecializationDecl::loadExternalRedecls() {
+  getSpecializedTemplate()->loadExternalSpecializations();
+  getSpecializedTemplate()->loadLazySpecializationsWithArgs(
+      getTemplateArgs().asArray());
+}
+
 //===----------------------------------------------------------------------===//
 // ConceptDecl Implementation
 //===----------------------------------------------------------------------===//
@@ -1226,19 +1245,15 @@ VarTemplateDecl *VarTemplateDecl::CreateDeserialized(ASTContext &C,
                                      DeclarationName(), nullptr, nullptr);
 }
 
-void VarTemplateDecl::LoadLazySpecializations() const {
-  loadLazySpecializationsImpl();
-}
-
 llvm::FoldingSetVector<VarTemplateSpecializationDecl> &
 VarTemplateDecl::getSpecializations() const {
-  LoadLazySpecializations();
+  loadExternalSpecializations();
   return getCommonPtr()->Specializations;
 }
 
 llvm::FoldingSetVector<VarTemplatePartialSpecializationDecl> &
 VarTemplateDecl::getPartialSpecializations() const {
-  LoadLazySpecializations();
+  loadExternalSpecializations();
   return getCommonPtr()->PartialSpecializations;
 }
 
@@ -1252,17 +1267,21 @@ VarTemplateDecl::newCommon(ASTContext &C) const {
 VarTemplateSpecializationDecl *
 VarTemplateDecl::findSpecialization(ArrayRef<TemplateArgument> Args,
                                     void *&InsertPos) {
+  loadLazySpecializationsWithArgs(Args);
   return findSpecializationImpl(getSpecializations(), InsertPos, Args);
 }
 
 void VarTemplateDecl::AddSpecialization(VarTemplateSpecializationDecl *D,
                                         void *InsertPos) {
+  using SETraits = SpecEntryTraits<VarTemplateSpecializationDecl>;
+  loadLazySpecializationsWithArgs(SETraits::getTemplateArgs(D));
   addSpecializationImpl<VarTemplateDecl>(getSpecializations(), D, InsertPos);
 }
 
 VarTemplatePartialSpecializationD...
[truncated]

github-actions · 2024-01-03T03:49:08Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 4118082f651a05cca258c684ab1199578b57afac 22c9d1145eb57d9c2cb2ef490b7c474598dd5d12 -- clang/unittests/Serialization/LoadSpecLazilyTest.cpp clang/include/clang/AST/DeclTemplate.h clang/include/clang/AST/ExternalASTSource.h clang/include/clang/AST/ODRHash.h clang/include/clang/Sema/MultiplexExternalSemaSource.h clang/include/clang/Serialization/ASTBitCodes.h clang/include/clang/Serialization/ASTReader.h clang/include/clang/Serialization/ASTWriter.h clang/lib/AST/DeclTemplate.cpp clang/lib/AST/ExternalASTSource.cpp clang/lib/AST/ODRHash.cpp clang/lib/Driver/ToolChains/Clang.cpp clang/lib/Sema/MultiplexExternalSemaSource.cpp clang/lib/Serialization/ASTCommon.h clang/lib/Serialization/ASTReader.cpp clang/lib/Serialization/ASTReaderDecl.cpp clang/lib/Serialization/ASTReaderInternals.h clang/lib/Serialization/ASTWriter.cpp clang/lib/Serialization/ASTWriterDecl.cpp

View the diff from clang-format here.

diff --git a/clang/include/clang/AST/DeclTemplate.h b/clang/include/clang/AST/DeclTemplate.h
index 515c60e51e..26a9ebc468 100644
--- a/clang/include/clang/AST/DeclTemplate.h
+++ b/clang/include/clang/AST/DeclTemplate.h
@@ -802,7 +802,7 @@ protected:
   template <class EntryType, typename... ProfileArguments>
   typename SpecEntryTraits<EntryType>::DeclType *
   findLocalSpecialization(llvm::FoldingSetVector<EntryType> &Specs,
-                          void *&InsertPos, ProfileArguments &&... ProfileArgs);
+                          void *&InsertPos, ProfileArguments &&...ProfileArgs);
 
   bool loadLazySpecializationsWithArgs(ArrayRef<TemplateArgument> TemplateArgs);
 
diff --git a/clang/lib/AST/DeclTemplate.cpp b/clang/lib/AST/DeclTemplate.cpp
index f7d513b096..c2bbf29f61 100644
--- a/clang/lib/AST/DeclTemplate.cpp
+++ b/clang/lib/AST/DeclTemplate.cpp
@@ -344,7 +344,7 @@ void RedeclarableTemplateDecl::loadExternalSpecializations() const {
   }
 
   // We still load all the external specializations explicitly in the case
-  // the writer specified `-fload-external-specializations-lazily`. 
+  // the writer specified `-fload-external-specializations-lazily`.
   if (!getASTContext().getLangOpts().LoadExternalSpecializationsLazily &&
       getASTContext().getExternalSource())
     getASTContext().getExternalSource()->LoadAllExternalSpecializations(
@@ -355,7 +355,7 @@ template <class EntryType, typename... ProfileArguments>
 typename RedeclarableTemplateDecl::SpecEntryTraits<EntryType>::DeclType *
 RedeclarableTemplateDecl::findLocalSpecialization(
     llvm::FoldingSetVector<EntryType> &Specs, void *&InsertPos,
-    ProfileArguments &&... ProfileArgs) {
+    ProfileArguments &&...ProfileArgs) {
   using SETraits = SpecEntryTraits<EntryType>;
 
   llvm::FoldingSetNodeID ID;
@@ -370,7 +370,7 @@ template <class EntryType, typename... ProfileArguments>
 typename RedeclarableTemplateDecl::SpecEntryTraits<EntryType>::DeclType *
 RedeclarableTemplateDecl::findSpecializationImpl(
     llvm::FoldingSetVector<EntryType> &Specs, void *&InsertPos,
-    ProfileArguments &&... ProfileArgs) {
+    ProfileArguments &&...ProfileArgs) {
   if (auto *Ret = findLocalSpecialization(
           Specs, InsertPos, std::forward<ProfileArguments>(ProfileArgs)...))
     return Ret;
diff --git a/clang/lib/AST/ODRHash.cpp b/clang/lib/AST/ODRHash.cpp
index 9e274ff596..72a9a870ea 100644
--- a/clang/lib/AST/ODRHash.cpp
+++ b/clang/lib/AST/ODRHash.cpp
@@ -1318,4 +1318,3 @@ void ODRHash::AddStructuralValue(const APValue &Value) {
 }
 
 void ODRHash::AddInteger(unsigned Value) { ID.AddInteger(Value); }
-
diff --git a/clang/lib/Serialization/ASTReaderDecl.cpp b/clang/lib/Serialization/ASTReaderDecl.cpp
index 99b02f3987..1facfd4865 100644
--- a/clang/lib/Serialization/ASTReaderDecl.cpp
+++ b/clang/lib/Serialization/ASTReaderDecl.cpp
@@ -265,9 +265,10 @@ namespace clang {
         : Reader(Reader), Record(Record), Loc(Loc), ThisDeclID(thisDeclID),
           ThisDeclLoc(ThisDeclLoc) {}
 
-    template <typename T> static
-    void AddExternalSpecializations(T *D,
-                                SmallVectorImpl<serialization::DeclID>& IDs) {
+    template <typename T>
+    static void
+    AddExternalSpecializations(T *D,
+                               SmallVectorImpl<serialization::DeclID> &IDs) {
       if (IDs.empty())
         return;
 
@@ -4273,11 +4274,14 @@ void ASTReader::loadDeclUpdateRecords(PendingUpdateRecord &Record) {
           isa<ClassTemplateDecl, VarTemplateDecl, FunctionTemplateDecl>(D)) &&
          "Must not have pending specializations");
   if (auto *CTD = dyn_cast<ClassTemplateDecl>(D))
-    ASTDeclReader::AddExternalSpecializations(CTD, PendingExternalSpecializationIDs);
+    ASTDeclReader::AddExternalSpecializations(CTD,
+                                              PendingExternalSpecializationIDs);
   else if (auto *FTD = dyn_cast<FunctionTemplateDecl>(D))
-    ASTDeclReader::AddExternalSpecializations(FTD, PendingExternalSpecializationIDs);
+    ASTDeclReader::AddExternalSpecializations(FTD,
+                                              PendingExternalSpecializationIDs);
   else if (auto *VTD = dyn_cast<VarTemplateDecl>(D))
-    ASTDeclReader::AddExternalSpecializations(VTD, PendingExternalSpecializationIDs);
+    ASTDeclReader::AddExternalSpecializations(VTD,
+                                              PendingExternalSpecializationIDs);
   PendingExternalSpecializationIDs.clear();
 
   // Load the pending visible updates for this decl context, if it has any.
diff --git a/clang/unittests/Serialization/LoadSpecLazilyTest.cpp b/clang/unittests/Serialization/LoadSpecLazilyTest.cpp
index 39b183f774..58ffd1ca38 100644
--- a/clang/unittests/Serialization/LoadSpecLazilyTest.cpp
+++ b/clang/unittests/Serialization/LoadSpecLazilyTest.cpp
@@ -46,8 +46,7 @@ public:
     OS << Contents;
   }
 
-  std::string GenerateModuleInterface(StringRef ModuleName,
-                                      StringRef Contents,
+  std::string GenerateModuleInterface(StringRef ModuleName, StringRef Contents,
                                       bool WriteExternalSpecsTable) {
     std::string FileName = llvm::Twine(ModuleName + ".cppm").str();
     addFile(FileName, Contents);
@@ -65,9 +64,9 @@ public:
     const char *Args[] = {"clang++",
                           "-std=c++20",
                           "--precompile",
-                          (WriteExternalSpecsTable ?
-                           "-fload-external-specializations-lazily" :
-                           ""),
+                          (WriteExternalSpecsTable
+                               ? "-fload-external-specializations-lazily"
+                               : ""),
                           PrebuiltModulePath.c_str(),
                           "-working-directory",
                           TestDir.c_str(),
@@ -159,7 +158,8 @@ export class ShouldNotBeLoaded {};
 export class Temp {
    A<ShouldNotBeLoaded> AS;
 };
-  )cpp", /*WriteExternalSpecsTable=*/true);
+  )cpp",
+                          /*WriteExternalSpecsTable=*/true);
 
   const char *test_file_contents = R"cpp(
 import M;
@@ -185,7 +185,8 @@ TEST_F(LoadSpecLazilyTest, ChainedTest) {
 export module M;
 export template <class T>
 class A {};
-  )cpp", /*WriteExternalSpecsTable=*/true);
+  )cpp",
+                          /*WriteExternalSpecsTable=*/true);
 
   GenerateModuleInterface("N", R"cpp(
 export module N;
@@ -195,7 +196,8 @@ export class ShouldNotBeLoaded {};
 export class Temp {
    A<ShouldNotBeLoaded> AS;
 };
-  )cpp", /*WriteExternalSpecsTable=*/true);
+  )cpp",
+                          /*WriteExternalSpecsTable=*/true);
 
   const char *test_file_contents = R"cpp(
 import N;
@@ -223,7 +225,8 @@ TEST_F(LoadSpecLazilyTest, LoadAllTest) {
 export module M;
 export template <class T>
 class A {};
-  )cpp", /*WriteExternalSpecsTable=*/true);
+  )cpp",
+                          /*WriteExternalSpecsTable=*/true);
 
   GenerateModuleInterface("N", R"cpp(
 export module N;
@@ -233,7 +236,8 @@ export class ShouldBeLoaded {};
 export class Temp {
    A<ShouldBeLoaded> AS;
 };
-  )cpp", /*WriteExternalSpecsTable=*/true);
+  )cpp",
+                          /*WriteExternalSpecsTable=*/true);
 
   const char *test_file_contents = R"cpp(
 import N;

vgvassilev

This is a great way to start a new year ;)

The phab link is https://reviews.llvm.org/D41416.

In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.

Perhaps we should put both patches together and that'd allow us to test them if they are on par with https://reviews.llvm.org/D41416 which we use downstream.

Thanks for working on this!

vgvassilev · 2024-01-03T07:25:25Z

clang/include/clang/AST/ExternalASTSource.h

@@ -150,6 +150,11 @@ class ExternalASTSource : public RefCountedBase<ExternalASTSource> {
  virtual bool
  FindExternalVisibleDeclsByName(const DeclContext *DC, DeclarationName Name);

+  /// Load all the external specialzations for the Decl and the corresponding
+  /// template arguments.
+  virtual void LoadExternalSpecs(const Decl *D,


Suggested change

virtual void LoadExternalSpecs(const Decl *D,

virtual void FindExternalSpecialization(const Decl *D,

sounds more consistent to the surroundings here.

I feel Load may be a better name. Since from the signature it doesn't find anything. And if we want consistency, I suggest to rename FindExternalVisibleDeclsByName to LoadExternalVisibleDeclsByName.

vgvassilev · 2024-01-03T07:26:47Z

clang/include/clang/Serialization/ASTWriter.h

@@ -527,6 +527,10 @@ class ASTWriter : public ASTDeserializationListener,
  bool isLookupResultExternal(StoredDeclsList &Result, DeclContext *DC);
  bool isLookupResultEntirelyExternal(StoredDeclsList &Result, DeclContext *DC);

+  uint64_t
+  WriteSpecsLookupTable(NamedDecl *D,


Generally spec would read as specification not specialization. Maybe we should use the full word.

Got it. Will do in the next circle.

vgvassilev · 2024-01-03T07:54:57Z

clang/lib/Serialization/ASTReaderDecl.cpp

    SmallVector<serialization::DeclID, 32> SpecIDs;
    readDeclIDList(SpecIDs);
+
+    if (Record.readInt())
+      ReadDeclsSpecs(*Loc.F, D, Loc.F->DeclsCursor);


What if the TemplateDecl came from a different module file and this module file contains only specializations?

Then it won't fall here. It is the job of the latter patch (ChuanqiXu9@7f027f0)

vgvassilev · 2024-01-03T08:02:54Z

clang/lib/AST/ODRHash.cpp

@@ -1249,3 +1249,5 @@ void ODRHash::AddQualType(QualType T) {
 void ODRHash::AddBoolean(bool Value) {
  Bools.push_back(Value);
 }
+
+void ODRHash::AddInteger(unsigned Value) { ID.AddInteger(Value); }


I remember @hahnjo and @zygoloid discussing that the odr-hasher is probably not the best way to has template arguments because the hasher would not take into account semantic aspects of template arguments. For example, a fully qualified template argument would not compare the same to a non-qualified one. We might need to implement our own folding set logic.

@hahnjo, could you help me out dig that discussion.

Interesting. I didn't recognize this. If this is true, we need to decide if we can leave a FIXME here or we must fix it to proceed.

The review related to ODRHash is this one: https://reviews.llvm.org/D153003

In short, my understanding is that ODRHash gives the following guarantee: If the hashes are different, there is guaranteed to be a ODR violation. In the other direction, if two hashes are the same, the declarations have to be compared in more detail, ie there may or may not be an ODR violation.

For the specializations, we need the opposite: If two template arguments are semantically the same (*), they must hash to the same value or otherwise we will not find the correct bucket. On the other hand, two different specialization arguments may have the same hash, that's fine for the map data structure.

Now the additional caveat (*) is that "semantically the same" is not the same congruence as "no ODR violation". In https://reviews.llvm.org/D153003 we discuss using declarations, but IIRC it's also possible to construct problematic cases with (nested) namespaces, top-level :: prefixes, and template template parameters. Taken together, my conclusion from the discussion above is that ODRHash is simply not the right method to find template specialization parameters in a map.

Great analysis. Fair enough, let's find a method to proceed.

I tried to add a test case to show the problem in 9b808a4. But the current patch works well for that. While I agree the ODRHash may be too aggressive for the problem we're solving, I don't want to write things that can't be well tested. I am wondering if we can proceed by leaving a FIXME here if we can't find good test in time? Or maybe we can add an option -fload-specialization-lazily, then we can regress smoothly if there are any problems.

@hahnjo @vgvassilev

It looks like the qualified related problems in ODRHash (at least some of them) are fixed in https://reviews.llvm.org/D156210

I guess the comment we are discussing is here: https://reviews.llvm.org/D154324#4524368 by @zygoloid:

"
...

For D41416, ODR hashing may not be the best mechanism to hash the template arguments, unfortunately. ODR hashing is (or perhaps, should be) about determining whether two things are spelled the same way and have the same meaning (as required by the C++ ODR), whereas I think what you're looking for is whether they have the same meaning regardless of spelling. Maybe we can get away with reusing ODR hashing anyway, on the basis that any canonical, non-dependent template argument should have the same (invented) spelling in every translation unit, but I'm not certain that's true in all cases. There may still be cases where the canonical type includes some aspect of "whatever we saw first", in which case the ODR hash can differ across translation units for non-dependent, canonical template arguments that are spelled differently but have the same meaning, though I can't think of one off-hand.
"

Yeah, I just saw it. My concern for reinventing a new hash mechanism is how can we make sure it is correct. It may be not hard to invent a new hasher. But I am just worrying it may not be well tested. I prefer to make it step by step.

If the example of @hahnjo works, perhaps a FIXME referring to this discussion should be sufficient and we can revisit the issue once we have an example that breaks.

ChuanqiXu9 · 2024-01-05T03:51:24Z

This is a great way to start a new year ;)

The phab link is https://reviews.llvm.org/D41416.

In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.

Perhaps we should put both patches together and that'd allow us to test them if they are on par with https://reviews.llvm.org/D41416 which we use downstream.

Thanks for working on this!

Hi Vassilev, for testing purpose I sent https://github.com/ChuanqiXu9/llvm-project/tree/LoadSpecializationUpdatesLazily. I didn't create stacked review since I feel a standalone branch may be sufficient.

In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.

IIUC, it looks like what I do in ChuanqiXu9@7f027f0#diff-c61a3cce4bfa099b5af032fa83cbf1563f0af4bf58dc112b39571d74b6b681c1R3487-R3499. But I don't want to do that with this patch. Since we can avoid load the hash table if the template decl is not loaded.

…iased template args This a test for #76774. In the review comments, we're concerning about the case that ODRHash may produce the different hash values for semantical same template arguments. For example, if the template argument in a specialization is not qualified and the semantical same template argument in the instantiation point is qualified, we should be able to select that template specialization. And this patch tests this behavior: we should be able to select the correct specialization with semantical same template arguments.

vgvassilev · 2024-01-08T08:06:02Z

This is a great way to start a new year ;)
The phab link is https://reviews.llvm.org/D41416.
In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.
Perhaps we should put both patches together and that'd allow us to test them if they are on par with https://reviews.llvm.org/D41416 which we use downstream.
Thanks for working on this!

Hi Vassilev, for testing purpose I sent https://github.com/ChuanqiXu9/llvm-project/tree/LoadSpecializationUpdatesLazily. I didn't create stacked review since I feel a standalone branch may be sufficient.

@ChuanqiXu9, I'd prefer to review both patches at the same time. Otherwise we risk of missing some important details.

ChuanqiXu9 · 2024-01-08T08:10:13Z

This is a great way to start a new year ;)
The phab link is https://reviews.llvm.org/D41416.
In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.
Perhaps we should put both patches together and that'd allow us to test them if they are on par with https://reviews.llvm.org/D41416 which we use downstream.
Thanks for working on this!

Hi Vassilev, for testing purpose I sent https://github.com/ChuanqiXu9/llvm-project/tree/LoadSpecializationUpdatesLazily. I didn't create stacked review since I feel a standalone branch may be sufficient.

@ChuanqiXu9, I'd prefer to review both patches at the same time. Otherwise we risk of missing some important details.

Got it. I can try to create a stacked review. But from I know about the status quo stacked review now, it will require us to lost the current contexnt...

And it will still be pretty valuable if you can test this with your internal workloads, then may be we can find something pretty important in the high level before going into the details. I've tested this in our local workloads, and it looks good and the performance improvements remains. But I know our uses about modules may be not so complex like yours.

vgvassilev · 2024-01-08T08:42:53Z

This is a great way to start a new year ;)
The phab link is https://reviews.llvm.org/D41416.
In general I was wondering could we simplify the implementation by loading the specialization hash table upon module load. That should be relatively cheap as we will read 2 integers per specialization.
Perhaps we should put both patches together and that'd allow us to test them if they are on par with https://reviews.llvm.org/D41416 which we use downstream.
Thanks for working on this!

Hi Vassilev, for testing purpose I sent https://github.com/ChuanqiXu9/llvm-project/tree/LoadSpecializationUpdatesLazily. I didn't create stacked review since I feel a standalone branch may be sufficient.

@ChuanqiXu9, I'd prefer to review both patches at the same time. Otherwise we risk of missing some important details.

Got it. I can try to create a stacked review. But from I know about the status quo stacked review now, it will require us to lost the current contexnt...

And it will still be pretty valuable if you can test this with your internal workloads, then may be we can find something pretty important in the high level before going into the details. I've tested this in our local workloads, and it looks good and the performance improvements remains. But I know our uses about modules may be not so complex like yours.

I would just push the second commit here. It should be good enough.

ChuanqiXu9 · 2024-01-09T07:05:08Z

I failed to use spr to create stacked review... So I just create the stacked PR manually: #77417. Luckily the context are remained. I heard the current context may be lost if we change to use spr now.

vgvassilev

Overall this looks quite promising to me. Have you run that patch on bigger workflows? Do we have some performance numbers to compare?

I will run some tests on our infrastructure and report back.

vgvassilev · 2024-01-09T15:40:56Z

clang/lib/AST/ExternalASTSource.cpp

@@ -100,6 +100,11 @@ ExternalASTSource::FindExternalVisibleDeclsByName(const DeclContext *DC,
  return false;
 }

+void ExternalASTSource::LoadExternalSpecializations(
+    const Decl *D, ArrayRef<TemplateArgument> TemplateArgs) {
+  return;


Suggested change

return;

Will do in the next circle.

dwblaikie · 2024-01-09T19:59:03Z

@ilya-biryukov any chance you/your folks could test this change for performance implications in google? It's especially helpful to CERN, but the last iteration of this direction had some regressions that stalled out progress on that version a few years ago, so it'd be good to help poke this along while making sure it doesn't cause release hiccups/etc for google.

ChuanqiXu9 · 2024-01-10T01:55:01Z

Have you run that patch on bigger workflows? Do we have some performance numbers to compare?

I've tested it functionality in our largest workload about modules. It runs well. But our uses of modules don't have a lot of complexities while it has a large scale. For performances, I plan to make it this week. It is a little bit additional work since I need to compile the compiler with different optimizations to have a fair comparison.

vgvassilev · 2024-01-10T14:14:18Z

@ChuanqiXu9, this PR does not seem to compile. Can you make the second commit work before I start testing?

ChuanqiXu9 · 2024-01-11T01:55:24Z

@ChuanqiXu9, this PR does not seem to compile. Can you make the second commit work before I start testing?

Oh, sorry. It should work now.

ChuanqiXu9 · 2024-01-11T09:04:02Z

Update:

Previously we will always try to load the specializations with the
corresponding arguments before finding the specializations. This
requires to hash the template arguments.

This patch tries to improve this by trying to load the specializations
only if we can't find it locally.

But I didn't observe significant improvement with this change locally.

ChuanqiXu9 · 2024-02-18T09:00:52Z

[do not merge] [runtime-cxxmodules] Rework our lazy template specialization deserialization mechanism root-project/root#14495

From root-project/root#14495, I see there is new reply saying the testing is actually fine. Do you think we still need to split the patch?

That comment was concerning the version of the patch that had the lazy template deserialization turned off by default. Yes, I still think that this patch should implement tha on-disk hash table on top of D41416

OK. And would you like to send a PR for D41416? I've already fixed the issue mentioned in the review page. Then I'd like to send small and incremental patches on that.

Do you mean that I should open a PR for D41416 and you will apply your patch there? I have no problem if we do everything here as part of this PR. This way we will have the full history of how this was born in one place ;)

Yeah, and please create a branch under llvm/llvm-project directly. Then I can perform stacked PR on that.

There it is: https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003

If I drop it then our tests will break. IIUC that's somewhere deep in the hasher and should be not impact this PR. Does this make the work on the on-disk hashtable more complicated in some way?

No, it won't block the work for on-disk hashtable. But if we want to land that, we must understand what happened actually...

vgvassilev · 2024-02-18T09:25:42Z

[do not merge] [runtime-cxxmodules] Rework our lazy template specialization deserialization mechanism root-project/root#14495

From root-project/root#14495, I see there is new reply saying the testing is actually fine. Do you think we still need to split the patch?

That comment was concerning the version of the patch that had the lazy template deserialization turned off by default. Yes, I still think that this patch should implement tha on-disk hash table on top of D41416

OK. And would you like to send a PR for D41416? I've already fixed the issue mentioned in the review page. Then I'd like to send small and incremental patches on that.

Do you mean that I should open a PR for D41416 and you will apply your patch there? I have no problem if we do everything here as part of this PR. This way we will have the full history of how this was born in one place ;)

Yeah, and please create a branch under llvm/llvm-project directly. Then I can perform stacked PR on that.

There it is: https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003

If I drop it then our tests will break. IIUC that's somewhere deep in the hasher and should be not impact this PR. Does this make the work on the on-disk hashtable more complicated in some way?

No, it won't block the work for on-disk hashtable. But if we want to land that, we must understand what happened actually...

We can’t land that without attaching your on-disk hashtable implementation part of this PR because of what’s mentioned here #76774 (comment)

ChuanqiXu9 · 2024-02-18T09:40:54Z

[do not merge] [runtime-cxxmodules] Rework our lazy template specialization deserialization mechanism root-project/root#14495

From root-project/root#14495, I see there is new reply saying the testing is actually fine. Do you think we still need to split the patch?

That comment was concerning the version of the patch that had the lazy template deserialization turned off by default. Yes, I still think that this patch should implement tha on-disk hash table on top of D41416

OK. And would you like to send a PR for D41416? I've already fixed the issue mentioned in the review page. Then I'd like to send small and incremental patches on that.

Do you mean that I should open a PR for D41416 and you will apply your patch there? I have no problem if we do everything here as part of this PR. This way we will have the full history of how this was born in one place ;)

Yeah, and please create a branch under llvm/llvm-project directly. Then I can perform stacked PR on that.

There it is: https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003

If I drop it then our tests will break. IIUC that's somewhere deep in the hasher and should be not impact this PR. Does this make the work on the on-disk hashtable more complicated in some way?

No, it won't block the work for on-disk hashtable. But if we want to land that, we must understand what happened actually...

We can’t land that without attaching your on-disk hashtable implementation part of this PR because of what’s mentioned here #76774 (comment)

I know that. But we're not talking about the same thing. This is one of the reason that we can't land that. But my point is that we can't land that if we don't understand what's going wrong without that patch.

hahnjo · 2024-02-18T09:58:20Z

But my point is that we can't land that if we don't understand what's going wrong without that patch.

We understand that very well and it's described in https://reviews.llvm.org/D153003 as well as the surrounding discussions: because of the way that ODRHash works, template template arguments A and B will hash to different values, even if using A = B. However, for template specializations, we require them to hash to the same value (with some form of normalization) or we won't find nor load the right specializations. That's why I said that IMHO ODRHash is not the right tool for the job here, which follows directly from an old comment of yours: https://reviews.llvm.org/D153003#4427412

An important node here is that ODRHash is used to check the AST Nodes are keeping the same across compilations. There is gap to use ODRHash to check the semantical equality.

(and IIRC that's the same direction that Richard was going)

ChuanqiXu9 · 2024-02-18T10:28:27Z

But my point is that we can't land that if we don't understand what's going wrong without that patch.

We understand that very well and it's described in https://reviews.llvm.org/D153003 as well as the surrounding discussions: because of the way that ODRHash works, template template arguments A and B will hash to different values, even if using A = B.

Yeah, so I tried to fix that in the following patches. And if that works, I expect that can fix internal errors in your workloads.

However, for template specializations, we require them to hash to the same value (with some form of normalization) or we won't find nor load the right specializations. That's why I said that IMHO ODRHash is not the right tool for the job here, which follows directly from an old comment of yours: https://reviews.llvm.org/D153003#4427412

An important node here is that ODRHash is used to check the AST Nodes are keeping the same across compilations. There is gap to use ODRHash to check the semantical equality.

(and IIRC that's the same direction that Richard was going)

vgvassilev · 2024-02-18T11:14:48Z

Let's zoom out a little. The approach in D41416 shows that it is feasible to store a hash of the template arguments to delay eager deserializations. The ODR hash approach is a second order problem because we can swap it with something better once we need to. In order to make progress we have introduced D153003 which allows our infrastructure to work. The way I see moving forward here is:

Base this PR on D41416 in the approach how we model the lazy deserialization of templates. That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.
Test the implementation on our infrastructure for correctness
Test the implementation on the Google infrastructure for scalability
Think on a better approach to replace odr hashing if we see more pathological problems.

ChuanqiXu9 · 2024-02-19T01:56:41Z

Let's zoom out a little. The approach in D41416 shows that it is feasible to store a hash of the template arguments to delay eager deserializations. The ODR hash approach is a second order problem because we can swap it with something better once we need to. In order to make progress we have introduced D153003 which allows our infrastructure to work. The way I see moving forward here is:

Base this PR on D41416 in the approach how we model the lazy deserialization of templates. That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.

Test the implementation on our infrastructure for correctness

Test the implementation on the Google infrastructure for scalability

Think on a better approach to replace odr hashing if we see more pathological problems.

Yeah, no problem at all. This is what I want in the higher level too. What I am confused is about the status of D153003. If it is true that we've describe the problem completely in the review page, then c31d6b4 should be a proper fix for that.

vgvassilev · 2024-02-19T06:43:40Z

Let's zoom out a little. The approach in D41416 shows that it is feasible to store a hash of the template arguments to delay eager deserializations. The ODR hash approach is a second order problem because we can swap it with something better once we need to. In order to make progress we have introduced D153003 which allows our infrastructure to work. The way I see moving forward here is:

Base this PR on D41416 in the approach how we model the lazy deserialization of templates. That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.

Test the implementation on our infrastructure for correctness

Test the implementation on the Google infrastructure for scalability

Think on a better approach to replace odr hashing if we see more pathological problems.

Yeah, no problem at all. This is what I want in the higher level too. What I am confused is about the status of D153003. If it is true that we've describe the problem completely in the review page, then c31d6b4 should be a proper fix for that.

I can try it on our infrastructure and if it works I will remove D153003.

ilya-biryukov · 2024-02-21T15:03:28Z

Sorry for losing track of the discussion here. What is the current status here? Should we run another round of testing?

Also, I see proposals to land the new behaviour under a flag and have it off by default.
If that does not add a lot of complexity, that would definitely be something that's makes testing easier on our side. Our compiler is build from revisions close to head and don't need to wait for the next Clang release to rip the benefits of this approach.

vgvassilev · 2024-02-21T19:50:38Z

Sorry for losing track of the discussion here. What is the current status here? Should we run another round of testing?

Also, I see proposals to land the new behaviour under a flag and have it off by default. If that does not add a lot of complexity, that would definitely be something that's makes testing easier on our side. Our compiler is build from revisions close to head and don't need to wait for the next Clang release to rip the benefits of this approach.

@ilya-biryukov, this PR is not ready to test. However, I'd appreciate if you could test our baseline patch located here: https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003 on you

vgvassilev · 2024-02-21T19:51:07Z

Let's zoom out a little. The approach in D41416 shows that it is feasible to store a hash of the template arguments to delay eager deserializations. The ODR hash approach is a second order problem because we can swap it with something better once we need to. In order to make progress we have introduced D153003 which allows our infrastructure to work. The way I see moving forward here is:

Base this PR on D41416 in the approach how we model the lazy deserialization of templates. That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.

Test the implementation on our infrastructure for correctness

Test the implementation on the Google infrastructure for scalability

Think on a better approach to replace odr hashing if we see more pathological problems.

Yeah, no problem at all. This is what I want in the higher level too. What I am confused is about the status of D153003. If it is true that we've describe the problem completely in the review page, then c31d6b4 should be a proper fix for that.

I can try it on our infrastructure and if it works I will remove D153003.

@ChuanqiXu9, you were right. We seem to not need D153003 and I have removed it from the branch.

ChuanqiXu9 · 2024-02-22T01:54:56Z

Let's zoom out a little. The approach in D41416 shows that it is feasible to store a hash of the template arguments to delay eager deserializations. The ODR hash approach is a second order problem because we can swap it with something better once we need to. In order to make progress we have introduced D153003 which allows our infrastructure to work. The way I see moving forward here is:

Base this PR on D41416 in the approach how we model the lazy deserialization of templates. That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.

Test the implementation on our infrastructure for correctness

Test the implementation on the Google infrastructure for scalability

Think on a better approach to replace odr hashing if we see more pathological problems.

Yeah, no problem at all. This is what I want in the higher level too. What I am confused is about the status of D153003. If it is true that we've describe the problem completely in the review page, then c31d6b4 should be a proper fix for that.

I can try it on our infrastructure and if it works I will remove D153003.

@ChuanqiXu9, you were right. We seem to not need D153003 and I have removed it from the branch.

Yeah, then let's create a new branch (the existing [D41416_D153003](https://github.com/llvm/llvm-project/tree/users/vgvassilev/D41416_D153003) sounds not like a good name) and a PR for that. Then I can start a stacked PR on that.

ChuanqiXu9 · 2024-02-27T07:26:02Z

Oh, I didn't notice you've removed D153003 already. But the branch name looks not good. So I've created a pr in #83108

ChuanqiXu9 · 2024-02-27T08:30:02Z

That'd mean that we "just" need to replace LazySpecializationInfo *LazySpecializations = nullptr; with the on-disk hash table approach. That would probably require centralizing that logic somewhere in the ASTReader (the way this PR does) but with minimal changes wrt D41416.

@vgvassilev Let me try to double check your advice. In you suggestion, you suggest to replace LazySpecializationInfo *LazySpecializations with an on-disk hash map from an integer (hash value for template args) to LazySpecializationInfo in D41416 instead of another integer (DeclID, just like my patch)?

Following up for #83108 This follows the suggestion literally from #76774 (comment) which introduces OnDiskHashTable for specializations based on D41416. Note that I didn't polish this patch to reduce the diff from D41416 to it easier to review. I'll make the polishing patch later. So that we can focus what we're doing in this patch and focus on the style in the next patch.

iains

I am happy to defer to @vgvassilev et al. on this one.

Following up for llvm#83108 This follows the suggestion literally from llvm#76774 (comment) which introduces OnDiskHashTable for specializations based on D41416. Note that I didn't polish this patch to reduce the diff from D41416 to it easier to review. I'll make the polishing patch later. So that we can focus what we're doing in this patch and focus on the style in the next patch.

ChuanqiXu9 · 2024-04-25T08:16:48Z

Given we're pursuing #83237 series. I'll close this one.

Following up for #83108 This follows the suggestion literally from #76774 (comment) which introduces OnDiskHashTable for specializations based on D41416. Note that I didn't polish this patch to reduce the diff from D41416 to it easier to review. I'll make the polishing patch later. So that we can focus what we're doing in this patch and focus on the style in the next patch.

Following up for llvm#83108 This follows the suggestion literally from llvm#76774 (comment) which introduces OnDiskHashTable for specializations based on D41416. Note that I didn't polish this patch to reduce the diff from D41416 to it easier to review. I'll make the polishing patch later. So that we can focus what we're doing in this patch and focus on the style in the next patch.

…zations when looking for one. fmt [Serialization] Introduce OnDiskHashTable for specializations Following up for llvm#83108 This follows the suggestion literally from llvm#76774 (comment) which introduces OnDiskHashTable for specializations based on D41416. Note that I didn't polish this patch to reduce the diff from D41416 to it easier to review. I'll make the polishing patch later. So that we can focus what we're doing in this patch and focus on the style in the next patch. [Serialization] Code cleanups and polish 83233 fmt load specializations before writing specialization decls address comments Revert "load specializations before writing specialization decls" This reverts commit 61c451d. Do not omit data from imported modules with same key Handle merging spec info manually

Following up for #83108 This follows the suggestion literally from #76774 (comment) which introduces OnDiskHashTable for specializations based on D41416. Note that I didn't polish this patch to reduce the diff from D41416 to it easier to review. I'll make the polishing patch later. So that we can focus what we're doing in this patch and focus on the style in the next patch.

ChuanqiXu9 added the clang:modules C++20 modules and Clang Header Modules label Jan 3, 2024

ChuanqiXu9 requested review from zygoloid, iains, vgvassilev and dwblaikie January 3, 2024 03:26

ChuanqiXu9 self-assigned this Jan 3, 2024

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Jan 3, 2024

ChuanqiXu9 force-pushed the LoadSpecLazily branch from f419166 to af6f8ca Compare January 3, 2024 03:46

ChuanqiXu9 force-pushed the LoadSpecLazily branch from af6f8ca to 79cefc9 Compare January 3, 2024 03:51

vgvassilev reviewed Jan 3, 2024

View reviewed changes

ChuanqiXu9 force-pushed the LoadSpecLazily branch from 79cefc9 to 50fd47f Compare January 9, 2024 06:56

ChuanqiXu9 mentioned this pull request Jan 9, 2024

[Serialization] Load Specialization Lazily (2/2) #77417

Closed

ChuanqiXu9 changed the title ~~[Serialization] Load Specializations Lazily (1/2)~~ [Serialization] Load Specializations Lazily Jan 9, 2024

vgvassilev reviewed Jan 9, 2024

View reviewed changes

ChuanqiXu9 force-pushed the LoadSpecLazily branch from fd2d753 to 43648e5 Compare January 11, 2024 01:54

ChuanqiXu9 mentioned this pull request Feb 27, 2024

D41416: [modules] [pch] Do not deserialize all lazy template specializations when looking for one. #83108

Open

ChuanqiXu9 mentioned this pull request Feb 28, 2024

[Serialization] Introduce OnDiskHashTable for specializations #83233

Open

iains reviewed Mar 29, 2024

View reviewed changes

ChuanqiXu9 closed this Apr 25, 2024

	virtual void LoadExternalSpecs(const Decl *D,
	virtual void FindExternalSpecialization(const Decl *D,

[Serialization] Load Specializations Lazily #76774

[Serialization] Load Specializations Lazily #76774

Conversation

ChuanqiXu9 commented Jan 3, 2024 • edited Loading

Background

What this patch did

llvmbot commented Jan 3, 2024 • edited Loading

Background

What this patch did

What this patch not did

github-actions bot commented Jan 3, 2024 • edited Loading

vgvassilev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vgvassilev Jan 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChuanqiXu9 commented Jan 5, 2024

vgvassilev commented Jan 8, 2024

ChuanqiXu9 commented Jan 8, 2024

vgvassilev commented Jan 8, 2024

ChuanqiXu9 commented Jan 9, 2024 • edited Loading

vgvassilev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dwblaikie commented Jan 9, 2024

ChuanqiXu9 commented Jan 10, 2024

vgvassilev commented Jan 10, 2024

ChuanqiXu9 commented Jan 11, 2024 • edited Loading

ChuanqiXu9 commented Jan 11, 2024 • edited Loading

ChuanqiXu9 commented Feb 18, 2024

vgvassilev commented Feb 18, 2024

ChuanqiXu9 commented Feb 18, 2024

hahnjo commented Feb 18, 2024

ChuanqiXu9 commented Feb 18, 2024

vgvassilev commented Feb 18, 2024

ChuanqiXu9 commented Feb 19, 2024

vgvassilev commented Feb 19, 2024

ilya-biryukov commented Feb 21, 2024

vgvassilev commented Feb 21, 2024

vgvassilev commented Feb 21, 2024

ChuanqiXu9 commented Feb 22, 2024

ChuanqiXu9 commented Feb 27, 2024

ChuanqiXu9 commented Feb 27, 2024

iains left a comment

Choose a reason for hiding this comment

ChuanqiXu9 commented Apr 25, 2024

ChuanqiXu9 commented Jan 3, 2024 •

edited

Loading

llvmbot commented Jan 3, 2024 •

edited

Loading

github-actions bot commented Jan 3, 2024 •

edited

Loading

vgvassilev Jan 8, 2024 •

edited

Loading

ChuanqiXu9 commented Jan 9, 2024 •

edited

Loading

ChuanqiXu9 commented Jan 11, 2024 •

edited

Loading

ChuanqiXu9 commented Jan 11, 2024 •

edited

Loading