Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][modules] Adopt FileEntryRef in the HeaderFileInfo block in PCM files #67383

Merged
merged 2 commits into from
Sep 28, 2023

Conversation

jansvoboda11
Copy link
Contributor

@jansvoboda11 jansvoboda11 commented Sep 25, 2023

This patch adopts FileEntryRef in the HeaderFileInfo-writing part of ASTWriter.

First, this patch removes the loop over FileManager::VirtualFileEntries. It's redundant, since all virtual file entries are also present in SeenFileEntries and thus already in UIDToFiles.

Second, since we now no longer rely on FileEntry::getLastRef()/FileEntry::getName(), this patch takes care to establish which path gets used for each UID by picking the FileEntryRef with the most "<" name (instead of just relying on the StringMap iteration order).

Note that which FileEntry/FileEntryRef objects we pick for each UID for serialization into the llvm::OnDiskChainedHashTable doesn't really matter. The hash function only includes the file size and modification time. The file name only plays role during resolution of hash collisions, in which case it goes through FileManager and resolves to a FileEntry that gets pointer-compared with the queried FileEntry.

(Reincarnation of D143414 and D142780.)

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Sep 25, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 25, 2023

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Changes

This patch adopts FileEntryRef in the HeaderFileInfo-writing part of ASTWriter.

First, this patch removes the loop over FileManager::VirtualFileEntries. It's redundant, since all virtual file entries have a non-virtual/real counterpart that's already present in UIDToFiles.

Second, since we now no longer rely on FileEntry::getLastRef()/FileEntry::getName(), this patch takes care to establish which path gets used for each UID by picking the FileEntryRef with the most "&lt;" name (instead of just relying on the StringMap iteration order).

Note that which FileEntry/FileEntryRef objects we pick for each UID for serialization into the llvm::OnDiskChainedHashTable doesn't really matter. The hash function only includes the file size and modification time. The file name only plays role during resolution of hash collisions, in which case it goes through FileManager and resolves to a FileEntry that gets pointer-compared with the queried FileEntry.

(Reincarnation of D143414 and D142780.)


Full diff: https://github.com/llvm/llvm-project/pull/67383.diff

3 Files Affected:

  • (modified) clang/include/clang/Basic/FileManager.h (+3-3)
  • (modified) clang/lib/Basic/FileManager.cpp (+12-15)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+10-10)
diff --git a/clang/include/clang/Basic/FileManager.h b/clang/include/clang/Basic/FileManager.h
index 115558bfeee4ed5..56cb093dd8c376f 100644
--- a/clang/include/clang/Basic/FileManager.h
+++ b/clang/include/clang/Basic/FileManager.h
@@ -311,9 +311,9 @@ class FileManager : public RefCountedBase<FileManager> {
   bool makeAbsolutePath(SmallVectorImpl<char> &Path) const;
 
   /// Produce an array mapping from the unique IDs assigned to each
-  /// file to the corresponding FileEntry pointer.
-  void GetUniqueIDMapping(
-                    SmallVectorImpl<const FileEntry *> &UIDToFiles) const;
+  /// file to the corresponding FileEntryRef.
+  void
+  GetUniqueIDMapping(SmallVectorImpl<OptionalFileEntryRef> &UIDToFiles) const;
 
   /// Retrieve the canonical name for a given directory.
   ///
diff --git a/clang/lib/Basic/FileManager.cpp b/clang/lib/Basic/FileManager.cpp
index c80fbfd7433f5be..3b834610dd55ef8 100644
--- a/clang/lib/Basic/FileManager.cpp
+++ b/clang/lib/Basic/FileManager.cpp
@@ -612,24 +612,21 @@ FileManager::getNoncachedStatValue(StringRef Path,
 }
 
 void FileManager::GetUniqueIDMapping(
-    SmallVectorImpl<const FileEntry *> &UIDToFiles) const {
+    SmallVectorImpl<OptionalFileEntryRef> &UIDToFiles) const {
   UIDToFiles.clear();
   UIDToFiles.resize(NextFileUID);
 
-  // Map file entries
-  for (llvm::StringMap<llvm::ErrorOr<FileEntryRef::MapValue>,
-                       llvm::BumpPtrAllocator>::const_iterator
-           FE = SeenFileEntries.begin(),
-           FEEnd = SeenFileEntries.end();
-       FE != FEEnd; ++FE)
-    if (llvm::ErrorOr<FileEntryRef::MapValue> Entry = FE->getValue()) {
-      if (const auto *FE = Entry->V.dyn_cast<FileEntry *>())
-        UIDToFiles[FE->getUID()] = FE;
-    }
-
-  // Map virtual file entries
-  for (const auto &VFE : VirtualFileEntries)
-    UIDToFiles[VFE->getUID()] = VFE;
+  for (const auto &Entry : SeenFileEntries) {
+    // Only return existing non-virtual files.
+    if (!Entry.getValue() || !Entry.getValue()->V.is<FileEntry *>())
+      continue;
+    FileEntryRef FE(Entry);
+    // Add this file if it's the first one with the UID, or if its name is
+    // better than the existing one.
+    OptionalFileEntryRef &ExistingFE = UIDToFiles[FE.getUID()];
+    if (!ExistingFE || FE.getName() < ExistingFE->getName())
+      ExistingFE = FE;
+  }
 }
 
 StringRef FileManager::getCanonicalName(DirectoryEntryRef Dir) {
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 216ca94111e156b..1d939a3f5e7a819 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -167,23 +167,23 @@ std::set<const FileEntry *> GetAffectingModuleMaps(const Preprocessor &PP,
 
   const HeaderSearch &HS = PP.getHeaderSearchInfo();
 
-  SmallVector<const FileEntry *, 16> FilesByUID;
+  SmallVector<OptionalFileEntryRef, 16> FilesByUID;
   HS.getFileMgr().GetUniqueIDMapping(FilesByUID);
 
   if (FilesByUID.size() > HS.header_file_size())
     FilesByUID.resize(HS.header_file_size());
 
   for (unsigned UID = 0, LastUID = FilesByUID.size(); UID != LastUID; ++UID) {
-    const FileEntry *File = FilesByUID[UID];
+    OptionalFileEntryRef File = FilesByUID[UID];
     if (!File)
       continue;
 
     const HeaderFileInfo *HFI =
-        HS.getExistingFileInfo(File, /*WantExternal*/ false);
+        HS.getExistingFileInfo(*File, /*WantExternal*/ false);
     if (!HFI || (HFI->isModuleHeader && !HFI->isCompilingModuleHeader))
       continue;
 
-    for (const auto &KH : HS.findResolvedModulesForHeader(File)) {
+    for (const auto &KH : HS.findResolvedModulesForHeader(*File)) {
       if (!KH.getModule())
         continue;
       ModulesToProcess.push_back(KH.getModule());
@@ -2003,14 +2003,14 @@ void ASTWriter::WriteHeaderSearch(const HeaderSearch &HS) {
     }
   }
 
-  SmallVector<const FileEntry *, 16> FilesByUID;
+  SmallVector<OptionalFileEntryRef, 16> FilesByUID;
   HS.getFileMgr().GetUniqueIDMapping(FilesByUID);
 
   if (FilesByUID.size() > HS.header_file_size())
     FilesByUID.resize(HS.header_file_size());
 
   for (unsigned UID = 0, LastUID = FilesByUID.size(); UID != LastUID; ++UID) {
-    const FileEntry *File = FilesByUID[UID];
+    OptionalFileEntryRef File = FilesByUID[UID];
     if (!File)
       continue;
 
@@ -2021,7 +2021,7 @@ void ASTWriter::WriteHeaderSearch(const HeaderSearch &HS) {
     // from a different module; in that case, we rely on the module(s)
     // containing the header to provide this information.
     const HeaderFileInfo *HFI =
-        HS.getExistingFileInfo(File, /*WantExternal*/!Chain);
+        HS.getExistingFileInfo(*File, /*WantExternal*/!Chain);
     if (!HFI || (HFI->isModuleHeader && !HFI->isCompilingModuleHeader))
       continue;
 
@@ -2035,13 +2035,13 @@ void ASTWriter::WriteHeaderSearch(const HeaderSearch &HS) {
       SavedStrings.push_back(Filename.data());
     }
 
-    bool Included = PP->alreadyIncluded(File);
+    bool Included = PP->alreadyIncluded(*File);
 
     HeaderFileInfoTrait::key_type Key = {
-      Filename, File->getSize(), getTimestampForOutput(File)
+      Filename, File->getSize(), getTimestampForOutput(*File)
     };
     HeaderFileInfoTrait::data_type Data = {
-      *HFI, Included, HS.getModuleMap().findResolvedModulesForHeader(File), {}
+      *HFI, Included, HS.getModuleMap().findResolvedModulesForHeader(*File), {}
     };
     Generator.insert(Key, Data, GeneratorTrait);
     ++NumHeaderSearchEntries;

@github-actions
Copy link

github-actions bot commented Sep 25, 2023

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff feb7b1914d513c709b9e024dfed709bb889cc853 86f275550976e443f11252debd552e72a0d9e0c8 -- clang/include/clang/Basic/FileManager.h clang/lib/Basic/FileManager.cpp clang/lib/Serialization/ASTWriter.cpp
View the diff from clang-format here.
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 1d939a3f5e7a..ef9d5462f1c7 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -2021,7 +2021,7 @@ void ASTWriter::WriteHeaderSearch(const HeaderSearch &HS) {
     // from a different module; in that case, we rely on the module(s)
     // containing the header to provide this information.
     const HeaderFileInfo *HFI =
-        HS.getExistingFileInfo(*File, /*WantExternal*/!Chain);
+        HS.getExistingFileInfo(*File, /*WantExternal*/ !Chain);
     if (!HFI || (HFI->isModuleHeader && !HFI->isCompilingModuleHeader))
       continue;
 
@@ -2037,12 +2037,13 @@ void ASTWriter::WriteHeaderSearch(const HeaderSearch &HS) {
 
     bool Included = PP->alreadyIncluded(*File);
 
-    HeaderFileInfoTrait::key_type Key = {
-      Filename, File->getSize(), getTimestampForOutput(*File)
-    };
+    HeaderFileInfoTrait::key_type Key = {Filename, File->getSize(),
+                                         getTimestampForOutput(*File)};
     HeaderFileInfoTrait::data_type Data = {
-      *HFI, Included, HS.getModuleMap().findResolvedModulesForHeader(*File), {}
-    };
+        *HFI,
+        Included,
+        HS.getModuleMap().findResolvedModulesForHeader(*File),
+        {}};
     Generator.insert(Key, Data, GeneratorTrait);
     ++NumHeaderSearchEntries;
   }

@rmaz
Copy link
Contributor

rmaz commented Sep 26, 2023

LGTM, possible to come up with some test case with multiple virtual entries to the same file that would now be serialized once?

@jansvoboda11
Copy link
Contributor Author

jansvoboda11 commented Sep 26, 2023

test case with multiple virtual entries to the same file that would now be serialized once?

If you're asking because of this:

First, this patch removes the loop over FileManager::VirtualFileEntries. It's redundant, since all virtual file entries have a non-virtual/real counterpart that's already present in UIDToFiles.

then I should probably clarify that the number of file entries we serialize doesn't change. The loop I removed didn't do anything that would affect the output: it was overwriting existing entry in UIDToFiles with the same const FileEntry *.

I could create a test that checks that we only serialize one entry for each UID if you think that adds value. But I would do that as either a prep or a follow-up patch, since this PR doesn't affect that.

I think that the only thing that makes sense to test in this patch is that we now serialize the most "<" name instead of whatever getLastRef().getName() used to resolve to.

for (const auto &VFE : VirtualFileEntries)
UIDToFiles[VFE->getUID()] = VFE;
for (const auto &Entry : SeenFileEntries) {
// Only return existing non-virtual files.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"virtual" is an overloaded term here; maybe "VFS-mapped"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say "not redirected", that seems to be more commonly used.

@jansvoboda11 jansvoboda11 merged commit 6bbccc0 into llvm:main Sep 28, 2023
1 of 3 checks passed
@jansvoboda11 jansvoboda11 deleted the astwriter-hfi-file-ref branch September 28, 2023 16:28
legrosbuffle pushed a commit to legrosbuffle/llvm-project that referenced this pull request Sep 29, 2023
…n PCM files (llvm#67383)

This patch adopts `FileEntryRef` in the `HeaderFileInfo`-writing part of
`ASTWriter`.

First, this patch removes the loop over
`FileManager::VirtualFileEntries`. It's redundant, since all virtual
file entries are also present in `SeenFileEntries` and thus already in
`UIDToFiles`.

Second, since we now no longer rely on
`FileEntry::getLastRef()`/`FileEntry::getName()`, this patch takes care
to establish which path gets used for each UID by picking the
`FileEntryRef` with the most "`<`" name (instead of just relying on the
`StringMap` iteration order).

Note that which `FileEntry`/`FileEntryRef` objects we pick for each UID
for serialization into the `llvm::OnDiskChainedHashTable` doesn't really
matter. The hash function only includes the file size and modification
time. The file name only plays role during resolution of hash
collisions, in which case it goes through `FileManager` and resolves to
a `FileEntry` that gets pointer-compared with the queried `FileEntry`.

(Reincarnation of [D143414](https://reviews.llvm.org/D143414) and
[D142780](https://reviews.llvm.org/D142780).)
jansvoboda11 added a commit that referenced this pull request Sep 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants