Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][DependencyScanner] Remove unused -ivfsoverlay files #73734

Merged
merged 1 commit into from
Jan 30, 2024

Conversation

Bigcheese
Copy link
Contributor

-ivfsoverlay files are unused when building most modules. Enable
removing them by,

  • adding a way to visit the filesystem tree with extensible RTTI to
    access each RedirectingFileSystem.
  • Adding tracking to RedirectingFileSystem to record when it
    actually redirects a file access.
  • Storing this information in each PCM.

Usage tracking is disabled during implicit modulemap search as this
ends up touching a lot of files that aren't actually used. The used
files are later touched by other parts of Clang so relevant VFS
overlays get marked as used.

@Bigcheese Bigcheese self-assigned this Nov 29, 2023
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules llvm:support labels Nov 29, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 29, 2023

@llvm/pr-subscribers-clang-modules
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-llvm-support

Author: Michael Spencer (Bigcheese)

Changes

-ivfsoverlay files are unused when building most modules. Enable
removing them by,

  • adding a way to visit the filesystem tree with extensible RTTI to
    access each RedirectingFileSystem.
  • Adding tracking to RedirectingFileSystem to record when it
    actually redirects a file access.
  • Storing this information in each PCM.

Usage tracking is disabled during implicit modulemap search as this
ends up touching a lot of files that aren't actually used. The used
files are later touched by other parts of Clang so relevant VFS
overlays get marked as used.


Patch is 34.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73734.diff

18 Files Affected:

  • (modified) clang/include/clang/Basic/FileManager.h (+4)
  • (modified) clang/include/clang/Lex/HeaderSearch.h (+6)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+3)
  • (modified) clang/include/clang/Serialization/ModuleFile.h (+3)
  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h (+5-1)
  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h (+4-1)
  • (modified) clang/lib/Basic/FileManager.cpp (+7)
  • (modified) clang/lib/Frontend/CompilerInvocation.cpp (+1)
  • (modified) clang/lib/Lex/HeaderSearch.cpp (+20)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+14-1)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+18-6)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp (+5-1)
  • (modified) clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp (+53-21)
  • (added) clang/test/ClangScanDeps/optimize-vfs.m (+193)
  • (modified) clang/tools/clang-scan-deps/ClangScanDeps.cpp (+1)
  • (modified) llvm/include/llvm/Support/VirtualFileSystem.h (+49-8)
  • (modified) llvm/lib/Support/VirtualFileSystem.cpp (+26)
  • (modified) llvm/unittests/Support/VirtualFileSystemTest.cpp (+84)
diff --git a/clang/include/clang/Basic/FileManager.h b/clang/include/clang/Basic/FileManager.h
index 56cb093dd8c376f..997c17a0ffcfcce 100644
--- a/clang/include/clang/Basic/FileManager.h
+++ b/clang/include/clang/Basic/FileManager.h
@@ -248,6 +248,10 @@ class FileManager : public RefCountedBase<FileManager> {
     return FS;
   }
 
+  /// Enable or disable tracking of VFS usage. Used to not track full header
+  /// search and implicit modulemap lookup.
+  void trackVFSUsage(bool Active);
+
   void setVirtualFileSystem(IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS) {
     this->FS = std::move(FS);
   }
diff --git a/clang/include/clang/Lex/HeaderSearch.h b/clang/include/clang/Lex/HeaderSearch.h
index a100598c80155fa..7c21796b0460238 100644
--- a/clang/include/clang/Lex/HeaderSearch.h
+++ b/clang/include/clang/Lex/HeaderSearch.h
@@ -579,6 +579,12 @@ class HeaderSearch {
   /// Note: implicit module maps don't contribute to entry usage.
   std::vector<bool> computeUserEntryUsage() const;
 
+  /// Determine which HeaderSearchOptions::VFSOverlayFiles have been
+  /// successfully used so far and mark their index with 'true' in the resulting
+  /// bit vector.
+  /// Note: implicit module maps don't contribute to entry usage.
+  std::vector<bool> computeVFSUsage() const;
+
   /// This method returns a HeaderMap for the specified
   /// FileEntry, uniquing them through the 'HeaderMaps' datastructure.
   const HeaderMap *CreateHeaderMap(FileEntryRef FE);
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index fdd64f2abbe9375..f4abfe6f560664f 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -405,6 +405,9 @@ enum UnhashedControlBlockRecordTypes {
 
   /// Record code for the indices of used header search entries.
   HEADER_SEARCH_ENTRY_USAGE,
+
+  /// Record code for the indices of used VFSs.
+  VFS_USAGE,
 };
 
 /// Record code for extension blocks.
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 48be8676cc26a4c..a2d49507a579427 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -189,6 +189,9 @@ class ModuleFile {
   /// The bit vector denoting usage of each header search entry (true = used).
   llvm::BitVector SearchPathUsage;
 
+  /// The bit vector denoting usage of each VFS entry (true = used).
+  llvm::BitVector VFSUsage;
+
   /// Whether this module has been directly imported by the
   /// user.
   bool DirectlyImported = false;
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
index 9a2aea5d6efa170..846fdc7253977f9 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
@@ -280,8 +280,12 @@ class EntryRef {
 /// This is not a thread safe VFS. A single instance is meant to be used only in
 /// one thread. Multiple instances are allowed to service multiple threads
 /// running in parallel.
-class DependencyScanningWorkerFilesystem : public llvm::vfs::ProxyFileSystem {
+class DependencyScanningWorkerFilesystem
+    : public llvm::RTTIExtends<DependencyScanningWorkerFilesystem,
+                               llvm::vfs::ProxyFileSystem> {
 public:
+  static const char ID;
+
   DependencyScanningWorkerFilesystem(
       DependencyScanningFilesystemSharedCache &SharedCache,
       IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS);
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
index dcdf1c171f6d731..e953e2dee48fc0a 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h
@@ -54,7 +54,10 @@ enum class ScanningOptimizations {
   /// Remove warnings from system modules.
   SystemWarnings = 2,
 
-  LLVM_MARK_AS_BITMASK_ENUM(SystemWarnings),
+  /// Remove unused -ivfsoverlay arguments.
+  VFS = 4,
+
+  LLVM_MARK_AS_BITMASK_ENUM(VFS),
   All = HeaderSearch | SystemWarnings,
   Default = All
 };
diff --git a/clang/lib/Basic/FileManager.cpp b/clang/lib/Basic/FileManager.cpp
index d16626b10652136..86f90b26145e9dc 100644
--- a/clang/lib/Basic/FileManager.cpp
+++ b/clang/lib/Basic/FileManager.cpp
@@ -387,6 +387,13 @@ llvm::Expected<FileEntryRef> FileManager::getSTDIN() {
   return *STDIN;
 }
 
+void FileManager::trackVFSUsage(bool Active) {
+  FS->visit([Active](llvm::vfs::FileSystem &FileSys) {
+    if (auto *RFS = dyn_cast<llvm::vfs::RedirectingFileSystem>(&FileSys))
+      RFS->setUsageTrackingActive(Active);
+  });
+}
+
 const FileEntry *FileManager::getVirtualFile(StringRef Filename, off_t Size,
                                              time_t ModificationTime) {
   return &getVirtualFileRef(Filename, Size, ModificationTime).getFileEntry();
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index 5a8e4cf9843de2b..f1787e152fd056a 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -4739,6 +4739,7 @@ std::string CompilerInvocation::getModuleHash() const {
   if (hsOpts.ModulesStrictContextHash) {
     HBuilder.addRange(hsOpts.SystemHeaderPrefixes);
     HBuilder.addRange(hsOpts.UserEntries);
+    HBuilder.addRange(hsOpts.VFSOverlayFiles);
 
     const DiagnosticOptions &diagOpts = getDiagnosticOpts();
 #define DIAGOPT(Name, Bits, Default) HBuilder.add(diagOpts.Name);
diff --git a/clang/lib/Lex/HeaderSearch.cpp b/clang/lib/Lex/HeaderSearch.cpp
index cf1c0cc5284f316..78f9f014dc06fe1 100644
--- a/clang/lib/Lex/HeaderSearch.cpp
+++ b/clang/lib/Lex/HeaderSearch.cpp
@@ -142,6 +142,21 @@ std::vector<bool> HeaderSearch::computeUserEntryUsage() const {
   return UserEntryUsage;
 }
 
+std::vector<bool> HeaderSearch::computeVFSUsage() const {
+  std::vector<bool> VFSUsage;
+  llvm::vfs::FileSystem &RootFS = FileMgr.getVirtualFileSystem();
+  // TODO: This only works if the `RedirectingFileSystem`s were all created by
+  //       `createVFSFromOverlayFiles`.
+  RootFS.visit([&](const llvm::vfs::FileSystem &FS) {
+    if (auto *RFS = dyn_cast<const llvm::vfs::RedirectingFileSystem>(&FS)) {
+      VFSUsage.push_back(RFS->hasBeenUsed());
+    }
+  });
+  // VFS visit order is the opposite of VFSOverlayFiles order.
+  std::reverse(VFSUsage.begin(), VFSUsage.end());
+  return VFSUsage;
+}
+
 /// CreateHeaderMap - This method returns a HeaderMap for the specified
 /// FileEntry, uniquing them through the 'HeaderMaps' datastructure.
 const HeaderMap *HeaderSearch::CreateHeaderMap(FileEntryRef FE) {
@@ -303,6 +318,10 @@ Module *HeaderSearch::lookupModule(StringRef ModuleName, StringRef SearchName,
                                    bool AllowExtraModuleMapSearch) {
   Module *Module = nullptr;
 
+  // Modulemap search can touch lots of files that aren't actually relavant to
+  // if a VFS is used or not.
+  FileMgr.trackVFSUsage(false);
+
   // Look through the various header search paths to load any available module
   // maps, searching for a module map that describes this module.
   for (DirectoryLookup &Dir : search_dir_range()) {
@@ -371,6 +390,7 @@ Module *HeaderSearch::lookupModule(StringRef ModuleName, StringRef SearchName,
       break;
   }
 
+  FileMgr.trackVFSUsage(true);
   return Module;
 }
 
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index f22da838424b415..4cee84ca00e5062 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -4985,7 +4985,7 @@ ASTReader::ASTReadResult ASTReader::readUnhashedControlBlockImpl(
         F->PragmaDiagMappings.insert(F->PragmaDiagMappings.end(),
                                      Record.begin(), Record.end());
       break;
-    case HEADER_SEARCH_ENTRY_USAGE:
+    case HEADER_SEARCH_ENTRY_USAGE: {
       if (!F)
         break;
       unsigned Count = Record[0];
@@ -4997,6 +4997,19 @@ ASTReader::ASTReadResult ASTReader::readUnhashedControlBlockImpl(
             F->SearchPathUsage[I] = true;
       break;
     }
+    case VFS_USAGE: {
+      if (!F)
+        break;
+      unsigned Count = Record[0];
+      const char *Byte = Blob.data();
+      F->VFSUsage = llvm::BitVector(Count, false);
+      for (unsigned I = 0; I < Count; ++Byte)
+        for (unsigned Bit = 0; Bit < 8 && I < Count; ++Bit, ++I)
+          if (*Byte & (1 << Bit))
+            F->VFSUsage[I] = true;
+      break;
+    }
+    }
   }
 }
 
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 6df815234e235fb..bc26e7c68720a40 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -1265,18 +1265,30 @@ void ASTWriter::writeUnhashedControlBlock(Preprocessor &PP,
     WritePragmaDiagnosticMappings(Diags, /* isModule = */ WritingModule);
 
   // Header search entry usage.
-  auto HSEntryUsage = PP.getHeaderSearchInfo().computeUserEntryUsage();
-  auto Abbrev = std::make_shared<BitCodeAbbrev>();
-  Abbrev->Add(BitCodeAbbrevOp(HEADER_SEARCH_ENTRY_USAGE));
-  Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // Number of bits.
-  Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));      // Bit vector.
-  unsigned HSUsageAbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
   {
+    auto HSEntryUsage = PP.getHeaderSearchInfo().computeUserEntryUsage();
+    auto Abbrev = std::make_shared<BitCodeAbbrev>();
+    Abbrev->Add(BitCodeAbbrevOp(HEADER_SEARCH_ENTRY_USAGE));
+    Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // Number of bits.
+    Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));      // Bit vector.
+    unsigned HSUsageAbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
     RecordData::value_type Record[] = {HEADER_SEARCH_ENTRY_USAGE,
                                        HSEntryUsage.size()};
     Stream.EmitRecordWithBlob(HSUsageAbbrevCode, Record, bytes(HSEntryUsage));
   }
 
+  // VFS usage.
+  {
+    auto VFSUsage = PP.getHeaderSearchInfo().computeVFSUsage();
+    auto Abbrev = std::make_shared<BitCodeAbbrev>();
+    Abbrev->Add(BitCodeAbbrevOp(VFS_USAGE));
+    Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // Number of bits.
+    Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));      // Bit vector.
+    unsigned VFSUsageAbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
+    RecordData::value_type Record[] = {VFS_USAGE, VFSUsage.size()};
+    Stream.EmitRecordWithBlob(VFSUsageAbbrevCode, Record, bytes(VFSUsage));
+  }
+
   // Leave the options block.
   Stream.ExitBlock();
   UnhashedControlBlockRange.second = Stream.GetCurrentBitNo() >> 3;
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
index 3e53c8fc5740875..bea52c906cab856 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
@@ -194,7 +194,9 @@ static bool shouldCacheStatFailures(StringRef Filename) {
 DependencyScanningWorkerFilesystem::DependencyScanningWorkerFilesystem(
     DependencyScanningFilesystemSharedCache &SharedCache,
     IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS)
-    : ProxyFileSystem(std::move(FS)), SharedCache(SharedCache),
+    : llvm::RTTIExtends<DependencyScanningWorkerFilesystem,
+                        llvm::vfs::ProxyFileSystem>(std::move(FS)),
+      SharedCache(SharedCache),
       WorkingDirForCacheLookup(llvm::errc::invalid_argument) {
   updateWorkingDirForCacheLookup();
 }
@@ -379,3 +381,5 @@ void DependencyScanningWorkerFilesystem::updateWorkingDirForCacheLookup() {
   assert(!WorkingDirForCacheLookup ||
          llvm::sys::path::is_absolute_gnu(*WorkingDirForCacheLookup));
 }
+
+const char DependencyScanningWorkerFilesystem::ID = 0;
diff --git a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
index 9099c18391e4d29..ac981a092d62dbe 100644
--- a/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
+++ b/clang/lib/Tooling/DependencyScanning/ModuleDepCollector.cpp
@@ -31,25 +31,55 @@ const std::vector<std::string> &ModuleDeps::getBuildArguments() {
 
 static void optimizeHeaderSearchOpts(HeaderSearchOptions &Opts,
                                      ASTReader &Reader,
-                                     const serialization::ModuleFile &MF) {
-  // Only preserve search paths that were used during the dependency scan.
-  std::vector<HeaderSearchOptions::Entry> Entries = Opts.UserEntries;
-  Opts.UserEntries.clear();
-
-  llvm::BitVector SearchPathUsage(Entries.size());
-  llvm::DenseSet<const serialization::ModuleFile *> Visited;
-  std::function<void(const serialization::ModuleFile *)> VisitMF =
-      [&](const serialization::ModuleFile *MF) {
-        SearchPathUsage |= MF->SearchPathUsage;
-        Visited.insert(MF);
-        for (const serialization::ModuleFile *Import : MF->Imports)
-          if (!Visited.contains(Import))
-            VisitMF(Import);
-      };
-  VisitMF(&MF);
-
-  for (auto Idx : SearchPathUsage.set_bits())
-    Opts.UserEntries.push_back(Entries[Idx]);
+                                     const serialization::ModuleFile &MF,
+                                     ScanningOptimizations OptimizeArgs) {
+  if (any(OptimizeArgs & ScanningOptimizations::HeaderSearch)) {
+    // Only preserve search paths that were used during the dependency scan.
+    std::vector<HeaderSearchOptions::Entry> Entries;
+    std::swap(Opts.UserEntries, Entries);
+
+    llvm::BitVector SearchPathUsage(Entries.size());
+    llvm::DenseSet<const serialization::ModuleFile *> Visited;
+    std::function<void(const serialization::ModuleFile *)> VisitMF =
+        [&](const serialization::ModuleFile *MF) {
+          SearchPathUsage |= MF->SearchPathUsage;
+          Visited.insert(MF);
+          for (const serialization::ModuleFile *Import : MF->Imports)
+            if (!Visited.contains(Import))
+              VisitMF(Import);
+        };
+    VisitMF(&MF);
+
+    if (SearchPathUsage.size() != Entries.size())
+      llvm::report_fatal_error(
+          "Inconsistent search path options between modules detected");
+
+    for (auto Idx : SearchPathUsage.set_bits())
+      Opts.UserEntries.push_back(std::move(Entries[Idx]));
+  }
+  if (any(OptimizeArgs & ScanningOptimizations::VFS)) {
+    std::vector<std::string> VFSOverlayFiles;
+    std::swap(Opts.VFSOverlayFiles, VFSOverlayFiles);
+
+    llvm::BitVector VFSUsage(VFSOverlayFiles.size());
+    llvm::DenseSet<const serialization::ModuleFile *> Visited;
+    std::function<void(const serialization::ModuleFile *)> VisitMF =
+        [&](const serialization::ModuleFile *MF) {
+          VFSUsage |= MF->VFSUsage;
+          Visited.insert(MF);
+          for (const serialization::ModuleFile *Import : MF->Imports)
+            if (!Visited.contains(Import))
+              VisitMF(Import);
+        };
+    VisitMF(&MF);
+
+    if (VFSUsage.size() != VFSOverlayFiles.size())
+      llvm::report_fatal_error(
+          "Inconsistent -ivfsoverlay options between modules detected");
+
+    for (auto Idx : VFSUsage.set_bits())
+      Opts.VFSOverlayFiles.push_back(std::move(VFSOverlayFiles[Idx]));
+  }
 }
 
 static void optimizeDiagnosticOpts(DiagnosticOptions &Opts,
@@ -551,9 +581,11 @@ ModuleDepCollectorPP::handleTopLevelModule(const Module *M) {
   CowCompilerInvocation CI =
       MDC.getInvocationAdjustedForModuleBuildWithoutOutputs(
           MD, [&](CowCompilerInvocation &BuildInvocation) {
-            if (any(MDC.OptimizeArgs & ScanningOptimizations::HeaderSearch))
+            if (any(MDC.OptimizeArgs & (ScanningOptimizations::HeaderSearch |
+                                        ScanningOptimizations::VFS)))
               optimizeHeaderSearchOpts(BuildInvocation.getMutHeaderSearchOpts(),
-                                       *MDC.ScanInstance.getASTReader(), *MF);
+                                       *MDC.ScanInstance.getASTReader(), *MF,
+                                       MDC.OptimizeArgs);
             if (any(MDC.OptimizeArgs & ScanningOptimizations::SystemWarnings))
               optimizeDiagnosticOpts(
                   BuildInvocation.getMutDiagnosticOpts(),
diff --git a/clang/test/ClangScanDeps/optimize-vfs.m b/clang/test/ClangScanDeps/optimize-vfs.m
new file mode 100644
index 000000000000000..20c97956087d2d8
--- /dev/null
+++ b/clang/test/ClangScanDeps/optimize-vfs.m
@@ -0,0 +1,193 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+// RUN: sed -e "s|DIR|%/t|g" %t/build/compile-commands.json.in > %t/build/compile-commands.json
+// RUN: sed -e "s|DIR|%/t|g" %t/build/vfs.yaml.in > %t/build/vfs.yaml
+// RUN: sed -e "s|DIR|%/t|g" %t/build/unused-vfs.yaml.in > %t/build/unused-vfs.yaml
+// RUN: sed -e "s|DIR|%/t|g" %t/build/unused-vfs.yaml.in > %t/build/unused2-vfs.yaml
+// RUN: clang-scan-deps -compilation-database %t/build/compile-commands.json \
+// RUN:   -j 1 -format experimental-full --optimize-args=vfs,header-search > %t/deps.db
+// RUN: cat %t/deps.db | sed 's:\\\\\?:/:g' | FileCheck %s -DPREFIX=%/t
+
+// Check that unused -ivfsoverlay arguments are removed, and that used ones are
+// not.
+
+// CHECK:      {
+// CHECK-NEXT:   "modules": [
+// CHECK-NEXT:     {
+// CHECK-NEXT:       "clang-module-deps": [],
+// CHECK-NEXT:       "clang-modulemap-file": "[[PREFIX]]/modules/A/module.modulemap",
+// CHECK-NEXT:       "command-line": [
+// CHECK-NOT:          "build/unused-vfs.yaml"
+// CHECK:              "-ivfsoverlay"
+// CHECK-NEXT:         "build/vfs.yaml"
+// CHECK-NOT:          "build/unused-vfs.yaml"
+// CHECK:            ],
+// CHECK-NEXT:       "context-hash": "{{.*}}",
+// CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/build/A.h",
+// CHECK-NEXT:         "[[PREFIX]]/build/module.modulemap"
+// CHECK-NEXT:       ],
+// CHECK-NEXT:       "name": "A"
+// CHECK-NEXT:     },
+// CHECK-NEXT:     {
+// CHECK-NEXT:       "clang-module-deps": [],
+// CHECK-NEXT:       "clang-modulemap-file": "[[PREFIX]]/modules/B/module.modulemap",
+// CHECK-NEXT:       "command-line": [
+// CHECK-NOT:          "-ivfsoverlay"
+// CHECK:            ],
+// CHECK-NEXT:       "context-hash": "{{.*}}",
+// CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/modules/B/B.h",
+// CHECK-NEXT:         "[[PREFIX]]/modules/B/module.modulemap"
+// CHECK-NEXT:       ],
+// CHECK-NEXT:       "name": "B"
+// CHECK-NEXT:     },
+// CHECK-NEXT:     {
+// CHECK-NEXT:       "clang-module-deps": [
+// CHECK-NEXT:         {
+// CHECK-NEXT:           "context-hash": "{{.*}}",
+// CHECK-NEXT:           "module-name": "B"
+// CHECK-NEXT:         }
+// CHECK-NEXT:       ],
+// CHECK-NEXT:       "clang-modulemap-file": "[[PREFIX]]/modules/C/module.modulemap",
+// CHECK-NEXT:       "command-line": [
+// CHECK-NOT:          "-ivfsoverlay"
+// CHECK:            ],
+// CHECK-NEXT:       "context-hash": "{{.*}}",
+// CHECK-NEXT:       "file-deps": [
+// CHECK-NEXT:         "[[PREFIX]]/modules/B/module.modulemap",
+// CHECK-NEXT:         "[[PREFIX]]/modules/C/C.h",
+// CHECK-NEXT:         "[[PREFIX]]/modules/C/module.modulemap"
+// CHECK-NEXT:       ],
+// CHECK-NEXT:       "name": "C"
+// CHECK-NEXT:     }
+// CHECK-NEXT:   ],
+// CHECK-NEXT:   "translation-units": [
+// CHECK:        ]
+// CHECK:      }
+
+//--- build/compile-commands.json.in
+
+[
+{
+  "directory": "DIR",
+  "command": "clang -c DIR/0.m -Imodules/A -Imodules/B -fmodules -fmodules-cache-path=DIR/module-cache -fimplicit-module-maps -ivfsoverlay build/unused-vfs.yaml -ivfsoverlay build/unused2-vfs.yaml -ivfsoverlay build/vfs.yaml",
+  "file": "DIR/0.m"
+},
+{
+  "directory": "DIR",
+  "command": "clang -c DIR/A.m -Imodules/A -Imodules/B -fmodules -fmodules-cache-path=DIR/module-cache -fimplicit-module-maps -ivfsoverlay build/vfs.yaml -ivfsoverlay build/unused-vfs.yaml",
+  "file": "DIR/A.m"
+},
+{
+  "directory": "DIR",
+  "command": "clang -c DIR/B.m -Imodules/B -fmodules -fmodules-cache-path=DIR/module-cache -fimplicit-module-maps -ivfsoverlay build/unused-vfs.yaml -ivfs...
[truncated]

std::vector<bool> HeaderSearch::computeVFSUsage() const {
std::vector<bool> VFSUsage;
llvm::vfs::FileSystem &RootFS = FileMgr.getVirtualFileSystem();
// TODO: This only works if the `RedirectingFileSystem`s were all created by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could all VFS created via createVFSFromOverlayFiles be marked with a flag and then assert here if the bit is not set? Or does it not matter and we only use the optimization potential?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could set a bit, and then just skip over VFSs without the bit. That would work even if people manually pass in a VFS with an extra RedirectingFileSystem, and we have a check later that would fire if createVFSFromOverlayFiles wasn't used at all.

Copy link
Collaborator

@benlangmuir benlangmuir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd to me that tracking is enabled by default. I would have expected tracking be off by default and enabled explicitly for scanning. Similarly, in the modulemap case it could save-and-restore rather than enable the tracking if it was previously off.

clang/lib/Serialization/ASTReader.cpp Outdated Show resolved Hide resolved
llvm/include/llvm/Support/VirtualFileSystem.h Outdated Show resolved Hide resolved
@Bigcheese
Copy link
Contributor Author

It's odd to me that tracking is enabled by default. I would have expected tracking be off by default and enabled explicitly for scanning. Similarly, in the modulemap case it could save-and-restore rather than enable the tracking if it was previously off.

It has to be on by default because of PCH. You never know if a PCM will be used for scanning.

@jansvoboda11
Copy link
Contributor

We set RedirectingFileSystem::HasBeenUsed to true while implicitly compiling one module, but then hand off the same VFS object to implicit compile of another module. This will cause all modules discovered later to incorrectly inherit that bit from previous modules. This will also introduce non-determinism based on the TU that reached a module first (and which modules it imported on the way). Here's a test case:

// This test checks that VFS usage doesn't leak between modules.

// RUN: rm -rf %t && split-file %s %t
// RUN: sed -e "s|DIR|%/t|g" %t/build/cdb.json.in > %t/build/cdb.json
// RUN: sed -e "s|DIR|%/t|g" %t/build/vfs.yaml.in > %t/build/vfs.yaml
// RUN: clang-scan-deps -compilation-database %t/build/cdb.json \
// RUN:   -format experimental-full --optimize-args=vfs > %t/deps.json
// RUN: cat %t/deps.json | sed 's:\\\\\?:/:g' | FileCheck %s -DPREFIX=%/t

// CHECK:      {
// CHECK-NEXT:   "modules": [
// CHECK-NEXT:     {
// CHECK-NEXT:       "clang-module-deps": [
// CHECK-NEXT:         {
// CHECK-NEXT:           "context-hash": "{{.*}}",
// CHECK-NEXT:           "module-name": "B"
// CHECK-NEXT:         },
// CHECK-NEXT:         {
// CHECK-NEXT:           "context-hash": "{{.*}}",
// CHECK-NEXT:           "module-name": "C"
// CHECK-NEXT:         }
// CHECK-NEXT:       ],
// CHECK-NEXT:       "clang-modulemap-file": "[[PREFIX]]/moduleA/module.modulemap",
// CHECK-NEXT:       "command-line": [
// Module A needs the VFS overlay because its dependency, module B, needs it.
// CHECK:              "-ivfsoverlay"
// CHECK-NEXT:         "[[PREFIX]]/build/vfs.yaml"
// CHECK:            ],
// CHECK-NEXT:       "context-hash": "{{.*}}",
// CHECK-NEXT:       "file-deps": [
// CHECK:            ],
// CHECK-NEXT:       "name": "A"
// CHECK-NEXT:     },
// CHECK-NEXT:     {
// CHECK-NEXT:       "clang-module-deps": [],
// CHECK-NEXT:       "clang-modulemap-file": "[[PREFIX]]/moduleB/module.modulemap",
// CHECK-NEXT:       "command-line": [
// Module B needs the VFS overlay because it provides the header referred to by the module map.
// CHECK:              "-ivfsoverlay"
// CHECK-NEXT:         "[[PREFIX]]/build/vfs.yaml"
// CHECK:            ],
// CHECK-NEXT:       "context-hash": "{{.*}}",
// CHECK-NEXT:       "file-deps": [
// CHECK:            ],
// CHECK-NEXT:       "name": "B"
// CHECK-NEXT:     },
// CHECK-NEXT:     {
// CHECK-NEXT:       "clang-module-deps": [],
// CHECK-NEXT:       "clang-modulemap-file": "[[PREFIX]]/moduleC/module.modulemap",
// CHECK-NEXT:       "command-line": [
// Module C doesn't need the VFS overlay.
// CHECK-NOT:          "-ivfsoverlay"
// CHECK:            ],
// CHECK-NEXT:       "context-hash": "{{.*}}",
// CHECK-NEXT:       "file-deps": [
// CHECK:            ],
// CHECK-NEXT:       "name": "C"
// CHECK-NEXT:     }
// CHECK-NEXT:   ],
// CHECK-NEXT:   "translation-units": [
// CHECK:        ]
// CHECK:      }

//--- build/cdb.json.in
[{
  "directory": "DIR",
  "command": "clang -c DIR/tu.m -I DIR/moduleA -I DIR/moduleB -I DIR/moduleC -fmodules -fmodules-cache-path=DIR/cache -fimplicit-module-maps -ivfsoverlay DIR/build/vfs.yaml",
  "file": "DIR/tu.m"
}]

//--- build/vfs.yaml.in
{
  "version": 0,
  "case-sensitive": "false",
  "roots": [
    {
      "contents": [
        {
          "external-contents": "DIR/build/B.h",
          "name": "B.h",
          "type": "file"
        }
      ],
      "name": "DIR/moduleB",
      "type": "directory"
    }
  ]
}

//--- tu.m
@import A;

//--- moduleA/module.modulemap
module A { header "A.h" }
//--- moduleA/A.h
@import B;
@import C;

//--- moduleB/module.modulemap
module B { header "B.h" }
//--- build/B.h

//--- moduleC/module.modulemap
module C { header "C.h" }
//--- moduleC/C.h

@jansvoboda11
Copy link
Contributor

jansvoboda11 commented Nov 30, 2023

I'm not thrilled by the chosen implementation strategy. Disabling tracking during parts of header search does not sound obviously correct to me. Module map search can have subtle side-effects, and I wouldn't be surprised if this feature was affected by one. That said, I haven't been able to come up with a case where this would yield incorrect results.

An alternative implementation I'd be more confident in would be to grab the set of files we know affect the compilation (built on top of ASTWriter::collectNonAffectingInputFiles()) and call FileManager::getOptionalFileRef() with their FileEntryRef::getNameAsRequested().

Copy link

github-actions bot commented Jan 25, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@Bigcheese
Copy link
Contributor Author

I've updated the patch to use the alternative implementation Jan suggested.

Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good! I have just a couple of minor comments.

clang/lib/Serialization/ASTWriter.cpp Outdated Show resolved Hide resolved
clang/lib/Lex/HeaderSearch.cpp Outdated Show resolved Hide resolved
clang/lib/Lex/HeaderSearch.cpp Show resolved Hide resolved
clang/include/clang/Serialization/ASTReader.h Outdated Show resolved Hide resolved
clang/lib/Serialization/ASTReader.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@jansvoboda11 jansvoboda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working through this!

clang/include/clang/Lex/HeaderSearch.h Outdated Show resolved Hide resolved
clang/lib/Serialization/ASTReader.cpp Outdated Show resolved Hide resolved
`-ivfsoverlay` files are unused when building most modules. Enable
removing them by,
* adding a way to visit the filesystem tree with extensible RTTI to
  access each `RedirectingFileSystem`.
* Adding tracking to `RedirectingFileSystem` to record when it
  actually redirects a file access.
* Storing this information in each PCM.

Usage tracking is only enabled when iterating over the source manager
and affecting modulemaps. Here each path is stated to cause an access.
During scanning these stats all hit the cache.
@Bigcheese Bigcheese merged commit 7847e44 into llvm:main Jan 30, 2024
3 of 4 checks passed
Bigcheese added a commit to Bigcheese/llvm-project that referenced this pull request Feb 1, 2024
`-ivfsoverlay` files are unused when building most modules. Enable
removing them by,
* adding a way to visit the filesystem tree with extensible RTTI to
  access each `RedirectingFileSystem`.
* Adding tracking to `RedirectingFileSystem` to record when it
  actually redirects a file access.
* Storing this information in each PCM.

Usage tracking is only enabled when iterating over the source manager
and affecting modulemaps. Here each path is stated to cause an access.
During scanning these stats all hit the cache.

llvm#73734
rdar://118030978
Bigcheese added a commit to apple/llvm-project that referenced this pull request Feb 7, 2024
`-ivfsoverlay` files are unused when building most modules. Enable
removing them by,
* adding a way to visit the filesystem tree with extensible RTTI to
  access each `RedirectingFileSystem`.
* Adding tracking to `RedirectingFileSystem` to record when it
  actually redirects a file access.
* Storing this information in each PCM.

Usage tracking is only enabled when iterating over the source manager
and affecting modulemaps. Here each path is stated to cause an access.
During scanning these stats all hit the cache.

llvm#73734
rdar://118030978
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category llvm:support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants