Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][deps] Only bypass scanning VFS for the module cache #88800

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jansvoboda11
Copy link
Contributor

The scanning VFS doesn't cache stat failures of paths with no extension. This was originally implemented to avoid caching the non-existence of the modules cache directory that the modular scanner will eventually create if it does not exist.

However, this prevents caching of the non-existence of all directories and notably also header files from the standard C++ library, which can lead to sub-par performance.

This patch adds an API to the scanning VFS that allows clients to configure path prefix for which to bypass the scanning VFS and use the underlying VFS directly.

The scanning VFS doesn't cache stat failures of paths with no extension. This was originally implemented to avoid caching the non-existence of the modules cache directory that the modular scanner will eventually create if it does not exist.

However, this prevents caching of the non-existence of all directories and notably also header files from the standard C++ library, which can lead to sub-par performance.

This patch adds an API to the scanning VFS that allows clients to configure path prefix for which to bypass the scanning VFS and use the underlying VFS directly.
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Apr 15, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 15, 2024

@llvm/pr-subscribers-clang

Author: Jan Svoboda (jansvoboda11)

Changes

The scanning VFS doesn't cache stat failures of paths with no extension. This was originally implemented to avoid caching the non-existence of the modules cache directory that the modular scanner will eventually create if it does not exist.

However, this prevents caching of the non-existence of all directories and notably also header files from the standard C++ library, which can lead to sub-par performance.

This patch adds an API to the scanning VFS that allows clients to configure path prefix for which to bypass the scanning VFS and use the underlying VFS directly.


Full diff: https://github.com/llvm/llvm-project/pull/88800.diff

4 Files Affected:

  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h (+13)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp (+7-9)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp (+20-12)
  • (modified) clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp (+27)
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
index f7b4510d7f7beb..d12814e7c9253e 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
@@ -353,6 +353,12 @@ class DependencyScanningWorkerFilesystem
 
   std::error_code setCurrentWorkingDirectory(const Twine &Path) override;
 
+  /// Make it so that no paths bypass this VFS.
+  void resetBypassedPathPrefix() { BypassedPathPrefix.reset(); }
+  /// Set the prefix for paths that should bypass this VFS and go straight to
+  /// the underlying VFS.
+  void setBypassedPathPrefix(StringRef Prefix) { BypassedPathPrefix = Prefix; }
+
   /// Returns entry for the given filename.
   ///
   /// Attempts to use the local and shared caches first, then falls back to
@@ -450,12 +456,19 @@ class DependencyScanningWorkerFilesystem
     getUnderlyingFS().print(OS, Type, IndentLevel + 1);
   }
 
+  /// Whether this path should bypass this VFS and go straight to the underlying
+  /// VFS.
+  bool shouldBypass(StringRef Path) const;
+
   /// The global cache shared between worker threads.
   DependencyScanningFilesystemSharedCache &SharedCache;
   /// The local cache is used by the worker thread to cache file system queries
   /// locally instead of querying the global cache every time.
   DependencyScanningFilesystemLocalCache LocalCache;
 
+  /// Prefix of paths that should go straight to the underlying VFS.
+  std::optional<std::string> BypassedPathPrefix;
+
   /// The working directory to use for making relative paths absolute before
   /// using them for cache lookups.
   llvm::ErrorOr<std::string> WorkingDirForCacheLookup;
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
index 0cab17a3424406..4d738e4bea41a6 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
@@ -201,11 +201,8 @@ const CachedRealPath &DependencyScanningFilesystemSharedCache::CacheShard::
   return *StoredRealPath;
 }
 
-static bool shouldCacheStatFailures(StringRef Filename) {
-  StringRef Ext = llvm::sys::path::extension(Filename);
-  if (Ext.empty())
-    return false; // This may be the module cache directory.
-  return true;
+bool DependencyScanningWorkerFilesystem::shouldBypass(StringRef Path) const {
+  return BypassedPathPrefix && Path.starts_with(*BypassedPathPrefix);
 }
 
 DependencyScanningWorkerFilesystem::DependencyScanningWorkerFilesystem(
@@ -244,8 +241,6 @@ DependencyScanningWorkerFilesystem::computeAndStoreResult(
   llvm::ErrorOr<llvm::vfs::Status> Stat =
       getUnderlyingFS().status(OriginalFilename);
   if (!Stat) {
-    if (!shouldCacheStatFailures(OriginalFilename))
-      return Stat.getError();
     const auto &Entry =
         getOrEmplaceSharedEntryForFilename(FilenameForLookup, Stat.getError());
     return insertLocalEntryForFilename(FilenameForLookup, Entry);
@@ -291,7 +286,7 @@ DependencyScanningWorkerFilesystem::status(const Twine &Path) {
   SmallString<256> OwnedFilename;
   StringRef Filename = Path.toStringRef(OwnedFilename);
 
-  if (Filename.ends_with(".pcm"))
+  if (shouldBypass(Filename))
     return getUnderlyingFS().status(Path);
 
   llvm::ErrorOr<EntryRef> Result = getOrCreateFileSystemEntry(Filename);
@@ -362,7 +357,7 @@ DependencyScanningWorkerFilesystem::openFileForRead(const Twine &Path) {
   SmallString<256> OwnedFilename;
   StringRef Filename = Path.toStringRef(OwnedFilename);
 
-  if (Filename.ends_with(".pcm"))
+  if (shouldBypass(Filename))
     return getUnderlyingFS().openFileForRead(Path);
 
   llvm::ErrorOr<EntryRef> Result = getOrCreateFileSystemEntry(Filename);
@@ -377,6 +372,9 @@ DependencyScanningWorkerFilesystem::getRealPath(const Twine &Path,
   SmallString<256> OwnedFilename;
   StringRef OriginalFilename = Path.toStringRef(OwnedFilename);
 
+  if (shouldBypass(OriginalFilename))
+    return getUnderlyingFS().getRealPath(Path, Output);
+
   SmallString<256> PathBuf;
   auto FilenameForLookup = tryGetFilenameForLookup(OriginalFilename, PathBuf);
   if (!FilenameForLookup)
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
index 32850f5eea92a9..4fad825ca242ba 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
@@ -347,6 +347,26 @@ class DependencyScanningAction : public tooling::ToolAction {
         ScanInstance.getInvocation(), ScanInstance.getDiagnostics(),
         DriverFileMgr->getVirtualFileSystemPtr());
 
+    // Use the dependency scanning optimized file system if requested to do so.
+    if (DepFS) {
+      StringRef ModulesCachePath =
+          ScanInstance.getHeaderSearchOpts().ModuleCachePath;
+
+      DepFS->resetBypassedPathPrefix();
+      if (!ModulesCachePath.empty())
+        DepFS->setBypassedPathPrefix(ModulesCachePath);
+
+      ScanInstance.getPreprocessorOpts().DependencyDirectivesForFile =
+          [LocalDepFS = DepFS](FileEntryRef File)
+          -> std::optional<ArrayRef<dependency_directives_scan::Directive>> {
+        if (llvm::ErrorOr<EntryRef> Entry =
+                LocalDepFS->getOrCreateFileSystemEntry(File.getName()))
+          if (LocalDepFS->ensureDirectiveTokensArePopulated(*Entry))
+            return Entry->getDirectiveTokens();
+        return std::nullopt;
+      };
+    }
+
     // Create a new FileManager to match the invocation's FileSystemOptions.
     auto *FileMgr = ScanInstance.createFileManager(FS);
     ScanInstance.createSourceManager(*FileMgr);
@@ -363,18 +383,6 @@ class DependencyScanningAction : public tooling::ToolAction {
               PrebuiltModuleVFSMap, ScanInstance.getDiagnostics()))
         return false;
 
-    // Use the dependency scanning optimized file system if requested to do so.
-    if (DepFS)
-      ScanInstance.getPreprocessorOpts().DependencyDirectivesForFile =
-          [LocalDepFS = DepFS](FileEntryRef File)
-          -> std::optional<ArrayRef<dependency_directives_scan::Directive>> {
-        if (llvm::ErrorOr<EntryRef> Entry =
-                LocalDepFS->getOrCreateFileSystemEntry(File.getName()))
-          if (LocalDepFS->ensureDirectiveTokensArePopulated(*Entry))
-            return Entry->getDirectiveTokens();
-        return std::nullopt;
-      };
-
     // Create the dependency collector that will collect the produced
     // dependencies.
     //
diff --git a/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp b/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp
index 87bb67cfd9327c..69843d3ca50b0e 100644
--- a/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp
+++ b/clang/unittests/Tooling/DependencyScanning/DependencyScanningFilesystemTest.cpp
@@ -174,3 +174,30 @@ TEST(DependencyScanningFilesystem, CacheStatOnExists) {
   EXPECT_EQ(InstrumentingFS->NumStatusCalls, 2u);
   EXPECT_EQ(InstrumentingFS->NumExistsCalls, 0u);
 }
+
+TEST(DependencyScanningFilesystem, CacheStatFailures) {
+  auto InMemoryFS = llvm::makeIntrusiveRefCnt<llvm::vfs::InMemoryFileSystem>();
+  InMemoryFS->setCurrentWorkingDirectory("/");
+  InMemoryFS->addFile("/dir/vector", 0, llvm::MemoryBuffer::getMemBuffer(""));
+  InMemoryFS->addFile("/cache/a.pcm", 0, llvm::MemoryBuffer::getMemBuffer(""));
+
+  auto InstrumentingFS =
+      llvm::makeIntrusiveRefCnt<InstrumentingFilesystem>(InMemoryFS);
+
+  DependencyScanningFilesystemSharedCache SharedCache;
+  DependencyScanningWorkerFilesystem DepFS(SharedCache, InstrumentingFS);
+
+  DepFS.status("/dir");
+  DepFS.status("/dir");
+  EXPECT_EQ(InstrumentingFS->NumStatusCalls, 1u);
+
+  DepFS.status("/dir/vector");
+  DepFS.status("/dir/vector");
+  EXPECT_EQ(InstrumentingFS->NumStatusCalls, 2u);
+
+  DepFS.setBypassedPathPrefix("/cache");
+  DepFS.exists("/cache/a.pcm");
+  EXPECT_EQ(InstrumentingFS->NumStatusCalls, 3u);
+  DepFS.exists("/cache/a.pcm");
+  EXPECT_EQ(InstrumentingFS->NumStatusCalls, 4u);
+}

return false; // This may be the module cache directory.
return true;
bool DependencyScanningWorkerFilesystem::shouldBypass(StringRef Path) const {
return BypassedPathPrefix && Path.starts_with(*BypassedPathPrefix);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always expect canonical paths here? Are paths always rendered using the style of the target platform?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that the rest of the compiler just appends to the Path we pass in here. But looking into that a bit closer I see that pruneModuleCache() in CompilerInstance.cpp calls llvm::sys::path::native() on this and HeaderSearch::getCachedModuleFileNameImpl() calls llvm::sys::fs::make_absolute(). I'll investigate if these are necessary and if so, I'll try to unify things so that just doing string prefix match is enough in all situations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseHeaderSearchArgs() in CompilerInvocation.cpp already calls llvm::sys::fs::make_absolute() with either the process CWD or -working-directory, so the call to the same function in HeaderSearch::getCachedModuleFileNameImpl() seems to be redundant for Clang instances that were constructed from command line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't canonicalize if it's already absolute, which is commonly the case for the module cache dir. If we're going to canonicalize on each access anyway, I think it's fine to canonicalize in CompilerInstance at the start.

@@ -362,7 +357,7 @@ DependencyScanningWorkerFilesystem::openFileForRead(const Twine &Path) {
SmallString<256> OwnedFilename;
StringRef Filename = Path.toStringRef(OwnedFilename);

if (Filename.ends_with(".pcm"))
if (shouldBypass(Filename))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the .pcm files to be generated in the modules cache path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All implicitly-built PCM files, yes. We'll now start caching explicitly-built PCM files (that were either passed on the command line or extracted from a PCH file), but these are immutable anyways, so that should be fine.

DepFS.exists("/cache/a.pcm");
EXPECT_EQ(InstrumentingFS->NumStatusCalls, 3u);
DepFS.exists("/cache/a.pcm");
EXPECT_EQ(InstrumentingFS->NumStatusCalls, 4u);
Copy link
Member

@aganea aganea Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe test usage of DepFS.resetBypassedPathPrefix() also here?

@aganea
Copy link
Member

aganea commented May 6, 2024

Ping @jansvoboda11! Are you able to get back to this soon? LG generally, just missing a test case above. Thanks!

@jansvoboda11
Copy link
Contributor Author

I don't think I'll have time to work on this PR in the short-term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants