Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][deps] Lazy dependency directives #86347

Merged
merged 1 commit into from
Mar 22, 2024

Conversation

jansvoboda11
Copy link
Contributor

Since b4c83a1, Preprocessor and Lexer are aware of the concept of scanning dependency directives. This makes it possible to scan for them on-demand rather than eagerly on the first filesystem operation (open, or even just stat).

This might improve performance, but is also necessary for the "PCH as module" mode. Some precompiled header sources use the ".pch" file extension, which means they were not getting scanned for dependency directives. This was okay when the PCH was the main input file in a separate scan step, because there we just lex the file in a scanning-specific frontend action. But when such source gets treated as a module implicitly loaded from a TU, it will get compiled as any other module - with Sema - which will result in compilation errors. (See attached test case.)

Since b4c83a1, `Preprocessor` and `Lexer` are aware of the concept of scanning dependency directives. This makes it possible to scan for them on-demand rather than eagerly on the first filesystem operation (open, or even just stat).

This might improve performance, but is also necessary for the "PCH as module" mode. Some precompiled header sources use the ".pch" file extension, which means they were not getting scanned for dependency directives. This was okay when the PCH was the main input file in a separate scan step, because there we just lex the file in a scanning-specific frontend action. But when such source gets treated as a module implicitly loaded from a TU, it will get compiled as any other module - with Sema - which will result in compilation errors. (See attached test case.)
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Mar 22, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 22, 2024

@llvm/pr-subscribers-clang

Author: Jan Svoboda (jansvoboda11)

Changes

Since b4c83a1, Preprocessor and Lexer are aware of the concept of scanning dependency directives. This makes it possible to scan for them on-demand rather than eagerly on the first filesystem operation (open, or even just stat).

This might improve performance, but is also necessary for the "PCH as module" mode. Some precompiled header sources use the ".pch" file extension, which means they were not getting scanned for dependency directives. This was okay when the PCH was the main input file in a separate scan step, because there we just lex the file in a scanning-specific frontend action. But when such source gets treated as a module implicitly loaded from a TU, it will get compiled as any other module - with Sema - which will result in compilation errors. (See attached test case.)


Full diff: https://github.com/llvm/llvm-project/pull/86347.diff

4 Files Affected:

  • (modified) clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h (+7-11)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp (+15-47)
  • (modified) clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp (+3-1)
  • (added) clang/test/ClangScanDeps/modules-extension.c (+33)
diff --git a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
index 846fdc7253977f..870ef8b2f45bb0 100644
--- a/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
+++ b/clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
@@ -242,6 +242,8 @@ class EntryRef {
   /// The underlying cached entry.
   const CachedFileSystemEntry &Entry;
 
+  friend class DependencyScanningWorkerFilesystem;
+
 public:
   EntryRef(StringRef Name, const CachedFileSystemEntry &Entry)
       : Filename(Name), Entry(Entry) {}
@@ -300,14 +302,13 @@ class DependencyScanningWorkerFilesystem
   ///
   /// Attempts to use the local and shared caches first, then falls back to
   /// using the underlying filesystem.
-  llvm::ErrorOr<EntryRef>
-  getOrCreateFileSystemEntry(StringRef Filename,
-                             bool DisableDirectivesScanning = false);
+  llvm::ErrorOr<EntryRef> getOrCreateFileSystemEntry(StringRef Filename);
 
-private:
-  /// Check whether the file should be scanned for preprocessor directives.
-  bool shouldScanForDirectives(StringRef Filename);
+  /// Scan for preprocessor directives for the given entry if necessary and
+  /// returns a wrapper object with reference semantics.
+  bool scanForDirectives(EntryRef Entry);
 
+private:
   /// For a filename that's not yet associated with any entry in the caches,
   /// uses the underlying filesystem to either look up the entry based in the
   /// shared cache indexed by unique ID, or creates new entry from scratch.
@@ -317,11 +318,6 @@ class DependencyScanningWorkerFilesystem
   computeAndStoreResult(StringRef OriginalFilename,
                         StringRef FilenameForLookup);
 
-  /// Scan for preprocessor directives for the given entry if necessary and
-  /// returns a wrapper object with reference semantics.
-  EntryRef scanForDirectivesIfNecessary(const CachedFileSystemEntry &Entry,
-                                        StringRef Filename, bool Disable);
-
   /// Represents a filesystem entry that has been stat-ed (and potentially read)
   /// and that's about to be inserted into the cache as `CachedFileSystemEntry`.
   struct TentativeEntry {
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
index 1b750cec41e1cc..8fd1990440bbb2 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
@@ -41,24 +41,24 @@ DependencyScanningWorkerFilesystem::readFile(StringRef Filename) {
   return TentativeEntry(Stat, std::move(Buffer));
 }
 
-EntryRef DependencyScanningWorkerFilesystem::scanForDirectivesIfNecessary(
-    const CachedFileSystemEntry &Entry, StringRef Filename, bool Disable) {
-  if (Entry.isError() || Entry.isDirectory() || Disable ||
-      !shouldScanForDirectives(Filename))
-    return EntryRef(Filename, Entry);
+bool DependencyScanningWorkerFilesystem::scanForDirectives(EntryRef Ref) {
+  auto &Entry = Ref.Entry;
+
+  if (Entry.isError() || Entry.isDirectory())
+    return false;
 
   CachedFileContents *Contents = Entry.getCachedContents();
   assert(Contents && "contents not initialized");
 
   // Double-checked locking.
   if (Contents->DepDirectives.load())
-    return EntryRef(Filename, Entry);
+    return true;
 
   std::lock_guard<std::mutex> GuardLock(Contents->ValueLock);
 
   // Double-checked locking.
   if (Contents->DepDirectives.load())
-    return EntryRef(Filename, Entry);
+    return true;
 
   SmallVector<dependency_directives_scan::Directive, 64> Directives;
   // Scan the file for preprocessor directives that might affect the
@@ -69,16 +69,16 @@ EntryRef DependencyScanningWorkerFilesystem::scanForDirectivesIfNecessary(
     Contents->DepDirectiveTokens.clear();
     // FIXME: Propagate the diagnostic if desired by the client.
     Contents->DepDirectives.store(new std::optional<DependencyDirectivesTy>());
-    return EntryRef(Filename, Entry);
+    return false;
   }
 
   // This function performed double-checked locking using `DepDirectives`.
   // Assigning it must be the last thing this function does, otherwise other
-  // threads may skip the
-  // critical section (`DepDirectives != nullptr`), leading to a data race.
+  // threads may skip the critical section (`DepDirectives != nullptr`), leading
+  // to a data race.
   Contents->DepDirectives.store(
       new std::optional<DependencyDirectivesTy>(std::move(Directives)));
-  return EntryRef(Filename, Entry);
+  return true;
 }
 
 DependencyScanningFilesystemSharedCache::
@@ -161,34 +161,11 @@ DependencyScanningFilesystemSharedCache::CacheShard::
   return *EntriesByFilename.insert({Filename, &Entry}).first->getValue();
 }
 
-/// Whitelist file extensions that should be minimized, treating no extension as
-/// a source file that should be minimized.
-///
-/// This is kinda hacky, it would be better if we knew what kind of file Clang
-/// was expecting instead.
-static bool shouldScanForDirectivesBasedOnExtension(StringRef Filename) {
-  StringRef Ext = llvm::sys::path::extension(Filename);
-  if (Ext.empty())
-    return true; // C++ standard library
-  return llvm::StringSwitch<bool>(Ext)
-      .CasesLower(".c", ".cc", ".cpp", ".c++", ".cxx", true)
-      .CasesLower(".h", ".hh", ".hpp", ".h++", ".hxx", true)
-      .CasesLower(".m", ".mm", true)
-      .CasesLower(".i", ".ii", ".mi", ".mmi", true)
-      .CasesLower(".def", ".inc", true)
-      .Default(false);
-}
-
 static bool shouldCacheStatFailures(StringRef Filename) {
   StringRef Ext = llvm::sys::path::extension(Filename);
   if (Ext.empty())
     return false; // This may be the module cache directory.
-  // Only cache stat failures on files that are not expected to change during
-  // the build.
-  StringRef FName = llvm::sys::path::filename(Filename);
-  if (FName == "module.modulemap" || FName == "module.map")
-    return true;
-  return shouldScanForDirectivesBasedOnExtension(Filename);
+  return true;
 }
 
 DependencyScanningWorkerFilesystem::DependencyScanningWorkerFilesystem(
@@ -201,11 +178,6 @@ DependencyScanningWorkerFilesystem::DependencyScanningWorkerFilesystem(
   updateWorkingDirForCacheLookup();
 }
 
-bool DependencyScanningWorkerFilesystem::shouldScanForDirectives(
-    StringRef Filename) {
-  return shouldScanForDirectivesBasedOnExtension(Filename);
-}
-
 const CachedFileSystemEntry &
 DependencyScanningWorkerFilesystem::getOrEmplaceSharedEntryForUID(
     TentativeEntry TEntry) {
@@ -259,7 +231,7 @@ DependencyScanningWorkerFilesystem::computeAndStoreResult(
 
 llvm::ErrorOr<EntryRef>
 DependencyScanningWorkerFilesystem::getOrCreateFileSystemEntry(
-    StringRef OriginalFilename, bool DisableDirectivesScanning) {
+    StringRef OriginalFilename) {
   StringRef FilenameForLookup;
   SmallString<256> PathBuf;
   if (llvm::sys::path::is_absolute_gnu(OriginalFilename)) {
@@ -276,15 +248,11 @@ DependencyScanningWorkerFilesystem::getOrCreateFileSystemEntry(
   assert(llvm::sys::path::is_absolute_gnu(FilenameForLookup));
   if (const auto *Entry =
           findEntryByFilenameWithWriteThrough(FilenameForLookup))
-    return scanForDirectivesIfNecessary(*Entry, OriginalFilename,
-                                        DisableDirectivesScanning)
-        .unwrapError();
+    return EntryRef(OriginalFilename, *Entry).unwrapError();
   auto MaybeEntry = computeAndStoreResult(OriginalFilename, FilenameForLookup);
   if (!MaybeEntry)
     return MaybeEntry.getError();
-  return scanForDirectivesIfNecessary(*MaybeEntry, OriginalFilename,
-                                      DisableDirectivesScanning)
-      .unwrapError();
+  return EntryRef(OriginalFilename, *MaybeEntry).unwrapError();
 }
 
 llvm::ErrorOr<llvm::vfs::Status>
diff --git a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
index 76f3d950a13b81..f240573c3437ba 100644
--- a/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
+++ b/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
@@ -371,8 +371,10 @@ class DependencyScanningAction : public tooling::ToolAction {
           [LocalDepFS = std::move(LocalDepFS)](FileEntryRef File)
           -> std::optional<ArrayRef<dependency_directives_scan::Directive>> {
         if (llvm::ErrorOr<EntryRef> Entry =
-                LocalDepFS->getOrCreateFileSystemEntry(File.getName()))
+                LocalDepFS->getOrCreateFileSystemEntry(File.getName())) {
+          LocalDepFS->scanForDirectives(*Entry);
           return Entry->getDirectiveTokens();
+        }
         return std::nullopt;
       };
     }
diff --git a/clang/test/ClangScanDeps/modules-extension.c b/clang/test/ClangScanDeps/modules-extension.c
new file mode 100644
index 00000000000000..0f27f608440f45
--- /dev/null
+++ b/clang/test/ClangScanDeps/modules-extension.c
@@ -0,0 +1,33 @@
+// RUN: rm -rf %t
+// RUN: split-file %s %t
+
+// This test checks that source files with uncommon extensions still undergo
+// dependency directives scan. If header.pch would not and b.h would, the scan
+// would fail when parsing `void function(B)` and not knowing the symbol B.
+
+//--- module.modulemap
+module __PCH { header "header.pch" }
+module B { header "b.h" }
+
+//--- header.pch
+#include "b.h"
+void function(B);
+
+//--- b.h
+typedef int B;
+
+//--- tu.c
+int main() {
+  function(0);
+  return 0;
+}
+
+//--- cdb.json.in
+[{
+  "directory": "DIR",
+  "file": "DIR/tu.c",
+  "command": "clang -c DIR/tu.c -fmodules -fmodules-cache-path=DIR/cache -fimplicit-module-maps -include DIR/header.pch"
+}]
+
+// RUN: sed -e "s|DIR|%/t|g" %t/cdb.json.in > %t/cdb.json
+// RUN: clang-scan-deps -compilation-database %t/cdb.json -format experimental-full > %t/deps.json

Copy link

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link

✅ With the latest revision this PR passed the Python code formatter.

Copy link
Contributor

@akyrtzi akyrtzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Note that you implemented rdar://107663951 🎉

@jansvoboda11 jansvoboda11 merged commit b768a8c into llvm:main Mar 22, 2024
3 of 4 checks passed
@jansvoboda11 jansvoboda11 deleted the lazy-dependency-directives branch March 22, 2024 23:09
jansvoboda11 added a commit to apple/llvm-project that referenced this pull request Mar 23, 2024
The CAS counterpart to the upstream llvm#86347.
jansvoboda11 added a commit to apple/llvm-project that referenced this pull request Mar 25, 2024
…tives

[clang][deps][CAS] Lazy dependency directives

The CAS counterpart to the upstream llvm#86347.
jansvoboda11 added a commit to apple/llvm-project that referenced this pull request Mar 27, 2024
Since b4c83a1, `Preprocessor` and
`Lexer` are aware of the concept of scanning dependency directives. This
makes it possible to scan for them on-demand rather than eagerly on the
first filesystem operation (open, or even just stat).

This might improve performance, but is also necessary for the "PCH as
module" mode. Some precompiled header sources use the ".pch" file
extension, which means they were not getting scanned for dependency
directives. This was okay when the PCH was the main input file in a
separate scan step, because there we just lex the file in a
scanning-specific frontend action. But when such source gets treated as
a module implicitly loaded from a TU, it will get compiled as any other
module - with Sema - which will result in compilation errors. (See
attached test case.)

rdar://107663951
(cherry picked from commit b768a8c)
jansvoboda11 added a commit to apple/llvm-project that referenced this pull request Mar 27, 2024
The CAS counterpart to the upstream llvm#86347.

(cherry picked from commit 0d92d1c)
jansvoboda11 added a commit to apple/llvm-project that referenced this pull request Mar 28, 2024
Since b4c83a1, `Preprocessor` and
`Lexer` are aware of the concept of scanning dependency directives. This
makes it possible to scan for them on-demand rather than eagerly on the
first filesystem operation (open, or even just stat).

This might improve performance, but is also necessary for the "PCH as
module" mode. Some precompiled header sources use the ".pch" file
extension, which means they were not getting scanned for dependency
directives. This was okay when the PCH was the main input file in a
separate scan step, because there we just lex the file in a
scanning-specific frontend action. But when such source gets treated as
a module implicitly loaded from a TU, it will get compiled as any other
module - with Sema - which will result in compilation errors. (See
attached test case.)

rdar://107663951
(cherry picked from commit b768a8c)
jansvoboda11 added a commit to apple/llvm-project that referenced this pull request Mar 28, 2024
The CAS counterpart to the upstream llvm#86347.

(cherry picked from commit 0d92d1c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants