Skip to content

Conversation

@jpporto
Copy link

@jpporto jpporto commented Nov 24, 2025

StringMap insertion may cause elements to be rehased, or the underlying storage to be reallocated, so it is generatelly unsafe to take pointers to StringMap "values". This in turn caused clangd to fail to load its on disk cache as the code depended on pointer equality (and immutability) to work.

After this change, clangd will successfully re-open its index for large projects (e.g., LLVM itself), thus improving the developer experience.

StringMap insertion may cause elements to be rehased, or the underlying
storage to be reallocated, so it is generatelly unsafe to take pointers
to StringMap "values". This in turn caused clangd to fail to load its on
disk cache as the code depended on pointer equality (and immutability)
to work.

After this change, clangd will successfully re-open its index for large
projects (e.g., LLVM itself), thus improving the developer experience.
@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Nov 24, 2025

@llvm/pr-subscribers-clang-tools-extra

@llvm/pr-subscribers-clangd

Author: John Porto (jpporto)

Changes

StringMap insertion may cause elements to be rehased, or the underlying storage to be reallocated, so it is generatelly unsafe to take pointers to StringMap "values". This in turn caused clangd to fail to load its on disk cache as the code depended on pointer equality (and immutability) to work.

After this change, clangd will successfully re-open its index for large projects (e.g., LLVM itself), thus improving the developer experience.


Full diff: https://github.com/llvm/llvm-project/pull/169339.diff

2 Files Affected:

  • (modified) clang-tools-extra/clangd/GlobalCompilationDatabase.cpp (+22-10)
  • (modified) clang-tools-extra/clangd/GlobalCompilationDatabase.h (+1-1)
diff --git a/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp b/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp
index c6afd0bc07cbd..43e27b3f21d1a 100644
--- a/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp
+++ b/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp
@@ -398,8 +398,12 @@ DirectoryBasedGlobalCompilationDatabase::getDirectoryCaches(
   Ret.reserve(Dirs.size());
 
   std::lock_guard<std::mutex> Lock(DirCachesMutex);
-  for (unsigned I = 0; I < Dirs.size(); ++I)
-    Ret.push_back(&DirCaches.try_emplace(FoldedDirs[I], Dirs[I]).first->second);
+  for (unsigned I = 0; I < Dirs.size(); ++I) {
+    std::unique_ptr<DirectoryCache> &DC = DirCaches[FoldedDirs[I]];
+    if (!DC)
+      DC = std::make_unique<DirectoryCache>(Dirs[I]);
+    Ret.push_back(DC.get());
+  }
   return Ret;
 }
 
@@ -571,7 +575,7 @@ class DirectoryBasedGlobalCompilationDatabase::BroadcastThread::Filter {
     enum { Unknown, Missing, TargetCDB, OtherCDB } State = Unknown;
     DirInfo *Parent = nullptr;
   };
-  llvm::StringMap<DirInfo> Dirs;
+  llvm::StringMap<std::unique_ptr<DirInfo>> Dirs;
 
   // A search path starts at a directory, and either includes ancestors or not.
   using SearchPath = llvm::PointerIntPair<DirInfo *, 1>;
@@ -583,16 +587,18 @@ class DirectoryBasedGlobalCompilationDatabase::BroadcastThread::Filter {
     DirInfo *Child = nullptr;
     actOnAllParentDirectories(FilePath, [&](llvm::StringRef Dir) {
       auto &Info = Dirs[Dir];
+      if (!Info)
+        Info = std::make_unique<DirInfo>();
       // If this is the first iteration, then this node is the overall result.
       if (!Leaf)
-        Leaf = &Info;
+        Leaf = Info.get();
       // Fill in the parent link from the previous iteration to this parent.
       if (Child)
-        Child->Parent = &Info;
+        Child->Parent = Info.get();
       // Keep walking, whether we inserted or not, if parent link is missing.
       // (If it's present, parent links must be present up to the root, so stop)
-      Child = &Info;
-      return Info.Parent != nullptr;
+      Child = Info.get();
+      return Info->Parent != nullptr;
     });
     return Leaf;
   }
@@ -609,7 +615,7 @@ class DirectoryBasedGlobalCompilationDatabase::BroadcastThread::Filter {
     DirValues.reserve(Dirs.size());
     for (auto &E : Dirs) {
       DirKeys.push_back(E.first());
-      DirValues.push_back(&E.second);
+      DirValues.push_back(E.second.get());
     }
 
     // Also look up the cache entry for the CDB we're broadcasting.
@@ -677,7 +683,10 @@ class DirectoryBasedGlobalCompilationDatabase::BroadcastThread::Filter {
     std::vector<SearchPath> SearchPaths(AllFiles.size());
     for (unsigned I = 0; I < AllFiles.size(); ++I) {
       if (Parent.Opts.CompileCommandsDir) { // FIXME: unify with config
-        SearchPaths[I].setPointer(&Dirs[*Parent.Opts.CompileCommandsDir]);
+        std::unique_ptr<DirInfo> &Dir = Dirs[*Parent.Opts.CompileCommandsDir];
+        if (!Dir)
+          Dir = std::make_unique<DirInfo>();
+        SearchPaths[I].setPointer(Dir.get());
         continue;
       }
       if (ExitEarly()) // loading config may be slow
@@ -693,7 +702,10 @@ class DirectoryBasedGlobalCompilationDatabase::BroadcastThread::Filter {
         SearchPaths[I].setPointer(addParents(AllFiles[I]));
         break;
       case Config::CDBSearchSpec::FixedDir:
-        SearchPaths[I].setPointer(&Dirs[*Spec.FixedCDBPath]);
+        std::unique_ptr<DirInfo> &Dir = Dirs[*Spec.FixedCDBPath];
+        if (!Dir)
+          Dir = std::make_unique<DirInfo>();
+        SearchPaths[I].setPointer(Dir.get());
         break;
       }
     }
diff --git a/clang-tools-extra/clangd/GlobalCompilationDatabase.h b/clang-tools-extra/clangd/GlobalCompilationDatabase.h
index 1d636d73664be..2a8e3821c4596 100644
--- a/clang-tools-extra/clangd/GlobalCompilationDatabase.h
+++ b/clang-tools-extra/clangd/GlobalCompilationDatabase.h
@@ -143,7 +143,7 @@ class DirectoryBasedGlobalCompilationDatabase
   class DirectoryCache;
   // Keyed by possibly-case-folded directory path.
   // We can hand out pointers as they're stable and entries are never removed.
-  mutable llvm::StringMap<DirectoryCache> DirCaches;
+  mutable llvm::StringMap<std::unique_ptr<DirectoryCache>> DirCaches;
   mutable std::mutex DirCachesMutex;
 
   std::vector<DirectoryCache *>

@bcardosolopes
Copy link
Member

Nice! Can you please add a testcase?

@github-actions
Copy link

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Developer Policy and LLVM Discourse for more information.

@jpporto
Copy link
Author

jpporto commented Nov 24, 2025

Nice! Can you please add a testcase?

I am unsure how to add a test for this -- the error is non-deterministic. Moreover, the original commit didn't add a test, so it is hard to base off of that. I am open to suggestions, though.

@jpporto
Copy link
Author

jpporto commented Nov 25, 2025

@bcardosolopes pointed me to clang-tools-extra/clangd/unittests/GlobalCompilationDatabaseTests.cpp, so I spent some time looking at that code. I still don't know how I can write a test.

The scenario I want to repro is what happens during clangd's initialization when it lists all files in the compile_commands.json, and decides whether or not it should rebuild the index. In that scenario the
DirCaches may be rehashed, and thus some of the DirectoryCache pointers stored in the DirInfo
objects will no longer point to the appropriate entry.

@HighCommander4
Copy link
Collaborator

StringMap insertion may cause elements to be rehased, or the underlying storage to be reallocated, so it is generatelly unsafe to take pointers to StringMap "values".

Are you sure about this?

The comment above the declaration of DirCaches specifically says "We can hand out pointers as they're stable and entries are never removed."

And while I haven't studied the implementation of StringMap in detail, I at least can't find an obvious place where values would get moved after initial insertion: the hashtable stores pointers to entries (suggesting that you can reallocate/rearrange the table without moving the entry objects themselves), and the entries are created via placement-new here and I'm not seeing what would call that during a rehash or reallocation.

@HighCommander4
Copy link
Collaborator

HighCommander4 commented Nov 30, 2025

In fact, I can give DirectoryCache = delete'ed copy and move constructors, and the code compiles, suggesting that the StringMap never copies or moves the values.

@jpporto
Copy link
Author

jpporto commented Nov 30, 2025

Thanks for the pointers, truly appreciated.

I am not sure what was happening on my machine (my clangd was built with a revision from a few weeks ago, so probably a broken version either). I am closing the PR for now until I can repro the issue on my side (hopefully never).

Cheers.

@jpporto jpporto closed this Nov 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants