Skip to content

Conversation

@zhyty
Copy link
Contributor

@zhyty zhyty commented Nov 5, 2025

Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered (his previous attempt).

This change can be summarized as the following:

  • Plumb through a boolean flag to force no preload in GetOrCreateModules all the way through to LoadModuleAtAddress.
  • Parallelize Module::PreloadSymbols separately from Target::GetOrCreateModule and its caller LoadModuleAtAddress (this is what avoids the deadlock).

These changes roughly maintain the performance characteristics of the previous implementation of parallel module loading. Testing on targets with between 5000 and 14000 modules, I saw similar numbers as before, often more than 10% faster in the new implementation across multiple trials for these massive targets. I think it's because we have less lock contention with this approach.

The deadlock

See bt.txt for a sample backtrace of LLDB when the deadlock occurs.

As @GeorgeHuyubo explains in his PR, the deadlock occurs from an ABBA deadlock that happens when a thread context-switches out of Module::PreloadSymbols, goes into Target::GetOrCreateModule for another module, possibly entering this block:

      if (!module_sp) {
        // The platform is responsible for finding and caching an appropriate
        // module in the shared module cache.
        if (m_platform_sp) {
          error = m_platform_sp->GetSharedModule(
              module_spec, m_process_sp.get(), module_sp, &search_paths,
              &old_modules, &did_create_module);
        } else {
          error = Status::FromErrorString("no platform is currently set");
        }
      }

Module::PreloadSymbols holds a module-level mutex, and then GetSharedModule attempts to hold the mutex of the global shared ModuleList. So, this thread holds the module mutex, and waits on the global shared ModuleList mutex.

A competing thread may execute Target::GetOrCreateModule, enter the same block as above, grabbing the global shared ModuleList mutex. Then, in ModuleList::GetSharedModule, we eventually call ModuleList::FindModules which eventually waits for the Module mutex held by the first thread (via Module::GetUUID). Thus, we deadlock.

Reproducing the deadlock

It might be worth noting that I've never been able to observe this deadlock issue during live debugging (e.g. launching or attaching to processes), however we were able to consistently reproduce this issue with coredumps when using the following settings:

(lldb) settings set target.parallel-module-load true
(lldb) settings set target.preload-symbols true
(lldb) settings set symbols.load-on-demand false
(lldb) target create --core /some/core/file/here
# deadlock happens

How this change avoids this deadlock

This change avoids concurrent executions of Module::PreloadSymbols with Target::GetOrCreateModule by waiting until after the Target::GetOrCreateModule executions to run Module::PreloadSymbols in parallel. This avoids the ordering of holding a Module lock then the ModuleList lock, as Target::GetOrCreateModule executions maintain the ordering of the shared ModuleList lock first (from what I've read and tested).

Why not read-write lock?

Some feedback in #160225 was to modify mutexes used in these components with read-write locks. This might be a good idea overall, but I don't think it would easily resolve this specific deadlock. Module::PreloadSymbols would probably need a write lock to Module, so even if we had a read lock in Module::GetUUID we would still contend. Maybe the ModuleList lock could be a read lock that converts to a write lock if it chooses to update the module, but it seems likely that some thread would try to update the shared module list and then the write lock would contend again.

Perhaps with deeper architectural changes, we could fix this issue?

Other attempts

One downside of this approach (and the former approach of parallel module loading) is that each DYLD would need to implement this pattern themselves. With @clayborg's help, I looked at a few other approaches:

  • In Target::GetOrCreateModule, backgrounding the Module::PreloadSymbols call by adding it directly to the thread pool via Debugger::GetThreadPool().async(). This required adding a lock to Module::SetLoadAddress (probably should be one there already) since ObjectFileELF::SetLoadAddress is not thread-safe (updates sections). Unfortunately, during execution, this causes the preload symbols to run synchronously with Target::GetOrCreateModule, preventing us from truly parallelizing the execution.
  • In Module::PreloadSymbols, backgrounding the symtab and sym_file PreloadSymbols calls individually, but similar issues as the above.
  • Passing a callback function like [LLDB] Deterministic module order in Target::SetExecutableModule swiftlang/llvm-project#10746 instead of the boolean I use in this change. It's functionally the same change IMO, with some design tradeoffs:
    • Pro: the caller doesn't need to explicitly call Module::PreloadSymbols itself, and can instead call whatever function is passed into the callback.
    • Con: the caller needs to delay the execution of the callback such that it occurs after the GetOrCreateModule logic, otherwise we run into the same issue. I thought this would be trickier for the caller, requiring some kinda condition variable or otherwise storing the calls to execute afterwards.

Test Plan:

ninja check-lldb

@github-actions
Copy link

github-actions bot commented Nov 5, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@dmpots
Copy link
Contributor

dmpots commented Nov 5, 2025

I wonder if instead of a boolean we should pass down a function that schedules the module loading as suggested in swiftlang#10746.

That gives more flexibility in that we could immediately schedule the preload to happen in parallel (as done in the other PR), or for your use case we could accumulate the preload callbacks into a vector and then schedule all the preload jobs to run in parallel once all the modules have been loaded.

Any thoughts @DmT021?

@zhyty
Copy link
Contributor Author

zhyty commented Nov 5, 2025

I wonder if instead of a boolean we should pass down a function that schedules the module loading as suggested in #10746.

That gives more flexibility in that we could immediately schedule the preload to happen in parallel (as done in the other PR), or for your use case we could accumulate the preload callbacks into a vector and then schedule all the preload jobs to run in parallel once all the modules have been loaded.

Any thoughts @DmT021?

(Proper link would be swiftlang#10746. #10746 in this repo refers to something unrelated)

@zhyty zhyty marked this pull request as ready for review November 5, 2025 19:20
@zhyty zhyty requested a review from JDevlieghere as a code owner November 5, 2025 19:20
@llvmbot llvmbot added the lldb label Nov 5, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 5, 2025

@llvm/pr-subscribers-lldb

Author: Tom Yang (zhyty)

Changes

Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered (his previous attempt).

This change can be summarized as the following:

  • Plumb through a boolean flag to force no preload in GetOrCreateModules all the way through to LoadModuleAtAddress.
  • Parallelize Module::PreloadSymbols separately from Target::GetOrCreateModule and its caller LoadModuleAtAddress (this is what avoids the deadlock).

These changes roughly maintain the performance characteristics of the previous implementation of parallel module loading. Testing on targets with between 5000 and 14000 modules, I saw similar numbers as before, often more than 10% faster in the new implementation across multiple trials for these massive targets. I think it's because we have less lock contention with this approach.

The deadlock

As @GeorgeHuyubo explains in his PR, the deadlock occurs from an ABBA deadlock that happens when a thread context-switches out of Module::PreloadSymbols, goes into Target::GetOrCreateModule for another module, possibly entering this block:

      if (!module_sp) {
        // The platform is responsible for finding and caching an appropriate
        // module in the shared module cache.
        if (m_platform_sp) {
          error = m_platform_sp->GetSharedModule(
              module_spec, m_process_sp.get(), module_sp, &search_paths,
              &old_modules, &did_create_module);
        } else {
          error = Status::FromErrorString("no platform is currently set");
        }
      }

Module::PreloadSymbols holds a module-level mutex, and then GetSharedModule attempts to hold the mutex of the global shared ModuleList. So, this thread holds the module mutex, and waits on the global shared ModuleList mutex.

A competing thread may execute Target::GetOrCreateModule, enter the same block as above, grabbing the global shared ModuleList mutex. Then, in ModuleList::GetSharedModule, we eventually call ModuleList::FindModules which eventually waits for the Module mutex held by the first thread (via Module::GetUUID). Thus, we deadlock.

Reproducing the deadlock

It might be worth noting that I've never been able to observe this deadlock issue during live debugging (e.g. launching or attaching to processes), however we were able to consistently reproduce this issue with coredumps when using the following settings:

(lldb) settings set target.parallel-module-load true
(lldb) settings set target.preload-symbols true
(lldb) settings set symbols.load-on-demand false
(lldb) target create --core /some/core/file/here
# deadlock happens

How this change avoids this deadlock

This change avoids concurrent executions of Module::PreloadSymbols with Target::GetOrCreateModule by waiting until after the Target::GetOrCreateModule executions to run Module::PreloadSymbols in parallel. This avoids the ordering of holding a Module lock then the ModuleList lock, as Target::GetOrCreateModule executions maintain the ordering of the shared ModuleList lock first (from what I've read and tested).

Why not read-write lock?

Some feedback in #160225 was to modify mutexes used in these components with read-write locks. This might be a good idea overall, but I don't think it would easily resolve this specific deadlock. Module::PreloadSymbols would probably need a write lock to Module, so even if we had a read lock in Module::GetUUID we would still contend. Maybe the ModuleList lock could be a read lock that converts to a write lock if it chooses to update the module, but it seems likely that some thread would try to update the shared module list and then the write lock would contend again.

Perhaps with deeper architectural changes, we could fix this issue?

Other attempts

One downside of this approach (and the former approach of parallel module loading) is that each DYLD would need to implement this pattern themselves. With @clayborg's help, I looked at a few other approaches:

  • In Target::GetOrCreateModule, backgrounding the Module::PreloadSymbols call by adding it directly to the thread pool via Debugger::GetThreadPool().async(). This required adding a lock to Module::SetLoadAddress (probably should be one there already) since ObjectFileELF::SetLoadAddress is not thread-safe (updates sections). Unfortunately, during execution, this causes the preload symbols to run synchronously with Target::GetOrCreateModule, preventing us from truly parallelizing the execution.
  • In Module::PreloadSymbols, backgrounding the symtab and sym_file PreloadSymbols calls individually, but similar issues as the above.
  • Passing a callback function like [LLDB] Deterministic module order in Target::SetExecutableModule swiftlang/llvm-project#10746 instead of the boolean I use in this change. It's functionally the same change IMO, with some design tradeoffs:
    • Pro: the caller doesn't need to explicitly call Module::PreloadSymbols itself, and can instead call whatever function is passed into the callback.
    • Con: the caller needs to delay the execution of the callback such that it occurs after the GetOrCreateModule logic, otherwise we run into the same issue. I thought this would be trickier for the caller, requiring some kinda condition variable or otherwise storing the calls to execute afterwards.

Test Plan:

ninja check-lldb

Full diff: https://github.com/llvm/llvm-project/pull/166480.diff

10 Files Affected:

  • (modified) lldb/include/lldb/Core/ModuleList.h (+3)
  • (modified) lldb/include/lldb/Target/DynamicLoader.h (+15-2)
  • (modified) lldb/include/lldb/Target/Target.h (+9-1)
  • (modified) lldb/source/Core/DynamicLoader.cpp (+8-4)
  • (modified) lldb/source/Core/ModuleList.cpp (+13)
  • (modified) lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp (+25-15)
  • (modified) lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h (+5-4)
  • (modified) lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.cpp (+4-2)
  • (modified) lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h (+5-4)
  • (modified) lldb/source/Target/Target.cpp (+5-3)
diff --git a/lldb/include/lldb/Core/ModuleList.h b/lldb/include/lldb/Core/ModuleList.h
index e71f3b2bad6b4..8e87a2cb01df6 100644
--- a/lldb/include/lldb/Core/ModuleList.h
+++ b/lldb/include/lldb/Core/ModuleList.h
@@ -511,6 +511,9 @@ class ModuleList {
   /// Atomically swaps the contents of this module list with \a other.
   void Swap(ModuleList &other);
 
+  /// For each module in this ModuleList, preload its symbols.
+  void PreloadSymbols() const;
+
 protected:
   // Class typedefs.
   typedef std::vector<lldb::ModuleSP>
diff --git a/lldb/include/lldb/Target/DynamicLoader.h b/lldb/include/lldb/Target/DynamicLoader.h
index 75bb6cb6bb907..271d5f761696c 100644
--- a/lldb/include/lldb/Target/DynamicLoader.h
+++ b/lldb/include/lldb/Target/DynamicLoader.h
@@ -207,10 +207,17 @@ class DynamicLoader : public PluginInterface {
   /// resulting module at the virtual base address \p base_addr.
   /// Note that this calls Target::GetOrCreateModule with notify being false,
   /// so it is necessary to call Target::ModulesDidLoad afterwards.
+  ///
+  /// \param[in] defer_module_preload
+  ///     If the module needs to be loaded, prevent the module from being
+  ///     preloaded even if the user sets the preload symbols option. This is
+  ///     used as a performance optimization should the caller preload the
+  ///     modules in parallel after calling this function.
   virtual lldb::ModuleSP LoadModuleAtAddress(const lldb_private::FileSpec &file,
                                              lldb::addr_t link_map_addr,
                                              lldb::addr_t base_addr,
-                                             bool base_addr_is_offset);
+                                             bool base_addr_is_offset,
+                                             bool defer_module_preload = false);
 
   /// Find/load a binary into lldb given a UUID and the address where it is
   /// loaded in memory, or a slide to be applied to the file address.
@@ -352,7 +359,13 @@ class DynamicLoader : public PluginInterface {
 protected:
   // Utility methods for derived classes
 
-  lldb::ModuleSP FindModuleViaTarget(const FileSpec &file);
+  /// Find a module in the target that matches the given file.
+  ///
+  /// \param[in] defer_module_preload
+  ///     If the module needs to be loaded, prevent the module from being
+  ///     preloaded even if the user sets the preload symbols option.
+  lldb::ModuleSP FindModuleViaTarget(const FileSpec &file,
+                                     bool defer_module_preload = false);
 
   /// Checks to see if the target module has changed, updates the target
   /// accordingly and returns the target executable module.
diff --git a/lldb/include/lldb/Target/Target.h b/lldb/include/lldb/Target/Target.h
index c375df248154f..af755508cbe74 100644
--- a/lldb/include/lldb/Target/Target.h
+++ b/lldb/include/lldb/Target/Target.h
@@ -649,12 +649,20 @@ class Target : public std::enable_shared_from_this<Target>,
   ///     will handle / summarize the failures in a custom way and
   ///     don't use these messages.
   ///
+  /// \param[in] defer_preload
+  ///     If true, the module will not be preloaded even if
+  ///     Target::GetPreloadSymbols() is true. This is useful when the caller
+  ///     wishes to preload loaded modules in parallel after calling this
+  ///     function for better performance. This is because it's currently not
+  ///     thread-safe to do so during the execution of this function.
+  ///
   /// \return
   ///     An empty ModuleSP will be returned if no matching file
   ///     was found.  If error_ptr was non-nullptr, an error message
   ///     will likely be provided.
   lldb::ModuleSP GetOrCreateModule(const ModuleSpec &module_spec, bool notify,
-                                   Status *error_ptr = nullptr);
+                                   Status *error_ptr = nullptr,
+                                   bool defer_preload = false);
 
   // Settings accessors
 
diff --git a/lldb/source/Core/DynamicLoader.cpp b/lldb/source/Core/DynamicLoader.cpp
index 7580b15c02ce1..d1f2ae33af75e 100644
--- a/lldb/source/Core/DynamicLoader.cpp
+++ b/lldb/source/Core/DynamicLoader.cpp
@@ -154,7 +154,8 @@ DynamicLoader::GetSectionListFromModule(const ModuleSP module) const {
   return sections;
 }
 
-ModuleSP DynamicLoader::FindModuleViaTarget(const FileSpec &file) {
+ModuleSP DynamicLoader::FindModuleViaTarget(const FileSpec &file,
+                                            bool defer_module_preload) {
   Target &target = m_process->GetTarget();
   ModuleSpec module_spec(file, target.GetArchitecture());
   if (UUID uuid = m_process->FindModuleUUID(file.GetPath())) {
@@ -165,7 +166,9 @@ ModuleSP DynamicLoader::FindModuleViaTarget(const FileSpec &file) {
   if (ModuleSP module_sp = target.GetImages().FindFirstModule(module_spec))
     return module_sp;
 
-  if (ModuleSP module_sp = target.GetOrCreateModule(module_spec, false))
+  if (ModuleSP module_sp = target.GetOrCreateModule(
+          module_spec, false /* notify */, nullptr /* error_ptr */,
+          defer_module_preload))
     return module_sp;
 
   return nullptr;
@@ -174,8 +177,9 @@ ModuleSP DynamicLoader::FindModuleViaTarget(const FileSpec &file) {
 ModuleSP DynamicLoader::LoadModuleAtAddress(const FileSpec &file,
                                             addr_t link_map_addr,
                                             addr_t base_addr,
-                                            bool base_addr_is_offset) {
-  if (ModuleSP module_sp = FindModuleViaTarget(file)) {
+                                            bool base_addr_is_offset,
+                                            bool defer_module_preload) {
+  if (ModuleSP module_sp = FindModuleViaTarget(file, defer_module_preload)) {
     UpdateLoadedSections(module_sp, link_map_addr, base_addr,
                          base_addr_is_offset);
     return module_sp;
diff --git a/lldb/source/Core/ModuleList.cpp b/lldb/source/Core/ModuleList.cpp
index c40612c1ced5e..c8594b4aa803a 100644
--- a/lldb/source/Core/ModuleList.cpp
+++ b/lldb/source/Core/ModuleList.cpp
@@ -7,6 +7,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "lldb/Core/ModuleList.h"
+#include "lldb/Core/Debugger.h"
 #include "lldb/Core/Module.h"
 #include "lldb/Core/ModuleSpec.h"
 #include "lldb/Core/PluginManager.h"
@@ -26,6 +27,7 @@
 #include "lldb/Utility/Log.h"
 #include "lldb/Utility/UUID.h"
 #include "lldb/lldb-defines.h"
+#include "llvm/Support/ThreadPool.h"
 
 #if defined(_WIN32)
 #include "lldb/Host/windows/PosixApi.h"
@@ -1357,3 +1359,14 @@ void ModuleList::Swap(ModuleList &other) {
       m_modules_mutex, other.m_modules_mutex);
   m_modules.swap(other.m_modules);
 }
+
+void ModuleList::PreloadSymbols() const {
+  std::lock_guard<std::recursive_mutex> guard(m_modules_mutex);
+  llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+  for (const ModuleSP &module_sp : m_modules)
+    task_group.async([module_sp] {
+      if (module_sp)
+        module_sp->PreloadSymbols();
+    });
+  // task group destructor waits for all tasks to complete
+}
diff --git a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
index 326b6910b5267..2ca3f1eb6e27a 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
@@ -453,7 +453,8 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
     // exists for the duration of this call in `m_rendezvous`.
     auto load_module_fn =
         [this, &loaded_modules, &new_modules,
-         &interpreter_module_mutex](const DYLDRendezvous::SOEntry &so_entry) {
+         &interpreter_module_mutex](const DYLDRendezvous::SOEntry &so_entry,
+                                    bool defer_module_preload) {
           // Don't load a duplicate copy of ld.so if we have already loaded it
           // earlier in LoadInterpreterModule. If we instead loaded then
           // unloaded it later, the section information for ld.so would be
@@ -469,7 +470,8 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
           }
 
           ModuleSP module_sp = LoadModuleAtAddress(
-              so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, true);
+              so_entry.file_spec, so_entry.link_addr, so_entry.base_addr,
+              true /* base_addr_is_offset */, defer_module_preload);
           if (!module_sp.get())
             return;
 
@@ -503,11 +505,14 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
     if (m_process->GetTarget().GetParallelModuleLoad()) {
       llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
       for (; I != E; ++I)
-        task_group.async(load_module_fn, *I);
+        task_group.async(load_module_fn, *I, true /* defer_module_preload */);
       task_group.wait();
+
+      if (m_process->GetTarget().GetPreloadSymbols())
+        new_modules.PreloadSymbols();
     } else {
       for (; I != E; ++I)
-        load_module_fn(*I);
+        load_module_fn(*I, false /* defer_module_preload */);
     }
 
     m_process->GetTarget().ModulesDidLoad(new_modules);
@@ -644,12 +649,12 @@ ModuleSP DynamicLoaderPOSIXDYLD::LoadInterpreterModule() {
   return nullptr;
 }
 
-ModuleSP DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(const FileSpec &file,
-                                                     addr_t link_map_addr,
-                                                     addr_t base_addr,
-                                                     bool base_addr_is_offset) {
+ModuleSP DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(
+    const FileSpec &file, addr_t link_map_addr, addr_t base_addr,
+    bool base_addr_is_offset, bool defer_module_preload) {
   if (ModuleSP module_sp = DynamicLoader::LoadModuleAtAddress(
-          file, link_map_addr, base_addr, base_addr_is_offset))
+          file, link_map_addr, base_addr, base_addr_is_offset,
+          defer_module_preload))
     return module_sp;
 
   // This works around an dynamic linker "bug" on android <= 23, where the
@@ -668,7 +673,7 @@ ModuleSP DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(const FileSpec &file,
         !(memory_info.GetName().IsEmpty())) {
       if (ModuleSP module_sp = DynamicLoader::LoadModuleAtAddress(
               FileSpec(memory_info.GetName().GetStringRef()), link_map_addr,
-              base_addr, base_addr_is_offset))
+              base_addr, base_addr_is_offset, defer_module_preload))
         return module_sp;
     }
   }
@@ -704,9 +709,11 @@ void DynamicLoaderPOSIXDYLD::LoadAllCurrentModules() {
       module_names, m_process->GetTarget().GetArchitecture().GetTriple());
 
   auto load_module_fn = [this, &module_list,
-                         &log](const DYLDRendezvous::SOEntry &so_entry) {
-    ModuleSP module_sp = LoadModuleAtAddress(
-        so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, true);
+                         &log](const DYLDRendezvous::SOEntry &so_entry,
+                               bool defer_module_preload) {
+    ModuleSP module_sp =
+        LoadModuleAtAddress(so_entry.file_spec, so_entry.link_addr,
+                            so_entry.base_addr, true, defer_module_preload);
     if (module_sp.get()) {
       LLDB_LOG(log, "LoadAllCurrentModules loading module: {0}",
                so_entry.file_spec.GetFilename());
@@ -723,11 +730,14 @@ void DynamicLoaderPOSIXDYLD::LoadAllCurrentModules() {
   if (m_process->GetTarget().GetParallelModuleLoad()) {
     llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
     for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I)
-      task_group.async(load_module_fn, *I);
+      task_group.async(load_module_fn, *I, true /* defer_module_preload */);
     task_group.wait();
+
+    if (m_process->GetTarget().GetPreloadSymbols())
+      module_list.PreloadSymbols();
   } else {
     for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I) {
-      load_module_fn(*I);
+      load_module_fn(*I, false /* defer_module_preload */);
     }
   }
 
diff --git a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
index 6efb92673a13c..2d9dba9bca9fa 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
@@ -55,10 +55,11 @@ class DynamicLoaderPOSIXDYLD : public lldb_private::DynamicLoader {
   // PluginInterface protocol
   llvm::StringRef GetPluginName() override { return GetPluginNameStatic(); }
 
-  lldb::ModuleSP LoadModuleAtAddress(const lldb_private::FileSpec &file,
-                                     lldb::addr_t link_map_addr,
-                                     lldb::addr_t base_addr,
-                                     bool base_addr_is_offset) override;
+  lldb::ModuleSP
+  LoadModuleAtAddress(const lldb_private::FileSpec &file,
+                      lldb::addr_t link_map_addr, lldb::addr_t base_addr,
+                      bool base_addr_is_offset,
+                      bool defer_module_preload = false) override;
 
   void CalculateDynamicSaveCoreRanges(
       lldb_private::Process &process,
diff --git a/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.cpp b/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.cpp
index d019415cb67a6..09cde464ccdc8 100644
--- a/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.cpp
+++ b/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.cpp
@@ -67,9 +67,11 @@ ThreadPlanSP DynamicLoaderWasmDYLD::GetStepThroughTrampolinePlan(Thread &thread,
 
 lldb::ModuleSP DynamicLoaderWasmDYLD::LoadModuleAtAddress(
     const lldb_private::FileSpec &file, lldb::addr_t link_map_addr,
-    lldb::addr_t base_addr, bool base_addr_is_offset) {
+    lldb::addr_t base_addr, bool base_addr_is_offset,
+    bool defer_module_preload) {
   if (ModuleSP module_sp = DynamicLoader::LoadModuleAtAddress(
-          file, link_map_addr, base_addr, base_addr_is_offset))
+          file, link_map_addr, base_addr, base_addr_is_offset,
+          defer_module_preload))
     return module_sp;
 
   if (ModuleSP module_sp = m_process->ReadModuleFromMemory(file, base_addr)) {
diff --git a/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h b/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h
index 5ed855395cca7..9c13bdff0279a 100644
--- a/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h
+++ b/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h
@@ -33,10 +33,11 @@ class DynamicLoaderWasmDYLD : public DynamicLoader {
   Status CanLoadImage() override { return Status(); }
   lldb::ThreadPlanSP GetStepThroughTrampolinePlan(Thread &thread,
                                                   bool stop) override;
-  lldb::ModuleSP LoadModuleAtAddress(const lldb_private::FileSpec &file,
-                                     lldb::addr_t link_map_addr,
-                                     lldb::addr_t base_addr,
-                                     bool base_addr_is_offset) override;
+  lldb::ModuleSP
+  LoadModuleAtAddress(const lldb_private::FileSpec &file,
+                      lldb::addr_t link_map_addr, lldb::addr_t base_addr,
+                      bool base_addr_is_offset,
+                      bool defer_module_preload = false) override;
 
   /// \}
 
diff --git a/lldb/source/Target/Target.cpp b/lldb/source/Target/Target.cpp
index d070c3d953d4a..ea685992043e7 100644
--- a/lldb/source/Target/Target.cpp
+++ b/lldb/source/Target/Target.cpp
@@ -2342,7 +2342,8 @@ bool Target::ReadPointerFromMemory(const Address &addr, Status &error,
 }
 
 ModuleSP Target::GetOrCreateModule(const ModuleSpec &orig_module_spec,
-                                   bool notify, Status *error_ptr) {
+                                   bool notify, Status *error_ptr,
+                                   bool defer_preload) {
   ModuleSP module_sp;
 
   Status error;
@@ -2406,7 +2407,7 @@ ModuleSP Target::GetOrCreateModule(const ModuleSpec &orig_module_spec,
                 module_spec.GetFileSpec().GetDirectory(), transformed_dir)) {
           transformed_spec.GetFileSpec().SetDirectory(transformed_dir);
           transformed_spec.GetFileSpec().SetFilename(
-                module_spec.GetFileSpec().GetFilename());
+              module_spec.GetFileSpec().GetFilename());
           error = ModuleList::GetSharedModule(transformed_spec, module_sp,
                                               &search_paths, &old_modules,
                                               &did_create_module);
@@ -2510,8 +2511,9 @@ ModuleSP Target::GetOrCreateModule(const ModuleSpec &orig_module_spec,
 
         // Preload symbols outside of any lock, so hopefully we can do this for
         // each library in parallel.
-        if (GetPreloadSymbols())
+        if (GetPreloadSymbols() && !defer_preload)
           module_sp->PreloadSymbols();
+
         llvm::SmallVector<ModuleSP, 1> replaced_modules;
         for (ModuleSP &old_module_sp : old_modules) {
           if (m_images.GetIndexForModule(old_module_sp.get()) !=

@zhyty
Copy link
Contributor Author

zhyty commented Nov 5, 2025

I wonder if instead of a boolean we should pass down a function that schedules the module loading as suggested in #10746.

That gives more flexibility in that we could immediately schedule the preload to happen in parallel (as done in the other PR), or for your use case we could accumulate the preload callbacks into a vector and then schedule all the preload jobs to run in parallel once all the modules have been loaded.

Any thoughts @DmT021?

I just finished writing a small blurb explaining why I went with this approach, so maybe that helps justify this approach over that one?

That gives more flexibility in that we could immediately schedule the preload to happen in parallel (as done in the other PR)

Immediately scheduling the preload to happen in parallel shouldn't be thread-safe due to the ABBA deadlock. I don't think that flexibility is good, then, if it leads callers into this deadlock.

@JDevlieghere
Copy link
Member

Since you can easily reproduce the deadlock, can you include a backtrace to help me better understand the call stack. I'm particularly interested in understanding why PreloadSymbols needs to call GetOrCreateModule.

At first glance, the current solution feels a bit ad-hoc, especially because DynamicLoaderPOSIXDYLD is the only one taking this approach. If this makes sense for the Linux case, why not defer this work unconditionally, and do it after loading all the modules?

Lastly, adopting a read-write lock in the module list would benefit more than just resolving this deadlock, so I would really like us to explore that option first. The problem here is again that we are calling GetOrCreateModule and that's the part I don't understand (yet). Assuming there's a good reason we need to get (but not create) a module, then a read lock should do.

@zhyty
Copy link
Contributor Author

zhyty commented Nov 6, 2025

Since you can easily reproduce the deadlock, can you include a backtrace to help me better understand the call stack. I'm particularly interested in understanding why PreloadSymbols needs to call GetOrCreateModule.

Just included @GeorgeHuyubo's backtrace in the description bt.txt. It's not that PreloadSymbols calls GetOrCreateModule. Looking at that attached backtrace as an example:

  • Note that DynamicLoaderDumpWithModuleList is some Meta-specific implementation of a DYLD for coredumps.
  • In thread 5, frames 59 to 51 show how we call LoadModuleAtAddress eventually all the way down to ManualDWARFIndex::Index, which also schedules tasks on the thread pool. When waiting for these tasks to finish, thread 5 context switches to another call of LoadModuleAtAddress at frame 12, which this time executes ModuleList::GetSharedModule. This is how thread 5 holds a module mutex (which would be a write lock), then attempts to hold the global shared module list mutex (which may be a read lock that converts to a write lock).
  • In thread 42, it's simpler -- it goes straight into ModuleList::GetSharedModule holding the global shared module list mutex (earlier than thread 5), then at frame 8 calls Module::GetUUID on the module mutex held by thread 5. Thus, ABBA.

So, PreloadSymbols doesn't call GetOrCreateModule, but a thread may context switch such that it ends up holding the mutexes in the [Module, ModuleList] order.

At first glance, the current solution feels a bit ad-hoc, especially because DynamicLoaderPOSIXDYLD is the only one taking this approach. If this makes sense for the Linux case, why not defer this work unconditionally, and do it after loading all the modules?

Yeah, I think that would be fine as well. I am probably missing a nice clean way of writing this solution out :/. Any ideas how we would defer this work unconditionally? Should I refactor LoadModuleAtAddress or GetOrCreateModule in some way?

I think we would actually need to change at least the Darwin DYLD as well since this could theoretically take place for it too (though I haven't reproduced on Mac yet).

Lastly, adopting a read-write lock in the module list would benefit more than just resolving this deadlock, so I would really like us to explore that option first.

That's true, and I think would benefit the overall design of the module list! I think the part that I don't agree on is that it would do anything to resolve this deadlock -- looking at the execution, would we not run into the same deadlock due to lock contention between readers and writers?

@JDevlieghere
Copy link
Member

Thanks for the backtrace and the clarification. I now also finally understand why you mentioned "context switching", which confused me as that's not something the LLVM ThreadPool supports. What's going on is that one of the worker threads is submitting new tasks (belonging to a different task group) and waiting on them.

From https://llvm.org/doxygen/classllvm_1_1ThreadPoolInterface.html:

It is also possible for worker threads to submit new tasks and wait for them. Note that this may result in a deadlock in cases such as when a task (directly or indirectly) tries to wait for its own completion, or when all available threads are used up by tasks waiting for a task that has no thread left to run on (this includes waiting on the returned future). It should be generally safe to wait() for a group as long as groups do not form a cycle.

However that last statement doesn't account for the situation where we're in. IIUC, we have a task group X and a task group Y. We create task group Y from a task belonging to X, and when we call wait, the thread pool picks up the next task from X, not from Y. This is what you referred to as "context switching". It also explains why @GeorgeHuyubo's switch to a separate ThreadPool avoided the issue.

With that information, I would argue that the actual mutexes involved here are irrelevant. Nothing can make this operation inherently safe. Even without any mutexes, with a sufficiently small group of threads, this may never complete. The only solution here is to not create a new task pool from a task. Your patch is doing exactly that by deferring the preload until after we're processing the modules in parallel.

Alright, I've convinced myself that this is the right approach. Thanks for bearing with me. I'm now also convinced that we should do this unconditionally. The only reason we're not seeing this on Darwin is because we always use the accelerator tables and don't need to manually index. However, it inherently suffers from the same problem. Let's change this globally and we should be good everywhere.

@zhyty
Copy link
Contributor Author

zhyty commented Nov 7, 2025

@JDevlieghere no worries, "context switching" was a bad (and incorrect) way of calling it. Your explanation helps me understand it better too!

The only reason we're not seeing this on Darwin is because we always use the accelerator tables and don't need to manually index.

Ah right, I always forget Darwin has that luxury!

However, it inherently suffers from the same problem. Let's change this globally and we should be good everywhere.

By change it "globally", I was initially going to change it on the caller side. I.e. do something similar as DynamicLoaderPosixDYLD to DynamicLoaderDarwin. Was this what you meant by "change it globally", or were you suggesting another kind of change?

@JDevlieghere
Copy link
Member

By change it "globally", I was initially going to change it on the caller side. I.e. do something similar as DynamicLoaderPosixDYLD to DynamicLoaderDarwin. Was this what you meant by "change it globally", or were you suggesting another kind of change?

I meant removing the flag, and always doing the "deferred" preloading after we've loaded all the modules.

@zhyty
Copy link
Contributor Author

zhyty commented Nov 11, 2025

By change it "globally", I was initially going to change it on the caller side. I.e. do something similar as DynamicLoaderPosixDYLD to DynamicLoaderDarwin. Was this what you meant by "change it globally", or were you suggesting another kind of change?

I meant removing the flag, and always doing the "deferred" preloading after we've loaded all the modules.

Right, sorry just to double check: there's a bunch of direct calls to Target::GetOrCreateModule or LoadModuleAtAddress scattered around. I'm interpreting your suggestion as "remove the Module::PreloadSymbols call from Target::GetOrCreateModule entirely, and explicitly call it from the DYLD callers or wherever seems appropriate"? I'll include a commit to confirm we're on the same page.

Tom Yang added 3 commits November 12, 2025 12:14
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: https://phabricator.intern.facebook.com/D86004440
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Target::GetOrCreateModule

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@zhyty zhyty force-pushed the parallel-module-load-fix-oss branch from cb250f6 to fff56d6 Compare November 12, 2025 20:48
@zhyty
Copy link
Contributor Author

zhyty commented Nov 12, 2025

Rebased, applied feedback from @JDevlieghere

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this LGTM modulo nits.

Tom Yang added 4 commits November 12, 2025 16:07
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@zhyty
Copy link
Contributor Author

zhyty commented Nov 14, 2025

@JDevlieghere

Just a heads up, made a change since you've approved the changes. The change moves the preloading logic completely into Target::ModulesDidLoad. @clayborg suggested this change, and we felt that it might be more intuitive than having each DYLD implementer explicitly call preload. It also preserves the existing preload behavior better for previous callers of GetOrCreateModule.

The one downside is that calling Target::GetOrCreateModule with notify=true is not thread-safe with this change. However, the API already suggests that notify=false should be used when multiple modules are loaded in bulk, and no current callers of Target::GetOrCreateModule attempt to call it in parallel except for the DYLDs, which call it with notify=false.

Copy link
Collaborator

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @JDevlieghere let us know if you like the Target::ModulesDidLoad doing the preloading and obeying the target settings.

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping. Doing this in ModulesDidLoad makes sense. LGTM.

@JDevlieghere JDevlieghere changed the title fix parallel module loading deadlock for Linux DYLD [lldb] fix parallel module loading deadlock for Linux DYLD Nov 14, 2025
@zhyty zhyty merged commit 66d5f6a into llvm:main Nov 14, 2025
8 of 9 checks passed
@zhyty zhyty deleted the parallel-module-load-fix-oss branch November 14, 2025 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants