[lldb] fix parallel module loading deadlock for Linux DYLD #166480

zhyty · 2025-11-05T00:53:16Z

Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered (his previous attempt).

This change can be summarized as the following:

Plumb through a boolean flag to force no preload in GetOrCreateModules all the way through to LoadModuleAtAddress.
Parallelize Module::PreloadSymbols separately from Target::GetOrCreateModule and its caller LoadModuleAtAddress (this is what avoids the deadlock).

These changes roughly maintain the performance characteristics of the previous implementation of parallel module loading. Testing on targets with between 5000 and 14000 modules, I saw similar numbers as before, often more than 10% faster in the new implementation across multiple trials for these massive targets. I think it's because we have less lock contention with this approach.

The deadlock

See bt.txt for a sample backtrace of LLDB when the deadlock occurs.

As @GeorgeHuyubo explains in his PR, the deadlock occurs from an ABBA deadlock that happens when a thread context-switches out of Module::PreloadSymbols, goes into Target::GetOrCreateModule for another module, possibly entering this block:

      if (!module_sp) {
        // The platform is responsible for finding and caching an appropriate
        // module in the shared module cache.
        if (m_platform_sp) {
          error = m_platform_sp->GetSharedModule(
              module_spec, m_process_sp.get(), module_sp, &search_paths,
              &old_modules, &did_create_module);
        } else {
          error = Status::FromErrorString("no platform is currently set");
        }
      }

Module::PreloadSymbols holds a module-level mutex, and then GetSharedModule attempts to hold the mutex of the global shared ModuleList. So, this thread holds the module mutex, and waits on the global shared ModuleList mutex.

A competing thread may execute Target::GetOrCreateModule, enter the same block as above, grabbing the global shared ModuleList mutex. Then, in ModuleList::GetSharedModule, we eventually call ModuleList::FindModules which eventually waits for the Module mutex held by the first thread (via Module::GetUUID). Thus, we deadlock.

Reproducing the deadlock

It might be worth noting that I've never been able to observe this deadlock issue during live debugging (e.g. launching or attaching to processes), however we were able to consistently reproduce this issue with coredumps when using the following settings:

(lldb) settings set target.parallel-module-load true
(lldb) settings set target.preload-symbols true
(lldb) settings set symbols.load-on-demand false
(lldb) target create --core /some/core/file/here
# deadlock happens

How this change avoids this deadlock

This change avoids concurrent executions of Module::PreloadSymbols with Target::GetOrCreateModule by waiting until after the Target::GetOrCreateModule executions to run Module::PreloadSymbols in parallel. This avoids the ordering of holding a Module lock then the ModuleList lock, as Target::GetOrCreateModule executions maintain the ordering of the shared ModuleList lock first (from what I've read and tested).

Why not read-write lock?

Some feedback in #160225 was to modify mutexes used in these components with read-write locks. This might be a good idea overall, but I don't think it would easily resolve this specific deadlock. Module::PreloadSymbols would probably need a write lock to Module, so even if we had a read lock in Module::GetUUID we would still contend. Maybe the ModuleList lock could be a read lock that converts to a write lock if it chooses to update the module, but it seems likely that some thread would try to update the shared module list and then the write lock would contend again.

Perhaps with deeper architectural changes, we could fix this issue?

Other attempts

One downside of this approach (and the former approach of parallel module loading) is that each DYLD would need to implement this pattern themselves. With @clayborg's help, I looked at a few other approaches:

In Target::GetOrCreateModule, backgrounding the Module::PreloadSymbols call by adding it directly to the thread pool via Debugger::GetThreadPool().async(). This required adding a lock to Module::SetLoadAddress (probably should be one there already) since ObjectFileELF::SetLoadAddress is not thread-safe (updates sections). Unfortunately, during execution, this causes the preload symbols to run synchronously with Target::GetOrCreateModule, preventing us from truly parallelizing the execution.
In Module::PreloadSymbols, backgrounding the symtab and sym_file PreloadSymbols calls individually, but similar issues as the above.
Passing a callback function like [LLDB] Deterministic module order in Target::SetExecutableModule swiftlang/llvm-project#10746 instead of the boolean I use in this change. It's functionally the same change IMO, with some design tradeoffs:
- Pro: the caller doesn't need to explicitly call Module::PreloadSymbols itself, and can instead call whatever function is passed into the callback.
- Con: the caller needs to delay the execution of the callback such that it occurs after the GetOrCreateModule logic, otherwise we run into the same issue. I thought this would be trickier for the caller, requiring some kinda condition variable or otherwise storing the calls to execute afterwards.

Test Plan:

ninja check-lldb

github-actions · 2025-11-05T00:55:03Z

✅ With the latest revision this PR passed the C/C++ code formatter.

dmpots · 2025-11-05T17:01:40Z

I wonder if instead of a boolean we should pass down a function that schedules the module loading as suggested in swiftlang#10746.

That gives more flexibility in that we could immediately schedule the preload to happen in parallel (as done in the other PR), or for your use case we could accumulate the preload callbacks into a vector and then schedule all the preload jobs to run in parallel once all the modules have been loaded.

Any thoughts @DmT021?

zhyty · 2025-11-05T19:13:55Z

I wonder if instead of a boolean we should pass down a function that schedules the module loading as suggested in #10746.

That gives more flexibility in that we could immediately schedule the preload to happen in parallel (as done in the other PR), or for your use case we could accumulate the preload callbacks into a vector and then schedule all the preload jobs to run in parallel once all the modules have been loaded.

Any thoughts @DmT021?

(Proper link would be swiftlang#10746. #10746 in this repo refers to something unrelated)

llvmbot · 2025-11-05T19:21:34Z

@llvm/pr-subscribers-lldb

Author: Tom Yang (zhyty)

Changes

Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered (his previous attempt).

This change can be summarized as the following:

Plumb through a boolean flag to force no preload in GetOrCreateModules all the way through to LoadModuleAtAddress.
Parallelize Module::PreloadSymbols separately from Target::GetOrCreateModule and its caller LoadModuleAtAddress (this is what avoids the deadlock).

These changes roughly maintain the performance characteristics of the previous implementation of parallel module loading. Testing on targets with between 5000 and 14000 modules, I saw similar numbers as before, often more than 10% faster in the new implementation across multiple trials for these massive targets. I think it's because we have less lock contention with this approach.

The deadlock

As @GeorgeHuyubo explains in his PR, the deadlock occurs from an ABBA deadlock that happens when a thread context-switches out of Module::PreloadSymbols, goes into Target::GetOrCreateModule for another module, possibly entering this block:

      if (!module_sp) {
        // The platform is responsible for finding and caching an appropriate
        // module in the shared module cache.
        if (m_platform_sp) {
          error = m_platform_sp-&gt;GetSharedModule(
              module_spec, m_process_sp.get(), module_sp, &amp;search_paths,
              &amp;old_modules, &amp;did_create_module);
        } else {
          error = Status::FromErrorString("no platform is currently set");
        }
      }

Module::PreloadSymbols holds a module-level mutex, and then GetSharedModule attempts to hold the mutex of the global shared ModuleList. So, this thread holds the module mutex, and waits on the global shared ModuleList mutex.

A competing thread may execute Target::GetOrCreateModule, enter the same block as above, grabbing the global shared ModuleList mutex. Then, in ModuleList::GetSharedModule, we eventually call ModuleList::FindModules which eventually waits for the Module mutex held by the first thread (via Module::GetUUID). Thus, we deadlock.

Reproducing the deadlock

It might be worth noting that I've never been able to observe this deadlock issue during live debugging (e.g. launching or attaching to processes), however we were able to consistently reproduce this issue with coredumps when using the following settings:

(lldb) settings set target.parallel-module-load true
(lldb) settings set target.preload-symbols true
(lldb) settings set symbols.load-on-demand false
(lldb) target create --core /some/core/file/here
# deadlock happens

How this change avoids this deadlock

This change avoids concurrent executions of Module::PreloadSymbols with Target::GetOrCreateModule by waiting until after the Target::GetOrCreateModule executions to run Module::PreloadSymbols in parallel. This avoids the ordering of holding a Module lock then the ModuleList lock, as Target::GetOrCreateModule executions maintain the ordering of the shared ModuleList lock first (from what I've read and tested).

Why not read-write lock?

Some feedback in #160225 was to modify mutexes used in these components with read-write locks. This might be a good idea overall, but I don't think it would easily resolve this specific deadlock. Module::PreloadSymbols would probably need a write lock to Module, so even if we had a read lock in Module::GetUUID we would still contend. Maybe the ModuleList lock could be a read lock that converts to a write lock if it chooses to update the module, but it seems likely that some thread would try to update the shared module list and then the write lock would contend again.

Perhaps with deeper architectural changes, we could fix this issue?

Other attempts

One downside of this approach (and the former approach of parallel module loading) is that each DYLD would need to implement this pattern themselves. With @clayborg's help, I looked at a few other approaches:

In Target::GetOrCreateModule, backgrounding the Module::PreloadSymbols call by adding it directly to the thread pool via Debugger::GetThreadPool().async(). This required adding a lock to Module::SetLoadAddress (probably should be one there already) since ObjectFileELF::SetLoadAddress is not thread-safe (updates sections). Unfortunately, during execution, this causes the preload symbols to run synchronously with Target::GetOrCreateModule, preventing us from truly parallelizing the execution.
In Module::PreloadSymbols, backgrounding the symtab and sym_file PreloadSymbols calls individually, but similar issues as the above.
Passing a callback function like [LLDB] Deterministic module order in Target::SetExecutableModule swiftlang/llvm-project#10746 instead of the boolean I use in this change. It's functionally the same change IMO, with some design tradeoffs:
- Pro: the caller doesn't need to explicitly call Module::PreloadSymbols itself, and can instead call whatever function is passed into the callback.
- Con: the caller needs to delay the execution of the callback such that it occurs after the GetOrCreateModule logic, otherwise we run into the same issue. I thought this would be trickier for the caller, requiring some kinda condition variable or otherwise storing the calls to execute afterwards.

Test Plan:

ninja check-lldb

Full diff: https://github.com/llvm/llvm-project/pull/166480.diff

10 Files Affected:

(modified) lldb/include/lldb/Core/ModuleList.h (+3)
(modified) lldb/include/lldb/Target/DynamicLoader.h (+15-2)
(modified) lldb/include/lldb/Target/Target.h (+9-1)
(modified) lldb/source/Core/DynamicLoader.cpp (+8-4)
(modified) lldb/source/Core/ModuleList.cpp (+13)
(modified) lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp (+25-15)
(modified) lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h (+5-4)
(modified) lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.cpp (+4-2)
(modified) lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h (+5-4)
(modified) lldb/source/Target/Target.cpp (+5-3)

diff --git a/lldb/include/lldb/Core/ModuleList.h b/lldb/include/lldb/Core/ModuleList.h
index e71f3b2bad6b4..8e87a2cb01df6 100644
--- a/lldb/include/lldb/Core/ModuleList.h
+++ b/lldb/include/lldb/Core/ModuleList.h
@@ -511,6 +511,9 @@ class ModuleList {
   /// Atomically swaps the contents of this module list with \a other.
   void Swap(ModuleList &other);
 
+  /// For each module in this ModuleList, preload its symbols.
+  void PreloadSymbols() const;
+
 protected:
   // Class typedefs.
   typedef std::vector<lldb::ModuleSP>
diff --git a/lldb/include/lldb/Target/DynamicLoader.h b/lldb/include/lldb/Target/DynamicLoader.h
index 75bb6cb6bb907..271d5f761696c 100644
--- a/lldb/include/lldb/Target/DynamicLoader.h
+++ b/lldb/include/lldb/Target/DynamicLoader.h
@@ -207,10 +207,17 @@ class DynamicLoader : public PluginInterface {
   /// resulting module at the virtual base address \p base_addr.
   /// Note that this calls Target::GetOrCreateModule with notify being false,
   /// so it is necessary to call Target::ModulesDidLoad afterwards.
+  ///
+  /// \param[in] defer_module_preload
+  ///     If the module needs to be loaded, prevent the module from being
+  ///     preloaded even if the user sets the preload symbols option. This is
+  ///     used as a performance optimization should the caller preload the
+  ///     modules in parallel after calling this function.
   virtual lldb::ModuleSP LoadModuleAtAddress(const lldb_private::FileSpec &file,
                                              lldb::addr_t link_map_addr,
                                              lldb::addr_t base_addr,
-                                             bool base_addr_is_offset);
+                                             bool base_addr_is_offset,
+                                             bool defer_module_preload = false);
 
   /// Find/load a binary into lldb given a UUID and the address where it is
   /// loaded in memory, or a slide to be applied to the file address.
@@ -352,7 +359,13 @@ class DynamicLoader : public PluginInterface {
 protected:
   // Utility methods for derived classes
 
-  lldb::ModuleSP FindModuleViaTarget(const FileSpec &file);
+  /// Find a module in the target that matches the given file.
+  ///
+  /// \param[in] defer_module_preload
+  ///     If the module needs to be loaded, prevent the module from being
+  ///     preloaded even if the user sets the preload symbols option.
+  lldb::ModuleSP FindModuleViaTarget(const FileSpec &file,
+                                     bool defer_module_preload = false);
 
   /// Checks to see if the target module has changed, updates the target
   /// accordingly and returns the target executable module.
diff --git a/lldb/include/lldb/Target/Target.h b/lldb/include/lldb/Target/Target.h
index c375df248154f..af755508cbe74 100644
--- a/lldb/include/lldb/Target/Target.h
+++ b/lldb/include/lldb/Target/Target.h
@@ -649,12 +649,20 @@ class Target : public std::enable_shared_from_this<Target>,
   ///     will handle / summarize the failures in a custom way and
   ///     don't use these messages.
   ///
+  /// \param[in] defer_preload
+  ///     If true, the module will not be preloaded even if
+  ///     Target::GetPreloadSymbols() is true. This is useful when the caller
+  ///     wishes to preload loaded modules in parallel after calling this
+  ///     function for better performance. This is because it's currently not
+  ///     thread-safe to do so during the execution of this function.
+  ///
   /// \return
   ///     An empty ModuleSP will be returned if no matching file
   ///     was found.  If error_ptr was non-nullptr, an error message
   ///     will likely be provided.
   lldb::ModuleSP GetOrCreateModule(const ModuleSpec &module_spec, bool notify,
-                                   Status *error_ptr = nullptr);
+                                   Status *error_ptr = nullptr,
+                                   bool defer_preload = false);
 
   // Settings accessors
 
diff --git a/lldb/source/Core/DynamicLoader.cpp b/lldb/source/Core/DynamicLoader.cpp
index 7580b15c02ce1..d1f2ae33af75e 100644
--- a/lldb/source/Core/DynamicLoader.cpp
+++ b/lldb/source/Core/DynamicLoader.cpp
@@ -154,7 +154,8 @@ DynamicLoader::GetSectionListFromModule(const ModuleSP module) const {
   return sections;
 }
 
-ModuleSP DynamicLoader::FindModuleViaTarget(const FileSpec &file) {
+ModuleSP DynamicLoader::FindModuleViaTarget(const FileSpec &file,
+                                            bool defer_module_preload) {
   Target &target = m_process->GetTarget();
   ModuleSpec module_spec(file, target.GetArchitecture());
   if (UUID uuid = m_process->FindModuleUUID(file.GetPath())) {
@@ -165,7 +166,9 @@ ModuleSP DynamicLoader::FindModuleViaTarget(const FileSpec &file) {
   if (ModuleSP module_sp = target.GetImages().FindFirstModule(module_spec))
     return module_sp;
 
-  if (ModuleSP module_sp = target.GetOrCreateModule(module_spec, false))
+  if (ModuleSP module_sp = target.GetOrCreateModule(
+          module_spec, false /* notify */, nullptr /* error_ptr */,
+          defer_module_preload))
     return module_sp;
 
   return nullptr;
@@ -174,8 +177,9 @@ ModuleSP DynamicLoader::FindModuleViaTarget(const FileSpec &file) {
 ModuleSP DynamicLoader::LoadModuleAtAddress(const FileSpec &file,
                                             addr_t link_map_addr,
                                             addr_t base_addr,
-                                            bool base_addr_is_offset) {
-  if (ModuleSP module_sp = FindModuleViaTarget(file)) {
+                                            bool base_addr_is_offset,
+                                            bool defer_module_preload) {
+  if (ModuleSP module_sp = FindModuleViaTarget(file, defer_module_preload)) {
     UpdateLoadedSections(module_sp, link_map_addr, base_addr,
                          base_addr_is_offset);
     return module_sp;
diff --git a/lldb/source/Core/ModuleList.cpp b/lldb/source/Core/ModuleList.cpp
index c40612c1ced5e..c8594b4aa803a 100644
--- a/lldb/source/Core/ModuleList.cpp
+++ b/lldb/source/Core/ModuleList.cpp
@@ -7,6 +7,7 @@
 //===----------------------------------------------------------------------===//
 
 #include "lldb/Core/ModuleList.h"
+#include "lldb/Core/Debugger.h"
 #include "lldb/Core/Module.h"
 #include "lldb/Core/ModuleSpec.h"
 #include "lldb/Core/PluginManager.h"
@@ -26,6 +27,7 @@
 #include "lldb/Utility/Log.h"
 #include "lldb/Utility/UUID.h"
 #include "lldb/lldb-defines.h"
+#include "llvm/Support/ThreadPool.h"
 
 #if defined(_WIN32)
 #include "lldb/Host/windows/PosixApi.h"
@@ -1357,3 +1359,14 @@ void ModuleList::Swap(ModuleList &other) {
       m_modules_mutex, other.m_modules_mutex);
   m_modules.swap(other.m_modules);
 }
+
+void ModuleList::PreloadSymbols() const {
+  std::lock_guard<std::recursive_mutex> guard(m_modules_mutex);
+  llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+  for (const ModuleSP &module_sp : m_modules)
+    task_group.async([module_sp] {
+      if (module_sp)
+        module_sp->PreloadSymbols();
+    });
+  // task group destructor waits for all tasks to complete
+}
diff --git a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
index 326b6910b5267..2ca3f1eb6e27a 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
@@ -453,7 +453,8 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
     // exists for the duration of this call in `m_rendezvous`.
     auto load_module_fn =
         [this, &loaded_modules, &new_modules,
-         &interpreter_module_mutex](const DYLDRendezvous::SOEntry &so_entry) {
+         &interpreter_module_mutex](const DYLDRendezvous::SOEntry &so_entry,
+                                    bool defer_module_preload) {
           // Don't load a duplicate copy of ld.so if we have already loaded it
           // earlier in LoadInterpreterModule. If we instead loaded then
           // unloaded it later, the section information for ld.so would be
@@ -469,7 +470,8 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
           }
 
           ModuleSP module_sp = LoadModuleAtAddress(
-              so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, true);
+              so_entry.file_spec, so_entry.link_addr, so_entry.base_addr,
+              true /* base_addr_is_offset */, defer_module_preload);
           if (!module_sp.get())
             return;
 
@@ -503,11 +505,14 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
     if (m_process->GetTarget().GetParallelModuleLoad()) {
       llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
       for (; I != E; ++I)
-        task_group.async(load_module_fn, *I);
+        task_group.async(load_module_fn, *I, true /* defer_module_preload */);
       task_group.wait();
+
+      if (m_process->GetTarget().GetPreloadSymbols())
+        new_modules.PreloadSymbols();
     } else {
       for (; I != E; ++I)
-        load_module_fn(*I);
+        load_module_fn(*I, false /* defer_module_preload */);
     }
 
     m_process->GetTarget().ModulesDidLoad(new_modules);
@@ -644,12 +649,12 @@ ModuleSP DynamicLoaderPOSIXDYLD::LoadInterpreterModule() {
   return nullptr;
 }
 
-ModuleSP DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(const FileSpec &file,
-                                                     addr_t link_map_addr,
-                                                     addr_t base_addr,
-                                                     bool base_addr_is_offset) {
+ModuleSP DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(
+    const FileSpec &file, addr_t link_map_addr, addr_t base_addr,
+    bool base_addr_is_offset, bool defer_module_preload) {
   if (ModuleSP module_sp = DynamicLoader::LoadModuleAtAddress(
-          file, link_map_addr, base_addr, base_addr_is_offset))
+          file, link_map_addr, base_addr, base_addr_is_offset,
+          defer_module_preload))
     return module_sp;
 
   // This works around an dynamic linker "bug" on android <= 23, where the
@@ -668,7 +673,7 @@ ModuleSP DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(const FileSpec &file,
         !(memory_info.GetName().IsEmpty())) {
       if (ModuleSP module_sp = DynamicLoader::LoadModuleAtAddress(
               FileSpec(memory_info.GetName().GetStringRef()), link_map_addr,
-              base_addr, base_addr_is_offset))
+              base_addr, base_addr_is_offset, defer_module_preload))
         return module_sp;
     }
   }
@@ -704,9 +709,11 @@ void DynamicLoaderPOSIXDYLD::LoadAllCurrentModules() {
       module_names, m_process->GetTarget().GetArchitecture().GetTriple());
 
   auto load_module_fn = [this, &module_list,
-                         &log](const DYLDRendezvous::SOEntry &so_entry) {
-    ModuleSP module_sp = LoadModuleAtAddress(
-        so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, true);
+                         &log](const DYLDRendezvous::SOEntry &so_entry,
+                               bool defer_module_preload) {
+    ModuleSP module_sp =
+        LoadModuleAtAddress(so_entry.file_spec, so_entry.link_addr,
+                            so_entry.base_addr, true, defer_module_preload);
     if (module_sp.get()) {
       LLDB_LOG(log, "LoadAllCurrentModules loading module: {0}",
                so_entry.file_spec.GetFilename());
@@ -723,11 +730,14 @@ void DynamicLoaderPOSIXDYLD::LoadAllCurrentModules() {
   if (m_process->GetTarget().GetParallelModuleLoad()) {
     llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
     for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I)
-      task_group.async(load_module_fn, *I);
+      task_group.async(load_module_fn, *I, true /* defer_module_preload */);
     task_group.wait();
+
+    if (m_process->GetTarget().GetPreloadSymbols())
+      module_list.PreloadSymbols();
   } else {
     for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I) {
-      load_module_fn(*I);
+      load_module_fn(*I, false /* defer_module_preload */);
     }
   }
 
diff --git a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
index 6efb92673a13c..2d9dba9bca9fa 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
@@ -55,10 +55,11 @@ class DynamicLoaderPOSIXDYLD : public lldb_private::DynamicLoader {
   // PluginInterface protocol
   llvm::StringRef GetPluginName() override { return GetPluginNameStatic(); }
 
-  lldb::ModuleSP LoadModuleAtAddress(const lldb_private::FileSpec &file,
-                                     lldb::addr_t link_map_addr,
-                                     lldb::addr_t base_addr,
-                                     bool base_addr_is_offset) override;
+  lldb::ModuleSP
+  LoadModuleAtAddress(const lldb_private::FileSpec &file,
+                      lldb::addr_t link_map_addr, lldb::addr_t base_addr,
+                      bool base_addr_is_offset,
+                      bool defer_module_preload = false) override;
 
   void CalculateDynamicSaveCoreRanges(
       lldb_private::Process &process,
diff --git a/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.cpp b/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.cpp
index d019415cb67a6..09cde464ccdc8 100644
--- a/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.cpp
+++ b/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.cpp
@@ -67,9 +67,11 @@ ThreadPlanSP DynamicLoaderWasmDYLD::GetStepThroughTrampolinePlan(Thread &thread,
 
 lldb::ModuleSP DynamicLoaderWasmDYLD::LoadModuleAtAddress(
     const lldb_private::FileSpec &file, lldb::addr_t link_map_addr,
-    lldb::addr_t base_addr, bool base_addr_is_offset) {
+    lldb::addr_t base_addr, bool base_addr_is_offset,
+    bool defer_module_preload) {
   if (ModuleSP module_sp = DynamicLoader::LoadModuleAtAddress(
-          file, link_map_addr, base_addr, base_addr_is_offset))
+          file, link_map_addr, base_addr, base_addr_is_offset,
+          defer_module_preload))
     return module_sp;
 
   if (ModuleSP module_sp = m_process->ReadModuleFromMemory(file, base_addr)) {
diff --git a/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h b/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h
index 5ed855395cca7..9c13bdff0279a 100644
--- a/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h
+++ b/lldb/source/Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h
@@ -33,10 +33,11 @@ class DynamicLoaderWasmDYLD : public DynamicLoader {
   Status CanLoadImage() override { return Status(); }
   lldb::ThreadPlanSP GetStepThroughTrampolinePlan(Thread &thread,
                                                   bool stop) override;
-  lldb::ModuleSP LoadModuleAtAddress(const lldb_private::FileSpec &file,
-                                     lldb::addr_t link_map_addr,
-                                     lldb::addr_t base_addr,
-                                     bool base_addr_is_offset) override;
+  lldb::ModuleSP
+  LoadModuleAtAddress(const lldb_private::FileSpec &file,
+                      lldb::addr_t link_map_addr, lldb::addr_t base_addr,
+                      bool base_addr_is_offset,
+                      bool defer_module_preload = false) override;
 
   /// \}
 
diff --git a/lldb/source/Target/Target.cpp b/lldb/source/Target/Target.cpp
index d070c3d953d4a..ea685992043e7 100644
--- a/lldb/source/Target/Target.cpp
+++ b/lldb/source/Target/Target.cpp
@@ -2342,7 +2342,8 @@ bool Target::ReadPointerFromMemory(const Address &addr, Status &error,
 }
 
 ModuleSP Target::GetOrCreateModule(const ModuleSpec &orig_module_spec,
-                                   bool notify, Status *error_ptr) {
+                                   bool notify, Status *error_ptr,
+                                   bool defer_preload) {
   ModuleSP module_sp;
 
   Status error;
@@ -2406,7 +2407,7 @@ ModuleSP Target::GetOrCreateModule(const ModuleSpec &orig_module_spec,
                 module_spec.GetFileSpec().GetDirectory(), transformed_dir)) {
           transformed_spec.GetFileSpec().SetDirectory(transformed_dir);
           transformed_spec.GetFileSpec().SetFilename(
-                module_spec.GetFileSpec().GetFilename());
+              module_spec.GetFileSpec().GetFilename());
           error = ModuleList::GetSharedModule(transformed_spec, module_sp,
                                               &search_paths, &old_modules,
                                               &did_create_module);
@@ -2510,8 +2511,9 @@ ModuleSP Target::GetOrCreateModule(const ModuleSpec &orig_module_spec,
 
         // Preload symbols outside of any lock, so hopefully we can do this for
         // each library in parallel.
-        if (GetPreloadSymbols())
+        if (GetPreloadSymbols() && !defer_preload)
           module_sp->PreloadSymbols();
+
         llvm::SmallVector<ModuleSP, 1> replaced_modules;
         for (ModuleSP &old_module_sp : old_modules) {
           if (m_images.GetIndexForModule(old_module_sp.get()) !=

zhyty · 2025-11-05T19:24:55Z

I wonder if instead of a boolean we should pass down a function that schedules the module loading as suggested in #10746.

That gives more flexibility in that we could immediately schedule the preload to happen in parallel (as done in the other PR), or for your use case we could accumulate the preload callbacks into a vector and then schedule all the preload jobs to run in parallel once all the modules have been loaded.

Any thoughts @DmT021?

I just finished writing a small blurb explaining why I went with this approach, so maybe that helps justify this approach over that one?

That gives more flexibility in that we could immediately schedule the preload to happen in parallel (as done in the other PR)

Immediately scheduling the preload to happen in parallel shouldn't be thread-safe due to the ABBA deadlock. I don't think that flexibility is good, then, if it leads callers into this deadlock.

JDevlieghere · 2025-11-05T23:14:11Z

Since you can easily reproduce the deadlock, can you include a backtrace to help me better understand the call stack. I'm particularly interested in understanding why PreloadSymbols needs to call GetOrCreateModule.

At first glance, the current solution feels a bit ad-hoc, especially because DynamicLoaderPOSIXDYLD is the only one taking this approach. If this makes sense for the Linux case, why not defer this work unconditionally, and do it after loading all the modules?

Lastly, adopting a read-write lock in the module list would benefit more than just resolving this deadlock, so I would really like us to explore that option first. The problem here is again that we are calling GetOrCreateModule and that's the part I don't understand (yet). Assuming there's a good reason we need to get (but not create) a module, then a read lock should do.

zhyty · 2025-11-06T18:43:32Z

Since you can easily reproduce the deadlock, can you include a backtrace to help me better understand the call stack. I'm particularly interested in understanding why PreloadSymbols needs to call GetOrCreateModule.

Just included @GeorgeHuyubo's backtrace in the description bt.txt. It's not that PreloadSymbols calls GetOrCreateModule. Looking at that attached backtrace as an example:

Note that DynamicLoaderDumpWithModuleList is some Meta-specific implementation of a DYLD for coredumps.
In thread 5, frames 59 to 51 show how we call LoadModuleAtAddress eventually all the way down to ManualDWARFIndex::Index, which also schedules tasks on the thread pool. When waiting for these tasks to finish, thread 5 context switches to another call of LoadModuleAtAddress at frame 12, which this time executes ModuleList::GetSharedModule. This is how thread 5 holds a module mutex (which would be a write lock), then attempts to hold the global shared module list mutex (which may be a read lock that converts to a write lock).
In thread 42, it's simpler -- it goes straight into ModuleList::GetSharedModule holding the global shared module list mutex (earlier than thread 5), then at frame 8 calls Module::GetUUID on the module mutex held by thread 5. Thus, ABBA.

So, PreloadSymbols doesn't call GetOrCreateModule, but a thread may context switch such that it ends up holding the mutexes in the [Module, ModuleList] order.

At first glance, the current solution feels a bit ad-hoc, especially because DynamicLoaderPOSIXDYLD is the only one taking this approach. If this makes sense for the Linux case, why not defer this work unconditionally, and do it after loading all the modules?

Yeah, I think that would be fine as well. I am probably missing a nice clean way of writing this solution out :/. Any ideas how we would defer this work unconditionally? Should I refactor LoadModuleAtAddress or GetOrCreateModule in some way?

I think we would actually need to change at least the Darwin DYLD as well since this could theoretically take place for it too (though I haven't reproduced on Mac yet).

Lastly, adopting a read-write lock in the module list would benefit more than just resolving this deadlock, so I would really like us to explore that option first.

That's true, and I think would benefit the overall design of the module list! I think the part that I don't agree on is that it would do anything to resolve this deadlock -- looking at the execution, would we not run into the same deadlock due to lock contention between readers and writers?

JDevlieghere · 2025-11-06T22:16:41Z

Thanks for the backtrace and the clarification. I now also finally understand why you mentioned "context switching", which confused me as that's not something the LLVM ThreadPool supports. What's going on is that one of the worker threads is submitting new tasks (belonging to a different task group) and waiting on them.

From https://llvm.org/doxygen/classllvm_1_1ThreadPoolInterface.html:

It is also possible for worker threads to submit new tasks and wait for them. Note that this may result in a deadlock in cases such as when a task (directly or indirectly) tries to wait for its own completion, or when all available threads are used up by tasks waiting for a task that has no thread left to run on (this includes waiting on the returned future). It should be generally safe to wait() for a group as long as groups do not form a cycle.

However that last statement doesn't account for the situation where we're in. IIUC, we have a task group X and a task group Y. We create task group Y from a task belonging to X, and when we call wait, the thread pool picks up the next task from X, not from Y. This is what you referred to as "context switching". It also explains why @GeorgeHuyubo's switch to a separate ThreadPool avoided the issue.

With that information, I would argue that the actual mutexes involved here are irrelevant. Nothing can make this operation inherently safe. Even without any mutexes, with a sufficiently small group of threads, this may never complete. The only solution here is to not create a new task pool from a task. Your patch is doing exactly that by deferring the preload until after we're processing the modules in parallel.

Alright, I've convinced myself that this is the right approach. Thanks for bearing with me. I'm now also convinced that we should do this unconditionally. The only reason we're not seeing this on Darwin is because we always use the accelerator tables and don't need to manually index. However, it inherently suffers from the same problem. Let's change this globally and we should be good everywhere.

zhyty · 2025-11-07T22:00:11Z

@JDevlieghere no worries, "context switching" was a bad (and incorrect) way of calling it. Your explanation helps me understand it better too!

The only reason we're not seeing this on Darwin is because we always use the accelerator tables and don't need to manually index.

Ah right, I always forget Darwin has that luxury!

However, it inherently suffers from the same problem. Let's change this globally and we should be good everywhere.

By change it "globally", I was initially going to change it on the caller side. I.e. do something similar as DynamicLoaderPosixDYLD to DynamicLoaderDarwin. Was this what you meant by "change it globally", or were you suggesting another kind of change?

JDevlieghere · 2025-11-07T22:15:33Z

By change it "globally", I was initially going to change it on the caller side. I.e. do something similar as DynamicLoaderPosixDYLD to DynamicLoaderDarwin. Was this what you meant by "change it globally", or were you suggesting another kind of change?

I meant removing the flag, and always doing the "deferred" preloading after we've loaded all the modules.

zhyty · 2025-11-11T19:13:06Z

By change it "globally", I was initially going to change it on the caller side. I.e. do something similar as DynamicLoaderPosixDYLD to DynamicLoaderDarwin. Was this what you meant by "change it globally", or were you suggesting another kind of change?

I meant removing the flag, and always doing the "deferred" preloading after we've loaded all the modules.

Right, sorry just to double check: there's a bunch of direct calls to Target::GetOrCreateModule or LoadModuleAtAddress scattered around. I'm interpreting your suggestion as "remove the Module::PreloadSymbols call from Target::GetOrCreateModule entirely, and explicitly call it from the DYLD callers or wherever seems appropriate"? I'll include a commit to confirm we're on the same page.

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D86004440

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Target::GetOrCreateModule Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

zhyty · 2025-11-12T20:49:35Z

Rebased, applied feedback from @JDevlieghere

JDevlieghere

Thank you, this LGTM modulo nits.

lldb/include/lldb/Target/DynamicLoader.h

lldb/source/Core/DynamicLoader.cpp

lldb/source/Core/ModuleList.cpp

lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

zhyty · 2025-11-14T23:05:35Z

@JDevlieghere

Just a heads up, made a change since you've approved the changes. The change moves the preloading logic completely into Target::ModulesDidLoad. @clayborg suggested this change, and we felt that it might be more intuitive than having each DYLD implementer explicitly call preload. It also preserves the existing preload behavior better for previous callers of GetOrCreateModule.

The one downside is that calling Target::GetOrCreateModule with notify=true is not thread-safe with this change. However, the API already suggests that notify=false should be used when multiple modules are loaded in bulk, and no current callers of Target::GetOrCreateModule attempt to call it in parallel except for the DYLDs, which call it with notify=false.

clayborg

LGTM. @JDevlieghere let us know if you like the Target::ModulesDidLoad doing the preloading and obeying the target settings.

JDevlieghere

Thanks for the ping. Doing this in ModulesDidLoad makes sense. LGTM.

zhyty requested review from clayborg, dmpots and jeffreytan81 November 5, 2025 00:53

zhyty marked this pull request as ready for review November 5, 2025 19:20

zhyty requested a review from JDevlieghere as a code owner November 5, 2025 19:20

llvmbot added the lldb label Nov 5, 2025

Tom Yang added 3 commits November 12, 2025 12:14

move preload out of GetOrCreateModules and parallelize

0f1a17b

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D86004440

formatting fix

fd2e0ce

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

remove defer_module_preload bool flag, remove preload from

fff56d6

Target::GetOrCreateModule Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

zhyty force-pushed the parallel-module-load-fix-oss branch from cb250f6 to fff56d6 Compare November 12, 2025 20:48

Merge branch 'main' into parallel-module-load-fix-oss

d73a936

JDevlieghere approved these changes Nov 12, 2025

View reviewed changes

Tom Yang added 4 commits November 12, 2025 16:07

address coding style issues

32ae8f3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

remove explicit preloads -- move preloading to Target::ModulesDidLoad

26f9f93

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

remove small erroneous formatting change in wasmdyld

8f486e8

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

tweak Target.h documentation

b340870

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge branch 'main' into parallel-module-load-fix-oss

b21074d

clayborg approved these changes Nov 14, 2025

View reviewed changes

JDevlieghere approved these changes Nov 14, 2025

View reviewed changes

JDevlieghere changed the title ~~fix parallel module loading deadlock for Linux DYLD~~ [lldb] fix parallel module loading deadlock for Linux DYLD Nov 14, 2025

zhyty merged commit 66d5f6a into llvm:main Nov 14, 2025
8 of 9 checks passed

zhyty deleted the parallel-module-load-fix-oss branch November 14, 2025 23:58

[lldb] fix parallel module loading deadlock for Linux DYLD #166480

[lldb] fix parallel module loading deadlock for Linux DYLD #166480

Conversation

zhyty commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The deadlock

Reproducing the deadlock

How this change avoids this deadlock

Why not read-write lock?

Other attempts

Test Plan:

Uh oh!

github-actions bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmpots commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhyty commented Nov 5, 2025

Uh oh!

llvmbot commented Nov 5, 2025

The deadlock

Reproducing the deadlock

How this change avoids this deadlock

Why not read-write lock?

Other attempts

Test Plan:

Uh oh!

zhyty commented Nov 5, 2025

Uh oh!

JDevlieghere commented Nov 5, 2025

Uh oh!

zhyty commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JDevlieghere commented Nov 6, 2025

Uh oh!

zhyty commented Nov 7, 2025

Uh oh!

JDevlieghere commented Nov 7, 2025

Uh oh!

zhyty commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhyty commented Nov 12, 2025

Uh oh!

JDevlieghere left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhyty commented Nov 14, 2025

Uh oh!

clayborg left a comment

Choose a reason for hiding this comment

Uh oh!

JDevlieghere left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zhyty commented Nov 5, 2025 •

edited

Loading

github-actions bot commented Nov 5, 2025 •

edited

Loading

dmpots commented Nov 5, 2025 •

edited

Loading

zhyty commented Nov 6, 2025 •

edited

Loading

zhyty commented Nov 11, 2025 •

edited

Loading