Skip to content

Conversation

@jimingham
Copy link
Collaborator

Turns out there's a bug in the current lldb sources that if you fork, set the stdio file handles to close on exec and then exec lldb with some commands and the --batch flag, lldb will stall on exit. The first cause of the bug is that the Python session handler - and probably other places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles, and open and close and dup them as needed. NB: I am NOT trying to fix that bug. I'm not convinced running the lldb driver headless is worth a lot of effort, it's just as easy to redirect them to /dev/null, which does work.

But I would like to keep lldb from stalling on the way out when this happens. The reason we stall is that we have a MainLoop waiting for signals, and we try to Interrupt it, but because stdio was closed, the interrupt pipe for the MainLoop gets the file descriptor 0, which gets closed by the Python session handler if you run some script command. So the Interrupt fails.

We were running the Write to the interrupt pipe wrapped in llvm::cantFail, but in a no asserts build that just drops the error on the floor. So then lldb went on to call std::thread::join on the still active MainLoop, and that stalls

I made Interrupt (and AddCallback & AddPendingCallback) return a bool for "interrupt success" instead. All the places where code was requesting termination, I added checks for that failure, and skip the std::thread::join call on the MainLoop thread, since that is almost certainly going to stall at this point.

I didn't do the same for the Windows MainLoop, as I don't know if/when the WSASetEvent call can fail, so I always return true here. I also didn't turn the test off for Windows. According to the Python docs all the API's I used should work on Windows... If that turns out not to be true I'll make the test Darwin/Unix only.

@jimingham jimingham removed the request for review from JDevlieghere October 23, 2025 21:58
@jimingham jimingham requested a review from ashgti October 23, 2025 21:58
@llvmbot
Copy link
Member

llvmbot commented Oct 23, 2025

@llvm/pr-subscribers-lldb

Author: None (jimingham)

Changes

Turns out there's a bug in the current lldb sources that if you fork, set the stdio file handles to close on exec and then exec lldb with some commands and the --batch flag, lldb will stall on exit. The first cause of the bug is that the Python session handler - and probably other places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles, and open and close and dup them as needed. NB: I am NOT trying to fix that bug. I'm not convinced running the lldb driver headless is worth a lot of effort, it's just as easy to redirect them to /dev/null, which does work.

But I would like to keep lldb from stalling on the way out when this happens. The reason we stall is that we have a MainLoop waiting for signals, and we try to Interrupt it, but because stdio was closed, the interrupt pipe for the MainLoop gets the file descriptor 0, which gets closed by the Python session handler if you run some script command. So the Interrupt fails.

We were running the Write to the interrupt pipe wrapped in llvm::cantFail, but in a no asserts build that just drops the error on the floor. So then lldb went on to call std::thread::join on the still active MainLoop, and that stalls

I made Interrupt (and AddCallback & AddPendingCallback) return a bool for "interrupt success" instead. All the places where code was requesting termination, I added checks for that failure, and skip the std::thread::join call on the MainLoop thread, since that is almost certainly going to stall at this point.

I didn't do the same for the Windows MainLoop, as I don't know if/when the WSASetEvent call can fail, so I always return true here. I also didn't turn the test off for Windows. According to the Python docs all the API's I used should work on Windows... If that turns out not to be true I'll make the test Darwin/Unix only.


Full diff: https://github.com/llvm/llvm-project/pull/164905.diff

14 Files Affected:

  • (modified) lldb/include/lldb/Host/MainLoopBase.h (+14-8)
  • (modified) lldb/include/lldb/Host/posix/MainLoopPosix.h (+1-1)
  • (modified) lldb/include/lldb/Host/windows/MainLoopWindows.h (+1-1)
  • (modified) lldb/source/Host/common/MainLoopBase.cpp (+4-2)
  • (modified) lldb/source/Host/posix/MainLoopPosix.cpp (+6-3)
  • (modified) lldb/source/Host/windows/MainLoopWindows.cpp (+4-1)
  • (modified) lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp (+4-3)
  • (added) lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py (+46)
  • (modified) lldb/tools/driver/Driver.cpp (+4-3)
  • (modified) lldb/tools/lldb-dap/DAP.cpp (+5-3)
  • (modified) lldb/unittests/DAP/TestBase.cpp (+2-1)
  • (modified) lldb/unittests/Host/JSONTransportTest.cpp (+13-6)
  • (modified) lldb/unittests/Host/MainLoopTest.cpp (+22-9)
  • (modified) lldb/unittests/Protocol/ProtocolMCPServerTest.cpp (+2-1)
diff --git a/lldb/include/lldb/Host/MainLoopBase.h b/lldb/include/lldb/Host/MainLoopBase.h
index be9a2676e7443..9529f2c214784 100644
--- a/lldb/include/lldb/Host/MainLoopBase.h
+++ b/lldb/include/lldb/Host/MainLoopBase.h
@@ -57,18 +57,23 @@ class MainLoopBase {
   // Add a pending callback that will be executed once after all the pending
   // events are processed. The callback will be executed even if termination
   // was requested.
-  void AddPendingCallback(const Callback &callback) {
-    AddCallback(callback, std::chrono::steady_clock::time_point());
+  // Returns false if an interrupt was needed to get the loop to act on the new
+  // callback, but the interrupt failed, true otherwise.  Mostly used when the
+  // pending callback is a RequestTermination, since if the interrupt fails for
+  // that callback, waiting for the MainLoop thread to terminate could stall.
+  bool AddPendingCallback(const Callback &callback) {
+    return AddCallback(callback, std::chrono::steady_clock::time_point());
   }
 
   // Add a callback that will be executed after a certain amount of time has
-  // passed.
-  void AddCallback(const Callback &callback, std::chrono::nanoseconds delay) {
-    AddCallback(callback, std::chrono::steady_clock::now() + delay);
+  // passed.  See AddPendingCallback comment for the return value.
+  bool AddCallback(const Callback &callback, std::chrono::nanoseconds delay) {
+    return AddCallback(callback, std::chrono::steady_clock::now() + delay);
   }
 
   // Add a callback that will be executed after a given point in time.
-  void AddCallback(const Callback &callback, TimePoint point);
+  // See AddPendingCallback comment for the return value.
+  bool AddCallback(const Callback &callback, TimePoint point);
 
   // Waits for registered events and invoke the proper callbacks. Returns when
   // all callbacks deregister themselves or when someone requests termination.
@@ -85,8 +90,9 @@ class MainLoopBase {
 
   virtual void UnregisterReadObject(IOObject::WaitableHandle handle) = 0;
 
-  // Interrupt the loop that is currently waiting for events.
-  virtual void Interrupt() = 0;
+  /// Interrupt the loop that is currently waiting for events.  Return true if
+  /// the interrupt succeeded, false if it failed.
+  virtual bool Interrupt() = 0;
 
   void ProcessCallbacks();
 
diff --git a/lldb/include/lldb/Host/posix/MainLoopPosix.h b/lldb/include/lldb/Host/posix/MainLoopPosix.h
index e9ac798b948df..92cdbe9d87ec3 100644
--- a/lldb/include/lldb/Host/posix/MainLoopPosix.h
+++ b/lldb/include/lldb/Host/posix/MainLoopPosix.h
@@ -54,7 +54,7 @@ class MainLoopPosix : public MainLoopBase {
   void UnregisterReadObject(IOObject::WaitableHandle handle) override;
   void UnregisterSignal(int signo, std::list<Callback>::iterator callback_it);
 
-  void Interrupt() override;
+  bool Interrupt() override;
 
 private:
   void ProcessReadObject(IOObject::WaitableHandle handle);
diff --git a/lldb/include/lldb/Host/windows/MainLoopWindows.h b/lldb/include/lldb/Host/windows/MainLoopWindows.h
index 705e7e78ba48a..65b44aa1582c3 100644
--- a/lldb/include/lldb/Host/windows/MainLoopWindows.h
+++ b/lldb/include/lldb/Host/windows/MainLoopWindows.h
@@ -50,7 +50,7 @@ class MainLoopWindows : public MainLoopBase {
 protected:
   void UnregisterReadObject(IOObject::WaitableHandle handle) override;
 
-  void Interrupt() override;
+  bool Interrupt() override;
 
 private:
   llvm::Expected<size_t> Poll();
diff --git a/lldb/source/Host/common/MainLoopBase.cpp b/lldb/source/Host/common/MainLoopBase.cpp
index 64a57e65849e9..232b9bc0aa354 100644
--- a/lldb/source/Host/common/MainLoopBase.cpp
+++ b/lldb/source/Host/common/MainLoopBase.cpp
@@ -12,8 +12,9 @@
 using namespace lldb;
 using namespace lldb_private;
 
-void MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
+bool MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
   bool interrupt_needed;
+  bool interrupt_succeeded = true;
   {
     std::lock_guard<std::mutex> lock{m_callback_mutex};
     // We need to interrupt the main thread if this callback is scheduled to
@@ -22,7 +23,8 @@ void MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
     m_callbacks.emplace(point, callback);
   }
   if (interrupt_needed)
-    Interrupt();
+    interrupt_succeeded = Interrupt();
+  return interrupt_succeeded;
 }
 
 void MainLoopBase::ProcessCallbacks() {
diff --git a/lldb/source/Host/posix/MainLoopPosix.cpp b/lldb/source/Host/posix/MainLoopPosix.cpp
index 19a7128fbe407..e5fa32e4da2b7 100644
--- a/lldb/source/Host/posix/MainLoopPosix.cpp
+++ b/lldb/source/Host/posix/MainLoopPosix.cpp
@@ -387,10 +387,13 @@ void MainLoopPosix::ProcessSignal(int signo) {
   }
 }
 
-void MainLoopPosix::Interrupt() {
+bool MainLoopPosix::Interrupt() {
   if (m_interrupting.exchange(true))
-    return;
+    return true;
 
   char c = '.';
-  cantFail(m_interrupt_pipe.Write(&c, 1));
+  llvm::Expected<size_t> result = m_interrupt_pipe.Write(&c, 1);
+  if (result && *result != 0)
+    return true;
+  return false;
 }
diff --git a/lldb/source/Host/windows/MainLoopWindows.cpp b/lldb/source/Host/windows/MainLoopWindows.cpp
index 9b7df10258bcd..c973e1c61162f 100644
--- a/lldb/source/Host/windows/MainLoopWindows.cpp
+++ b/lldb/source/Host/windows/MainLoopWindows.cpp
@@ -272,4 +272,7 @@ Status MainLoopWindows::Run() {
   return Status();
 }
 
-void MainLoopWindows::Interrupt() { WSASetEvent(m_interrupt_event); }
+bool MainLoopWindows::Interrupt() {
+  WSASetEvent(m_interrupt_event);
+  return true;
+}
diff --git a/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp b/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp
index 390cf3eeb16a5..77a3ba6574cde 100644
--- a/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp
+++ b/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp
@@ -133,11 +133,12 @@ llvm::Error ProtocolServerMCP::Stop() {
   }
 
   // Stop the main loop.
-  m_loop.AddPendingCallback(
+  bool addition_succeeded = m_loop.AddPendingCallback(
       [](lldb_private::MainLoopBase &loop) { loop.RequestTermination(); });
 
-  // Wait for the main loop to exit.
-  if (m_loop_thread.joinable())
+  // Wait for the main loop to exit, but not if we didn't succeed in inserting
+  // our pending callback or we'll wait forever.
+  if (addition_succeeded && m_loop_thread.joinable())
     m_loop_thread.join();
 
   m_accept_handles.clear();
diff --git a/lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py b/lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py
new file mode 100644
index 0000000000000..304e548c8c8a1
--- /dev/null
+++ b/lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py
@@ -0,0 +1,46 @@
+"""
+Test that if you exec lldb with the stdio file handles
+closed, it is able to exit without hanging.
+"""
+
+
+import lldb
+import os
+import sys
+import socket
+import fcntl
+
+import lldbsuite.test.lldbutil as lldbutil
+from lldbsuite.test.lldbtest import *
+
+
+class RenameThisSampleTestTestCase(TestBase):
+    # If your test case doesn't stress debug info, then
+    # set this to true.  That way it won't be run once for
+    # each debug info format.
+    NO_DEBUG_INFO_TESTCASE = True
+
+    def test_run_lldb_and_wait(self):
+        """This test forks, closes the stdio channels and exec's lldb.
+           Then it waits for it to exit and asserts it did that successfully"""
+        pid = os.fork()
+        if pid == 0:
+            fcntl.fcntl(sys.stdin, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
+            fcntl.fcntl(sys.stdout, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
+            fcntl.fcntl(sys.stderr, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
+            lldb = lldbtest_config.lldbExec
+            print(f"About to run: {lldb}")
+            os.execlp(lldb, lldb, "-x", "-b", "-o", "script print(lldb.debugger.GetNumTargets())", "--batch")
+        else:
+            if pid == -1:
+                print("Couldn't fork a process.")
+                return
+            ret_pid, status = os.waitpid(pid, 0)
+            # We're really just checking that lldb doesn't stall.
+            # At the time this test was written, if you close stdin
+            # in an asserts build, lldb aborts.  So handle both
+            # of those cases.  The failure will just be that the
+            # waitpid doesn't return, and the test times out.
+            self.assertFalse(os.WIFSTOPPED(status), "We either exited or crashed.")
+
+
diff --git a/lldb/tools/driver/Driver.cpp b/lldb/tools/driver/Driver.cpp
index ba0041111045b..f9cf4c41bf682 100644
--- a/lldb/tools/driver/Driver.cpp
+++ b/lldb/tools/driver/Driver.cpp
@@ -902,9 +902,10 @@ int main(int argc, char const *argv[]) {
   }
 
 #if !defined(_WIN32)
-  signal_loop.AddPendingCallback(
-      [](MainLoopBase &loop) { loop.RequestTermination(); });
-  signal_thread.join();
+  // Try to interrupt the signal thread.  If that succeeds, wait for it to exit.
+  if (signal_loop.AddPendingCallback(
+      [](MainLoopBase &loop) { loop.RequestTermination(); }))
+    signal_thread.join();
 #endif
 
   return exit_code;
diff --git a/lldb/tools/lldb-dap/DAP.cpp b/lldb/tools/lldb-dap/DAP.cpp
index 3c4f2253d1ad5..c54d1a2b43092 100644
--- a/lldb/tools/lldb-dap/DAP.cpp
+++ b/lldb/tools/lldb-dap/DAP.cpp
@@ -1104,9 +1104,11 @@ llvm::Error DAP::Loop() {
                                      "unhandled packet");
   }
 
-  m_loop.AddPendingCallback(
-      [](MainLoopBase &loop) { loop.RequestTermination(); });
-  thread.join();
+  // Don't wait to join the mainloop thread if our callback wasn't added
+  // successfully, or we'll wait forever.
+  if (m_loop.AddPendingCallback(
+      [](MainLoopBase &loop) { loop.RequestTermination(); }))
+    thread.join();
 
   if (m_error_occurred)
     return llvm::createStringError(llvm::inconvertibleErrorCode(),
diff --git a/lldb/unittests/DAP/TestBase.cpp b/lldb/unittests/DAP/TestBase.cpp
index 83a303554ad6b..8cb459964f7d8 100644
--- a/lldb/unittests/DAP/TestBase.cpp
+++ b/lldb/unittests/DAP/TestBase.cpp
@@ -55,8 +55,9 @@ void TransportBase::SetUp() {
 }
 
 void TransportBase::Run() {
-  loop.AddPendingCallback(
+  bool addition_succeeded = loop.AddPendingCallback(
       [](lldb_private::MainLoopBase &loop) { loop.RequestTermination(); });
+  EXPECT_TRUE(addition_succeeded);
   EXPECT_THAT_ERROR(loop.Run().takeError(), llvm::Succeeded());
 }
 
diff --git a/lldb/unittests/Host/JSONTransportTest.cpp b/lldb/unittests/Host/JSONTransportTest.cpp
index 54f1372ca0fff..26d8913b20898 100644
--- a/lldb/unittests/Host/JSONTransportTest.cpp
+++ b/lldb/unittests/Host/JSONTransportTest.cpp
@@ -269,12 +269,13 @@ template <typename T> class JSONTransportTest : public PipePairTest {
         loop.RequestTermination();
       });
     }
-    loop.AddCallback(
+    bool addition_succeeded = loop.AddCallback(
         [](MainLoopBase &loop) {
           loop.RequestTermination();
           FAIL() << "timeout";
         },
         timeout);
+    EXPECT_TRUE(addition_succeeded);
     auto handle = transport->RegisterMessageHandler(loop, message_handler);
     if (!handle)
       return handle.takeError();
@@ -367,7 +368,9 @@ class TransportBinderTest : public testing::Test {
   }
 
   void Run() {
-    loop.AddPendingCallback([](auto &loop) { loop.RequestTermination(); });
+    bool addition_succeeded = loop.AddPendingCallback([](auto &loop) 
+        { loop.RequestTermination(); });
+    EXPECT_TRUE(addition_succeeded);
     EXPECT_THAT_ERROR(loop.Run().takeError(), Succeeded());
   }
 };
@@ -435,8 +438,9 @@ TEST_F(HTTPDelimitedJSONTransportTest, ReadPartialMessage) {
   EXPECT_CALL(message_handler, Received(Request{5, "foo", std::nullopt}));
 
   ASSERT_THAT_EXPECTED(input.Write(part1.data(), part1.size()), Succeeded());
-  loop.AddPendingCallback(
+  bool addition_succeeded = loop.AddPendingCallback(
       [](MainLoopBase &loop) { loop.RequestTermination(); });
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(Run(/*close_stdin=*/false), Succeeded());
   ASSERT_THAT_EXPECTED(input.Write(part2.data(), part2.size()), Succeeded());
   input.CloseWriteFileDescriptor();
@@ -454,15 +458,17 @@ TEST_F(HTTPDelimitedJSONTransportTest, ReadWithZeroByteWrites) {
   ASSERT_THAT_EXPECTED(input.Write(part1.data(), part1.size()), Succeeded());
 
   // Run the main loop once for the initial read.
-  loop.AddPendingCallback(
+  bool addition_succeeded = loop.AddPendingCallback(
       [](MainLoopBase &loop) { loop.RequestTermination(); });
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(Run(/*close_stdin=*/false), Succeeded());
 
   // zero-byte write.
   ASSERT_THAT_EXPECTED(input.Write(part1.data(), 0),
                        Succeeded()); // zero-byte write.
-  loop.AddPendingCallback(
+  addition_succeeded = loop.AddPendingCallback(
       [](MainLoopBase &loop) { loop.RequestTermination(); });
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(Run(/*close_stdin=*/false), Succeeded());
 
   // Write the remaining part of the message.
@@ -569,8 +575,9 @@ TEST_F(JSONRPCTransportTest, ReadPartialMessage) {
   EXPECT_CALL(message_handler, Received(Request{42, "foo", std::nullopt}));
 
   ASSERT_THAT_EXPECTED(input.Write(part1.data(), part1.size()), Succeeded());
-  loop.AddPendingCallback(
+  bool addition_succeeded = loop.AddPendingCallback(
       [](MainLoopBase &loop) { loop.RequestTermination(); });
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(Run(/*close_input=*/false), Succeeded());
 
   ASSERT_THAT_EXPECTED(input.Write(part2.data(), part2.size()), Succeeded());
diff --git a/lldb/unittests/Host/MainLoopTest.cpp b/lldb/unittests/Host/MainLoopTest.cpp
index ae16d02101819..b546b0f71286a 100644
--- a/lldb/unittests/Host/MainLoopTest.cpp
+++ b/lldb/unittests/Host/MainLoopTest.cpp
@@ -179,9 +179,13 @@ TEST_F(MainLoopTest, PipeDelayBetweenRegisterAndRun) {
     ASSERT_THAT_EXPECTED(pipe.Write(&X, len), llvm::HasValue(1));
   };
   // Add a write that triggers a read events.
-  loop.AddCallback(cb, std::chrono::milliseconds(500));
-  loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
+  bool addition_succeeded = loop.AddCallback(cb, 
+      std::chrono::milliseconds(500));
+  ASSERT_TRUE(addition_succeeded);
+  addition_succeeded = loop.AddCallback([](MainLoopBase &loop) 
+      { loop.RequestTermination(); },
                    std::chrono::milliseconds(1000));
+  ASSERT_TRUE(addition_succeeded);
   ASSERT_TRUE(error.Success());
   ASSERT_TRUE(handle);
 
@@ -310,8 +314,10 @@ TEST_F(MainLoopTest, NoSpuriousSocketReads) {
       error);
   ASSERT_THAT_ERROR(error.ToError(), llvm::Succeeded());
   // Terminate the loop after one second.
-  loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
+  bool addition_succeeded = loop.AddCallback([](MainLoopBase &loop) 
+      { loop.RequestTermination(); },
                    std::chrono::seconds(1));
+  ASSERT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(loop.Run().ToError(), llvm::Succeeded());
 
   // Make sure the callback was called only once.
@@ -388,10 +394,11 @@ TEST_F(MainLoopTest, PendingCallbackTrigger) {
   MainLoop loop;
   std::promise<void> add_callback2;
   bool callback1_called = false;
-  loop.AddPendingCallback([&](MainLoopBase &loop) {
+  bool addition_succeeded = loop.AddPendingCallback([&](MainLoopBase &loop) {
     callback1_called = true;
     add_callback2.set_value();
   });
+  EXPECT_TRUE(addition_succeeded);
   Status error;
   ASSERT_THAT_ERROR(error.ToError(), llvm::Succeeded());
   bool callback2_called = false;
@@ -416,9 +423,11 @@ TEST_F(MainLoopTest, ManyPendingCallbacks) {
   // caused a deadlock when the pipe filled up (either because the main loop was
   // not running, because it was slow, or because it was busy/blocked doing
   // something else).
-  for (int i = 0; i < 65536; ++i)
-    loop.AddPendingCallback(
+  for (int i = 0; i < 65536; ++i) {
+    bool addition_succeeded = loop.AddPendingCallback(
         [&](MainLoopBase &loop) { loop.RequestTermination(); });
+    EXPECT_TRUE(addition_succeeded);
+  }
   ASSERT_TRUE(loop.Run().Success());
 }
 
@@ -444,8 +453,10 @@ TEST_F(MainLoopTest, TimedCallbacksRunInOrder) {
   add_cb(2);
   add_cb(4);
   add_cb(1);
-  loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
+  bool addition_succeeded = loop.AddCallback([](MainLoopBase &loop) 
+      { loop.RequestTermination(); },
                    start + 5 * epsilon);
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(loop.Run().takeError(), llvm::Succeeded());
   EXPECT_GE(std::chrono::steady_clock::now() - start, 5 * epsilon);
   ASSERT_THAT(order, testing::ElementsAre(1, 2, 3, 4));
@@ -455,22 +466,24 @@ TEST_F(MainLoopTest, TimedCallbackShortensSleep) {
   MainLoop loop;
   auto start = std::chrono::steady_clock::now();
   bool long_callback_called = false;
-  loop.AddCallback(
+  bool addition_succeeded = loop.AddCallback(
       [&](MainLoopBase &loop) {
         long_callback_called = true;
         loop.RequestTermination();
       },
       std::chrono::seconds(30));
+  EXPECT_TRUE(addition_succeeded);
   std::future<Status> async_run =
       std::async(std::launch::async, &MainLoop::Run, std::ref(loop));
   std::this_thread::sleep_for(std::chrono::milliseconds(100));
   bool short_callback_called = false;
-  loop.AddCallback(
+  addition_succeeded = loop.AddCallback(
       [&](MainLoopBase &loop) {
         short_callback_called = true;
         loop.RequestTermination();
       },
       std::chrono::seconds(1));
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(async_run.get().takeError(), llvm::Succeeded());
   EXPECT_LT(std::chrono::steady_clock::now() - start, std::chrono::seconds(10));
   EXPECT_TRUE(short_callback_called);
diff --git a/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp b/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp
index 45464db958e04..97f32e2fbb1bf 100644
--- a/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp
+++ b/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp
@@ -150,8 +150,9 @@ class ProtocolServerMCPTest : public testing::Test {
 
   /// Runs the MainLoop a single time, executing any pending callbacks.
   void Run() {
-    loop.AddPendingCallback(
+    bool addition_succeeded = loop.AddPendingCallback(
         [](MainLoopBase &loop) { loop.RequestTermination(); });
+    EXPECT_TRUE(addition_succeeded);
     EXPECT_THAT_ERROR(loop.Run().takeError(), Succeeded());
   }
 

@jimingham jimingham requested a review from bulbazord October 23, 2025 21:59
@github-actions
Copy link

github-actions bot commented Oct 23, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions h,cpp -- lldb/include/lldb/Host/MainLoopBase.h lldb/include/lldb/Host/posix/MainLoopPosix.h lldb/include/lldb/Host/windows/MainLoopWindows.h lldb/source/Host/common/MainLoopBase.cpp lldb/source/Host/posix/MainLoopPosix.cpp lldb/source/Host/windows/MainLoopWindows.cpp lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp lldb/tools/driver/Driver.cpp lldb/tools/lldb-dap/DAP.cpp lldb/unittests/DAP/TestBase.cpp lldb/unittests/Host/JSONTransportTest.cpp lldb/unittests/Host/MainLoopTest.cpp lldb/unittests/Protocol/ProtocolMCPServerTest.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/lldb/source/Host/windows/MainLoopWindows.cpp b/lldb/source/Host/windows/MainLoopWindows.cpp
index 5e5888aee..96e871c8a 100644
--- a/lldb/source/Host/windows/MainLoopWindows.cpp
+++ b/lldb/source/Host/windows/MainLoopWindows.cpp
@@ -272,6 +272,4 @@ Status MainLoopWindows::Run() {
   return Status();
 }
 
-bool MainLoopWindows::Interrupt() {
-  return WSASetEvent(m_interrupt_event);
-}
+bool MainLoopWindows::Interrupt() { return WSASetEvent(m_interrupt_event); }

@github-actions
Copy link

github-actions bot commented Oct 23, 2025

✅ With the latest revision this PR passed the Python code formatter.

from lldbsuite.test.lldbtest import *


class RenameThisSampleTestTestCase(TestBase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the test name be updated?

lldb,
lldb,
"-x",
"-b",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that -b and --batch are both present? Should we remove one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter to lldb, each time we see it we set a boolean to true... But it's more regular to mention it once. I'll fix that.


char c = '.';
cantFail(m_interrupt_pipe.Write(&c, 1));
llvm::Expected<size_t> result = m_interrupt_pipe.Write(&c, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a errno EAGAIN or EINTR during this write? Looking at the ::write

ssize_t result = ::write(fd, buf, size);
we're not checking for those and trying again, should we? I think the m_interrupt_pipe is set to non-blocking.

Maybe we can wrap the write call in

inline decltype(auto) RetryAfterSignal(const FailT &Fail, const Fun &F,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error in the particular case I was addressing was that the interrupt pipe that we were trying to write to got closed out from under us. The error was "bad file handle". Since Write wasn't actually at fault here, I wasn't setting out to make the Write more robust. In this patch, I'm just trying to make sure the error gets reported so we don't stall waiting to join a thread that's never going to exit.

Making Write more robust in against interrupts would be a fine thing to do but is outside the scope of this patch.


void MainLoopWindows::Interrupt() { WSASetEvent(m_interrupt_event); }
bool MainLoopWindows::Interrupt() {
WSASetEvent(m_interrupt_event);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WSASetEvent returns a bool that indicates if it succeeded or not, so we could return that here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made that change.

Copy link
Member

@bulbazord bulbazord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Premise makes sense to me. With the limited scope of this change, I don't have any objections. I left one minor suggestion otherwise.

llvm::Expected<size_t> result = m_interrupt_pipe.Write(&c, 1);
if (result && *result != 0)
return true;
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor Suggestion: return result && *result != 0

@jimingham jimingham merged commit 88558d5 into llvm:main Oct 27, 2025
9 of 10 checks passed
@jimingham jimingham deleted the avoid-mainloop-stall branch October 27, 2025 18:38
@slydiman
Copy link
Contributor

slydiman commented Oct 27, 2025

Note the buildbot lldb-x86_64-win is broken after this patch.
Please fix ASAP.

  File "C:\buildbot\as-builder-10\lldb-x86-64\llvm-project\lldb\packages\Python\lldbsuite\test\dotest.py", line 659, in visit_file
    module = __import__(base)
             ^^^^^^^^^^^^^^^^
  File "C:\buildbot\as-builder-10\lldb-x86-64\llvm-project\lldb\test\API\driver\stdio_closed\TestDriverWithClosedSTDIO.py", line 11, in <module>
    import fcntl
ModuleNotFoundError: No module named 'fcntl'

The fcntl module is not available on Windows.
Add @skipIf(oslist=["windows"]).

@jimingham
Copy link
Collaborator Author

Note the buildbot lldb-x86_64-win is broken after this patch. Please fix ASAP.

  File "C:\buildbot\as-builder-10\lldb-x86-64\llvm-project\lldb\packages\Python\lldbsuite\test\dotest.py", line 659, in visit_file
    module = __import__(base)
             ^^^^^^^^^^^^^^^^
  File "C:\buildbot\as-builder-10\lldb-x86-64\llvm-project\lldb\test\API\driver\stdio_closed\TestDriverWithClosedSTDIO.py", line 11, in <module>
    import fcntl
ModuleNotFoundError: No module named 'fcntl'

The fcntl module is not available on Windows. Add @skipIf(oslist=["windows"]).

https://github.com/jimingham/from-apple-llvm-project/pull/new/no-fcntl-on-windows

It's okay to apply this sort of fix yourself if you know the reason for it...

dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
…lvm#164905)

Turns out there's a bug in the current lldb sources that if you fork,
set the stdio file handles to close on exec and then exec lldb with some
commands and the `--batch` flag, lldb will stall on exit. The first
cause of the bug is that the Python session handler - and probably other
places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles,
and open and close and dup them as needed. NB: I am NOT trying to fix
that bug. I'm not convinced running the lldb driver headless is worth a
lot of effort, it's just as easy to redirect them to /dev/null, which
does work.

But I would like to keep lldb from stalling on the way out when this
happens. The reason we stall is that we have a MainLoop waiting for
signals, and we try to Interrupt it, but because stdio was closed, the
interrupt pipe for the MainLoop gets the file descriptor 0, which gets
closed by the Python session handler if you run some script command. So
the Interrupt fails.

We were running the Write to the interrupt pipe wrapped in
`llvm::cantFail`, but in a no asserts build that just drops the error on
the floor. So then lldb went on to call std::thread::join on the still
active MainLoop, and that stalls

I made Interrupt (and AddCallback & AddPendingCallback) return a bool
for "interrupt success" instead. All the places where code was
requesting termination, I added checks for that failure, and skip the
std::thread::join call on the MainLoop thread, since that is almost
certainly going to stall at this point.

I didn't do the same for the Windows MainLoop, as I don't know if/when
the WSASetEvent call can fail, so I always return true here. I also
didn't turn the test off for Windows. According to the Python docs all
the API's I used should work on Windows... If that turns out not to be
true I'll make the test Darwin/Unix only.
DavidSpickett added a commit that referenced this pull request Oct 28, 2025
Test added in #164905, skipped by #165318, but that change did not
guard the import. This import is run when we parse the file which
happens before any skips are applied.
@slydiman
Copy link
Contributor

Please approve #165417 to fix the buildbot lldb-remote-linux-win.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants