Avoid stalls when MainLoop::Interrupt fails to wake up the MainLoop #164905

jimingham · 2025-10-23T21:58:13Z

Turns out there's a bug in the current lldb sources that if you fork, set the stdio file handles to close on exec and then exec lldb with some commands and the --batch flag, lldb will stall on exit. The first cause of the bug is that the Python session handler - and probably other places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles, and open and close and dup them as needed. NB: I am NOT trying to fix that bug. I'm not convinced running the lldb driver headless is worth a lot of effort, it's just as easy to redirect them to /dev/null, which does work.

But I would like to keep lldb from stalling on the way out when this happens. The reason we stall is that we have a MainLoop waiting for signals, and we try to Interrupt it, but because stdio was closed, the interrupt pipe for the MainLoop gets the file descriptor 0, which gets closed by the Python session handler if you run some script command. So the Interrupt fails.

We were running the Write to the interrupt pipe wrapped in llvm::cantFail, but in a no asserts build that just drops the error on the floor. So then lldb went on to call std::thread::join on the still active MainLoop, and that stalls

I made Interrupt (and AddCallback & AddPendingCallback) return a bool for "interrupt success" instead. All the places where code was requesting termination, I added checks for that failure, and skip the std::thread::join call on the MainLoop thread, since that is almost certainly going to stall at this point.

I didn't do the same for the Windows MainLoop, as I don't know if/when the WSASetEvent call can fail, so I always return true here. I also didn't turn the test off for Windows. According to the Python docs all the API's I used should work on Windows... If that turns out not to be true I'll make the test Darwin/Unix only.

main loop.

llvmbot · 2025-10-23T21:58:59Z

@llvm/pr-subscribers-lldb

Author: None (jimingham)

Changes

Turns out there's a bug in the current lldb sources that if you fork, set the stdio file handles to close on exec and then exec lldb with some commands and the --batch flag, lldb will stall on exit. The first cause of the bug is that the Python session handler - and probably other places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles, and open and close and dup them as needed. NB: I am NOT trying to fix that bug. I'm not convinced running the lldb driver headless is worth a lot of effort, it's just as easy to redirect them to /dev/null, which does work.

But I would like to keep lldb from stalling on the way out when this happens. The reason we stall is that we have a MainLoop waiting for signals, and we try to Interrupt it, but because stdio was closed, the interrupt pipe for the MainLoop gets the file descriptor 0, which gets closed by the Python session handler if you run some script command. So the Interrupt fails.

We were running the Write to the interrupt pipe wrapped in llvm::cantFail, but in a no asserts build that just drops the error on the floor. So then lldb went on to call std::thread::join on the still active MainLoop, and that stalls

I made Interrupt (and AddCallback & AddPendingCallback) return a bool for "interrupt success" instead. All the places where code was requesting termination, I added checks for that failure, and skip the std::thread::join call on the MainLoop thread, since that is almost certainly going to stall at this point.

I didn't do the same for the Windows MainLoop, as I don't know if/when the WSASetEvent call can fail, so I always return true here. I also didn't turn the test off for Windows. According to the Python docs all the API's I used should work on Windows... If that turns out not to be true I'll make the test Darwin/Unix only.

Full diff: https://github.com/llvm/llvm-project/pull/164905.diff

14 Files Affected:

(modified) lldb/include/lldb/Host/MainLoopBase.h (+14-8)
(modified) lldb/include/lldb/Host/posix/MainLoopPosix.h (+1-1)
(modified) lldb/include/lldb/Host/windows/MainLoopWindows.h (+1-1)
(modified) lldb/source/Host/common/MainLoopBase.cpp (+4-2)
(modified) lldb/source/Host/posix/MainLoopPosix.cpp (+6-3)
(modified) lldb/source/Host/windows/MainLoopWindows.cpp (+4-1)
(modified) lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp (+4-3)
(added) lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py (+46)
(modified) lldb/tools/driver/Driver.cpp (+4-3)
(modified) lldb/tools/lldb-dap/DAP.cpp (+5-3)
(modified) lldb/unittests/DAP/TestBase.cpp (+2-1)
(modified) lldb/unittests/Host/JSONTransportTest.cpp (+13-6)
(modified) lldb/unittests/Host/MainLoopTest.cpp (+22-9)
(modified) lldb/unittests/Protocol/ProtocolMCPServerTest.cpp (+2-1)

diff --git a/lldb/include/lldb/Host/MainLoopBase.h b/lldb/include/lldb/Host/MainLoopBase.h
index be9a2676e7443..9529f2c214784 100644
--- a/lldb/include/lldb/Host/MainLoopBase.h
+++ b/lldb/include/lldb/Host/MainLoopBase.h
@@ -57,18 +57,23 @@ class MainLoopBase {
   // Add a pending callback that will be executed once after all the pending
   // events are processed. The callback will be executed even if termination
   // was requested.
-  void AddPendingCallback(const Callback &callback) {
-    AddCallback(callback, std::chrono::steady_clock::time_point());
+  // Returns false if an interrupt was needed to get the loop to act on the new
+  // callback, but the interrupt failed, true otherwise.  Mostly used when the
+  // pending callback is a RequestTermination, since if the interrupt fails for
+  // that callback, waiting for the MainLoop thread to terminate could stall.
+  bool AddPendingCallback(const Callback &callback) {
+    return AddCallback(callback, std::chrono::steady_clock::time_point());
   }
 
   // Add a callback that will be executed after a certain amount of time has
-  // passed.
-  void AddCallback(const Callback &callback, std::chrono::nanoseconds delay) {
-    AddCallback(callback, std::chrono::steady_clock::now() + delay);
+  // passed.  See AddPendingCallback comment for the return value.
+  bool AddCallback(const Callback &callback, std::chrono::nanoseconds delay) {
+    return AddCallback(callback, std::chrono::steady_clock::now() + delay);
   }
 
   // Add a callback that will be executed after a given point in time.
-  void AddCallback(const Callback &callback, TimePoint point);
+  // See AddPendingCallback comment for the return value.
+  bool AddCallback(const Callback &callback, TimePoint point);
 
   // Waits for registered events and invoke the proper callbacks. Returns when
   // all callbacks deregister themselves or when someone requests termination.
@@ -85,8 +90,9 @@ class MainLoopBase {
 
   virtual void UnregisterReadObject(IOObject::WaitableHandle handle) = 0;
 
-  // Interrupt the loop that is currently waiting for events.
-  virtual void Interrupt() = 0;
+  /// Interrupt the loop that is currently waiting for events.  Return true if
+  /// the interrupt succeeded, false if it failed.
+  virtual bool Interrupt() = 0;
 
   void ProcessCallbacks();
 
diff --git a/lldb/include/lldb/Host/posix/MainLoopPosix.h b/lldb/include/lldb/Host/posix/MainLoopPosix.h
index e9ac798b948df..92cdbe9d87ec3 100644
--- a/lldb/include/lldb/Host/posix/MainLoopPosix.h
+++ b/lldb/include/lldb/Host/posix/MainLoopPosix.h
@@ -54,7 +54,7 @@ class MainLoopPosix : public MainLoopBase {
   void UnregisterReadObject(IOObject::WaitableHandle handle) override;
   void UnregisterSignal(int signo, std::list<Callback>::iterator callback_it);
 
-  void Interrupt() override;
+  bool Interrupt() override;
 
 private:
   void ProcessReadObject(IOObject::WaitableHandle handle);
diff --git a/lldb/include/lldb/Host/windows/MainLoopWindows.h b/lldb/include/lldb/Host/windows/MainLoopWindows.h
index 705e7e78ba48a..65b44aa1582c3 100644
--- a/lldb/include/lldb/Host/windows/MainLoopWindows.h
+++ b/lldb/include/lldb/Host/windows/MainLoopWindows.h
@@ -50,7 +50,7 @@ class MainLoopWindows : public MainLoopBase {
 protected:
   void UnregisterReadObject(IOObject::WaitableHandle handle) override;
 
-  void Interrupt() override;
+  bool Interrupt() override;
 
 private:
   llvm::Expected<size_t> Poll();
diff --git a/lldb/source/Host/common/MainLoopBase.cpp b/lldb/source/Host/common/MainLoopBase.cpp
index 64a57e65849e9..232b9bc0aa354 100644
--- a/lldb/source/Host/common/MainLoopBase.cpp
+++ b/lldb/source/Host/common/MainLoopBase.cpp
@@ -12,8 +12,9 @@
 using namespace lldb;
 using namespace lldb_private;
 
-void MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
+bool MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
   bool interrupt_needed;
+  bool interrupt_succeeded = true;
   {
     std::lock_guard<std::mutex> lock{m_callback_mutex};
     // We need to interrupt the main thread if this callback is scheduled to
@@ -22,7 +23,8 @@ void MainLoopBase::AddCallback(const Callback &callback, TimePoint point) {
     m_callbacks.emplace(point, callback);
   }
   if (interrupt_needed)
-    Interrupt();
+    interrupt_succeeded = Interrupt();
+  return interrupt_succeeded;
 }
 
 void MainLoopBase::ProcessCallbacks() {
diff --git a/lldb/source/Host/posix/MainLoopPosix.cpp b/lldb/source/Host/posix/MainLoopPosix.cpp
index 19a7128fbe407..e5fa32e4da2b7 100644
--- a/lldb/source/Host/posix/MainLoopPosix.cpp
+++ b/lldb/source/Host/posix/MainLoopPosix.cpp
@@ -387,10 +387,13 @@ void MainLoopPosix::ProcessSignal(int signo) {
   }
 }
 
-void MainLoopPosix::Interrupt() {
+bool MainLoopPosix::Interrupt() {
   if (m_interrupting.exchange(true))
-    return;
+    return true;
 
   char c = '.';
-  cantFail(m_interrupt_pipe.Write(&c, 1));
+  llvm::Expected<size_t> result = m_interrupt_pipe.Write(&c, 1);
+  if (result && *result != 0)
+    return true;
+  return false;
 }
diff --git a/lldb/source/Host/windows/MainLoopWindows.cpp b/lldb/source/Host/windows/MainLoopWindows.cpp
index 9b7df10258bcd..c973e1c61162f 100644
--- a/lldb/source/Host/windows/MainLoopWindows.cpp
+++ b/lldb/source/Host/windows/MainLoopWindows.cpp
@@ -272,4 +272,7 @@ Status MainLoopWindows::Run() {
   return Status();
 }
 
-void MainLoopWindows::Interrupt() { WSASetEvent(m_interrupt_event); }
+bool MainLoopWindows::Interrupt() {
+  WSASetEvent(m_interrupt_event);
+  return true;
+}
diff --git a/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp b/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp
index 390cf3eeb16a5..77a3ba6574cde 100644
--- a/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp
+++ b/lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp
@@ -133,11 +133,12 @@ llvm::Error ProtocolServerMCP::Stop() {
   }
 
   // Stop the main loop.
-  m_loop.AddPendingCallback(
+  bool addition_succeeded = m_loop.AddPendingCallback(
       [](lldb_private::MainLoopBase &loop) { loop.RequestTermination(); });
 
-  // Wait for the main loop to exit.
-  if (m_loop_thread.joinable())
+  // Wait for the main loop to exit, but not if we didn't succeed in inserting
+  // our pending callback or we'll wait forever.
+  if (addition_succeeded && m_loop_thread.joinable())
     m_loop_thread.join();
 
   m_accept_handles.clear();
diff --git a/lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py b/lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py
new file mode 100644
index 0000000000000..304e548c8c8a1
--- /dev/null
+++ b/lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py
@@ -0,0 +1,46 @@
+"""
+Test that if you exec lldb with the stdio file handles
+closed, it is able to exit without hanging.
+"""
+
+
+import lldb
+import os
+import sys
+import socket
+import fcntl
+
+import lldbsuite.test.lldbutil as lldbutil
+from lldbsuite.test.lldbtest import *
+
+
+class RenameThisSampleTestTestCase(TestBase):
+    # If your test case doesn't stress debug info, then
+    # set this to true.  That way it won't be run once for
+    # each debug info format.
+    NO_DEBUG_INFO_TESTCASE = True
+
+    def test_run_lldb_and_wait(self):
+        """This test forks, closes the stdio channels and exec's lldb.
+           Then it waits for it to exit and asserts it did that successfully"""
+        pid = os.fork()
+        if pid == 0:
+            fcntl.fcntl(sys.stdin, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
+            fcntl.fcntl(sys.stdout, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
+            fcntl.fcntl(sys.stderr, fcntl.F_SETFD, fcntl.FD_CLOEXEC)
+            lldb = lldbtest_config.lldbExec
+            print(f"About to run: {lldb}")
+            os.execlp(lldb, lldb, "-x", "-b", "-o", "script print(lldb.debugger.GetNumTargets())", "--batch")
+        else:
+            if pid == -1:
+                print("Couldn't fork a process.")
+                return
+            ret_pid, status = os.waitpid(pid, 0)
+            # We're really just checking that lldb doesn't stall.
+            # At the time this test was written, if you close stdin
+            # in an asserts build, lldb aborts.  So handle both
+            # of those cases.  The failure will just be that the
+            # waitpid doesn't return, and the test times out.
+            self.assertFalse(os.WIFSTOPPED(status), "We either exited or crashed.")
+
+
diff --git a/lldb/tools/driver/Driver.cpp b/lldb/tools/driver/Driver.cpp
index ba0041111045b..f9cf4c41bf682 100644
--- a/lldb/tools/driver/Driver.cpp
+++ b/lldb/tools/driver/Driver.cpp
@@ -902,9 +902,10 @@ int main(int argc, char const *argv[]) {
   }
 
 #if !defined(_WIN32)
-  signal_loop.AddPendingCallback(
-      [](MainLoopBase &loop) { loop.RequestTermination(); });
-  signal_thread.join();
+  // Try to interrupt the signal thread.  If that succeeds, wait for it to exit.
+  if (signal_loop.AddPendingCallback(
+      [](MainLoopBase &loop) { loop.RequestTermination(); }))
+    signal_thread.join();
 #endif
 
   return exit_code;
diff --git a/lldb/tools/lldb-dap/DAP.cpp b/lldb/tools/lldb-dap/DAP.cpp
index 3c4f2253d1ad5..c54d1a2b43092 100644
--- a/lldb/tools/lldb-dap/DAP.cpp
+++ b/lldb/tools/lldb-dap/DAP.cpp
@@ -1104,9 +1104,11 @@ llvm::Error DAP::Loop() {
                                      "unhandled packet");
   }
 
-  m_loop.AddPendingCallback(
-      [](MainLoopBase &loop) { loop.RequestTermination(); });
-  thread.join();
+  // Don't wait to join the mainloop thread if our callback wasn't added
+  // successfully, or we'll wait forever.
+  if (m_loop.AddPendingCallback(
+      [](MainLoopBase &loop) { loop.RequestTermination(); }))
+    thread.join();
 
   if (m_error_occurred)
     return llvm::createStringError(llvm::inconvertibleErrorCode(),
diff --git a/lldb/unittests/DAP/TestBase.cpp b/lldb/unittests/DAP/TestBase.cpp
index 83a303554ad6b..8cb459964f7d8 100644
--- a/lldb/unittests/DAP/TestBase.cpp
+++ b/lldb/unittests/DAP/TestBase.cpp
@@ -55,8 +55,9 @@ void TransportBase::SetUp() {
 }
 
 void TransportBase::Run() {
-  loop.AddPendingCallback(
+  bool addition_succeeded = loop.AddPendingCallback(
       [](lldb_private::MainLoopBase &loop) { loop.RequestTermination(); });
+  EXPECT_TRUE(addition_succeeded);
   EXPECT_THAT_ERROR(loop.Run().takeError(), llvm::Succeeded());
 }
 
diff --git a/lldb/unittests/Host/JSONTransportTest.cpp b/lldb/unittests/Host/JSONTransportTest.cpp
index 54f1372ca0fff..26d8913b20898 100644
--- a/lldb/unittests/Host/JSONTransportTest.cpp
+++ b/lldb/unittests/Host/JSONTransportTest.cpp
@@ -269,12 +269,13 @@ template <typename T> class JSONTransportTest : public PipePairTest {
         loop.RequestTermination();
       });
     }
-    loop.AddCallback(
+    bool addition_succeeded = loop.AddCallback(
         [](MainLoopBase &loop) {
           loop.RequestTermination();
           FAIL() << "timeout";
         },
         timeout);
+    EXPECT_TRUE(addition_succeeded);
     auto handle = transport->RegisterMessageHandler(loop, message_handler);
     if (!handle)
       return handle.takeError();
@@ -367,7 +368,9 @@ class TransportBinderTest : public testing::Test {
   }
 
   void Run() {
-    loop.AddPendingCallback([](auto &loop) { loop.RequestTermination(); });
+    bool addition_succeeded = loop.AddPendingCallback([](auto &loop) 
+        { loop.RequestTermination(); });
+    EXPECT_TRUE(addition_succeeded);
     EXPECT_THAT_ERROR(loop.Run().takeError(), Succeeded());
   }
 };
@@ -435,8 +438,9 @@ TEST_F(HTTPDelimitedJSONTransportTest, ReadPartialMessage) {
   EXPECT_CALL(message_handler, Received(Request{5, "foo", std::nullopt}));
 
   ASSERT_THAT_EXPECTED(input.Write(part1.data(), part1.size()), Succeeded());
-  loop.AddPendingCallback(
+  bool addition_succeeded = loop.AddPendingCallback(
       [](MainLoopBase &loop) { loop.RequestTermination(); });
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(Run(/*close_stdin=*/false), Succeeded());
   ASSERT_THAT_EXPECTED(input.Write(part2.data(), part2.size()), Succeeded());
   input.CloseWriteFileDescriptor();
@@ -454,15 +458,17 @@ TEST_F(HTTPDelimitedJSONTransportTest, ReadWithZeroByteWrites) {
   ASSERT_THAT_EXPECTED(input.Write(part1.data(), part1.size()), Succeeded());
 
   // Run the main loop once for the initial read.
-  loop.AddPendingCallback(
+  bool addition_succeeded = loop.AddPendingCallback(
       [](MainLoopBase &loop) { loop.RequestTermination(); });
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(Run(/*close_stdin=*/false), Succeeded());
 
   // zero-byte write.
   ASSERT_THAT_EXPECTED(input.Write(part1.data(), 0),
                        Succeeded()); // zero-byte write.
-  loop.AddPendingCallback(
+  addition_succeeded = loop.AddPendingCallback(
       [](MainLoopBase &loop) { loop.RequestTermination(); });
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(Run(/*close_stdin=*/false), Succeeded());
 
   // Write the remaining part of the message.
@@ -569,8 +575,9 @@ TEST_F(JSONRPCTransportTest, ReadPartialMessage) {
   EXPECT_CALL(message_handler, Received(Request{42, "foo", std::nullopt}));
 
   ASSERT_THAT_EXPECTED(input.Write(part1.data(), part1.size()), Succeeded());
-  loop.AddPendingCallback(
+  bool addition_succeeded = loop.AddPendingCallback(
       [](MainLoopBase &loop) { loop.RequestTermination(); });
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(Run(/*close_input=*/false), Succeeded());
 
   ASSERT_THAT_EXPECTED(input.Write(part2.data(), part2.size()), Succeeded());
diff --git a/lldb/unittests/Host/MainLoopTest.cpp b/lldb/unittests/Host/MainLoopTest.cpp
index ae16d02101819..b546b0f71286a 100644
--- a/lldb/unittests/Host/MainLoopTest.cpp
+++ b/lldb/unittests/Host/MainLoopTest.cpp
@@ -179,9 +179,13 @@ TEST_F(MainLoopTest, PipeDelayBetweenRegisterAndRun) {
     ASSERT_THAT_EXPECTED(pipe.Write(&X, len), llvm::HasValue(1));
   };
   // Add a write that triggers a read events.
-  loop.AddCallback(cb, std::chrono::milliseconds(500));
-  loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
+  bool addition_succeeded = loop.AddCallback(cb, 
+      std::chrono::milliseconds(500));
+  ASSERT_TRUE(addition_succeeded);
+  addition_succeeded = loop.AddCallback([](MainLoopBase &loop) 
+      { loop.RequestTermination(); },
                    std::chrono::milliseconds(1000));
+  ASSERT_TRUE(addition_succeeded);
   ASSERT_TRUE(error.Success());
   ASSERT_TRUE(handle);
 
@@ -310,8 +314,10 @@ TEST_F(MainLoopTest, NoSpuriousSocketReads) {
       error);
   ASSERT_THAT_ERROR(error.ToError(), llvm::Succeeded());
   // Terminate the loop after one second.
-  loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
+  bool addition_succeeded = loop.AddCallback([](MainLoopBase &loop) 
+      { loop.RequestTermination(); },
                    std::chrono::seconds(1));
+  ASSERT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(loop.Run().ToError(), llvm::Succeeded());
 
   // Make sure the callback was called only once.
@@ -388,10 +394,11 @@ TEST_F(MainLoopTest, PendingCallbackTrigger) {
   MainLoop loop;
   std::promise<void> add_callback2;
   bool callback1_called = false;
-  loop.AddPendingCallback([&](MainLoopBase &loop) {
+  bool addition_succeeded = loop.AddPendingCallback([&](MainLoopBase &loop) {
     callback1_called = true;
     add_callback2.set_value();
   });
+  EXPECT_TRUE(addition_succeeded);
   Status error;
   ASSERT_THAT_ERROR(error.ToError(), llvm::Succeeded());
   bool callback2_called = false;
@@ -416,9 +423,11 @@ TEST_F(MainLoopTest, ManyPendingCallbacks) {
   // caused a deadlock when the pipe filled up (either because the main loop was
   // not running, because it was slow, or because it was busy/blocked doing
   // something else).
-  for (int i = 0; i < 65536; ++i)
-    loop.AddPendingCallback(
+  for (int i = 0; i < 65536; ++i) {
+    bool addition_succeeded = loop.AddPendingCallback(
         [&](MainLoopBase &loop) { loop.RequestTermination(); });
+    EXPECT_TRUE(addition_succeeded);
+  }
   ASSERT_TRUE(loop.Run().Success());
 }
 
@@ -444,8 +453,10 @@ TEST_F(MainLoopTest, TimedCallbacksRunInOrder) {
   add_cb(2);
   add_cb(4);
   add_cb(1);
-  loop.AddCallback([](MainLoopBase &loop) { loop.RequestTermination(); },
+  bool addition_succeeded = loop.AddCallback([](MainLoopBase &loop) 
+      { loop.RequestTermination(); },
                    start + 5 * epsilon);
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(loop.Run().takeError(), llvm::Succeeded());
   EXPECT_GE(std::chrono::steady_clock::now() - start, 5 * epsilon);
   ASSERT_THAT(order, testing::ElementsAre(1, 2, 3, 4));
@@ -455,22 +466,24 @@ TEST_F(MainLoopTest, TimedCallbackShortensSleep) {
   MainLoop loop;
   auto start = std::chrono::steady_clock::now();
   bool long_callback_called = false;
-  loop.AddCallback(
+  bool addition_succeeded = loop.AddCallback(
       [&](MainLoopBase &loop) {
         long_callback_called = true;
         loop.RequestTermination();
       },
       std::chrono::seconds(30));
+  EXPECT_TRUE(addition_succeeded);
   std::future<Status> async_run =
       std::async(std::launch::async, &MainLoop::Run, std::ref(loop));
   std::this_thread::sleep_for(std::chrono::milliseconds(100));
   bool short_callback_called = false;
-  loop.AddCallback(
+  addition_succeeded = loop.AddCallback(
       [&](MainLoopBase &loop) {
         short_callback_called = true;
         loop.RequestTermination();
       },
       std::chrono::seconds(1));
+  EXPECT_TRUE(addition_succeeded);
   ASSERT_THAT_ERROR(async_run.get().takeError(), llvm::Succeeded());
   EXPECT_LT(std::chrono::steady_clock::now() - start, std::chrono::seconds(10));
   EXPECT_TRUE(short_callback_called);
diff --git a/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp b/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp
index 45464db958e04..97f32e2fbb1bf 100644
--- a/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp
+++ b/lldb/unittests/Protocol/ProtocolMCPServerTest.cpp
@@ -150,8 +150,9 @@ class ProtocolServerMCPTest : public testing::Test {
 
   /// Runs the MainLoop a single time, executing any pending callbacks.
   void Run() {
-    loop.AddPendingCallback(
+    bool addition_succeeded = loop.AddPendingCallback(
         [](MainLoopBase &loop) { loop.RequestTermination(); });
+    EXPECT_TRUE(addition_succeeded);
     EXPECT_THAT_ERROR(loop.Run().takeError(), Succeeded());
   }

github-actions · 2025-10-23T22:00:11Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff origin/main HEAD --extensions h,cpp -- lldb/include/lldb/Host/MainLoopBase.h lldb/include/lldb/Host/posix/MainLoopPosix.h lldb/include/lldb/Host/windows/MainLoopWindows.h lldb/source/Host/common/MainLoopBase.cpp lldb/source/Host/posix/MainLoopPosix.cpp lldb/source/Host/windows/MainLoopWindows.cpp lldb/source/Plugins/Protocol/MCP/ProtocolServerMCP.cpp lldb/tools/driver/Driver.cpp lldb/tools/lldb-dap/DAP.cpp lldb/unittests/DAP/TestBase.cpp lldb/unittests/Host/JSONTransportTest.cpp lldb/unittests/Host/MainLoopTest.cpp lldb/unittests/Protocol/ProtocolMCPServerTest.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.

diff --git a/lldb/source/Host/windows/MainLoopWindows.cpp b/lldb/source/Host/windows/MainLoopWindows.cpp
index 5e5888aee..96e871c8a 100644
--- a/lldb/source/Host/windows/MainLoopWindows.cpp
+++ b/lldb/source/Host/windows/MainLoopWindows.cpp
@@ -272,6 +272,4 @@ Status MainLoopWindows::Run() {
   return Status();
 }
 
-bool MainLoopWindows::Interrupt() {
-  return WSASetEvent(m_interrupt_event);
-}
+bool MainLoopWindows::Interrupt() { return WSASetEvent(m_interrupt_event); }

github-actions · 2025-10-23T22:00:11Z

✅ With the latest revision this PR passed the Python code formatter.

ashgti · 2025-10-23T22:55:21Z

lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py

+from lldbsuite.test.lldbtest import *
+
+
+class RenameThisSampleTestTestCase(TestBase):


Should the test name be updated?

ashgti · 2025-10-23T22:58:29Z

lldb/test/API/driver/stdio_closed/TestDriverWithClosedSTDIO.py

+                lldb,
+                lldb,
+                "-x",
+                "-b",


Does it matter that -b and --batch are both present? Should we remove one?

It doesn't matter to lldb, each time we see it we set a boolean to true... But it's more regular to mention it once. I'll fix that.

ashgti · 2025-10-23T23:06:53Z

lldb/source/Host/posix/MainLoopPosix.cpp


  char c = '.';
-  cantFail(m_interrupt_pipe.Write(&c, 1));
+  llvm::Expected<size_t> result = m_interrupt_pipe.Write(&c, 1);


Can we get a errno EAGAIN or EINTR during this write? Looking at the ::write

llvm-project/lldb/source/Host/posix/PipePosix.cpp

Line 339 in 09cf301

ssize_t result = ::write(fd, buf, size);

we're not checking for those and trying again, should we? I think the m_interrupt_pipe is set to non-blocking.

Maybe we can wrap the write call in

llvm-project/llvm/include/llvm/Support/Errno.h

Line 33 in 6dd78f6

inline decltype(auto) RetryAfterSignal(const FailT &Fail, const Fun &F,

The error in the particular case I was addressing was that the interrupt pipe that we were trying to write to got closed out from under us. The error was "bad file handle". Since Write wasn't actually at fault here, I wasn't setting out to make the Write more robust. In this patch, I'm just trying to make sure the error gets reported so we don't stall waiting to join a thread that's never going to exit.

Making Write more robust in against interrupts would be a fine thing to do but is outside the scope of this patch.

ashgti · 2025-10-23T23:09:56Z

lldb/source/Host/windows/MainLoopWindows.cpp


-void MainLoopWindows::Interrupt() { WSASetEvent(m_interrupt_event); }
+bool MainLoopWindows::Interrupt() {
+  WSASetEvent(m_interrupt_event);


WSASetEvent returns a bool that indicates if it succeeded or not, so we could return that here.

I made that change.

bulbazord

Premise makes sense to me. With the limited scope of this change, I don't have any objections. I left one minor suggestion otherwise.

bulbazord · 2025-10-24T18:14:08Z

lldb/source/Host/posix/MainLoopPosix.cpp

+  llvm::Expected<size_t> result = m_interrupt_pipe.Write(&c, 1);
+  if (result && *result != 0)
+    return true;
+  return false;


Minor Suggestion: return result && *result != 0

slydiman · 2025-10-27T19:17:23Z

Note the buildbot lldb-x86_64-win is broken after this patch.
Please fix ASAP.

  File "C:\buildbot\as-builder-10\lldb-x86-64\llvm-project\lldb\packages\Python\lldbsuite\test\dotest.py", line 659, in visit_file
    module = __import__(base)
             ^^^^^^^^^^^^^^^^
  File "C:\buildbot\as-builder-10\lldb-x86-64\llvm-project\lldb\test\API\driver\stdio_closed\TestDriverWithClosedSTDIO.py", line 11, in <module>
    import fcntl
ModuleNotFoundError: No module named 'fcntl'

The fcntl module is not available on Windows.
Add @skipIf(oslist=["windows"]).

jimingham · 2025-10-27T21:00:34Z

Note the buildbot lldb-x86_64-win is broken after this patch. Please fix ASAP.

  File "C:\buildbot\as-builder-10\lldb-x86-64\llvm-project\lldb\packages\Python\lldbsuite\test\dotest.py", line 659, in visit_file
    module = __import__(base)
             ^^^^^^^^^^^^^^^^
  File "C:\buildbot\as-builder-10\lldb-x86-64\llvm-project\lldb\test\API\driver\stdio_closed\TestDriverWithClosedSTDIO.py", line 11, in <module>
    import fcntl
ModuleNotFoundError: No module named 'fcntl'

The fcntl module is not available on Windows. Add @skipIf(oslist=["windows"]).

https://github.com/jimingham/from-apple-llvm-project/pull/new/no-fcntl-on-windows

It's okay to apply this sort of fix yourself if you know the reason for it...

…lvm#164905) Turns out there's a bug in the current lldb sources that if you fork, set the stdio file handles to close on exec and then exec lldb with some commands and the `--batch` flag, lldb will stall on exit. The first cause of the bug is that the Python session handler - and probably other places in lldb - think 0, 1, and 2 HAVE TO BE the stdio file handles, and open and close and dup them as needed. NB: I am NOT trying to fix that bug. I'm not convinced running the lldb driver headless is worth a lot of effort, it's just as easy to redirect them to /dev/null, which does work. But I would like to keep lldb from stalling on the way out when this happens. The reason we stall is that we have a MainLoop waiting for signals, and we try to Interrupt it, but because stdio was closed, the interrupt pipe for the MainLoop gets the file descriptor 0, which gets closed by the Python session handler if you run some script command. So the Interrupt fails. We were running the Write to the interrupt pipe wrapped in `llvm::cantFail`, but in a no asserts build that just drops the error on the floor. So then lldb went on to call std::thread::join on the still active MainLoop, and that stalls I made Interrupt (and AddCallback & AddPendingCallback) return a bool for "interrupt success" instead. All the places where code was requesting termination, I added checks for that failure, and skip the std::thread::join call on the MainLoop thread, since that is almost certainly going to stall at this point. I didn't do the same for the Windows MainLoop, as I don't know if/when the WSASetEvent call can fail, so I always return true here. I also didn't turn the test off for Windows. According to the Python docs all the API's I used should work on Windows... If that turns out not to be true I'll make the test Darwin/Unix only.

Test added in #164905, skipped by #165318, but that change did not guard the import. This import is run when we parse the file which happens before any skips are applied.

slydiman · 2025-10-28T15:21:39Z

Please approve #165417 to fix the buildbot lldb-remote-linux-win.

Avoid stalls when MainLoop::Interrupt fails to wake up the

64947d4

main loop.

jimingham requested a review from JDevlieghere as a code owner October 23, 2025 21:58

jimingham removed the request for review from JDevlieghere October 23, 2025 21:58

llvmbot added lldb lldb-dap labels Oct 23, 2025

jimingham requested a review from ashgti October 23, 2025 21:58

jimingham requested a review from bulbazord October 23, 2025 21:59

Formatting

c5990ba

ashgti reviewed Oct 23, 2025

View reviewed changes

Address review comments.

37b37c2

bulbazord approved these changes Oct 24, 2025

View reviewed changes

jimingham added 2 commits October 27, 2025 10:06

Review suggestion.

cc83cd7

Merge branch 'main' into avoid-mainloop-stall

f600f1d

jimingham merged commit 88558d5 into llvm:main Oct 27, 2025
9 of 10 checks passed

jimingham deleted the avoid-mainloop-stall branch October 27, 2025 18:38

		from lldbsuite.test.lldbtest import *


		class RenameThisSampleTestTestCase(TestBase):

Uh oh!

Avoid stalls when MainLoop::Interrupt fails to wake up the MainLoop #164905

Avoid stalls when MainLoop::Interrupt fails to wake up the MainLoop #164905

Conversation

jimingham commented Oct 23, 2025

Uh oh!

llvmbot commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bulbazord left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

slydiman commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimingham commented Oct 27, 2025

Uh oh!

slydiman commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Oct 23, 2025 •

edited

Loading

github-actions bot commented Oct 23, 2025 •

edited

Loading

slydiman commented Oct 27, 2025 •

edited

Loading