New ThreadPlanSingleThreadTimeout to resolve potential deadlock in single thread stepping #90930

jeffreytan81 · 2024-05-03T02:06:08Z

This PR introduces a new ThreadPlanSingleThreadTimeout that will be used to address potential deadlock during single-thread stepping.

While debugging a target with a non-trivial number of threads (around 5000 threads in one example target), we noticed that a simple step over can take as long as 10 seconds. Enabling single-thread stepping mode significantly reduces the stepping time to around 3 seconds. However, this can introduce deadlock if we try to step over a method that depends on other threads to release a lock.

To address this issue, we introduce a new ThreadPlanSingleThreadTimeout that can be controlled by the target.process.thread.single-thread-plan-timeout setting during single-thread stepping mode. The concept involves counting the elapsed time since the last internal stop to detect overall stepping progress. Once a timeout occurs, we assume the target is not making progress due to a potential deadlock, as mentioned above. We then send a new async interrupt, resume all threads, and ThreadPlanSingleThreadTimeout completes its task.

To support this design, the major changes made in this PR are:

ThreadPlanSingleThreadTimeout is popped during every internal stop and reset (re-pushed) to the top of the stack (as a leaf node) during resume. This is achieved by always returning true from ThreadPlanSingleThreadTimeout::DoPlanExplainsStop() and ThreadPlanSingleThreadTimeout::MischiefManaged().
A new thread-specific async interrupt stop is introduced, which can be detected/consumed by ThreadPlanSingleThreadTimeout.
The clearing of branch breakpoints in the range thread plan has been moved from DoPlanExplainsStop() to ShouldStop(), as it is not guaranteed that it will be called.

The detailed design is discussed in the RFC below:
https://discourse.llvm.org/t/improve-single-thread-stepping/74599

github-actions · 2024-05-06T18:17:26Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2024-05-06T21:57:16Z

@llvm/pr-subscribers-lldb

Author: None (jeffreytan81)

Changes

This PR introduces a new ThreadPlanSingleThreadTimeout that will be used to address potential deadlock during single-thread stepping.

While debugging a target with a non-trivial number of threads (around 5000 threads in one example target), we noticed that a simple step over can take as long as 10 seconds. Enabling single-thread stepping mode significantly reduces the stepping time to around 3 seconds. However, this can introduce deadlock if we try to step over a method that depends on other threads to release a lock.

To address this issue, we introduce a new ThreadPlanSingleThreadTimeout that can be controlled by the target.process.thread.single-thread-plan-timeout setting during single-thread stepping mode. The concept involves counting the elapsed time since the last internal stop to detect overall stepping progress. Once a timeout occurs, we assume the target is not making progress due to a potential deadlock, as mentioned above. We then send a new async interrupt, resume all threads, and ThreadPlanSingleThreadTimeout completes its task.

To support this design, the major changes made in this PR are:

ThreadPlanSingleThreadTimeout is popped during every internal stop and reset (re-pushed) to the top of the stack (as a leaf node) during resume. This is achieved by always returning true from ThreadPlanSingleThreadTimeout::DoPlanExplainsStop() and ThreadPlanSingleThreadTimeout::MischiefManaged().
A new thread-specific async interrupt stop is introduced, which can be detected/consumed by ThreadPlanSingleThreadTimeout.
The clearing of branch breakpoints in the range thread plan has been moved from DoPlanExplainsStop() to ShouldStop(), as it is not guaranteed that it will be called.

The detailed design is discussed in the RFC below:
https://discourse.llvm.org/t/improve-single-thread-stepping/74599

Patch is 49.90 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90930.diff

29 Files Affected:

(modified) lldb/include/lldb/Target/Process.h (+9-2)
(modified) lldb/include/lldb/Target/StopInfo.h (+4)
(modified) lldb/include/lldb/Target/Thread.h (+2)
(modified) lldb/include/lldb/Target/ThreadPlan.h (+5-3)
(added) lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h (+93)
(modified) lldb/include/lldb/Target/ThreadPlanStepOut.h (+1)
(modified) lldb/include/lldb/Target/ThreadPlanStepOverRange.h (+6)
(modified) lldb/include/lldb/Target/ThreadPlanStepRange.h (+5)
(modified) lldb/include/lldb/lldb-enumerations.h (+1)
(modified) lldb/source/API/SBThread.cpp (+6)
(modified) lldb/source/Interpreter/CommandInterpreter.cpp (+1-1)
(modified) lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp (+2)
(modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp (+50-11)
(modified) lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h (+6)
(modified) lldb/source/Target/CMakeLists.txt (+1)
(modified) lldb/source/Target/Process.cpp (+18-5)
(modified) lldb/source/Target/StopInfo.cpp (+37)
(modified) lldb/source/Target/TargetProperties.td (+4)
(modified) lldb/source/Target/Thread.cpp (+8-1)
(modified) lldb/source/Target/ThreadPlan.cpp (+1)
(added) lldb/source/Target/ThreadPlanSingleThreadTimeout.cpp (+234)
(modified) lldb/source/Target/ThreadPlanStepInRange.cpp (+1)
(modified) lldb/source/Target/ThreadPlanStepOverRange.cpp (+18-2)
(modified) lldb/source/Target/ThreadPlanStepRange.cpp (+45-26)
(added) lldb/test/API/functionalities/single-thread-step/Makefile (+4)
(added) lldb/test/API/functionalities/single-thread-step/TestSingleThreadStepTimeout.py (+123)
(added) lldb/test/API/functionalities/single-thread-step/main.cpp (+62)
(modified) lldb/tools/lldb-dap/JSONUtils.cpp (+3)
(modified) lldb/tools/lldb-dap/LLDBUtils.cpp (+1)

diff --git a/lldb/include/lldb/Target/Process.h b/lldb/include/lldb/Target/Process.h
index aac0cf51680a9e..7e758dbb9f6457 100644
--- a/lldb/include/lldb/Target/Process.h
+++ b/lldb/include/lldb/Target/Process.h
@@ -1312,11 +1312,13 @@ class Process : public std::enable_shared_from_this<Process>,
 
   size_t GetThreadStatus(Stream &ostrm, bool only_threads_with_stop_reason,
                          uint32_t start_frame, uint32_t num_frames,
-                         uint32_t num_frames_with_source,
-                         bool stop_format);
+                         uint32_t num_frames_with_source, bool stop_format);
 
   void SendAsyncInterrupt();
 
+  // Send an async interrupt and receive stop from a specific /p thread.
+  void SendAsyncInterrupt(Thread *thread);
+
   // Notify this process class that modules got loaded.
   //
   // If subclasses override this method, they must call this version before
@@ -3102,6 +3104,11 @@ void PruneThreadPlans();
                            // Resume will only request a resume, using this
                            // flag to check.
 
+  lldb::tid_t m_interrupt_tid; /// The tid of the thread that issued the async
+                               /// interrupt, used by thread plan timeout. It
+                               /// can be LLDB_INVALID_THREAD_ID to indicate
+                               /// user level async interrupt.
+
   /// This is set at the beginning of Process::Finalize() to stop functions
   /// from looking up or creating things during or after a finalize call.
   std::atomic<bool> m_finalizing;
diff --git a/lldb/include/lldb/Target/StopInfo.h b/lldb/include/lldb/Target/StopInfo.h
index d1848fcbbbdb19..fae90364deaf0a 100644
--- a/lldb/include/lldb/Target/StopInfo.h
+++ b/lldb/include/lldb/Target/StopInfo.h
@@ -123,6 +123,10 @@ class StopInfo : public std::enable_shared_from_this<StopInfo> {
                              const char *description = nullptr,
                              std::optional<int> code = std::nullopt);
 
+  static lldb::StopInfoSP
+  CreateStopReasonWithInterrupt(Thread &thread, int signo,
+                                const char *description);
+
   static lldb::StopInfoSP CreateStopReasonToTrace(Thread &thread);
 
   static lldb::StopInfoSP
diff --git a/lldb/include/lldb/Target/Thread.h b/lldb/include/lldb/Target/Thread.h
index c17bddf4d98b85..584093348b4c6a 100644
--- a/lldb/include/lldb/Target/Thread.h
+++ b/lldb/include/lldb/Target/Thread.h
@@ -57,6 +57,8 @@ class ThreadProperties : public Properties {
   bool GetStepOutAvoidsNoDebug() const;
 
   uint64_t GetMaxBacktraceDepth() const;
+
+  uint64_t GetSingleThreadPlanTimeout() const;
 };
 
 class Thread : public std::enable_shared_from_this<Thread>,
diff --git a/lldb/include/lldb/Target/ThreadPlan.h b/lldb/include/lldb/Target/ThreadPlan.h
index bf68a42e54d18f..640e997caf7b3d 100644
--- a/lldb/include/lldb/Target/ThreadPlan.h
+++ b/lldb/include/lldb/Target/ThreadPlan.h
@@ -302,7 +302,8 @@ class ThreadPlan : public std::enable_shared_from_this<ThreadPlan>,
     eKindStepInRange,
     eKindRunToAddress,
     eKindStepThrough,
-    eKindStepUntil
+    eKindStepUntil,
+    eKindSingleThreadTimeout,
   };
 
   virtual ~ThreadPlan();
@@ -483,6 +484,8 @@ class ThreadPlan : public std::enable_shared_from_this<ThreadPlan>,
     return m_takes_iteration_count;
   }
 
+  virtual lldb::StateType GetPlanRunState() = 0;
+
 protected:
   // Constructors and Destructors
   ThreadPlan(ThreadPlanKind kind, const char *name, Thread &thread,
@@ -522,8 +525,6 @@ class ThreadPlan : public std::enable_shared_from_this<ThreadPlan>,
     GetThread().SetStopInfo(stop_reason_sp);
   }
 
-  virtual lldb::StateType GetPlanRunState() = 0;
-
   bool IsUsuallyUnexplainedStopReason(lldb::StopReason);
 
   Status m_status;
@@ -590,6 +591,7 @@ class ThreadPlanNull : public ThreadPlan {
   const Status &GetStatus() { return m_status; }
 
 protected:
+  friend class ThreadPlanSingleThreadTimeout;
   bool DoPlanExplainsStop(Event *event_ptr) override;
 
   lldb::StateType GetPlanRunState() override;
diff --git a/lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h b/lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h
new file mode 100644
index 00000000000000..89db966b29bff3
--- /dev/null
+++ b/lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h
@@ -0,0 +1,93 @@
+//===-- ThreadPlanSingleThreadTimeout.h -------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLDB_TARGET_THREADPLANSINGLETHREADTIMEOUT_H
+#define LLDB_TARGET_THREADPLANSINGLETHREADTIMEOUT_H
+
+#include "lldb/Target/Thread.h"
+#include "lldb/Target/ThreadPlan.h"
+#include "lldb/Utility/Event.h"
+#include "lldb/Utility/LLDBLog.h"
+#include "lldb/Utility/State.h"
+
+#include <thread>
+
+namespace lldb_private {
+
+//
+// Thread plan used by single thread execution to issue timeout. This is useful
+// to detect potential deadlock in single thread execution. The timeout measures
+// the elapsed time from the last internal stop and got reset by each internal
+// stops to ensure we are accurately detecting execution not moving forward.
+// This means this thread plan  can be created/destroyed multiple times by the
+// parent execution plan.
+//
+// When timeout happens, the thread plan resolves the potential deadlock by
+// issuing a thread specific async interrupt to enter stop state, then all
+// threads execution are resumed to resolve the potential deadlock.
+//
+class ThreadPlanSingleThreadTimeout : public ThreadPlan {
+public:
+  ~ThreadPlanSingleThreadTimeout() override;
+
+  // Create a new instance from fresh new state.
+  static void CreateNew(Thread &thread);
+  // Reset and create a new instance from the previous state.
+  static void ResetFromPrevState(Thread &thread);
+
+  void GetDescription(Stream *s, lldb::DescriptionLevel level) override;
+  bool ValidatePlan(Stream *error) override { return true; }
+  bool WillStop() override;
+  void DidPop() override;
+
+  bool DoPlanExplainsStop(Event *event_ptr) override;
+
+  lldb::StateType GetPlanRunState() override;
+  static void TimeoutThreadFunc(ThreadPlanSingleThreadTimeout *self);
+
+  bool MischiefManaged() override;
+
+  bool ShouldStop(Event *event_ptr) override;
+  void SetStopOthers(bool new_value) override;
+  bool StopOthers() override;
+
+private:
+  ThreadPlanSingleThreadTimeout(Thread &thread);
+
+  static bool IsAlive();
+
+  enum class State {
+    WaitTimeout,    // Waiting for timeout.
+    AsyncInterrupt, // Async interrupt has been issued.
+    Done,           // Finished resume all threads.
+  };
+
+  static std::mutex s_mutex;
+  static ThreadPlanSingleThreadTimeout *s_instance;
+  static State s_prev_state;
+
+  bool HandleEvent(Event *event_ptr);
+  void HandleTimeout();
+
+  static std::string StateToString(State state);
+
+  ThreadPlanSingleThreadTimeout(const ThreadPlanSingleThreadTimeout &) = delete;
+  const ThreadPlanSingleThreadTimeout &
+  operator=(const ThreadPlanSingleThreadTimeout &) = delete;
+
+  State m_state;
+  std::mutex m_mutex;
+  std::condition_variable m_wakeup_cv;
+  // Whether the timer thread should exit or not.
+  bool m_exit_flag;
+  std::thread m_timer_thread;
+};
+
+} // namespace lldb_private
+
+#endif // LLDB_TARGET_THREADPLANSINGLETHREADTIMEOUT_H
diff --git a/lldb/include/lldb/Target/ThreadPlanStepOut.h b/lldb/include/lldb/Target/ThreadPlanStepOut.h
index b1d8769f7c546e..013c675afc33d0 100644
--- a/lldb/include/lldb/Target/ThreadPlanStepOut.h
+++ b/lldb/include/lldb/Target/ThreadPlanStepOut.h
@@ -30,6 +30,7 @@ class ThreadPlanStepOut : public ThreadPlan, public ThreadPlanShouldStopHere {
   bool ValidatePlan(Stream *error) override;
   bool ShouldStop(Event *event_ptr) override;
   bool StopOthers() override;
+  void SetStopOthers(bool new_value) override { m_stop_others = new_value; }
   lldb::StateType GetPlanRunState() override;
   bool WillStop() override;
   bool MischiefManaged() override;
diff --git a/lldb/include/lldb/Target/ThreadPlanStepOverRange.h b/lldb/include/lldb/Target/ThreadPlanStepOverRange.h
index 8585ac62f09b35..faa0ab596a2836 100644
--- a/lldb/include/lldb/Target/ThreadPlanStepOverRange.h
+++ b/lldb/include/lldb/Target/ThreadPlanStepOverRange.h
@@ -27,7 +27,9 @@ class ThreadPlanStepOverRange : public ThreadPlanStepRange,
   ~ThreadPlanStepOverRange() override;
 
   void GetDescription(Stream *s, lldb::DescriptionLevel level) override;
+  void SetStopOthers(bool new_value) override;
   bool ShouldStop(Event *event_ptr) override;
+  void DidPush() override;
 
 protected:
   bool DoPlanExplainsStop(Event *event_ptr) override;
@@ -42,8 +44,12 @@ class ThreadPlanStepOverRange : public ThreadPlanStepRange,
 
   void SetupAvoidNoDebug(LazyBool step_out_avoids_code_without_debug_info);
   bool IsEquivalentContext(const SymbolContext &context);
+  // Clear and create a new ThreadPlanSingleThreadTimeout to detect if program
+  // is moving forward.
+  void ResetSingleThreadTimeout();
 
   bool m_first_resume;
+  lldb::RunMode m_run_mode;
 
   ThreadPlanStepOverRange(const ThreadPlanStepOverRange &) = delete;
   const ThreadPlanStepOverRange &
diff --git a/lldb/include/lldb/Target/ThreadPlanStepRange.h b/lldb/include/lldb/Target/ThreadPlanStepRange.h
index 2fe88527710006..ced84c03e83cf5 100644
--- a/lldb/include/lldb/Target/ThreadPlanStepRange.h
+++ b/lldb/include/lldb/Target/ThreadPlanStepRange.h
@@ -58,8 +58,13 @@ class ThreadPlanStepRange : public ThreadPlan {
   // run' plan, then just single step.
   bool SetNextBranchBreakpoint();
 
+  // Whether the input stop info is caused by the next branch breakpoint.
+  bool IsNextBranchBreakpointStop(lldb::StopInfoSP stop_info_sp);
+
   void ClearNextBranchBreakpoint();
 
+  void ClearNextBranchBreakpointExplainedStop();
+
   bool NextRangeBreakpointExplainsStop(lldb::StopInfoSP stop_info_sp);
 
   SymbolContext m_addr_context;
diff --git a/lldb/include/lldb/lldb-enumerations.h b/lldb/include/lldb/lldb-enumerations.h
index 15e45857186091..2000931913c81b 100644
--- a/lldb/include/lldb/lldb-enumerations.h
+++ b/lldb/include/lldb/lldb-enumerations.h
@@ -253,6 +253,7 @@ enum StopReason {
   eStopReasonFork,
   eStopReasonVFork,
   eStopReasonVForkDone,
+  eStopReasonInterrupt, ///< Thread requested interrupt
 };
 
 /// Command Return Status Types.
diff --git a/lldb/source/API/SBThread.cpp b/lldb/source/API/SBThread.cpp
index ac3e2cd25daa94..c6925096e3cb1a 100644
--- a/lldb/source/API/SBThread.cpp
+++ b/lldb/source/API/SBThread.cpp
@@ -192,6 +192,9 @@ size_t SBThread::GetStopReasonDataCount() {
         case eStopReasonSignal:
           return 1;
 
+        case eStopReasonInterrupt:
+          return 1;
+
         case eStopReasonException:
           return 1;
 
@@ -261,6 +264,9 @@ uint64_t SBThread::GetStopReasonDataAtIndex(uint32_t idx) {
         case eStopReasonSignal:
           return stop_info_sp->GetValue();
 
+        case eStopReasonInterrupt:
+          return stop_info_sp->GetValue();
+
         case eStopReasonException:
           return stop_info_sp->GetValue();
 
diff --git a/lldb/source/Interpreter/CommandInterpreter.cpp b/lldb/source/Interpreter/CommandInterpreter.cpp
index 4c58ecc3c1848f..d327822db7bd37 100644
--- a/lldb/source/Interpreter/CommandInterpreter.cpp
+++ b/lldb/source/Interpreter/CommandInterpreter.cpp
@@ -2500,7 +2500,7 @@ bool CommandInterpreter::DidProcessStopAbnormally() const {
     const StopReason reason = stop_info->GetStopReason();
     if (reason == eStopReasonException ||
         reason == eStopReasonInstrumentation ||
-        reason == eStopReasonProcessorTrace)
+        reason == eStopReasonProcessorTrace || reason == eStopReasonInterrupt)
       return true;
 
     if (reason == eStopReasonSignal) {
diff --git a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
index ae1a77e5be8321..b3c0359b7dcf7b 100644
--- a/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
@@ -714,6 +714,8 @@ static const char *GetStopReasonString(StopReason stop_reason) {
     return "vfork";
   case eStopReasonVForkDone:
     return "vforkdone";
+  case eStopReasonInterrupt:
+    return "async interrupt";
   case eStopReasonInstrumentation:
   case eStopReasonInvalid:
   case eStopReasonPlanComplete:
diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index 871683a605686f..b93aad8808f5bd 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -1730,14 +1730,24 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
       thread_sp = memory_thread_sp;
 
     if (exc_type != 0) {
-      const size_t exc_data_size = exc_data.size();
-
-      thread_sp->SetStopInfo(
-          StopInfoMachException::CreateStopReasonWithMachException(
-              *thread_sp, exc_type, exc_data_size,
-              exc_data_size >= 1 ? exc_data[0] : 0,
-              exc_data_size >= 2 ? exc_data[1] : 0,
-              exc_data_size >= 3 ? exc_data[2] : 0));
+      // For thread plan async interrupt, creating stop info on the
+      // original async interrupt request thread instead. If interrupt thread
+      // does not exist anymore we fallback to current signal receiving thread
+      // instead.
+      ThreadSP interrupt_thread;
+      if (m_interrupt_tid != LLDB_INVALID_THREAD_ID)
+        interrupt_thread = HandleThreadAsyncInterrupt(signo, description);
+      if (interrupt_thread)
+        thread_sp = interrupt_thread;
+      else {
+        const size_t exc_data_size = exc_data.size();
+        thread_sp->SetStopInfo(
+            StopInfoMachException::CreateStopReasonWithMachException(
+                *thread_sp, exc_type, exc_data_size,
+                exc_data_size >= 1 ? exc_data[0] : 0,
+                exc_data_size >= 2 ? exc_data[1] : 0,
+                exc_data_size >= 3 ? exc_data[2] : 0));
+      }
     } else {
       bool handled = false;
       bool did_exec = false;
@@ -1936,9 +1946,20 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
                   *thread_sp, signo, description.c_str()));
           }
         }
-        if (!handled)
-          thread_sp->SetStopInfo(StopInfo::CreateStopReasonWithSignal(
-              *thread_sp, signo, description.c_str()));
+        if (!handled) {
+          // For thread plan async interrupt, creating stop info on the
+          // original async interrupt request thread instead. If interrupt
+          // thread does not exist anymore we fallback to current signal
+          // receiving thread instead.
+          ThreadSP interrupt_thread;
+          if (m_interrupt_tid != LLDB_INVALID_THREAD_ID)
+            interrupt_thread = HandleThreadAsyncInterrupt(signo, description);
+          if (interrupt_thread)
+            thread_sp = interrupt_thread;
+          else
+            thread_sp->SetStopInfo(StopInfo::CreateStopReasonWithSignal(
+                *thread_sp, signo, description.c_str()));
+        }
       }
 
       if (!description.empty()) {
@@ -1957,6 +1978,24 @@ ThreadSP ProcessGDBRemote::SetThreadStopInfo(
   return thread_sp;
 }
 
+ThreadSP
+ProcessGDBRemote::HandleThreadAsyncInterrupt(uint8_t signo,
+                                             const std::string &description) {
+  ThreadSP thread_sp;
+  {
+    std::lock_guard<std::recursive_mutex> guard(m_thread_list_real.GetMutex());
+    thread_sp = m_thread_list_real.FindThreadByProtocolID(m_interrupt_tid,
+                                                          /*can_update=*/false);
+  }
+  if (thread_sp)
+    thread_sp->SetStopInfo(StopInfo::CreateStopReasonWithInterrupt(
+        *thread_sp, signo, description.c_str()));
+  // Clear m_interrupt_tid regardless we can find original interrupt thread or
+  // not.
+  m_interrupt_tid = LLDB_INVALID_THREAD_ID;
+  return thread_sp;
+}
+
 lldb::ThreadSP
 ProcessGDBRemote::SetThreadStopInfo(StructuredData::Dictionary *thread_dict) {
   static constexpr llvm::StringLiteral g_key_tid("tid");
diff --git a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
index 610a1ee0b34d2f..615831dd7e6f60 100644
--- a/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
+++ b/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h
@@ -440,6 +440,12 @@ class ProcessGDBRemote : public Process,
   void HandleStopReply() override;
   void HandleAsyncStructuredDataPacket(llvm::StringRef data) override;
 
+  // Handle thread specific async interrupt and return the original thread
+  // that requested the async interrupt. It can be null if original thread
+  // has exited.
+  lldb::ThreadSP HandleThreadAsyncInterrupt(uint8_t signo,
+                                            const std::string &description);
+
   void SetThreadPc(const lldb::ThreadSP &thread_sp, uint64_t index);
   using ModuleCacheKey = std::pair<std::string, std::string>;
   // KeyInfo for the cached module spec DenseMap.
diff --git a/lldb/source/Target/CMakeLists.txt b/lldb/source/Target/CMakeLists.txt
index cf4818eae3eb8b..d1209d29245d54 100644
--- a/lldb/source/Target/CMakeLists.txt
+++ b/lldb/source/Target/CMakeLists.txt
@@ -59,6 +59,7 @@ add_lldb_library(lldbTarget
   ThreadPlanCallUserExpression.cpp
   ThreadPlanPython.cpp
   ThreadPlanRunToAddress.cpp
+  ThreadPlanSingleThreadTimeout.cpp
   ThreadPlanShouldStopHere.cpp
   ThreadPlanStepInRange.cpp
   ThreadPlanStepInstruction.cpp
diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index 30c240b064b59c..3e5e4fe58b4b3b 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -446,7 +446,8 @@ Process::Process(lldb::TargetSP target_sp, ListenerSP listener_sp,
       m_memory_cache(*this), m_allocated_memory_cache(*this),
       m_should_detach(false), m_next_event_action_up(), m_public_run_lock(),
       m_private_run_lock(), m_currently_handling_do_on_removals(false),
-      m_resume_requested(false), m_finalizing(false), m_destructing(false),
+      m_resume_requested(false), m_interrupt_tid(LLDB_INVALID_THREAD_ID),
+      m_finalizing(false), m_destructing(false),
       m_clear_thread_plans_on_stop(false), m_force_next_event_delivery(false),
       m_last_broadcast_state(eStateInvalid), m_destroy_in_process(false),
       m_can_interpret_function_calls(false), m_run_thread_plan_lock(),
@@ -863,6 +864,7 @@ bool Process::HandleProcessStateChangedEvent(
             case eStopReasonThreadExiting:
             case eStopReasonInstrumentation:
             case eStopReasonProcessorTrace:
+            case eStopReasonInterrupt:
               if (!other_thread)
                 other_thread = thread;
               break;
@@ -3679,7 +3681,13 @@ void Process::ControlPrivateStateThread(uint32_t signal) {
   }
 }
 
-void Process::SendAsyncInterrupt() {
+void Process::SendAsyncInterrupt() { SendAsyncInterrupt(nullptr); }
+
+void Process::SendAsyncInterrupt(Thread *thread) {
+  if (thread != nullptr)
+    m_interrupt_tid = thread->GetProtocolID();
+  else
+    m_interrupt_tid = LLDB_INVALID_THREAD_ID;
   if (PrivateStateThreadIsValid())
     m_private_state_broadcaster.BroadcastEvent(Process::eBroadcastBitInterrupt,
                                                nullptr);
@@ -3905,9 +3913,14 @@ thread_result_t Process::RunPrivateStateThread(bool is_secondary_thread) {
 
       if (interrupt_requested) {
         if (StateIsStoppedState(internal_state, true)) {
-          // We requested the interrupt, so mark this as such in the stop event
-          // so clients can tell an interrupted process from a natural stop
-          ProcessEventData::SetInterruptedInEvent(event_sp.get(), true);
+          // Oly mark interrupt event if it is not thread specific async
+          // interrupt.
+          if (m_interrupt_tid == LLDB_INVALID_THREAD_ID) {
+            // We requested the interrupt, so mark this as such in the stop
+            // event so clients can tell an interrupted process from ...
[truncated]

jeffreytan81 · 2024-05-10T16:38:24Z

@jimingham, friendly ping. @clayborg mentioned that you are the sole domain expert on ThreadPlan. Can you help to review this change? Thanks

jimingham · 2024-05-10T17:36:02Z

Thanks for working up this patch. I should be able to get time to look at this next week.

jimingham · 2024-05-16T00:52:02Z

lldb/include/lldb/Target/Process.h


  void SendAsyncInterrupt();

+  // Send an async interrupt and receive stop from a specific /p thread.
+  void SendAsyncInterrupt(Thread *thread);


Do we need two API's here? Why not have thread default to nullptr and document that that means we let the system choose which thread gets the interrupt? It's also worth noting here that the "receive stop from specific thread" is artificial, so you might say something like "and attribute the stop to a specific thread" instead to make it clear we don't require the ability to actually have that thread receive the signal.

I chose overloading because I remember reading some guideline to prefer overloading than default parameter. Google finds one:
https://stackoverflow.com/questions/15876590/overload-a-method-or-use-default-values-c

But I do not care that much really, so changed and made the comment more clear.

jimingham · 2024-05-16T00:55:44Z

lldb/include/lldb/Target/ThreadPlan.h

@@ -590,6 +591,7 @@ class ThreadPlanNull : public ThreadPlan {
  const Status &GetStatus() { return m_status; }

 protected:
+  friend class ThreadPlanSingleThreadTimeout;


What is this needed for? It's a little odd having a derived class friended to the base class, since you could make whatever the derived class needed protected rather than private.

Hmm, I thought there is a reason I can't remember, apparently not. Removed. Thanks.

jimingham · 2024-05-16T00:56:36Z

lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h

+//
+// Thread plan used by single thread execution to issue timeout. This is useful
+// to detect potential deadlock in single thread execution. The timeout measures
+// the elapsed time from the last internal stop and got reset by each internal


"gets reset by each internal stop"

jimingham · 2024-05-16T00:57:13Z

lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h

+// to detect potential deadlock in single thread execution. The timeout measures
+// the elapsed time from the last internal stop and got reset by each internal
+// stops to ensure we are accurately detecting execution not moving forward.
+// This means this thread plan  can be created/destroyed multiple times by the


"may" is better than "can" and there's one too many spaces.

jimingham · 2024-05-16T01:02:06Z

lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h

+    Done,           // Finished resume all threads.
+  };
+
+  static std::mutex s_mutex;


I'll have to see how you use this as I go along, but having statics here worries me. How are you going to support the case where lldb has two targets, each of which are running using this timeout feature?

This is a good point. I was storing some static states so that we can ensure:

Singleton (only one ThreadPlanSingleThreadTimeout is alive/used anytime).

If there is ever a race that, the ThreadPlanSingleThreadTimeout is in State::AsyncInterrupt but it got popped by another internal stop before async interrupt received. I want to remember the previous/last state, and the newly created ThreadPlanSingleThreadTimeout can be re-created from previous state.

Instead, I will look into storing these static states into parent controlling ThreadPlan (ThreadPlanStepOverRange in this case).

That would be better, but if you find you do need a singleton, you could also store that either in the Thread or the Process, depending on the scope at which you need them.

jeffreytan81 · 2024-05-21T13:19:33Z

@jimingham, I have revised the PR, let me know when you can review again. Thanks.

jimingham · 2024-05-22T21:25:28Z

lldb/include/lldb/Target/Process.h

-  void SendAsyncInterrupt();
+  /// Send an async interrupt request.
+  ///
+  /// If \a thread is specified the async interrupt stop will be attributed the


"will be attributed to the" - missing "to"

jimingham · 2024-05-22T21:26:17Z

lldb/include/lldb/Target/Process.h

+  /// specified thread.
+  ///
+  /// \param[in] thread
+  ///     The thread from which to attribute the async interrupt stop to.


"The thread the async interrupt will be attributed to"

jimingham · 2024-05-23T00:20:50Z

lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h

+
+  ~ThreadPlanSingleThreadTimeout() override;
+
+  // Create a new instance from fresh new state.


Since this API doesn't return the Created thread plan, then this API has to do more than just create it - it also pushes the plan, and sometimes doesn't. So CreateNew is not a good name, and the description of what the function does is too abbreviated.

jimingham · 2024-05-23T00:25:41Z

lldb/include/lldb/Target/ThreadPlanStepOverRange.h

@@ -42,8 +45,13 @@ class ThreadPlanStepOverRange : public ThreadPlanStepRange,

  void SetupAvoidNoDebug(LazyBool step_out_avoids_code_without_debug_info);
  bool IsEquivalentContext(const SymbolContext &context);
+  // Clear and create a new ThreadPlanSingleThreadTimeout to detect if program


ThreadPlanStepOverRange isn't the only thread plan that we want to have this behavior for. For instance, it would be great to do this for ThreadPlanStepOut, which we don't even try to run on one thread, IIRC, and in order not to duplicate code, we will want to switch the ThreadPlanCallUserExpression over to this method as well. So the code to handle the interrupt has to be shareable - maybe a mix-in class that does the management of the interrupt?

jimingham · 2024-05-23T00:29:42Z

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.h

+  // Handle thread specific async interrupt and return the original thread
+  // that requested the async interrupt. It can be null if original thread
+  // has exited.
+  lldb::ThreadSP HandleThreadAsyncInterrupt(uint8_t signo,


I can't see any reason why the ability to manage this interrupt event is GDBRemote specific. So this should be a virtual method in Process, and overridden here.

jimingham · 2024-05-23T00:30:28Z

lldb/source/Target/Process.cpp

-          // We requested the interrupt, so mark this as such in the stop event
-          // so clients can tell an interrupted process from a natural stop
-          ProcessEventData::SetInterruptedInEvent(event_sp.get(), true);
+          // Oly mark interrupt event if it is not thread specific async


jimingham · 2024-05-23T00:33:14Z

lldb/source/Target/StopInfo.cpp

+  bool ShouldStop(Event *event_ptr) override {
+    ThreadSP thread_sp(m_thread_wp.lock());
+    if (thread_sp)
+      return thread_sp->GetProcess()->GetUnixSignals()->GetShouldStop(m_value);


I don't think this is right. This is a signal that you are using to implement your async interrupt. So you shouldn't be governed by whether the user would or would not want to stop for that signal if it had been generated by some other means.

jeffreytan81 · 2024-06-06T15:26:45Z

@jimingham , let me know any other comments/thoughts you have.

jimingham · 2024-06-06T16:54:12Z

I will get to this, WWDC is close upon us so we're all rather busy at present. Jim

…

On Jun 6, 2024, at 8:27 AM, jeffreytan81 ***@***.***> wrote: @jimingham <https://github.com/jimingham> , let me know any other comments/thoughts you have. — Reply to this email directly, view it on GitHub <#90930 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW5QZIZR4GYAKULSRITZGB5U3AVCNFSM6AAAAABHEXT7I6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJSHAYTIOJZGM>. You are receiving this because you were mentioned.

jeffreytan81 · 2024-06-19T00:35:55Z

@jimingham, hope WWDC is going well. Do you have time to review this now? Thanks.

jimingham · 2024-06-27T18:34:30Z

lldb/include/lldb/Target/Process.h

@@ -2838,6 +2844,14 @@ void PruneThreadPlans();
    return std::nullopt;
  }

+  /// Handle thread specific async interrupt and return the original thread
+  /// that requested the async interrupt. It can be null if original thread
+  /// has exited.


You should say what the description string is for.

jimingham · 2024-06-27T20:44:54Z

lldb/include/lldb/Target/TimeoutResumeAll.h

+
+namespace lldb_private {
+
+// Mixin class that provides capability for ThreadPlan that supports single


Grammar:

// Mixin class that provides the capability for ThreadPlan to support single
// thread execution that resumes all threads after a timeout.

jimingham · 2024-06-28T22:07:42Z

lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h

+// This means this thread plan may be created/destroyed multiple times by the
+// parent execution plan.
+//
+// When timeout happens, the thread plan resolves the potential deadlock by


"When a timeout happens"

jimingham · 2024-06-28T22:08:09Z

lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h

+//
+// When timeout happens, the thread plan resolves the potential deadlock by
+// issuing a thread specific async interrupt to enter stop state, then all
+// threads execution are resumed to resolve the potential deadlock.


"then execution is resumed with all threads running to resolve the potential deadlock"

jimingham · 2024-06-28T22:16:17Z

lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h

+  };
+
+public:
+  struct TimeoutInfo {


This isn't something you need to do in this patch, but it would be good to allow different thread plan invocations to use different timeouts, and not just depend on the thread timeout. For instance, we should be able to reimplement RunThreadPlan with this infrastructure. But the timeouts you want for expression evaluation and stepping are likely to be very different (and different executions also use different timeouts). So being able to pass in the timeout, rather than relying on one thread specific one will be necessary. We could for instance add the timeout to the TimeoutInfo and plumb that so that the ThreadPlan that mixes in TimeoutResumeAll can pass in a timeout.

jimingham · 2024-06-28T22:17:32Z

lldb/include/lldb/Target/ThreadPlanSingleThreadTimeout.h

+  // If input \param thread is running in single thread mode, push a
+  // new ThreadPlanSingleThreadTimeout based on timeout setting from fresh new
+  // state. The reference of \param info is passed in so that when
+  // ThreadPlanSingleThreadTimeout got popped out its last state can be stored


"popped out" is confusing. We just say "popped" for the most part.

jimingham · 2024-06-28T22:21:13Z