[lldb/linux] Make sure the process continues running after a detach #88494

labath · 2024-04-12T10:05:35Z

Whenever an inferior thread stops, lldb-server sends a SIGSTOP to all other threads in the process to force them to stop as well. If those threads stop on their own before they get a signal, this SIGSTOP will remain pending and be delivered the next time the process resumes.

Normally, this is not a problem, because lldb-server will detect this stale SIGSTOP and resume the process. However, if we detach from the process while it has these SIGSTOPs pending, they will get immediately delivered, and the process will remain stopped (most likely forever).

This patch fixes that by sending a SIGCONT just before detaching from the process. This signal cancels out any pending SIGSTOPs, and ensures it is able to run after we detach. It does have one somewhat unfortunate side-effect that in that the process's SIGCONT handler (if it has one) will get executed spuriously (from the process's POV).

This could be sometimes avoided by tracking which threads got send a SIGSTOP, and whether those threads stopped due to it. From what I could tell by observing its behavior, this is what gdb does. I have not tried to replicate that behavior here because it adds a nontrivial amount of complexity and the result is still uncertain -- we still need to send a SIGCONT (and execute the handler) when any thread stops for some other reason (and leaves our SIGSTOP hanging). Furthermore, since SIGSTOPs don't stack, it's also possible that our SIGSTOP/SIGCONT combination will cancel a genuine SIGSTOP being sent to the debugger application (by someone else), and there is nothing we can do about that. For this reason I think it's simplest and most predictible to just always send a SIGCONT when detaching, but if it turns out this is breaking something, we can consider implementing something more elaborate.

One alternative I did try is to use PTRACE_INTERRUPT to suspend the threads instead of a SIGSTOP. PTRACE_INTERUPT requires using PTRACE_SEIZE to attach to the process, which also made this solution somewhat complicated, but the main problem with that approach is that PTRACE_INTERRUPT is not considered to be a signal-delivery-stop, which means it's not possible to resume it while injecting another signal to the inferior (which some of our tests expect to be able to do). This limitation could be worked around by forcing the thread into a signal delivery stop whenever we need to do this, but this additional complication is what made me think this approach is also not worthwhile.

This patch should fix (at least some of) the problems with TestConcurrentVFork, but I've also added a dedicated test for checking that a process keeps running after we detach. Although the problem I'm fixing here is linux-specific, the core functinoality of not stopping after a detach should function the same way everywhere.

llvmbot · 2024-04-12T10:06:08Z

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

Whenever an inferior thread stops, lldb-server sends a SIGSTOP to all other threads in the process to force them to stop as well. If those threads stop on their own before they get a signal, this SIGSTOP will remain pending and be delivered the next time the process resumes.

Normally, this is not a problem, because lldb-server will detect this stale SIGSTOP and resume the process. However, if we detach from the process while it has these SIGSTOPs pending, they will get immediately delivered, and the process will remain stopped (most likely forever).

This patch fixes that by sending a SIGCONT just before detaching from the process. This signal cancels out any pending SIGSTOPs, and ensures it is able to run after we detach. It does have one somewhat unfortunate side-effect that in that the process's SIGCONT handler (if it has one) will get executed spuriously (from the process's POV).

This could be sometimes avoided by tracking which threads got send a SIGSTOP, and whether those threads stopped due to it. From what I could tell by observing its behavior, this is what gdb does. I have not tried to replicate that behavior here because it adds a nontrivial amount of complexity and the result is still uncertain -- we still need to send a SIGCONT (and execute the handler) when any thread stops for some other reason (and leaves our SIGSTOP hanging). Furthermore, since SIGSTOPs don't stack, it's also possible that our SIGSTOP/SIGCONT combination will cancel a genuine SIGSTOP being sent to the debugger application (by someone else), and there is nothing we can do about that. For this reason I think it's simplest and most predictible to just always send a SIGCONT when detaching, but if it turns out this is breaking something, we can consider implementing something more elaborate.

One alternative I did try is to use PTRACE_INTERRUPT to suspend the threads instead of a SIGSTOP. PTRACE_INTERUPT requires using PTRACE_SEIZE to attach to the process, which also made this solution somewhat complicated, but the main problem with that approach is that PTRACE_INTERRUPT is not considered to be a signal-delivery-stop, which means it's not possible to resume it while injecting another signal to the inferior (which some of our tests expect to be able to do). This limitation could be worked around by forcing the thread into a signal delivery stop whenever we need to do this, but this additional complication is what made me think this approach is also not worthwhile.

This patch should fix (at least some of) the problems with TestConcurrentVFork, but I've also added a dedicated test for checking that a process keeps running after we detach. Although the problem I'm fixing here is linux-specific, the core functinoality of not stopping after a detach should function the same way everywhere.

Full diff: https://github.com/llvm/llvm-project/pull/88494.diff

5 Files Affected:

(modified) lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp (+4)
(added) lldb/test/API/commands/process/detach-resumes/Makefile (+4)
(added) lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py (+57)
(added) lldb/test/API/commands/process/detach-resumes/main.cpp (+46)
(modified) lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py (-16)

diff --git a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
index 5d2b4b03fe60cb..59fc8726b76739 100644
--- a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
+++ b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
@@ -1089,6 +1089,10 @@ Status NativeProcessLinux::Detach() {
   if (GetID() == LLDB_INVALID_PROCESS_ID)
     return error;
 
+  // Cancel out any SIGSTOPs we may have sent while stopping the process.
+  // Otherwise, the process may stop as soon as we detach from it.
+  kill(GetID(), SIGCONT);
+
   for (const auto &thread : m_threads) {
     Status e = Detach(thread->GetID());
     if (e.Fail())
diff --git a/lldb/test/API/commands/process/detach-resumes/Makefile b/lldb/test/API/commands/process/detach-resumes/Makefile
new file mode 100644
index 00000000000000..c46619c6623481
--- /dev/null
+++ b/lldb/test/API/commands/process/detach-resumes/Makefile
@@ -0,0 +1,4 @@
+CXX_SOURCES := main.cpp
+ENABLE_THREADS := YES
+
+include Makefile.rules
diff --git a/lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py b/lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py
new file mode 100644
index 00000000000000..ab2ed8a6d24c85
--- /dev/null
+++ b/lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py
@@ -0,0 +1,57 @@
+"""
+Test that the process continues running after we detach from it.
+"""
+
+
+import lldb
+import time
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+
+class DetachResumesTestCase(TestBase):
+    NO_DEBUG_INFO_TESTCASE = True
+
+    def test_detach_resumes(self):
+        self.build()
+        exe = self.getBuildArtifact()
+
+        # The inferior will use this file to let us know it is ready to be
+        # attached.
+        sync_file_path = lldbutil.append_to_process_working_directory(
+            self, "sync_file_%d" % (int(time.time()))
+        )
+
+        # And this one to let us know it is running after we've detached from
+        # it.
+        exit_file_path = lldbutil.append_to_process_working_directory(
+            self, "exit_file_%d" % (int(time.time()))
+        )
+
+        popen = self.spawnSubprocess(self.getBuildArtifact(exe), [sync_file_path, exit_file_path])
+        lldbutil.wait_for_file_on_target(self, sync_file_path)
+
+        self.runCmd("process attach -p " + str(popen.pid))
+
+        # Set a breakpoint at a place that will be called by multiple threads
+        # simultaneously. On systems (e.g. linux) where the debugger needs to
+        # send signals to suspend threads, these signals will race with threads
+        # hitting the breakpoint (and stopping on their own).
+        bpno = lldbutil.run_break_set_by_symbol(self, "break_here")
+
+        # And let the inferior know it can call the function.
+        self.runCmd("expr -- wait_for_debugger_flag = false")
+
+        self.runCmd("continue")
+
+        self.expect(
+            "thread list",
+            STOPPED_DUE_TO_BREAKPOINT,
+            substrs=["stopped", "stop reason = breakpoint"],
+        )
+
+        # Detach, the process should keep running after this, and not be stopped
+        # by the signals that the debugger may have used to suspend the threads.
+        self.runCmd("detach")
+
+        lldbutil.wait_for_file_on_target(self, exit_file_path)
diff --git a/lldb/test/API/commands/process/detach-resumes/main.cpp b/lldb/test/API/commands/process/detach-resumes/main.cpp
new file mode 100644
index 00000000000000..728ebbff586c36
--- /dev/null
+++ b/lldb/test/API/commands/process/detach-resumes/main.cpp
@@ -0,0 +1,46 @@
+#include <stdio.h>
+#include <fcntl.h>
+#include <chrono>
+#include <thread>
+#include "pseudo_barrier.h"
+#include <fstream>
+#include <vector>
+
+pseudo_barrier_t barrier;
+
+constexpr size_t nthreads = 5;
+volatile bool wait_for_debugger_flag = true;
+
+void break_here() {}
+
+void tfunc() {
+  pseudo_barrier_wait(barrier);
+
+  break_here();
+}
+
+int main(int argc, char const *argv[])
+{
+  lldb_enable_attach();
+
+  if (argc<3) return 1;
+
+  // Create a file to signal that this process has started up.
+  std::ofstream(argv[1]).close();
+
+  // And wait for it to attach.
+  for (int i = 0; i < 100 && wait_for_debugger_flag; ++i)
+    std::this_thread::sleep_for(std::chrono::seconds(1));
+
+  // Fire up the threads and have them call break_here() simultaneously.
+  pseudo_barrier_init(barrier, nthreads);
+  std::vector<std::thread> threads;
+  for(size_t i = 0; i < nthreads; ++i) threads.emplace_back(tfunc);
+
+  for(std::thread &t: threads) t.join();
+
+  // Create the file to let the debugger know we're running.
+  std::ofstream(argv[2]).close();
+
+  return 0;
+}
diff --git a/lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py b/lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py
index 1790bd497f4e6b..2dcbb728549fb4 100644
--- a/lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py
+++ b/lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py
@@ -48,8 +48,6 @@ def follow_child_helper(self, use_fork, call_exec):
         self.expect("continue", patterns=[r"exited with status = 1[0-4]"])
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_parent_vfork_no_exec(self):
         """
         Make sure that debugging concurrent vfork() from multiple threads won't crash lldb during follow-parent.
@@ -58,8 +56,6 @@ def test_follow_parent_vfork_no_exec(self):
         self.follow_parent_helper(use_fork=False, call_exec=False)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_parent_fork_no_exec(self):
         """
         Make sure that debugging concurrent fork() from multiple threads won't crash lldb during follow-parent.
@@ -68,8 +64,6 @@ def test_follow_parent_fork_no_exec(self):
         self.follow_parent_helper(use_fork=True, call_exec=False)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_parent_vfork_call_exec(self):
         """
         Make sure that debugging concurrent vfork() from multiple threads won't crash lldb during follow-parent.
@@ -78,8 +72,6 @@ def test_follow_parent_vfork_call_exec(self):
         self.follow_parent_helper(use_fork=False, call_exec=True)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_parent_fork_call_exec(self):
         """
         Make sure that debugging concurrent vfork() from multiple threads won't crash lldb during follow-parent.
@@ -88,8 +80,6 @@ def test_follow_parent_fork_call_exec(self):
         self.follow_parent_helper(use_fork=True, call_exec=True)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_child_vfork_no_exec(self):
         """
         Make sure that debugging concurrent vfork() from multiple threads won't crash lldb during follow-child.
@@ -98,8 +88,6 @@ def test_follow_child_vfork_no_exec(self):
         self.follow_child_helper(use_fork=False, call_exec=False)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_child_fork_no_exec(self):
         """
         Make sure that debugging concurrent fork() from multiple threads won't crash lldb during follow-child.
@@ -108,8 +96,6 @@ def test_follow_child_fork_no_exec(self):
         self.follow_child_helper(use_fork=True, call_exec=False)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_child_vfork_call_exec(self):
         """
         Make sure that debugging concurrent vfork() from multiple threads won't crash lldb during follow-child.
@@ -118,8 +104,6 @@ def test_follow_child_vfork_call_exec(self):
         self.follow_child_helper(use_fork=False, call_exec=True)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_child_fork_call_exec(self):
         """
         Make sure that debugging concurrent fork() from multiple threads won't crash lldb during follow-child.

github-actions · 2024-04-12T10:10:45Z

✅ With the latest revision this PR passed the C/C++ code formatter.

github-actions · 2024-04-12T10:10:46Z

✅ With the latest revision this PR passed the Python code formatter.

Whenever an inferior thread stops, lldb-server sends a SIGSTOP to all other threads in the process to force them to stop as well. If those threads stop on their own before they get a signal, this SIGSTOP will remain pending and be delivered the next time the process resumes. Normally, this is not a problem, because lldb-server will detect this stale SIGSTOP and resume the process. However, if we detach from the process while it has these SIGSTOPs pending, they will get immediately delivered, and the process will remain stopped (most likely forever). This patch fixes that by sending a SIGCONT just before detaching from the process. This signal cancels out any pending SIGSTOPs, and ensures it is able to run after we detach. It does have one somewhat unfortunate side-effect that in that the process's SIGCONT handler (if it has one) will get executed spuriously (from the process's POV). This could be _sometimes_ avoided by tracking which threads got send a SIGSTOP, and whether those threads stopped due to it. From what I could tell by observing its behavior, this is what gdb does. I have not tried to replicate that behavior here because it adds a nontrivial amount of complexity and the result is still uncertain -- we still need to send a SIGCONT (and execute the handler) when any thread stops for some other reason (and leaves our SIGSTOP hanging). Furthermore, since SIGSTOPs don't stack, it's also possible that our SIGSTOP/SIGCONT combination will cancel a genuine SIGSTOP being sent to the debugger application (by someone else), and there is nothing we can do about that. For this reason I think it's simplest and most predictible to just always send a SIGCONT when detaching, but if it turns out this is breaking something, we can consider implementing something more elaborate. One alternative I did try is to use PTRACE_INTERRUPT to suspend the threads instead of a SIGSTOP. PTRACE_INTERUPT requires using PTRACE_SEIZE to attach to the process, which also made this solution somewhat complicated, but the main problem with that approach is that PTRACE_INTERRUPT is not considered to be a signal-delivery-stop, which means it's not possible to resume it while injecting another signal to the inferior (which some of our tests expect to be able to do). This limitation could be worked around by forcing the thread into a signal delivery stop whenever we need to do this, but this additional complication is what made me think this approach is also not worthwhile. This patch should fix (at least some of) the problems with TestConcurrentVFork, but I've also added a dedicated test for checking that a process keeps running after we detach. Although the problem I'm fixing here is linux-specific, the core functinoality of not stopping after a detach should function the same way everywhere.

DavidSpickett · 2024-04-16T14:24:57Z

Will fix #85084.

lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py

DavidSpickett · 2024-04-16T14:26:36Z

Thanks for looking into this. I had no idea what it might be but your explanation makes a lot of sense.

jeffreytan81 · 2024-04-16T16:25:59Z

lldb/test/API/commands/process/detach-resumes/main.cpp

+}
+
+int main(int argc, char const *argv[]) {
+  lldb_enable_attach();


Just curious, why this is needed?
Is it that some special OS may restrict the default attaching behavior for security reason unless lldb_enable_attach() is explicitly called?

Yes, but it's not really a "special" OS. Most linux systems are configured like that by default, where you're only allowed to debug your own children (unless they explicitly allow that, or you're root, etc.). Check out https://www.kernel.org/doc/html/v4.15/admin-guide/LSM/Yama.html. lldb_enable_attach is our own invention, which calls the appropriate os interface to enable attaching if necessary. Currently, it's only implemented on linux.

DavidSpickett

LGTM

labath requested review from DavidSpickett and jeffreytan81 April 12, 2024 10:05

labath requested a review from JDevlieghere as a code owner April 12, 2024 10:05

llvmbot added the lldb label Apr 12, 2024

labath force-pushed the sigcont branch from 9e1aca0 to 8a9837b Compare April 12, 2024 11:33

labath force-pushed the sigcont branch from 8a9837b to c6b2c5e Compare April 12, 2024 12:40

DavidSpickett reviewed Apr 16, 2024

View reviewed changes

lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py Show resolved Hide resolved

jeffreytan81 approved these changes Apr 16, 2024

View reviewed changes

DavidSpickett approved these changes Apr 17, 2024

View reviewed changes

labath merged commit 5f3e106 into llvm:main Apr 17, 2024
4 checks passed

labath deleted the sigcont branch April 17, 2024 11:05

labath mentioned this pull request Apr 17, 2024

TestDetachResumes fails on windows (arm?) #89077

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lldb/linux] Make sure the process continues running after a detach #88494

[lldb/linux] Make sure the process continues running after a detach #88494

labath commented Apr 12, 2024 •

edited by DavidSpickett

Loading

llvmbot commented Apr 12, 2024

github-actions bot commented Apr 12, 2024 •

edited

Loading

github-actions bot commented Apr 12, 2024 •

edited

Loading

DavidSpickett commented Apr 16, 2024

DavidSpickett commented Apr 16, 2024

jeffreytan81 Apr 16, 2024

labath Apr 16, 2024

DavidSpickett left a comment

[lldb/linux] Make sure the process continues running after a detach #88494

[lldb/linux] Make sure the process continues running after a detach #88494

Conversation

labath commented Apr 12, 2024 • edited by DavidSpickett Loading

llvmbot commented Apr 12, 2024

github-actions bot commented Apr 12, 2024 • edited Loading

github-actions bot commented Apr 12, 2024 • edited Loading

DavidSpickett commented Apr 16, 2024

DavidSpickett commented Apr 16, 2024

jeffreytan81 Apr 16, 2024

Choose a reason for hiding this comment

labath Apr 16, 2024

Choose a reason for hiding this comment

DavidSpickett left a comment

Choose a reason for hiding this comment

labath commented Apr 12, 2024 •

edited by DavidSpickett

Loading

github-actions bot commented Apr 12, 2024 •

edited

Loading

github-actions bot commented Apr 12, 2024 •

edited

Loading