Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb/linux] Make sure the process continues running after a detach #88494

Merged
merged 1 commit into from
Apr 17, 2024

Conversation

labath
Copy link
Collaborator

@labath labath commented Apr 12, 2024

Fixes #85084

Whenever an inferior thread stops, lldb-server sends a SIGSTOP to all other threads in the process to force them to stop as well. If those threads stop on their own before they get a signal, this SIGSTOP will remain pending and be delivered the next time the process resumes.

Normally, this is not a problem, because lldb-server will detect this stale SIGSTOP and resume the process. However, if we detach from the process while it has these SIGSTOPs pending, they will get immediately delivered, and the process will remain stopped (most likely forever).

This patch fixes that by sending a SIGCONT just before detaching from the process. This signal cancels out any pending SIGSTOPs, and ensures it is able to run after we detach. It does have one somewhat unfortunate side-effect that in that the process's SIGCONT handler (if it has one) will get executed spuriously (from the process's POV).

This could be sometimes avoided by tracking which threads got send a SIGSTOP, and whether those threads stopped due to it. From what I could tell by observing its behavior, this is what gdb does. I have not tried to replicate that behavior here because it adds a nontrivial amount of complexity and the result is still uncertain -- we still need to send a SIGCONT (and execute the handler) when any thread stops for some other reason (and leaves our SIGSTOP hanging). Furthermore, since SIGSTOPs don't stack, it's also possible that our SIGSTOP/SIGCONT combination will cancel a genuine SIGSTOP being sent to the debugger application (by someone else), and there is nothing we can do about that. For this reason I think it's simplest and most predictible to just always send a SIGCONT when detaching, but if it turns out this is breaking something, we can consider implementing something more elaborate.

One alternative I did try is to use PTRACE_INTERRUPT to suspend the threads instead of a SIGSTOP. PTRACE_INTERUPT requires using PTRACE_SEIZE to attach to the process, which also made this solution somewhat complicated, but the main problem with that approach is that PTRACE_INTERRUPT is not considered to be a signal-delivery-stop, which means it's not possible to resume it while injecting another signal to the inferior (which some of our tests expect to be able to do). This limitation could be worked around by forcing the thread into a signal delivery stop whenever we need to do this, but this additional complication is what made me think this approach is also not worthwhile.

This patch should fix (at least some of) the problems with TestConcurrentVFork, but I've also added a dedicated test for checking that a process keeps running after we detach. Although the problem I'm fixing here is linux-specific, the core functinoality of not stopping after a detach should function the same way everywhere.

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 12, 2024

@llvm/pr-subscribers-lldb

Author: Pavel Labath (labath)

Changes

Whenever an inferior thread stops, lldb-server sends a SIGSTOP to all other threads in the process to force them to stop as well. If those threads stop on their own before they get a signal, this SIGSTOP will remain pending and be delivered the next time the process resumes.

Normally, this is not a problem, because lldb-server will detect this stale SIGSTOP and resume the process. However, if we detach from the process while it has these SIGSTOPs pending, they will get immediately delivered, and the process will remain stopped (most likely forever).

This patch fixes that by sending a SIGCONT just before detaching from the process. This signal cancels out any pending SIGSTOPs, and ensures it is able to run after we detach. It does have one somewhat unfortunate side-effect that in that the process's SIGCONT handler (if it has one) will get executed spuriously (from the process's POV).

This could be sometimes avoided by tracking which threads got send a SIGSTOP, and whether those threads stopped due to it. From what I could tell by observing its behavior, this is what gdb does. I have not tried to replicate that behavior here because it adds a nontrivial amount of complexity and the result is still uncertain -- we still need to send a SIGCONT (and execute the handler) when any thread stops for some other reason (and leaves our SIGSTOP hanging). Furthermore, since SIGSTOPs don't stack, it's also possible that our SIGSTOP/SIGCONT combination will cancel a genuine SIGSTOP being sent to the debugger application (by someone else), and there is nothing we can do about that. For this reason I think it's simplest and most predictible to just always send a SIGCONT when detaching, but if it turns out this is breaking something, we can consider implementing something more elaborate.

One alternative I did try is to use PTRACE_INTERRUPT to suspend the threads instead of a SIGSTOP. PTRACE_INTERUPT requires using PTRACE_SEIZE to attach to the process, which also made this solution somewhat complicated, but the main problem with that approach is that PTRACE_INTERRUPT is not considered to be a signal-delivery-stop, which means it's not possible to resume it while injecting another signal to the inferior (which some of our tests expect to be able to do). This limitation could be worked around by forcing the thread into a signal delivery stop whenever we need to do this, but this additional complication is what made me think this approach is also not worthwhile.

This patch should fix (at least some of) the problems with TestConcurrentVFork, but I've also added a dedicated test for checking that a process keeps running after we detach. Although the problem I'm fixing here is linux-specific, the core functinoality of not stopping after a detach should function the same way everywhere.


Full diff: https://github.com/llvm/llvm-project/pull/88494.diff

5 Files Affected:

  • (modified) lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp (+4)
  • (added) lldb/test/API/commands/process/detach-resumes/Makefile (+4)
  • (added) lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py (+57)
  • (added) lldb/test/API/commands/process/detach-resumes/main.cpp (+46)
  • (modified) lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py (-16)
diff --git a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
index 5d2b4b03fe60cb..59fc8726b76739 100644
--- a/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
+++ b/lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
@@ -1089,6 +1089,10 @@ Status NativeProcessLinux::Detach() {
   if (GetID() == LLDB_INVALID_PROCESS_ID)
     return error;
 
+  // Cancel out any SIGSTOPs we may have sent while stopping the process.
+  // Otherwise, the process may stop as soon as we detach from it.
+  kill(GetID(), SIGCONT);
+
   for (const auto &thread : m_threads) {
     Status e = Detach(thread->GetID());
     if (e.Fail())
diff --git a/lldb/test/API/commands/process/detach-resumes/Makefile b/lldb/test/API/commands/process/detach-resumes/Makefile
new file mode 100644
index 00000000000000..c46619c6623481
--- /dev/null
+++ b/lldb/test/API/commands/process/detach-resumes/Makefile
@@ -0,0 +1,4 @@
+CXX_SOURCES := main.cpp
+ENABLE_THREADS := YES
+
+include Makefile.rules
diff --git a/lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py b/lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py
new file mode 100644
index 00000000000000..ab2ed8a6d24c85
--- /dev/null
+++ b/lldb/test/API/commands/process/detach-resumes/TestDetachResumes.py
@@ -0,0 +1,57 @@
+"""
+Test that the process continues running after we detach from it.
+"""
+
+
+import lldb
+import time
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+
+class DetachResumesTestCase(TestBase):
+    NO_DEBUG_INFO_TESTCASE = True
+
+    def test_detach_resumes(self):
+        self.build()
+        exe = self.getBuildArtifact()
+
+        # The inferior will use this file to let us know it is ready to be
+        # attached.
+        sync_file_path = lldbutil.append_to_process_working_directory(
+            self, "sync_file_%d" % (int(time.time()))
+        )
+
+        # And this one to let us know it is running after we've detached from
+        # it.
+        exit_file_path = lldbutil.append_to_process_working_directory(
+            self, "exit_file_%d" % (int(time.time()))
+        )
+
+        popen = self.spawnSubprocess(self.getBuildArtifact(exe), [sync_file_path, exit_file_path])
+        lldbutil.wait_for_file_on_target(self, sync_file_path)
+
+        self.runCmd("process attach -p " + str(popen.pid))
+
+        # Set a breakpoint at a place that will be called by multiple threads
+        # simultaneously. On systems (e.g. linux) where the debugger needs to
+        # send signals to suspend threads, these signals will race with threads
+        # hitting the breakpoint (and stopping on their own).
+        bpno = lldbutil.run_break_set_by_symbol(self, "break_here")
+
+        # And let the inferior know it can call the function.
+        self.runCmd("expr -- wait_for_debugger_flag = false")
+
+        self.runCmd("continue")
+
+        self.expect(
+            "thread list",
+            STOPPED_DUE_TO_BREAKPOINT,
+            substrs=["stopped", "stop reason = breakpoint"],
+        )
+
+        # Detach, the process should keep running after this, and not be stopped
+        # by the signals that the debugger may have used to suspend the threads.
+        self.runCmd("detach")
+
+        lldbutil.wait_for_file_on_target(self, exit_file_path)
diff --git a/lldb/test/API/commands/process/detach-resumes/main.cpp b/lldb/test/API/commands/process/detach-resumes/main.cpp
new file mode 100644
index 00000000000000..728ebbff586c36
--- /dev/null
+++ b/lldb/test/API/commands/process/detach-resumes/main.cpp
@@ -0,0 +1,46 @@
+#include <stdio.h>
+#include <fcntl.h>
+#include <chrono>
+#include <thread>
+#include "pseudo_barrier.h"
+#include <fstream>
+#include <vector>
+
+pseudo_barrier_t barrier;
+
+constexpr size_t nthreads = 5;
+volatile bool wait_for_debugger_flag = true;
+
+void break_here() {}
+
+void tfunc() {
+  pseudo_barrier_wait(barrier);
+
+  break_here();
+}
+
+int main(int argc, char const *argv[])
+{
+  lldb_enable_attach();
+
+  if (argc<3) return 1;
+
+  // Create a file to signal that this process has started up.
+  std::ofstream(argv[1]).close();
+
+  // And wait for it to attach.
+  for (int i = 0; i < 100 && wait_for_debugger_flag; ++i)
+    std::this_thread::sleep_for(std::chrono::seconds(1));
+
+  // Fire up the threads and have them call break_here() simultaneously.
+  pseudo_barrier_init(barrier, nthreads);
+  std::vector<std::thread> threads;
+  for(size_t i = 0; i < nthreads; ++i) threads.emplace_back(tfunc);
+
+  for(std::thread &t: threads) t.join();
+
+  // Create the file to let the debugger know we're running.
+  std::ofstream(argv[2]).close();
+
+  return 0;
+}
diff --git a/lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py b/lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py
index 1790bd497f4e6b..2dcbb728549fb4 100644
--- a/lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py
+++ b/lldb/test/API/functionalities/fork/concurrent_vfork/TestConcurrentVFork.py
@@ -48,8 +48,6 @@ def follow_child_helper(self, use_fork, call_exec):
         self.expect("continue", patterns=[r"exited with status = 1[0-4]"])
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_parent_vfork_no_exec(self):
         """
         Make sure that debugging concurrent vfork() from multiple threads won't crash lldb during follow-parent.
@@ -58,8 +56,6 @@ def test_follow_parent_vfork_no_exec(self):
         self.follow_parent_helper(use_fork=False, call_exec=False)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_parent_fork_no_exec(self):
         """
         Make sure that debugging concurrent fork() from multiple threads won't crash lldb during follow-parent.
@@ -68,8 +64,6 @@ def test_follow_parent_fork_no_exec(self):
         self.follow_parent_helper(use_fork=True, call_exec=False)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_parent_vfork_call_exec(self):
         """
         Make sure that debugging concurrent vfork() from multiple threads won't crash lldb during follow-parent.
@@ -78,8 +72,6 @@ def test_follow_parent_vfork_call_exec(self):
         self.follow_parent_helper(use_fork=False, call_exec=True)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_parent_fork_call_exec(self):
         """
         Make sure that debugging concurrent vfork() from multiple threads won't crash lldb during follow-parent.
@@ -88,8 +80,6 @@ def test_follow_parent_fork_call_exec(self):
         self.follow_parent_helper(use_fork=True, call_exec=True)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_child_vfork_no_exec(self):
         """
         Make sure that debugging concurrent vfork() from multiple threads won't crash lldb during follow-child.
@@ -98,8 +88,6 @@ def test_follow_child_vfork_no_exec(self):
         self.follow_child_helper(use_fork=False, call_exec=False)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_child_fork_no_exec(self):
         """
         Make sure that debugging concurrent fork() from multiple threads won't crash lldb during follow-child.
@@ -108,8 +96,6 @@ def test_follow_child_fork_no_exec(self):
         self.follow_child_helper(use_fork=True, call_exec=False)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_child_vfork_call_exec(self):
         """
         Make sure that debugging concurrent vfork() from multiple threads won't crash lldb during follow-child.
@@ -118,8 +104,6 @@ def test_follow_child_vfork_call_exec(self):
         self.follow_child_helper(use_fork=False, call_exec=True)
 
     @skipUnlessPlatform(["linux"])
-    # See https://github.com/llvm/llvm-project/issues/85084.
-    @skipIf(oslist=["linux"], archs=["aarch64", "arm"])
     def test_follow_child_fork_call_exec(self):
         """
         Make sure that debugging concurrent fork() from multiple threads won't crash lldb during follow-child.

Copy link

github-actions bot commented Apr 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link

github-actions bot commented Apr 12, 2024

✅ With the latest revision this PR passed the Python code formatter.

Whenever an inferior thread stops, lldb-server sends a SIGSTOP to all
other threads in the process to force them to stop as well. If those
threads stop on their own before they get a signal, this SIGSTOP will
remain pending and be delivered the next time the process resumes.

Normally, this is not a problem, because lldb-server will detect this
stale SIGSTOP and resume the process. However, if we detach from the
process while it has these SIGSTOPs pending, they will get immediately
delivered, and the process will remain stopped (most likely forever).

This patch fixes that by sending a SIGCONT just before detaching from
the process. This signal cancels out any pending SIGSTOPs, and ensures
it is able to run after we detach. It does have one somewhat unfortunate
side-effect that in that the process's SIGCONT handler (if it has one)
will get executed spuriously (from the process's POV).

This could be _sometimes_ avoided by tracking which threads got send a
SIGSTOP, and whether those threads stopped due to it. From what I could
tell by observing its behavior, this is what gdb does. I have not tried
to replicate that behavior here because it adds a nontrivial amount of
complexity and the result is still uncertain -- we still need to send a
SIGCONT (and execute the handler) when any thread stops for some other
reason (and leaves our SIGSTOP hanging). Furthermore, since SIGSTOPs
don't stack, it's also possible that our SIGSTOP/SIGCONT combination
will cancel a genuine SIGSTOP being sent to the debugger application (by
someone else), and there is nothing we can do about that. For this
reason I think it's simplest and most predictible to just always send a
SIGCONT when detaching, but if it turns out this is breaking something,
we can consider implementing something more elaborate.

One alternative I did try is to use PTRACE_INTERRUPT to suspend the
threads instead of a SIGSTOP. PTRACE_INTERUPT requires using
PTRACE_SEIZE to attach to the process, which also made this solution
somewhat complicated, but the main problem with that approach is that
PTRACE_INTERRUPT is not considered to be a signal-delivery-stop, which
means it's not possible to resume it while injecting another signal to
the inferior (which some of our tests expect to be able to do). This
limitation could be worked around by forcing the thread into a signal
delivery stop whenever we need to do this, but this additional
complication is what made me think this approach is also not worthwhile.

This patch should fix (at least some of) the problems with
TestConcurrentVFork, but I've also added a dedicated test for checking
that a process keeps running after we detach. Although the problem I'm
fixing here is linux-specific, the core functinoality of not stopping
after a detach should function the same way everywhere.
@DavidSpickett
Copy link
Collaborator

Will fix #85084.

@DavidSpickett
Copy link
Collaborator

Thanks for looking into this. I had no idea what it might be but your explanation makes a lot of sense.

}

int main(int argc, char const *argv[]) {
lldb_enable_attach();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why this is needed?
Is it that some special OS may restrict the default attaching behavior for security reason unless lldb_enable_attach() is explicitly called?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's not really a "special" OS. Most linux systems are configured like that by default, where you're only allowed to debug your own children (unless they explicitly allow that, or you're root, etc.). Check out https://www.kernel.org/doc/html/v4.15/admin-guide/LSM/Yama.html. lldb_enable_attach is our own invention, which calls the appropriate os interface to enable attaching if necessary. Currently, it's only implemented on linux.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@labath labath merged commit 5f3e106 into llvm:main Apr 17, 2024
4 checks passed
@labath labath deleted the sigcont branch April 17, 2024 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LLDB TestConcurrentVFork.py is flaky on Linux
4 participants