-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm-exegesis] Kill process that recieve a signal #86069
[llvm-exegesis] Kill process that recieve a signal #86069
Conversation
Before this patch, llvm-exegesis would leave processes lingering that experienced signals like segmentation faults. They would up in a signal-delivery-stop state under the ptrace and never exit. This does not cause problems (or at least many) in llvm-exegesis as they are cleaned up after the main process exits, which usually happens quickly. However, in downstream use, when many blocks are being executed (many of which run into signals) within a single process, these processes stay around and can easily exhaust the process limit on some systems. This patch cleans them up by sending SIGKILL after information about the signal that was sent has been gathered.
Fixes google/gematria#76. |
@llvm/pr-subscribers-tools-llvm-exegesis Author: Aiden Grossman (boomanaiden154) ChangesBefore this patch, llvm-exegesis would leave processes lingering that experienced signals like segmentation faults. They would up in a signal-delivery-stop state under the ptrace and never exit. This does not cause problems (or at least many) in llvm-exegesis as they are cleaned up after the main process exits, which usually happens quickly. However, in downstream use, when many blocks are being executed (many of which run into signals) within a single process, these processes stay around and can easily exhaust the process limit on some systems. This patch cleans them up by sending SIGKILL after information about the signal that was sent has been gathered. Full diff: https://github.com/llvm/llvm-project/pull/86069.diff 1 Files Affected:
diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
index 5c9848f3c68885..f0452605eb24bf 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
@@ -342,7 +342,7 @@ class SubProcessFunctionExecutorImpl
return make_error<Failure>("Failed to attach to the child process: " +
Twine(strerror(errno)));
- if (wait(NULL) == -1) {
+ if (waitpid(ParentOrChildPID, NULL, 0) == -1) {
return make_error<Failure>(
"Failed to wait for child process to stop after attaching: " +
Twine(strerror(errno)));
@@ -361,7 +361,7 @@ class SubProcessFunctionExecutorImpl
return SendError;
int ChildStatus;
- if (wait(&ChildStatus) == -1) {
+ if (waitpid(ParentOrChildPID, &ChildStatus, 0) == -1) {
return make_error<Failure>(
"Waiting for the child process to complete failed: " +
Twine(strerror(errno)));
@@ -401,6 +401,20 @@ class SubProcessFunctionExecutorImpl
Twine(strerror(errno)));
}
+ // Send SIGKILL rather than SIGTERM as the child process has no SIGTERM
+ // handlers to run, and calling SIGTERM would mean that ptrace will force
+ // it to block in the signal-delivery-stop for the SIGSEGV/other signals,
+ // and upon exit.
+ if (kill(ParentOrChildPID, SIGKILL) == -1)
+ return make_error<Failure>("Failed to kill child benchmarking proces: " +
+ Twine(strerror(errno)));
+
+ // Wait for the process to exit so that there are no zombie processes left
+ // around.
+ if (waitpid(ParentOrChildPID, NULL, 0) == -1)
+ return make_error<Failure>("Failed to wait for process to die: " +
+ Twine(strerror(errno)));
+
if (ChildSignalInfo.si_signo == SIGSEGV)
return make_error<SnippetSegmentationFault>(
reinterpret_cast<intptr_t>(ChildSignalInfo.si_addr));
|
@@ -342,7 +342,7 @@ class SubProcessFunctionExecutorImpl | |||
return make_error<Failure>("Failed to attach to the child process: " + | |||
Twine(strerror(errno))); | |||
|
|||
if (wait(NULL) == -1) { | |||
if (waitpid(ParentOrChildPID, NULL, 0) == -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At that point ParentOrChildPID
is always the parent, but it's not immediately obvious (I had to scroll up). What about restructuring this function to have:
[[noreturn]] void runChildSubprocess(int ReadFD, int FriteFD) {
// We are in the child process, close the write end of the pipe.
close(PipeFiles[1]);
// Unregister handlers, signal handling is now handled through ptrace in
// the host process.
sys::unregisterHandlers();
prepareAndRunBenchmark(PipeFiles[0], Key);
llvm_unreachable("Child process didn't exit when expected.");
}
Error runParentSubprocess(pid_t PID, int ReadFD, int FriteFD) {
const ExegesisTarget &ET = State.getExegesisTarget();
...
}
createSubProcessAndRunBenchmark() {
...
if (ParentOrChildPID == -1) {
...
}
if (ParentOrChildPID == 0) {
runChildSubprocess(PipeFiles[0], Pipefiles[1]);
llvm_unreachable("Child process didn't exit when expected.");
}
return runParentSubprocess(ParentOrChildPID, PipeFiles[0], Pipefiles[1]);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. The suggested refactoring would definitely make the code cleaner. I'm going to land this PR and then open up another PR with the suggested refactoring to try and keep the history cleanish. Thanks for the suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #86232.
Before this patch, llvm-exegesis would leave processes lingering that experienced signals like segmentation faults. They would up in a signal-delivery-stop state under the ptrace and never exit. This does not cause problems (or at least many) in llvm-exegesis as they are cleaned up after the main process exits, which usually happens quickly. However, in downstream use, when many blocks are being executed (many of which run into signals) within a single process, these processes stay around and can easily exhaust the process limit on some systems. This patch cleans them up by sending SIGKILL after information about the signal that was sent has been gathered.
Before this patch, llvm-exegesis would leave processes lingering that experienced signals like segmentation faults. They would up in a signal-delivery-stop state under the ptrace and never exit. This does not cause problems (or at least many) in llvm-exegesis as they are cleaned up after the main process exits, which usually happens quickly. However, in downstream use, when many blocks are being executed (many of which run into signals) within a single process, these processes stay around and can easily exhaust the process limit on some systems.
This patch cleans them up by sending SIGKILL after information about the signal that was sent has been gathered.