-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK-8288556: VM crashes if it gets sent SIGUSR2 from outside #9181
Conversation
👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into |
Webrevs
|
os::print_siginfo(&ss, siginfo); | ||
ss.print_raw(")."); | ||
assert(thread != NULL, "%s.", ss.base()); | ||
tty->print_cr("%s", ss.base()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this should be a regular VM warning not a raw write to tty - neither of which are signal-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I changed this to log_warning(os).
As per the discussion this is a much broader problem as it could apply to a range of signals and can be wrong even if the thread is not attached. Even if you want to restrict this to SR_SIGNUM the current proposal only handles one case. |
Hi David,
I disagree about this being a large issue. Quoting https://mail.openjdk.org/pipermail/core-libs-dev/2022-June/091577.html I'd say we limit this to The (1) set is rather small and contains:
All of these signals are already handled correctly. Note that with several of them (PIPE, XFSZ) we already established the pattern of ignoring signals instead of vanishing the VM. The set (2) is atm unknown to me. Are there any more? SIGCHILD is ignored by default. SIGUSR1 exists and exits the VM; this may be another case, but atm we don't handle it and I would not add a handler to it since user apps may use this signal. Any others? The way I see it, my patch would introduce the same handling for SIGUSR2 we already have established for SIGPIPE, SIGXFSZ, and arguably for SIGINT. Cheers, Thomas |
Hi Thomas, Okay I see SIGUSR2 (or more generally SR_SIGNUM) is special in that regard: it is an internal-use-only non-terminating signal. But any external sending of SIGUSR2 is invalid regardless of whether an attached or not-attached thread handles it. |
So, should I fail if si_pid!=getpid? As I wrote, I was a bit worried that some OSes may not deliver the correct pid in si_pid - either deliver the kernel thread id or leave it empty. I have dim recollections of such errors on AIX or HPUX. So far, we use si_pid only for displaying purposes and don't really rely on it being correct. Another issue, I tried the patch with redefining SR_SIGNUM and found that I could not use SIGUSR1 on Linux because its numerical value (10) is below SIGSEGV(11) on my box and we have this code: jdk/src/hotspot/os/posix/signals_posix.cpp Lines 1684 to 1689 in 0a5f577
Do you think this is fix-worthy? SIGUSR1 seems an obvious choice for an alternate SR signal, but OTOH nobody complained in 20+ years. |
I read up on this in the very good analysis at: https://bugs.openjdk.org/browse/JDK-4355769?focusedCommentId=12425929&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-12425929 . This was a wild read :) I continue to be impressed by the quality of bug reports in JBS. Seems more complicated than I thought. I am not sure how much of this is till valid today. This was 21 years ago. I'm surprised by the described behaviour though, that SIGUSR2 can interrupt delivery of error signals; I would have naively thought that signal delivery is on a strict first come first deliver base. |
As the saying goes "it's complicated". Whether Linux signal delivery has the same properties today I have no idea. I'm somewhat bemused that SIGUSR1 and SIGUSR2 are not adjacent signals - weird design to say the least. I also don't understand how you can possibly get two signals pending like that at the same time when one is synchronous and the other asynchronous. 4355769 is an interesting read but I'm not sure I can really agree with the analysis (and note that one comment seems to contradict an earlier one, so exactly what happened is unclear). So yeah setting an alternative SR_SIGNUM is problematic. On the si_pid part ... I have no prior knowledge of this (didn't even know it existed), so have no idea whether it is reliable or not. Seems to me we have far greater risk of breaking something unexpectedly with changing this code than we potentially benefit from making the change. So I'd vote for doing nothing. |
"Nothing" as in I should withdraw this patch? Surely not? The behavioral difference my patch brings would be: As I wrote, I can compromise the second part to: |
Given you have done the work I can review the patch - we just need to resolve the tty vs. VM warning issue. But I don't see a need to make any actual changes here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Thomas!
@tstuefe This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 53 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me.
PRODUCT code should not abort but continue running if safely possible. In the case considered here, nothing is wrong in the VM. It just receives a signal it doesn't know what to do about. Therefore, "ignore and continue" is the right strategy.
Thanks a lot, @dholmes-ora and @RealLucy ! /integrate |
Going to push as commit 701ea3b.
Your commit was automatically rebased without conflicts. |
The VM uses SIGUSR2 (can be overridden via _JAVA_SR_SIGNUM) to implement suspend/resume on java threads. It sends, via pthread_kill, SIGUSR2 to targeted threads to interrupt them. It knows the target thread, and the target thread is always a VM-attached thread.
However, if SIGUSR2 gets sent from outside, any thread may receive the signal, and if the target thread is not attached to the VM (e.g. primordial), it is unable to handle it. The result is an assert (debug VM) or a crash (release VM). On my box, this can be reliably reproduced by sending SIGUSR2 to any VM.
This has been discussed here: https://mail.openjdk.org/pipermail/core-libs-dev/2022-June/091450.html
The proposed solutions range from "works as designed" (on the ground that sending arbitrary signals to the JVM is an error in itself, and we should rather crash hard and fast) to "lets catch and ignore the signal".
In this patch I opt for:
Notes:
Thanks, Thomas
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9181/head:pull/9181
$ git checkout pull/9181
Update a local copy of the PR:
$ git checkout pull/9181
$ git pull https://git.openjdk.org/jdk pull/9181/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9181
View PR using the GUI difftool:
$ git pr show -t 9181
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9181.diff