-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8279124: VM does not handle SIGQUIT during initialization #7003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… before sending SIGQUIT
|
👋 Welcome back xliu! A progress list of the required criteria for merging this PR into |
Webrevs
|
dholmes-ora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Xin,
A couple of comments below. I'm still thinking about this one ... seems okay but I'm not certain ...
Thanks,
David
| * Return 1 if the SIGQUIT is set in SigCgt; 0 if it is not. | ||
| * Return -1 when it runs into any error. | ||
| */ | ||
| static int check_sigquit_caught(jint pid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't this just be a bool function? Even if int we don't need a ternary return value as only zero is of interest.
|
Hi @navyxliu, nice catch. I can see how this can be annoying. I propose a simpler and more robust way to fix it though. We (1) set up general hotspot signal handling very early, then (2) proceed to initialize the heap - which you have shown can take some time - then (3) set up SIGQUIT handling. We core if we get a quit signal before (3). I would add SIGQUIT handling to the general signal handler too, just to cover the time frame between (1) and (3). At (3), it would be overwritten, but that would be fine. Before (3), we could just ignore SIGQUIT, or print out some generic information (I assume thread dumps are not yet possible at this stage). Since the documented behavior for the JVM is to threaddump on SIGQUIT unless we run with -Xrs, I think this is acceptable behavior. Not printing thread dump or only printing partial information is preferable to quitting with core. Then, this would work for any client that sends sigquit to a JVM, not just those using the attach framework. And it would work on all Posix platforms, not just Linux. And we'd would not have to rely on parsing the proc fs. Als note that a solution implemented in the client as you did has the disadvantage that I need a modern jcmd for it to work. However, I often just use whatever jcmd is in the path. Better to handle this in the receiving hotspot. I sketched out a simple patch to test if what I propose can work: It still misses a number of things (I did not check signal mask setup and ReduceSignalUsage must be handled too), but it shows the general direction and worked as expected. Cheers, Thomas |
Great, this is the kind of thing I was heading towards with the conversation in the bug text. Although not sure why I could not reproduce the problem, with various different JDK versions. |
Ah, I missed your conversation. I reproduced this by adding a delay during initialization and sending sigquit manually. The bug is not restricted to jcmd, sigquit handling is broken during initialization. Folks tend to send sigquit to unresponsive VMs to get thread dumps, so coring is unfortunate (another reason not to fix it in jcmd itself). Cheers, Thomas |
|
Apologies @kevinjwalls as I also missed the discussion in the JBS issue. Ideally we would know if the target VM is ready before we send the SIGQUIT to attach - e.g. writing a well-known file that the attach mechanism looks for before trying to attach. That seems feasible but perhaps costly and not always possible(?) ... What has been presented here is a side-channel way of knowing. In that respect I like this. It is a pity it is Linux only. The alternative suggestions of just making the window during which SIGQUIT terminates the VM process small enough to be un-hittable, also has merit. I don't think we have to try and accommodate extreme code that just looks up a process in the process table and throws a signal at it. Ignoring SIGQUIT during the early VM startup seems a reasonable solutions (we can't produce a thread dump at that time anyway). It seems to me that we can simply install UserHandler for SIGQUIT very early in the VM initialization process and it will be a no-op until the JDK signal handling is properly initialized (just need to fix one assert). Cheers, |
|
hi, @tstuefe @dholmes-ora One behavior change in the 2f25753 is that we can't core the JVM process with -Xrs (ReduceSignalUsage) anymore SIGQUIT will be intercepted by JVM_HANDLE_XXX_SIGNAL. Thomas said this is fine. |
tstuefe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Xin,
thanks for taking my suggestion. Remarks inline.
Cheers, Thomas
|
p.s. may make sense to reformulate the JBS issue. "VM does not handle SIGQUIT during initialization", and extending the error description to make clear this affects anyone sending SIGQUIT, not only jcmd. |
Yes, I learn this trick recently. If the JVM looks suspiciously frozen, just input |
false(default value). This patch also adds a log message with 'os+init=info'.
|
Hi -- thanks for updating the bug title and text. Yes it's much better to start with a concise problem description. I'm in favour of the signal hander change. I'm not personally concerned about printing, silently handling SIGQUIT seems fine for a VM at this stage, perhaps printing just adds risk. Still curious that I don't reproduce the problem by making heap initialization slow with options like -Xms100g -XX:+AlwaysPreTouch as you could. Startup can be so slow I can attach gdb and see it's in: Threads::create_vm / init_globals / universe_init / G1CollectedHeap::initialize / ...etc... ...but jcmd or kill -QUIT don't hurt my JVM. 8-) That process' /proc/PID/status contains: SigIgn: 0000000000000006 ...so that I think has signals 2 and 3 ignored? (Ubuntu) Elsewhere I used Oracle Linux under Windows Services for Linux, and SigXXX fields in /proc/PID/status are all zeroes, not sure if they are meaningful there. Possibly another reason to handle this with the signal handler change. On a real OracleLinux install I do see : SigIgn: 0000000000000006 at startup become: SigIgn: 0000000000000006 ..after some seconds. But I still can't trigger the issue, there are some signals ignored there also. So I like the change but would like to be clearer where the problem exists, where (what platforms?) can we see no signals ignored or caught at startup, and trigger the problem of crashing the VM with SIGQUIT. |
I reproduced it with my artificial delay on Ubuntu 20.04. Cannot reproduce it with AlwaysPreTouch since my machine is too fast. |
|
hi, @kevinjwalls, According to signal(7), the default disposition of SIGQUIT(3) is "Core" and SIGINT(2) is "Term". "SigIgn: 0000000000000006" indicates that you "Ign" them in the first place. Besides launcher and hotspot, is it possible systemd/bash or kernel can change it? The doc also describes as follows. Is that possible that you just fork but not exec so it just inherits the signal dispositions from your bash?
--lx |
tstuefe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now. Thanks!
Cheers, Thomas
|
@navyxliu This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 67 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
| if (!signal_was_handled && sig == BREAK_SIGNAL) { | ||
| assert(!ReduceSignalUsage, "Should not happen with -Xrs/-XX:+ReduceSignalUsage"); | ||
| signal_was_handled = true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just do this as the very first thing at line 564:
// If we get BREAK_SIGNAL it means we are very early in VM initialization and
// only temporarily "handling" it to prevent the VM process getting terminated.
if (sig == BREAK_SIGNAL) {
assert(!ReduceSignalUsage, "Should not happen with -Xrs/-XX:+ReduceSignalUsage");
return true;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, David and Thomas,
I am nervous about this change. It's not complex but I don't want to break the existing java applications. Bear with me.
From my reading, HotSpot can chain a user-custom signal handler because of libjsig.so. It's not like a linked-list chain. if the user installs a handler for a signal, libjsig just saves it. JVM_HANDLE_XXX_SIGNAL is invoked first and then the user-custom handler is called here if it isn't handled.
The reason we can hoist this logic to line 564 because we assume that no user application would define a handler for BREAK_SIGNAL. It doesn't work right now because os::initialize_jdk_signal_support() will overwrite the signal handler of BREAK_SIGNAL later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Xin,
Signal chaining doesn't work for BREAK_SIGNAL - from the signal chaining docs:
Note:
The SIGQUIT, SIGTERM, SIGINT, and SIGHUP signals cannot be chained. If the application must
handle these signals, then consider using the —Xrs option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. If embedding application likes to handle SIGQUIT, it needs to set -Xrs. In that case, it does not matter if it installed the signal handler before or after VM was initialized. We just won't touch SIGQUIT at all in that case.
Wrt the position of handling the break signal: most of the coding between lines 564 and 588 does not hurt; PosixSignals::unblock_error_signals(); may actually be beneficial, though it is highly unlikely that we get problems with secondary crashes when handling SIGQUIT. So, line 588 would be an okay position for SIGQUIT handling too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just didn't see the point in doing that extra stuff or the signal_is_handled when we just need to bail out immediately.
It's because SIGQUIT will be overwritten later. This patch fixed the regresssion runtime/jni/checked/TestCheckedJniExceptionCheck.java. The test uses -Xcheck:jni and the warning message from JniPeriodicCheckerTask may mess up the expected outputs.
|
Good catch about Xcheck:jni. This looks good to me. Ship it. |
| } | ||
|
|
||
| void set_signal_handler(int sig) { | ||
| void set_signal_handler(int sig, bool do_check = true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good catch on this part!
|
/integrate |
|
Going to push as commit 9bf6ffa.
Your commit was automatically rebased without conflicts. |
In early stage of initialization, HotSpot doesn't handle SIGQUIT. The default signal preposition on Linux is to quit the process and generate coredump.
There are 2 applications for this signal.
It is possible that HotSpot is still initializing in Threads::create_vm() when SIGQUIT arrives. We should change JVM_HANDLE_XXX_SIGNAL to catch SIGQUIT and ignore it. It is installed os::init_2() and should cover the early stage of initialization. Later on, os::initialize_jdk_signal_support() still overwrites the signal handler of SIGQUIT if ReduceSignalUsage is false(default).
Testing
Before, this patch, once initialization takes long time, jcmd may quit the java process.
With this patch, neither jcmd nor kill -3 will disrupt java process 45850.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7003/head:pull/7003$ git checkout pull/7003Update a local copy of the PR:
$ git checkout pull/7003$ git pull https://git.openjdk.java.net/jdk pull/7003/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 7003View PR using the GUI difftool:
$ git pr show -t 7003Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7003.diff