-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8292695: SIGQUIT and jcmd attaching mechanism does not work with signal chaining library #9955
Conversation
…al chaining library
👋 Welcome back manc! A progress list of the required criteria for merging this PR into |
test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java
Outdated
Show resolved
Hide resolved
Webrevs
|
@caoman I'm interested in what your launcher does, or is, since I have almost never encountered signal chaining in the field. Would you mind sharing what product the launcher belongs to, and why you enable signal chaining? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I am not a reviewer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still trying to gauge the cause and effects here ...
In the meantime, should the test not be put under runtime/jsig ?
@tstuefe Our launcher is equipped with almost all our Java applications. It basically does what bin/java does, but with several additional features. One main motivation to have our own launcher, is to static link all user-defined JNI dependency libraries with the launcher, so the applications do not need to use Regarding to the signal chaining support from our launcher, we found some existing applications that |
The test tests SIGQUIT behavior with and without libjsig.so. So it does not necessarily depend on libjsig.so. I could move it if you prefer runtime/jsig. |
That is interesting. Do you link statically to improve startup speed, or for some other reason? |
It is not about startup speed, but mainly to ensure correct version and simplify deployment. It is very similar to the advantages of static library as stated on Wikipedia: https://en.wikipedia.org/wiki/Static_library. |
@dholmes-ora Let me know if there's any other issue, or if I could help explain the cause and effects better. |
@tstuefe are you able to look at this one? Thanks. |
@tstuefe Any feedback? I'd like to get this submitted soon to unblock our internal use cases relying on SIGQUIT and jcmd. |
Sorry, missed this one. I'll take a look later today. |
@caoman While looking at this, I think I found an unrelated bug in libjsig. https://bugs.openjdk.org/browse/JDK-8293466 Could you eyeball it and tell me if my assumption is correct? |
Thanks for filing the other bug. Responded in that bug. |
Looking at this, this gets more and more interesting. One thing, libjsig does nothing to prevent user code from changing thread/process sigmask. @caoman has that never been a problem at Google? |
Found another minor one: https://bugs.openjdk.org/browse/JDK-8293494. I'm looking at the change now, though. |
Hi @caoman, @dholmes-ora, @navyxliu, I wonder if it has to be that complicated. The problems (both https://bugs.openjdk.org/browse/JDK-8279124 and https://bugs.openjdk.org/browse/JDK-8292695) are caused by SIGQUIT being in a weird space - it is a VM signal, really, but gets handled by the signal dispatcher thread. The signal dispatcher thread is started long after the "real" hotspot handlers are installed. And after the dispatcher thread starts, only then we install the SIGQUIT handler. At that point the libjsig is already in "surveillance" mode where it intercepts stuff. But there is nothing that prevents us from moving the SIGQUIT handler installation up in time to the hotspot signal handler initialization. Nobody says the SIGQUIT handler has to be installed after the signal dispatcher thread started. Signal dispatcher thread and handler are only loosely coupled via To test my theory, I built: master...tstuefe:jdk:test-move-sigquit-init-up-in-time . This moves signal handler installation up in time. It also optionally adds a delay between hotspot signal handler installation and dispatcher thread startup. And this just works as expected: if we send SIGQUIT to the VM, we get thread dumps. Before dispatcher thread start, these SIGQUITs are "stored", and executed all once dispatcher thread starts (whether or not one wants that is a different question, but easily to modify). And SIGQUITs after dispatcher startup work fine too. As expected this works with or without libjsig, so it solves JDK-8292695. It also closes the initial time window and therefore does not regress JDK-8279124. And it is a lot simpler to argue about, especially with libjsig present. Now SIGQUIT is handled like any other ordinary hotspot signal. (and I wonder whether we could handle the shutdown signals the same way, but have not tested). What do you think? Am I overlooking something? |
We don't actually use libjsig directly. Our launcher reimplemented signal chaining support using the same JVM_begin_signal_setting/JVM_end_signal_setting/JVM_get_signal_action interface. Our launcher's implementation has calls to pthread_sigmask, but the goal seems to make the overridden sigaction() async-signal-safe. It doesn't prevent user code from changing thread/process sigmask either. However, I haven't heard of issues with user code changing sigmask. |
Thank you for the great idea and experiment! Let me try this approach and rerun the tests. |
Switched to the approach that installs the handler only once. The SIGQUIT handler has to be installed after jdk_misc_signal_init() has initialized |
You can just move Then it works as planned, with SIGQUIT now being part of the normal hotspot VM signal installation. |
I try it out. yeah, it works! One installation but it will remember early SIGBREAK. Why do you insist to move jdk_misc_signal_init() before install_signal_handlers()? Is there any difference before and after it? |
Moved
Because SIGBREAK's UserHandler() uses the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a test that checks that:
if VM is started with -Xrs, and we send a SIGQUIT, the VM shall crash with a core. The crash should look like a system crash, not an hs-err file crash, like this:
Quit (core dumped)
which assures us that no hotspot signal handler is installed and we take therefore the default action on SIGQUIT.
If this string is OS dependent and gives us headaches, the test should check at least that the process exit code is not 0.
And of course the VM should be started with tiny heap etc such to make sure the core file generated will be tiny.
test/hotspot/jtreg/runtime/Thread/TestBreakSignalThreadDump.java
Outdated
Show resolved
Hide resolved
I struggled to get this to work. I found out it is due to https://bugs.openjdk.org/browse/JDK-8234262, and ProcessBuilder changes how SIGQUIT is handled in the subprocess. This is further complicated by how jtreg runs the parent process. The workaround (passing -Xrs to the parent process) does not work through jtreg using "main/othervm -Xrs". I uploaded the not-working test in here: caoman@73e3a5f |
Thank you for trying, and finding this out! I'm fine with postponing this test then to another RFE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thank you for taking my suggestion!
..Thomas
@caoman This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 18 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new revision still LGTM.
/integrate |
Going to push as commit 45ff10c.
Your commit was automatically rebased without conflicts. |
SG. Filed https://bugs.openjdk.org/browse/JDK-8293611. |
Messing around with initialization order always makes me very nervous. While this may have solved the problem at hand it is far from clear to me that it has not introduced unexpected behaviour in other situations - situations that our testing is unlikely to expose. |
Hmm, we put a lot of work and time into this solution. While I acknowledge this is a difficult area, so are many others in hotspot which we are not shy of changing. And we did not change the coding on a whim - the proposed solution IMHO is better than the assortment of piled-on hotfixes we had before and dissolves quite a bit of complexity. That makes the code easier to understand and maintain in the long run. @caoman also provided a regression test, so that's covered too. But if you have concrete misgivings about the proposed solution, let's discuss it. |
@tstuefe IIUC the earlier proposal from @caoman simply moved the setting of the BREAK_HANDLER to inside |
See my comment at #9955 (comment). The original solution by @caoman , while it would have worked, was a bandaid atop of a bandaid (see also https://bugs.openjdk.org/browse/JDK-8279124). It was perpetuating a crooked solution, and the crookedness stems from the fact that SIGQUIT is semantically a VM signal, owned by the VM, but initialized outside the libjsig-VM-signal-initialization window. The solution proposed by me makes SIGQUIT initialize like any other VM signal, therefore we can remove the SIGQUIT-related special handling and hopefully don't need further patches in this area. |
I don't really agree with the rationale, but the fix is correct, and the reordering seems harmless enough on further inspection. To me this is a JDK serviceability signal, both for dynamic attach and thread dumps, as it is the JDK serving "signal thread" that handles it. Most of the problems in this area have stemmed from initialization issues due to "too early" sending of SIGQUIT to initiate an attach attempt. |
Yes, this is main reason we installed BREAK_SIGNAL handler inside There was a concern that with libjsig present and without |
Hi all,
Could anyone review this bug fix? See https://bugs.openjdk.org/browse/JDK-8292695 for details.
I changed the temporary handler for SIGQUIT to use a dummy function, and use
os::signal()
to set it up, just asos::initialize_jdk_signal_support()
does.It is possible that just moving the
set_signal_handler(BREAK_SIGNAL, false);
ininstall_signal_handlers()
outside of the window bounded byJVM_{begin|end}_signal_setting()
could also fix this bug. However,set_signal_handler()
andJVM_HANDLE_XXX_SIGNAL()
are currently used for signals that support chaining and periodically check, which do not apply to SIGQUIT. I think it is cleaner to use different functions for SIGQUIT.I also added a test to check that sending SIGQUIT should produce a thread dump on stdout, with and without using libjsig.so.
-Man
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9955/head:pull/9955
$ git checkout pull/9955
Update a local copy of the PR:
$ git checkout pull/9955
$ git pull https://git.openjdk.org/jdk pull/9955/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9955
View PR using the GUI difftool:
$ git pr show -t 9955
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9955.diff