Skip to content

8283337: Posix signal handler modification warning triggering incorrectly #7858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

kevinjwalls
Copy link
Contributor

@kevinjwalls kevinjwalls commented Mar 17, 2022

PosixSignals::print_signal_handler() is incorrectly detecting a changed signal handler. The signal handler for SIGBREAK/SIGQUIT is now set in src/hotspot/os/posix/signals_posix.cpp with the bool parameter do_check set to false.

set_signal_handler should only store a handler in vm_handlers when do_check is true.

However I don't see a simple way of getting a valid warning for the SIGQUIT handler, if it is added later when os::initialize_jdk_signal_support() calls os::signal().

If only signals added directly by src/hotspot/os/posix/signals_posix.cpp have the warning for a handler changing, then we never had a warning for SIGQUIT, so just this simple change is needed to remove the bogus warning.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8283337: Posix signal handler modification warning triggering incorrectly

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7858/head:pull/7858
$ git checkout pull/7858

Update a local copy of the PR:
$ git checkout pull/7858
$ git pull https://git.openjdk.java.net/jdk pull/7858/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7858

View PR using the GUI difftool:
$ git pr show -t 7858

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7858.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 17, 2022

👋 Welcome back kevinw! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 17, 2022
@openjdk
Copy link

openjdk bot commented Mar 17, 2022

@kevinjwalls The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Mar 17, 2022
@mlbridge
Copy link

mlbridge bot commented Mar 17, 2022

Webrevs

@dholmes-ora
Copy link
Member

Hi Kevin,

Given:

static void check_signal_handler(int sig) {
  char buf[O_BUFLEN];
  bool mismatch = false;

  if (!do_check_signal_periodically[sig]) {
    return;
  }

I don't see how we can get the bogus warning ??

@dholmes-ora
Copy link
Member

Ah sorry - wrong code. The warning comes from print_signal_handlers.

@dholmes-ora
Copy link
Member

Okay ... so the basic problem is two similar pieces of code use different guards for checking for a changed handler. The real checking code does check do_check_signal_periodically[sig], while the printing code just checks whether vm_handlers.get(sig) == NULL. So one way to fix is to ensure the handler is NULL when not checking - which is the current fix. Alternatively, change print_signal_handlers so that it instead checks do_check_signal_periodically[sig] (and just asserts the handler is not NULL).

Also I note that if check_signal_handler finds a mismatch it will call print_signal_handlers which will issue the mismatch warning a second time. That can be fixed if we swap these two lines:

838 os::print_signal_handlers(tty, buf, O_BUFLEN);
839 do_check_signal_periodically[sig] = false;

and also check do_check_signal_periodically[sig] in print_signal_handlers.

@tstuefe
Copy link
Member

tstuefe commented Mar 18, 2022

Trying to recall the details of Xin's JDK-8279124 to wrap my head around this one...

The original problem was that we install the SIGQUIT handler when initializing the JDK as part of os::initialize_jdk_signal_support(). In contrast to other signals, that happens at the end of VM initialization. VM initialization can take time (eg large heap and +AlwaysPreTouch). So it leaves a window where a SIGQUIT would run into the default handler, which tears down the VM process. Unfortunately tools like jcmd use SIGQUIT as signaling mechanism, so using jcmd on a seemingly unresponsive VM - that just takes its time initializing - kills the VM.

The solution was to install a first stop-gap signal handler for SIGQUIT at the very start of VM initialization, together with all other signals. That one just ignored SIGQUIT. Then we replace later that handler with the real one.

The problem then was that signal change recognition was established for the first handler, and triggers warnings after the first handler is replaced by the second one. But both handlers are legit, so the warning was bogus. Xin solved this by disabling handler change recognition for SIGQUIT. He did not include print_signal_handler() though, so we see that warning in hs-err files and in jcmd VM.info always.

Have I gotten this right, or have I forgotten something?

--

For a workaround the proposed change is fine. It gets rid of the always present warning in hs-err files, at the cost of hiding it also when it would be useful. But that is okay, a warning that is always present gets quickly ignored.

One code improvement would be to remove do_check_signal_periodically altogether, since it is now redundant with vm_handlers.get() == NULL. So, leaving NULL or resetting to NULL in SavedSignalHandlers could mean "dont test".

A better solution may be to save the second SIGQUIT handler too and reestablish the checks for SIGQUIT. Since we want to know if a third-party app changes SIGQUIT, and Xcheck:jni to tell us if that happens. A possible scenario, Customer complains about "vanishing VMs", but some third party lib just reset SIGQUIT and VM dies as soon as a Thread dump is triggered. Running with -Xcheck:jni would be the first thing I try, but right now it would not help.

Cheers, Thomas

@kevinjwalls
Copy link
Contributor Author

Thanks both --

Yes Thomas your memory is fine I think the summary is good. 8-)
The VMs that could be accidentally killed by attaching were those that were discovered by scanning for PIDs, and that were slow to startup. I think we'll all agree it was good to make things more robust.

Not sure about removing do_check_signal_periodically ? It still checks the others handlers, even if vm_handlers.get(SIGBREAK) returns null.

Ideally yes we would save the second SIGQUIT handler - I think the problem there is that it is set later by os::signal() which doesn't have a way to distinguish itself from any other native code setting a handler. It is the kind of signal action setting code which this saved handler mechanism is trying to notice (and it does notice it!)

I think that os::signal() call to set the BREAK/QUIT handler was never being saved in vm_handlers, so we haven't actually been checking for changed handlers there.

David's comment about reversing these lines
(inside check_signal_handler(int sig) at the point where we have detected a change):

838 os::print_signal_handlers(tty, buf, O_BUFLEN);
839 do_check_signal_periodically[sig] = false;

..and checking do_check_signal_periodically[sig] in print_signal_handlers:

Yes, I almost added checking do_check_signal_periodically[sig] in print_signal_handlers.

There is some duplication of warnings, but they are different: if periodic check finds a mismatch, it warns (shows which handler has changed):

Warning: SIGQUIT handler modified!

...then calls print_handlers which warns more specifically (what the handler is, and what we expected):

*** Handler was modified!
*** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO

Actually, the periodic check can get really verbose: it prints all handlers, for each individual handler it sees changed.

os::run_periodic_checks()
calls
889 check_signal_handler(SIGSEGV);
890 check_signal_handler(SIGILL);
..etc...

...each of which might make this call:
838 os::print_signal_handlers(tty, buf, O_BUFLEN);

...so thankfully we don't see this very often. I get it now that line 838 should probably be printing ONE handler:
os::print_signal_handler(tty, sig, buf, O_BUFLEN);

This is a removal of a regression so don't want to feature-creep too much, but I'll try that change.

@tstuefe
Copy link
Member

tstuefe commented Mar 18, 2022

Not sure about removing do_check_signal_periodically ? It still checks the others handlers, even if vm_handlers.get(SIGBREAK) returns null.

I was under assumption that do_check_signal_periodically[] = true and vm_handlers.get() != NULL mean the same. But I see now there are subtle differences: when Xcheck:jni encounters a mismatch, it prints out a warning, and then disables the check by setting do_check_signal_periodically[] = false. But it leaves vm_handlers intentionally unchanged, so in the hs-err file we still get noticed about the changed handler. That way we have a warning on stdout, and one in the hs-err file.

So please ignore my proposal about removing do_check_signal_periodically.

Ideally yes we would save the second SIGQUIT handler - I think the problem there is that it is set later by os::signal() which doesn't have a way to distinguish itself from any other native code setting a handler. It is the kind of signal action setting code which this saved handler mechanism is trying to notice (and it does notice it!)

I wondered about that too. The question is whether its valid to repurpose hotspot signals when done via the official java API. And while I think about this, probably not. You are right, a blanket "ok" for os::signal() would be wrong.

We could just try and handle SIGQUIT differently. But everything I can come up quickly is hacky. E.g. treat the first modification of SIGQUIT via java special. But oh god thats ugly.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reasonable.

@openjdk
Copy link

openjdk bot commented Mar 18, 2022

@kevinjwalls This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8283337: Posix signal handler modification warning triggering incorrectly

Reviewed-by: stuefe, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 41 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 18, 2022
@kevinjwalls
Copy link
Contributor Author

Yes, it looks OK if check_signal_handler calls:

PosixSignals::print_signal_handler(tty, sig, buf, O_BUFLEN);

It would give output like:

$ build/linux-x86_64-server-release/images/jdk/bin/java -Xcheck:jni ...testapp...
...
Warning: SIGILL handler modified!
SIGILL: SIG_DFL, mask=11100100010111111101111111111110, flags=SA_RESTART|SA_SIGINFO
*** Handler was modified!
*** Expected: javaSignalHandler in libjvm.so, mask=11100100110111111111111111111110, flags=SA_RESTART|SA_SIGINFO
Consider using jsig library.
...

This shows the duplication, and it's not bad, even desirable as it means the message starts with "Warning:" which tries to separate it from other output.

I'd like to integrate this PR as is, to fix the regression in the warning quickly, unless any other thoughts. Can separately do a change to print only the handler that has changed as that looks like a mistake.

@tstuefe
Copy link
Member

tstuefe commented Mar 19, 2022

I'm still fine with this change. David is on vacation I believe. Lets wait till next week.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that we are checking the wrong condition in print_signal_handler. This is okay as a quick fix but there needs to be a follow up RFE to clean this up.

Thanks,
David

@kevinjwalls
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Mar 31, 2022

Going to push as commit 45d4d7d.
Since your change was applied there have been 181 commits pushed to the master branch:

  • 3d4be14: 8181571: printing to CUPS fails on mac sandbox app
  • ef51dfd: 8283791: Parallel: Remove unnecessary condition in PSKeepAliveClosure
  • 3e643f4: 8283799: Collapse identical catch branches in jdk.hotspot.agent
  • 1ca0ede: 8283725: Launching java with "-Xlog:gc*=trace,safepoint*=trace,class*=trace" crashes the JVM
  • c9a469a: 8283784: java_lang_String::as_platform_dependent_str stores to oop in native state
  • fbb8ca5: 8281717: Cover logout method for several LoginModule
  • e0a8669: 8281223: Improve the API documentation of HttpRequest.Builder::build to state that the default implementation provided by the JDK returns immutable objects.
  • eeca3a3: 8253569: javax.xml.catalog.Catalog.matchURI() implementation should reset state variables
  • ec0897a: 8281705: SourceLauncherTest.testSystemProperty isn't being run
  • 1ddab6f: 8283060: RawNativeLibraries should allow multiple clients to load/unload the same library
  • ... and 171 more: https://git.openjdk.java.net/jdk/compare/69e4e338b19c0ffd2f0881be1bbb19a5642bc4d4...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 31, 2022
@openjdk openjdk bot closed this Mar 31, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 31, 2022
@openjdk
Copy link

openjdk bot commented Mar 31, 2022

@kevinjwalls Pushed as commit 45d4d7d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants