Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8259937: guarantee(loc != NULL) failed: missing saved register with native invoker #2528

Closed
wants to merge 10 commits into from

Conversation

rwestrel
Copy link
Contributor

@rwestrel rwestrel commented Feb 11, 2021

We spotted this issue with Shenandoah and I managed to write a simple
test case that reproduces it reliably with Shenandoah but the issue is
independent of the GC.

The loop in the test case calls a native invoker with an oop live in
rbp. rbp is saved in the native invoker stub's frame. A safepoint is
triggered from the safepoint check in the native invoker. The stack
walking code sees that rbp contains an oop but can't find where that
oop is stored. That's because stack walking updates the caller's frame
with the location of rbp in the callee on calls to
frame::sender(). But the current code sets the last java frame to be
the compiled frame where rbp is live. So there's no call to
frame::sender() to update the location rbp. The fix I propose is that
the frame of the native invoker be visible by stack walking. On a
safepoint, stack walking starts from the native invoker thread, then
calls frame::sender() to move to the compiled frame. That causes rbp
to be properly recorded with its location in the native invoker frame.

Same problem affects both x86 and aarch64. I've tested this patch with:

make run-test TEST="java/foreign" TEST_VM_OPTS="-Xcomp" JTREG="TIMEOUT_FACTOR=10"

on both platforms.

/cc hotspot-compiler


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8259937: guarantee(loc != NULL) failed: missing saved register with native invoker

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2528/head:pull/2528
$ git checkout pull/2528

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Feb 11, 2021

👋 Welcome back roland! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the hotspot-compiler label Feb 11, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Feb 11, 2021

@rwestrel
The hotspot-compiler label was successfully added.

@openjdk openjdk bot added the rfr label Feb 11, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Feb 11, 2021

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Seems reasonable.

@openjdk
Copy link

@openjdk openjdk bot commented Feb 11, 2021

@rwestrel This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8259937: guarantee(loc != NULL) failed: missing saved register with native invoker

Reviewed-by: kvn, jvernee, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 80 new commits pushed to the master branch:

  • fe8e370: 8262188: Add test to verify trace page sizes logging on Linux
  • 0a7fff4: 8261636: The test mapping in hugetlbfs_sanity_check should consider LargePageSizeInBytes
  • 702ca62: 8262185: G1: Prune collection set candidates early
  • 8bc8542: 8262195: Harden tests that use the HostsFileNameService (jdk.net.hosts.file property)
  • 20c93b3: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload
  • ddd550a: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed
  • 03d888f: 8261804: Remove field _processing_is_mt, calculate it instead
  • 6800ba4: 8257500: Drawing MultiResolutionImage with ImageObserver "leaks" memory
  • 65a245e: 8262329: Fix JFR parser exception messages
  • a4c2496: 8259535: ECDSA SignatureValue do not always have the specified length
  • ... and 70 more: https://git.openjdk.java.net/jdk/compare/a6a7e4398aae9568e39650341aa3255ab23f9961...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Feb 11, 2021
@adinn
Copy link
Contributor

@adinn adinn commented Feb 12, 2021

The code here looks ok. I'm slightly concerned about the consequences of adding a new stack frame visible to stack walking code. Does this have the potential to break serviceability code that reports and/or analyzes stack frames (whether that's code in OpenJDK or 3rd party code)?

@rwestrel
Copy link
Contributor Author

@rwestrel rwestrel commented Feb 12, 2021

The code here looks ok. I'm slightly concerned about the consequences of adding a new stack frame visible to stack walking code. Does this have the potential to break serviceability code that reports and/or analyzes stack frames (whether that's code in OpenJDK or 3rd party code)?

Thanks for the review. The native invoker code is new in jdk 16 so it's unlikely tools already rely of some specific layout.

@iwanowww are you the author of the native invoker code? What do you think?

@JornVernee
Copy link
Member

@JornVernee JornVernee commented Feb 23, 2021

@adinn I'm not aware of any such use-cases (whether in the JDK or elsewhere). They would only be affected if they were using Panama native calls, which were introduce pretty recently, and are also still in incubator state.

Inside the JDK the only place where this code is currently being used is in the jdk/java/foreign test stuite, as well as in the internal implementation of the Panama linker. If that test suite still passes I'm happy to call it safe.

Copy link
Member

@JornVernee JornVernee left a comment

LGTM, but I've left some suggestions in line.

Thanks for cleaning up the frame layout code. Having a fixed frame is much better than repeatedly modifying the stack pointer.

I've also done some downstream stress testing with jextract, and everything works as expected.


__ movptr(Address(r15_thread, JavaThread::saved_rbp_address_offset()), rsp); // rsp points at saved RBP
Copy link
Member

@JornVernee JornVernee Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The saved_rbp_address_offset field was added to JavaFrameAnchor just to serve this particular use case. It should be cleaned up now that it is no longer used.

If you don't want to take care of that, we can file a followup.

There is code in frame_*.cpp, thread_*.hpp, and javaFrameAnchor_*.hpp that deals with this field (not on all platforms/archs though). See https://github.com/openjdk/jdk/pull/634/files & https://github.com/openjdk/jdk/pull/1711/files for the original diffs.


public class TestLinkToNativeRBP {
final static CLinker abi = CLinker.getInstance();
static final LibraryLookup lookup = LibraryLookup.ofDefault();
Copy link
Member

@JornVernee JornVernee Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default library can be unreliable (but we've kept it in lieu of something better, which is still in the pipeline). We've had problems with it in the past since it acts differently in different environments, so we try to avoid it in tests.

It would be more robust to add a small test library that defines a dummy function and then depend on that instead.

If you've not done this before; you can just add a lib<Name>.c file to the same directory as the main test file and the build system will compile it for you. You can then load it with LibraryLookup.ofLibrary("<Name>") in the test. See the test/jdk/java/foreign/stackwalk directory for an example (one noteworthy thing is that the function has to be explicitly exported in order to work on Windows. See the example).

@openjdk
Copy link

@openjdk openjdk bot commented Feb 24, 2021

⚠️ @rwestrel This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

This reverts commit cb9dd24.
@rwestrel
Copy link
Contributor Author

@rwestrel rwestrel commented Feb 24, 2021

@JornVernee thanks for the review and the comments. The new commits should address them (cb9dd24 was pushed by accident and reverted).

Copy link
Member

@JornVernee JornVernee left a comment

Thanks for addressing the comments! Looks good.

@adinn
Copy link
Contributor

@adinn adinn commented Feb 25, 2021

@JornVernee I'm not clear that your response addresses my point. I'm concerned that a thread stack dump reported by serviceability code may contain an extra frame for the stub call. This could occur while the Java thread is still in native and it could also include the case wher the native call re-enters into Java i.e. the extra frame could appear at the top of the stack dump or interleaved between Java method frames.

I don't see how that problem is mitigated by your suggestion that this only relates to Panama API use. Code which consumes any such stack dump (incluing 3rd party code) that might be affected by the presence of this extra frame will not care (or even be aware) that the native callout is a Panama call.

Anyway, since no one from the serviceability team has noted this as a potential problem I'm ok to see the patch proceed.

@iwanowww
Copy link

@iwanowww iwanowww commented Feb 26, 2021

Overall, the fix looks good.

At some point, there was no frame for native invoker set up and native state transitions were put inline in generated code, but that was rewritten.

Regarding the refactorings: I find newly introduced spill_register()/fill_register() methods very confusing.
I'd prefer to see spill_output_registers()/fill_output_registers() instead and an assert in NativeInvokerGenerator constructor (akin to the one in NativeInvokerGenerator::generate() on x86_64):

  assert(_output_registers.length() <= 1
    || (_output_registers.length() == 2 && !_output_registers.at(1)->is_valid()), "no multi-reg returns");

@openjdk openjdk bot removed ready rfr labels Mar 1, 2021
@rwestrel
Copy link
Contributor Author

@rwestrel rwestrel commented Mar 1, 2021

Overall, the fix looks good.

At some point, there was no frame for native invoker set up and native state transitions were put inline in generated code, but that was rewritten.

Regarding the refactorings: I find newly introduced spill_register()/fill_register() methods very confusing.
I'd prefer to see spill_output_registers()/fill_output_registers() instead and an assert in NativeInvokerGenerator constructor (akin to the one in NativeInvokerGenerator::generate() on x86_64):

  assert(_output_registers.length() <= 1
    || (_output_registers.length() == 2 && !_output_registers.at(1)->is_valid()), "no multi-reg returns");

Thanks for the review. I pushed a new change. Does that look ok to you now?

Copy link

@iwanowww iwanowww left a comment

It looks much better now. Thanks!

@openjdk openjdk bot added ready rfr labels Mar 1, 2021
@rwestrel
Copy link
Contributor Author

@rwestrel rwestrel commented Mar 1, 2021

/integrate

@openjdk openjdk bot closed this Mar 1, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Mar 1, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Mar 1, 2021

@rwestrel Since your change was applied there have been 82 commits pushed to the master branch:

  • c569f1d: 8262085: Hovering Metal HTML Tooltips in different windows cause IllegalArgExc on Linux
  • 75bf106: 8262028: Make InstanceKlass::implementor return InstanceKlass
  • fe8e370: 8262188: Add test to verify trace page sizes logging on Linux
  • 0a7fff4: 8261636: The test mapping in hugetlbfs_sanity_check should consider LargePageSizeInBytes
  • 702ca62: 8262185: G1: Prune collection set candidates early
  • 8bc8542: 8262195: Harden tests that use the HostsFileNameService (jdk.net.hosts.file property)
  • 20c93b3: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload
  • ddd550a: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed
  • 03d888f: 8261804: Remove field _processing_is_mt, calculate it instead
  • 6800ba4: 8257500: Drawing MultiResolutionImage with ImageObserver "leaks" memory
  • ... and 72 more: https://git.openjdk.java.net/jdk/compare/a6a7e4398aae9568e39650341aa3255ab23f9961...master

Your commit was automatically rebased without conflicts.

Pushed as commit 6baecf3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler integrated
5 participants