-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8331735: UpcallLinker::on_exit races with GC when copying frame anchor #21742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back jvernee! A progress list of the required criteria for merging this PR into |
|
@JornVernee This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 265 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
@JornVernee The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
|
/label remove hotspot |
|
/solves 8343144 |
|
@JornVernee |
|
@JornVernee |
|
@JornVernee |
|
@JornVernee |
|
@JornVernee |
|
Happy to see this addressed and as Jorn noted, thanks to Stefan and Erik for finding the root cause of this issue which was hard to reproduce and debug. |
dholmes-ora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems quite reasonable. Ensuring the correct state for things like updating the frame_anchor is critical, so I wonder if we can assert we are in a safepoint-safe state when doing so?
I had to think long about the async exception deferral ... probably okay.
Thanks
I think we can do this. I've prototyped this here: pr/21742...JornVernee:jdk:SafeFrameAnchor+assert This catches the issue fixed by this patch, and it passes at least tier 1. We'd need something similar in assembly where we touch the frame anchor, is |
Thinking some more about this: there might be other instances of |
fisk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good to me.
xmas92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
Would be nice if if we could assert that we are not in native or blocked when touching the oops as well. Similarly to modifications of the frame anchor. But I agree that it should be done separately.
|
/integrate |
|
Going to push as commit 461ffaf.
Your commit was automatically rebased without conflicts. |
|
@JornVernee Pushed as commit 461ffaf. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
There is a subtle race in
UpcallLinker::on_exitbetween copying of the old frame anchor back into place, and the GC. Since this copy is not atomic, it may briefly appear as if a thread has no last Java frame, while still in the_thread_in_nativestate, which leads to the GC skipping processing of any active Java frames.This code was originally adapted from
JavaCallWrapper::!JavaCallWrapper- the JNI mechanism for upcalls - but in that case the frame anchor copy happens in the_thread_in_vmstate, which means the GC will wait for the thread to get to a safepoint.The solution proposed here is to do the frame anchor copy in the java thread state, before transitioning back to the native state. The java thread state, like the vm thread state, is also 'safe' i.e. the GC will wait for the thread to get to a safepoint, so we can safely do our non-atomic copy of the frame anchor.
Additionally, this PR resolves a similar issue in
on_entry, by moving the clearing of the pending exception (in case native code use a JNI API and didn't handle the exception afterwards). We now also skip checking for async exceptions when transitioning from native to java, so we don't immediately clear them. Any async exceptions will be picked up at the next safepoint instead.Special thanks to @stefank and @fisk for finding the root cause, and @jaikiran for testing and debugging.
Testing: tier 1-4, 20k runs of the failing test on linux-aarch64.
Progress
Issues
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21742/head:pull/21742$ git checkout pull/21742Update a local copy of the PR:
$ git checkout pull/21742$ git pull https://git.openjdk.org/jdk.git pull/21742/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 21742View PR using the GUI difftool:
$ git pr show -t 21742Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21742.diff
Using Webrev
Link to Webrev Comment