-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8274379: Allow process of unsafe access errors in check_special_condition_for_native_trans #5741
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into |
|
/label add hotspot-runtime |
|
@pchilano |
|
/label remove serviceability |
|
/label remove hotspot |
|
@pchilano |
|
@pchilano |
Webrevs
|
|
Hi @pchilano, Yes this seem like an improvement, so process_if_requested_with_exit_check is the only place were we handle async exceptions. (except JNI) Thanks, Robbin |
|
@pchilano This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 107 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
dholmes-ora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Patricio,
Like Robbin I find it very difficult to actually track the potential changes in behaviour here to determine that they are in fact correct. Especially as it is not clear what the actual rules are supposed to be in the first place.
More broadly I'm concerned about lumping these together as the unsafe_access_error is not an async exception in the same way that thread.stop produces. There is a degree of asynchrony to them but the rules for checking for a pending ThreadDeath from Thread.stop are quite distinct from those related to unsafe_access_error.
I'm also a bit confused about the original check here as I thought that unsafe_access_error can only arise due to execution of JIT'd code, during which the thread is _thread_in_java - so I'm not sure how the native-trans check came to be needed in the first place. ??
Thanks,
David
| // We might have blocked in a ThreadBlockInVM wrapper in the call above so make sure we process pending | ||
| // suspend requests and object reallocation operations if any since we might be going to Java after this. | ||
| SafepointMechanism::process_if_requested_with_exit_check(this, true /* check asyncs */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to wonder why this isn't handled in the TBIVM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 8270085 we had to disable by default processing of suspend requests due to possible deadlocks. So although suspend requests are implemented with handshakes, calls to process_if_requested() from ~TBIVM will not process them. As for object reallocation I'm not sure but I think it's just because there is no need to process them until we are about to go back to Java.
|
Hi David, Thanks for looking at this.
Most of the changes are just cleanup of the API. The only change in behavior is that now in the transition from native back to Java we will throw the InternalError exception if there is a pending unsafe access error. Today we will return to the Java method and throw the exception at some undefined point later on, i.e. whenever we happen to hit the next transition from vm to Java where we will check the async exception condition. So for those unsafe access errors that happened within the context of a native method we'll be throwing an async exception closer to where the unsafe access actually happened. Maybe I can separate the commits into the functional changes and the cleanup? Or otherwise I can just leave the cleanup for another RFE.
Right, but today they are already being handled together. They both use the _async_exception_condition field and except for this native to Java special case that I'm removing they are checked and handled at the sample places. If at some point later on we find a place where we should check and handle one but not the other we can always assign them different bits from the suspend_flag. Although I doubt that will happen since this code has been like this forever. Also if we were to completely decouple unsafe access errors from async exceptions and somehow process them synchronously then this change just reduces the amount of code needed to be changed. We just wouldn't set the suspend _flag bit and remove the else condition from check_and_handle_async_exceptions().
An unsafe access error can also happen when the JT has a state of _thread_in_vm or _thread_in_native while executing methods like Unsafe_CopyMemory0 or Unsafe_CopySwapMemory0. In fact that is actually where we set the unsafe access error condition in test InternalErrorTest.java. Thanks, |
|
Hi Robbin,
Thanks for the review! Please check my answer to David regarding maybe separating the commits or just leaving the cleanup to another RFE. Thanks, |
|
Hi Patricio, Separate commits would be good if you can easily manage that, but if it is too much effort then don't bother. I'm surprised to see that Unsafe_CopySwapMemory0 is UNSAFE_LEAF rather than UNSAFE_ENTRY. Thanks, |
|
Hi David,
I ended up separating the commits since it makes the change easier to review. I separated them into a commit for the actual functional changes, a commit for some general cleanups that could have been done independent of this bug, and the cleanup of the API possible after the functional changes.
The comment says it's to avoid blocking a GC in case both src and dst are in native memory and the copy is large. We still have a JVM_ENTRY_FROM_LEAF block on the else branch for the other cases. I guess we expect those to be shorter so there is no need to be in native. Thanks, |
dholmes-ora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Patricio,
Thanks for the separate commits. I hope github makes it clear where comments are being directed.
This is a comment on the initial functional change, which generally seems okay. One detauiled comment below.
Thanks,
David
| assert(thread->thread_state() == _thread_in_native_trans, "wrong state"); | ||
| assert(!thread->has_last_Java_frame() || thread->frame_anchor()->walkable(), "Unwalkable stack in native->Java transition"); | ||
|
|
||
| thread->set_thread_state(_thread_in_vm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems odd to change the thread-state explicitly here when this logic is being processed in the middle of a transition from native->java. So what we end up doing is native->native_trans->in_vm->in_java (and possibly some additional changes from in_vm to blocked and back). I understand why we need to be "in vm" before calling process_if_requested... but it makes the thread state transition story rather messy (just imagine trying to draw a state machine model for this :) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we call JavaThread::check_special_condition_for_native_trans() we are calling into the vm so to me it actually makes more sense to set the state to _thread_in_vm there. Specially given that this is not some leaf method, but we actually process safepoint/handshakes/stackwatermarks and all other operations we check for in handle_special_runtime_exit_condition().
Staying in native_trans also doesn't simplify the state machine in my opinion and actually makes it more complex because inside process_if_requested_with_exit_check() we have to keep track of all these states different than _thread_in_vm: switch case with valid states for safepoint code, transition wrapper for state bookkeeping in handshake code (so we still have the native_trans->vm change), and having to manually switch to _thread_blocked and back because we cannot use ThreadBlockInVM in wait_for_object_deoptimization().
Note: As a future change I think we could just avoid the native_trans state altogether and just poll in a state of _thread_in_vm, since the only purpose of it is to poll in an unsafe state. Or we could even set the state to _thread_in_Java already, so that when there is nothing to process there are no extra changes of state needed, and when there are pending operations we can use a ThreadInVMfromJava wrapper here.
dholmes-ora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup commit looks good.
src/hotspot/share/runtime/thread.cpp
Outdated
| JavaThread* THREAD = this; | ||
| Exceptions::throw_unsafe_access_internal_error(THREAD, __FILE__, __LINE__, "a fault occurred in an unsafe memory access operation"); | ||
| return; | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than now including the final assertion to be executed, I don't see why we need to change from "return" to "break". That final assertion seems unnecessary as it basically checks that Exceptions::throw... actually threw. I'd be tempted to just delete the final assertion and keep the return statements. But not a major concern either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
dholmes-ora
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final commit for API changes was a bit harder to follow, but overall seems okay.
One comment/suggestion below but otherwise I'll use this to approve the overal change.
Thanks,
David
src/hotspot/share/runtime/thread.cpp
Outdated
| _pending_async_exception = java_throwable; | ||
| set_suspend_flag(_has_async_exception); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear why we didn't just simplify set_pending_async_exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, maybe it's better to do that. I restored set_pending_async_exception() and simplified it instead.
|
Mailing list message from David Holmes on hotspot-runtime-dev: On 7/10/2021 5:27 am, Patricio Chilano Mateo wrote:
I like the sound of that future change. :) Thanks, |
|
Thanks for the review David! |
|
/integrate |
|
Going to push as commit 0d80f6c.
Your commit was automatically rebased without conflicts. |
Hi,
Please review the following patch to allow processing of unsafe access errors in check_special_condition_for_native_trans().
Today we special case unsafe access errors from other async exceptions in that we don't process them when transitioning from native back to Java. Code comments in check_special_condition_for_native_trans() mention that unsafe access errors should not be handled because that may block while creating the exception. But that should not be an issue since we can always make sure we call process_if_requested_with_exit_check() after the call to throw_unsafe_access_internal_error() to process any pending operations not already handled in a ThreadBlockInVM wrapper (today that only means suspend requests and object reallocation operations).
By removing this special treatment for unsafe access errors we can also simplify the async exception support API. For instance we can remove _async_exception_condition and simplify some of the supporting methods. I also removed the _thread_in_native case from the switch statement in check_and_handle_async_exceptions() since we never call that method in that state.
Testing by running tiers1-6 in mach5.
Thanks,
Patricio
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5741/head:pull/5741$ git checkout pull/5741Update a local copy of the PR:
$ git checkout pull/5741$ git pull https://git.openjdk.java.net/jdk pull/5741/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 5741View PR using the GUI difftool:
$ git pr show -t 5741Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5741.diff