Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8274379: Allow process of unsafe access errors in check_special_condition_for_native_trans #5741

Closed
wants to merge 6 commits into from

Conversation

pchilano
Copy link
Contributor

@pchilano pchilano commented Sep 28, 2021

Hi,


Please review the following patch to allow processing of unsafe access errors in check_special_condition_for_native_trans().

Today we special case unsafe access errors from other async exceptions in that we don't process them when transitioning from native back to Java. Code comments in check_special_condition_for_native_trans() mention that unsafe access errors should not be handled because that may block while creating the exception. But that should not be an issue since we can always make sure we call process_if_requested_with_exit_check() after the call to throw_unsafe_access_internal_error() to process any pending operations not already handled in a ThreadBlockInVM wrapper (today that only means suspend requests and object reallocation operations).

By removing this special treatment for unsafe access errors we can also simplify the async exception support API. For instance we can remove _async_exception_condition and simplify some of the supporting methods. I also removed the _thread_in_native case from the switch statement in check_and_handle_async_exceptions() since we never call that method in that state.

Testing by running tiers1-6 in mach5.

Thanks,
Patricio


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8274379: Allow process of unsafe access errors in check_special_condition_for_native_trans

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5741/head:pull/5741
$ git checkout pull/5741

Update a local copy of the PR:
$ git checkout pull/5741
$ git pull https://git.openjdk.java.net/jdk pull/5741/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5741

View PR using the GUI difftool:
$ git pr show -t 5741

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5741.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Sep 28, 2021

👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@pchilano
Copy link
Contributor Author

@pchilano pchilano commented Sep 28, 2021

/label add hotspot-runtime

@openjdk
Copy link

@openjdk openjdk bot commented Sep 28, 2021

@pchilano The following labels will be automatically applied to this pull request:

  • hotspot
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability hotspot hotspot-runtime labels Sep 28, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 28, 2021

@pchilano
The hotspot-runtime label was successfully added.

@pchilano
Copy link
Contributor Author

@pchilano pchilano commented Sep 28, 2021

/label remove serviceability

@pchilano
Copy link
Contributor Author

@pchilano pchilano commented Sep 28, 2021

/label remove hotspot

@openjdk openjdk bot removed the serviceability label Sep 28, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 28, 2021

@pchilano
The serviceability label was successfully removed.

@openjdk openjdk bot removed the hotspot label Sep 28, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 28, 2021

@pchilano
The hotspot label was successfully removed.

@pchilano pchilano marked this pull request as ready for review Sep 28, 2021
@openjdk openjdk bot added the rfr label Sep 28, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 28, 2021

Webrevs

@robehn
Copy link
Contributor

@robehn robehn commented Sep 30, 2021

Hi @pchilano,

Yes this seem like an improvement, so process_if_requested_with_exit_check is the only place were we handle async exceptions. (except JNI)
It looks good, but I have a really hard time looking at the diff/code to determine if this behaves differently in any bad ways.
I trust you have done the testing needed to be sure.

Thanks, Robbin

robehn
robehn approved these changes Sep 30, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 30, 2021

@pchilano This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8274379: Allow process of unsafe access errors in check_special_condition_for_native_trans

Reviewed-by: rehn, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 107 new commits pushed to the master branch:

  • 9945f7a: 8274318: Replace 'for' cycles with iterator with enhanced-for in java.management
  • 754bc82: 8274525: Replace uses of StringBuffer with StringBuilder in java.xml
  • 4e7d7ca: 8273711: Remove redundant stream() call before forEach in jdk.jlink
  • f3cedbe: 8274464: Remove redundant stream() call before forEach in java.* modules
  • c10de35: 8262944: Improve exception message when automatic module lists provider class not in JAR file
  • b8af6a9: 8273917: Remove 'leaf' ranking for Mutex
  • c80a612: 8273381: Assert in PtrQueueBufferAllocatorTest.stress_free_list_allocator_vm
  • 9759fcb: 8274496: Use String.contains() instead of String.indexOf() in java.desktop
  • cdf8930: 8274625: Search field placeholder behavior
  • df125f6: 8273410: IR verification framework fails with "Should find method name in validIrRulesMap"
  • ... and 97 more: https://git.openjdk.java.net/jdk/compare/79865cd797737f22cd4efe7e9c03ddbb86095e64...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Sep 30, 2021
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Patricio,

Like Robbin I find it very difficult to actually track the potential changes in behaviour here to determine that they are in fact correct. Especially as it is not clear what the actual rules are supposed to be in the first place.

More broadly I'm concerned about lumping these together as the unsafe_access_error is not an async exception in the same way that thread.stop produces. There is a degree of asynchrony to them but the rules for checking for a pending ThreadDeath from Thread.stop are quite distinct from those related to unsafe_access_error.

I'm also a bit confused about the original check here as I thought that unsafe_access_error can only arise due to execution of JIT'd code, during which the thread is _thread_in_java - so I'm not sure how the native-trans check came to be needed in the first place. ??

Thanks,
David

// We might have blocked in a ThreadBlockInVM wrapper in the call above so make sure we process pending
// suspend requests and object reallocation operations if any since we might be going to Java after this.
SafepointMechanism::process_if_requested_with_exit_check(this, true /* check asyncs */);
Copy link
Member

@dholmes-ora dholmes-ora Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to wonder why this isn't handled in the TBIVM?

Copy link
Contributor Author

@pchilano pchilano Oct 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 8270085 we had to disable by default processing of suspend requests due to possible deadlocks. So although suspend requests are implemented with handshakes, calls to process_if_requested() from ~TBIVM will not process them. As for object reallocation I'm not sure but I think it's just because there is no need to process them until we are about to go back to Java.

@pchilano
Copy link
Contributor Author

@pchilano pchilano commented Oct 1, 2021

Hi David,

Thanks for looking at this.

Hi Patricio,

Like Robbin I find it very difficult to actually track the potential changes in behaviour here to determine that they are in fact correct. Especially as it is not clear what the actual rules are supposed to be in the first place.

Most of the changes are just cleanup of the API. The only change in behavior is that now in the transition from native back to Java we will throw the InternalError exception if there is a pending unsafe access error. Today we will return to the Java method and throw the exception at some undefined point later on, i.e. whenever we happen to hit the next transition from vm to Java where we will check the async exception condition. So for those unsafe access errors that happened within the context of a native method we'll be throwing an async exception closer to where the unsafe access actually happened. Maybe I can separate the commits into the functional changes and the cleanup? Or otherwise I can just leave the cleanup for another RFE.

More broadly I'm concerned about lumping these together as the unsafe_access_error is not an async exception in the same way that thread.stop produces. There is a degree of asynchrony to them but the rules for checking for a pending ThreadDeath from Thread.stop are quite distinct from those related to unsafe_access_error.

Right, but today they are already being handled together. They both use the _async_exception_condition field and except for this native to Java special case that I'm removing they are checked and handled at the sample places. If at some point later on we find a place where we should check and handle one but not the other we can always assign them different bits from the suspend_flag. Although I doubt that will happen since this code has been like this forever. Also if we were to completely decouple unsafe access errors from async exceptions and somehow process them synchronously then this change just reduces the amount of code needed to be changed. We just wouldn't set the suspend _flag bit and remove the else condition from check_and_handle_async_exceptions().

I'm also a bit confused about the original check here as I thought that unsafe_access_error can only arise due to execution of JIT'd code, during which the thread is _thread_in_java - so I'm not sure how the native-trans check came to be needed in the first place. ??

An unsafe access error can also happen when the JT has a state of _thread_in_vm or _thread_in_native while executing methods like Unsafe_CopyMemory0 or Unsafe_CopySwapMemory0. In fact that is actually where we set the unsafe access error condition in test InternalErrorTest.java.

Thanks,
Patricio

@pchilano
Copy link
Contributor Author

@pchilano pchilano commented Oct 1, 2021

Hi Robbin,

Hi @pchilano,

Yes this seem like an improvement, so process_if_requested_with_exit_check is the only place were we handle async exceptions. (except JNI) It looks good, but I have a really hard time looking at the diff/code to determine if this behaves differently in any bad ways. I trust you have done the testing needed to be sure.

Thanks for the review! Please check my answer to David regarding maybe separating the commits or just leaving the cleanup to another RFE.

Thanks,
Patricio

@dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Oct 2, 2021

Hi Patricio,

Separate commits would be good if you can easily manage that, but if it is too much effort then don't bother.

I'm surprised to see that Unsafe_CopySwapMemory0 is UNSAFE_LEAF rather than UNSAFE_ENTRY.

Thanks,
David

@pchilano
Copy link
Contributor Author

@pchilano pchilano commented Oct 5, 2021

Hi David,

Hi Patricio,

Separate commits would be good if you can easily manage that, but if it is too much effort then don't bother.

I ended up separating the commits since it makes the change easier to review. I separated them into a commit for the actual functional changes, a commit for some general cleanups that could have been done independent of this bug, and the cleanup of the API possible after the functional changes.

I'm surprised to see that Unsafe_CopySwapMemory0 is UNSAFE_LEAF rather than UNSAFE_ENTRY.

The comment says it's to avoid blocking a GC in case both src and dst are in native memory and the copy is large. We still have a JVM_ENTRY_FROM_LEAF block on the else branch for the other cases. I guess we expect those to be shorter so there is no need to be in native.

Thanks,
Patricio

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Hi Patricio,

Thanks for the separate commits. I hope github makes it clear where comments are being directed.

This is a comment on the initial functional change, which generally seems okay. One detauiled comment below.

Thanks,
David

@@ -1847,21 +1847,17 @@ void JavaThread::check_special_condition_for_native_trans(JavaThread *thread) {
assert(thread->thread_state() == _thread_in_native_trans, "wrong state");
assert(!thread->has_last_Java_frame() || thread->frame_anchor()->walkable(), "Unwalkable stack in native->Java transition");

thread->set_thread_state(_thread_in_vm);
Copy link
Member

@dholmes-ora dholmes-ora Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to change the thread-state explicitly here when this logic is being processed in the middle of a transition from native->java. So what we end up doing is native->native_trans->in_vm->in_java (and possibly some additional changes from in_vm to blocked and back). I understand why we need to be "in vm" before calling process_if_requested... but it makes the thread state transition story rather messy (just imagine trying to draw a state machine model for this :) ).

Copy link
Contributor Author

@pchilano pchilano Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we call JavaThread::check_special_condition_for_native_trans() we are calling into the vm so to me it actually makes more sense to set the state to _thread_in_vm there. Specially given that this is not some leaf method, but we actually process safepoint/handshakes/stackwatermarks and all other operations we check for in handle_special_runtime_exit_condition().
Staying in native_trans also doesn't simplify the state machine in my opinion and actually makes it more complex because inside process_if_requested_with_exit_check() we have to keep track of all these states different than _thread_in_vm: switch case with valid states for safepoint code, transition wrapper for state bookkeeping in handshake code (so we still have the native_trans->vm change), and having to manually switch to _thread_blocked and back because we cannot use ThreadBlockInVM in wait_for_object_deoptimization().

Note: As a future change I think we could just avoid the native_trans state altogether and just poll in a state of _thread_in_vm, since the only purpose of it is to poll in an unsafe state. Or we could even set the state to _thread_in_Java already, so that when there is nothing to process there are no extra changes of state needed, and when there are pending operations we can use a ThreadInVMfromJava wrapper here.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Cleanup commit looks good.

JavaThread* THREAD = this;
Exceptions::throw_unsafe_access_internal_error(THREAD, __FILE__, __LINE__, "a fault occurred in an unsafe memory access operation");
return;
break;
Copy link
Member

@dholmes-ora dholmes-ora Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than now including the final assertion to be executed, I don't see why we need to change from "return" to "break". That final assertion seems unnecessary as it basically checks that Exceptions::throw... actually threw. I'd be tempted to just delete the final assertion and keep the return statements. But not a major concern either way.

Copy link
Contributor Author

@pchilano pchilano Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

The final commit for API changes was a bit harder to follow, but overall seems okay.

One comment/suggestion below but otherwise I'll use this to approve the overal change.

Thanks,
David

_pending_async_exception = java_throwable;
set_suspend_flag(_has_async_exception);
Copy link
Member

@dholmes-ora dholmes-ora Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear why we didn't just simplify set_pending_async_exception.

Copy link
Contributor Author

@pchilano pchilano Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, maybe it's better to do that. I restored set_pending_async_exception() and simplified it instead.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 7, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 7/10/2021 5:27 am, Patricio Chilano Mateo wrote:

On Wed, 6 Oct 2021 00:51:18 GMT, David Holmes <dholmes at openjdk.org> wrote:

Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:

address David comments

src/hotspot/share/runtime/thread.cpp line 1642:

1640: // suspend requests and object reallocation operations if any since we might be going to Java after this.
1641: SafepointMechanism::process_if_requested_with_exit_check(this, true /* check asyncs */);
1642: break;

Other than now including the final assertion to be executed, I don't see why we need to change from "return" to "break". That final assertion seems unnecessary as it basically checks that Exceptions::throw... actually threw. I'd be tempted to just delete the final assertion and keep the return statements. But not a major concern either way.

Fixed.

src/hotspot/share/runtime/thread.cpp line 1850:

1848: assert(!thread->has_last_Java_frame() || thread->frame_anchor()->walkable(), "Unwalkable stack in native->Java transition");
1849:
1850: thread->set_thread_state(_thread_in_vm);

It seems odd to change the thread-state explicitly here when this logic is being processed in the middle of a transition from native->java. So what we end up doing is native->native_trans->in_vm->in_java (and possibly some additional changes from in_vm to blocked and back). I understand why we need to be "in vm" before calling process_if_requested... but it makes the thread state transition story rather messy (just imagine trying to draw a state machine model for this :) ).

Once we call JavaThread::check_special_condition_for_native_trans() we are calling into the vm so to me it actually makes more sense to set the state to _thread_in_vm there. Specially given that this is not some leaf method, but we actually process safepoint/handshakes/stackwatermarks and all other operations we check for in handle_special_runtime_exit_condition().
Staying in native_trans also doesn't simplify the state machine in my opinion and actually makes it more complex because inside process_if_requested_with_exit_check() we have to keep track of all these states different than _thread_in_vm: switch case with valid states for safepoint code, transition wrapper for state bookkeeping in handshake code (so we still have the native_trans->vm change), and having to manually switch to _thread_blocked and back because we cannot use ThreadBlockInVM in wait_for_object_deoptimization().

Note: As a future change I think we could just avoid the native_trans state altogether and just poll in a state of _thread_in_vm, since the only purpose of it is to poll in an unsafe state. Or we could even set the state to _thread_in_Java already, so that when there is nothing to process there are no extra changes of state needed, and when there are pending operations we can use a ThreadInVMfromJava wrapper here.

I like the sound of that future change. :)

Thanks,
David

@pchilano
Copy link
Contributor Author

@pchilano pchilano commented Oct 7, 2021

Thanks for the review David!

@pchilano
Copy link
Contributor Author

@pchilano pchilano commented Oct 11, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Oct 11, 2021

Going to push as commit 0d80f6c.
Since your change was applied there have been 150 commits pushed to the master branch:

  • b870468: 8274347: Passing a nested switch expression as a parameter causes an NPE during compile
  • 110e38d: 8274753: ZGC: SEGV in MetaspaceShared::link_shared_classes
  • b7af890: 8274430: Remove some debug error printing code added in JDK-8017163
  • aaf2401: 8274927: Remove unnecessary G1ArchiveAllocator code
  • c55dd36: 8275008: gtest build failure due to stringop-overflow warning with gcc11
  • 3edee1e: 8272723: Don't use Access API to access primitive fields
  • 49f8ce6: 8274773: [TESTBUG] UnsafeIntrinsicsTest intermittently fails on weak memory model platform
  • c032186: 8272968: AArch64: Remove redundant matching rules for commutative ops
  • a05873a: 8274952: jdk/jfr/api/consumer/TestRecordedFrameType.java failed when c1 disabled
  • 5ecc99b: 8274620: resourcehogs/serviceability/sa/TestHeapDumpForLargeArray.java is timing out
  • ... and 140 more: https://git.openjdk.java.net/jdk/compare/79865cd797737f22cd4efe7e9c03ddbb86095e64...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Oct 11, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Oct 11, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Oct 11, 2021

@pchilano Pushed as commit 0d80f6c.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime integrated
3 participants