Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8272526: Cleanup ThreadStateTransition class #5128

Closed
wants to merge 2 commits into from

Conversation

pchilano
Copy link
Contributor

@pchilano pchilano commented Aug 16, 2021

Hi all,

Please review the following cleanup of class ThreadStateTransition. I added method transition_from_vm and removed the generic transition method which was covering those transitions. This not only makes the public API more consistent given that we already have transition_from_java and transition_from_native but it also allows for some common code refactoring and it also makes more clear which transitions we exactly need to deal with and what actions need to be executed.

I adjusted the expected error message in test InternalErrorTest.java to match the _thread_in_vm case in check_and_handle_async_exceptions() because with this patch we transition to Java only after having checked the exit conditions, rather than today where we process safepoint/handshakes, set state to _thread_in_Java and then process the exit conditions. Ideally I would remove the _thread_in_Java case from the switch statement but we still have the handle_polling_page_exception() case which does everything in Java and never transitions (I will address that in another RFE), so I only adjusted the error message there in case we hit that path in this test.

Tested in mach5 tiers 1-7.

Thanks,
Patricio


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5128/head:pull/5128
$ git checkout pull/5128

Update a local copy of the PR:
$ git checkout pull/5128
$ git pull https://git.openjdk.java.net/jdk pull/5128/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5128

View PR using the GUI difftool:
$ git pr show -t 5128

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5128.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 16, 2021

👋 Welcome back pchilanomate! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 16, 2021

@pchilano The following labels will be automatically applied to this pull request:

  • hotspot
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Aug 16, 2021
@pchilano
Copy link
Contributor Author

/label add hotspot-runtime

@pchilano
Copy link
Contributor Author

/label remove hotspot

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Aug 16, 2021
@openjdk
Copy link

openjdk bot commented Aug 16, 2021

@pchilano
The hotspot-runtime label was successfully added.

@openjdk openjdk bot removed the hotspot hotspot-dev@openjdk.org label Aug 16, 2021
@openjdk
Copy link

openjdk bot commented Aug 16, 2021

@pchilano
The hotspot label was successfully removed.

@pchilano
Copy link
Contributor Author

/label remove serviceability

@openjdk openjdk bot removed the serviceability serviceability-dev@openjdk.org label Aug 16, 2021
@openjdk
Copy link

openjdk bot commented Aug 16, 2021

@pchilano
The serviceability label was successfully removed.

@pchilano pchilano marked this pull request as ready for review August 16, 2021 21:24
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 16, 2021
@mlbridge
Copy link

mlbridge bot commented Aug 16, 2021

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Patricio,

Generally this looks good but I have a few comments/queries.

Thanks,
David

clear_pending_exception = false;
}
}
ThreadStateTransition::transition_from_native(thread, _thread_in_Java, true /* check_asyncs */);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is much cleaner!

assert(thread->thread_state() == _thread_in_native, "coming from wrong thread state");
assert(!thread->has_last_Java_frame() || thread->frame_anchor()->walkable(), "Unwalkable stack in native->vm transition");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to have lost this assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unintended, restored.

}
~ThreadInVMfromNative() {
assert(_thread->thread_state() == _thread_in_vm, "coming from wrong thread state");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you drop this assert? It guards against unexpected/unbalanced intermediate state changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert made sense before but now we are just using transition_from_vm() which will already assert the JT is in vm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I missed that.

// We are leaving the VM at this point and going directly to native code.
// Block, if we are in the middle of a safepoint synchronization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment no longer correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the sentence "We are leaving the VM at this point and going directly to native code." is correct but I'm not sure what's the point of adding it. Otherwise we would have to add it for all transitions. As for "Block, if we are in the middle of a safepoint synchronization." it was true before when we used transition(), but now we just go directly to native since the safepoint will be able to progress without the need to block (equivalent to what we do in ~ThreadInVMfromNative).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't realized we had such asymmetry with the two different VM->native transitions. Though the circumstances of the two are somewhat different.

But I can't conjure up any bad scenarios here given any safepoint/handshake is racing with this transition anyway.

@@ -1644,7 +1646,7 @@ void JavaThread::check_and_handle_async_exceptions() {
case _thread_in_Java: {
ThreadInVMfromJava tiv(this);
JavaThread* THREAD = this;
Exceptions::throw_unsafe_access_internal_error(THREAD, __FILE__, __LINE__, "a fault occurred in a recent unsafe memory access operation in compiled Java code");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording is somewhat deliberate as the unsafe access if not detected synchronously with the actual memory operation - hence it is a "recent unsafe memory access operation".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, maybe I should instead change the error message for the _thread_in_vm case to be "a fault occurred in a recent unsafe memory access operation"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it was only the _thread_in_Java case that was special because that was where JIT'd code could have a series of unsafe memory accesses, and the exception won't be thrown until we hit a specific async exception check point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So today we always reach the _async_unsafe_access_error case of check_and_handle_async_exceptions() with a state of _thread_in_Java, except when called from jni_check_async_exceptions() where we are in vm, so I don't know why the message refers exclusively to compiled code. I run test InternalErrorTest.java in baseline with just the interpreter("main/othervm -Xint -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI InternalErrorTest") and still passes. Here is the code that sets the pending unsafe access error when the thread receives SIGBUS while in Java: https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L249

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the unsafe access error happens on Unsafe_CopySwapMemory0 like in the test then we will be either in vm or native (depending on src & dst) which will also output the same error message. We can differentiate between this cases if we want by saving the state in SharedRuntime::handle_unsafe_access().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting back to this. There are a lot of subtleties around the throwing of an async exception related to "unsafe access errors" and I had to dig around to refresh my memory on a few things. End result: changing the error messages to say the same thing is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review David!

assert(!thread->has_last_Java_frame() || thread->frame_anchor()->walkable(), "Unwalkable stack in native->vm transition");

// Change to transition state and ensure it is seen by the VM thread.
thread->set_thread_state_fence(_thread_in_native_trans);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we no longer need _thread_in_native_trans? If it is truly not needed here then it seems to no longer be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to transition to an unsafe state before polling to avoid the issue of a safepoint/handshake starting right after we checked but before we transition to the unsafe state. Since we are in the vm we can use _thread_in_vm instead which is already unsafe. We still use the _thread_in_native_trans state on the native wrappers when transitioning out of native back to Java, but I will try to address that in another RFE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of liked the trans states as indicators of unsafe states as it made the logic consistent and simple:

  • advance to current state + 1 (trans state)
  • check for safepoint
  • advance to final state

and you didn't have to think too much about what the initial state was.

But I can see that logically any unsafe state works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but whenever we are in safepoint/handshake code we are already in vm, that's why I actually think directly setting _thread_in_vm here makes more sense than having additional states just to make the poll. In fact using transitional states(along with the code in handle_polling_page_exception) forces us to need yet another transition wrapper for handshakes(ThreadInVMForHandshake) just to switch to vm and back.
Reading now the description of transition states in globalDefinitions.hpp I see they were more useful back then with the old safepoint logic:

"These extra states makes it possible for the safepoint code to handle certain thread_states without having to suspend the thread - making the safepoint code faster."

In SafepointSynchronize::block() there was a faster path for _thread_in_native_trans and _thread_blocked_trans (https://github.com/openjdk/jdk/blame/66aa45649ad36ed41c0241ded767d465248eda3c/src/hotspot/share/runtime/safepoint.cpp#L816).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes things have changed a lot :)

@pchilano
Copy link
Contributor Author

Hi David,

Thanks for looking at this.

Patricio

Copy link
Contributor

@robehn robehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit worried about the clear pending exception which we had some problems with before.
So I ran my stress tests, it passed, looks good thanks!

@openjdk
Copy link

openjdk bot commented Aug 17, 2021

@pchilano This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8272526: Cleanup ThreadStateTransition class

Reviewed-by: dholmes, rehn, coleenp

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 130 new commits pushed to the master branch:

  • 22ef4f0: 5015261: NPE may be thrown if JDesktopIcon is set to null on a JInternalFrame
  • 9bc0232: 8269223: -Xcheck:jni WARNINGs working with fonts on Linux
  • 2ff4c01: 8271600: C2: CheckCastPP which should closely follow Allocate is sunk of a loop
  • ad92033: 8272736: [JVMCI] Add API for reading and writing JVMCI thread locals
  • 709b591: 8272553: several hotspot runtime/CommandLine tests don't check exit code
  • 1884072: 8265253: javac -Xdoclint:all gives "no comment" warning for code that can't be commented
  • 594e516: 8272778: Consolidate is_instance and is_instance_inlined in java_lang_String
  • d542745: 8267894: Skip work for empty regions in G1 Full GC
  • 741f58c: 8272417: ZGC: fastdebug build crashes when printing ClassLoaderData
  • b7f75c0: 8271142: package help is not displayed for missing X11/extensions/Xrandr.h
  • ... and 120 more: https://git.openjdk.java.net/jdk/compare/7fc99cf9b69f99fc78709e57b92cd88e09577d0f...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 17, 2021
@pchilano
Copy link
Contributor Author

I was a bit worried about the clear pending exception which we had some problems with before.
So I ran my stress tests, it passed, looks good thanks!

Great, thanks for the review and extra testing Robbin!

Patricio

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this simplifies things a lot and consolidates the code when you transition from the _thread_in_vm. Essentially the real states one has to think about are _thread_in_native, _thread_in_vm and _thread_in_Java, so there's a function for each. Really nice cleanup!


SafepointMechanism::process_if_requested(thread);
thread->set_thread_state_fence(_thread_in_vm);
SafepointMechanism::process_if_requested_with_exit_check(thread, check_asyncs);
thread->set_thread_state(to);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always wonder why all the set_thread_states don't have a fence. Is it really that bad for performance to not have this unconditional, to avoid any transitions potentially needing but missing the fence?

protected:
void trans(JavaThreadState from, JavaThreadState to) { transition(_thread, from, to); }
void trans_from_java(JavaThreadState to) { transition_from_java(_thread, to); }
void trans_from_native(JavaThreadState to) { transition_from_native(_thread, to); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so happy to see these go away.

@pchilano
Copy link
Contributor Author

So this simplifies things a lot and consolidates the code when you transition from the _thread_in_vm. Essentially the real states one has to think about are _thread_in_native, _thread_in_vm and _thread_in_Java, so there's a function for each. Really nice cleanup!

Thanks for the review Coleen!

Patricio

@pchilano
Copy link
Contributor Author

Are you okay with v2 David?

Thanks,
Patricio

@@ -1644,7 +1646,7 @@ void JavaThread::check_and_handle_async_exceptions() {
case _thread_in_Java: {
ThreadInVMfromJava tiv(this);
JavaThread* THREAD = this;
Exceptions::throw_unsafe_access_internal_error(THREAD, __FILE__, __LINE__, "a fault occurred in a recent unsafe memory access operation in compiled Java code");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting back to this. There are a lot of subtleties around the throwing of an async exception related to "unsafe access errors" and I had to dig around to refresh my memory on a few things. End result: changing the error messages to say the same thing is fine.

@pchilano
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Aug 24, 2021

Going to push as commit 7454306.
Since your change was applied there have been 133 commits pushed to the master branch:

  • 0597cde: 8221360: Eliminate Shared_DirtyCardQ_lock
  • 928b972: 8271930: Simplify end_card calculation in G1BlockOffsetTablePart::verify
  • 7f80683: 8272783: Epsilon: Refactor tests to improve performance
  • 22ef4f0: 5015261: NPE may be thrown if JDesktopIcon is set to null on a JInternalFrame
  • 9bc0232: 8269223: -Xcheck:jni WARNINGs working with fonts on Linux
  • 2ff4c01: 8271600: C2: CheckCastPP which should closely follow Allocate is sunk of a loop
  • ad92033: 8272736: [JVMCI] Add API for reading and writing JVMCI thread locals
  • 709b591: 8272553: several hotspot runtime/CommandLine tests don't check exit code
  • 1884072: 8265253: javac -Xdoclint:all gives "no comment" warning for code that can't be commented
  • 594e516: 8272778: Consolidate is_instance and is_instance_inlined in java_lang_String
  • ... and 123 more: https://git.openjdk.java.net/jdk/compare/7fc99cf9b69f99fc78709e57b92cd88e09577d0f...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Aug 24, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 24, 2021
@openjdk
Copy link

openjdk bot commented Aug 24, 2021

@pchilano Pushed as commit 7454306.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@mlbridge
Copy link

mlbridge bot commented Sep 1, 2021

Mailing list message from David Holmes on hotspot-runtime-dev:

On 21/08/2021 8:05 am, Coleen Phillimore wrote:

On Tue, 17 Aug 2021 16:50:50 GMT, Patricio Chilano Mateo <pchilanomate at openjdk.org> wrote:

Hi all,

Please review the following cleanup of class ThreadStateTransition. I added method transition_from_vm and removed the generic transition method which was covering those transitions. This not only makes the public API more consistent given that we already have transition_from_java and transition_from_native but it also allows for some common code refactoring and it also makes more clear which transitions we exactly need to deal with and what actions need to be executed.

I adjusted the expected error message in test InternalErrorTest.java to match the _thread_in_vm case in check_and_handle_async_exceptions() because with this patch we transition to Java only after having checked the exit conditions, rather than today where we process safepoint/handshakes, set state to _thread_in_Java and then process the exit conditions. Ideally I would remove the _thread_in_Java case from the switch statement but we still have the handle_polling_page_exception() case which does everything in Java and never transitions (I will address that in another RFE), so I only adjusted the error message there in case we hit that path in this test.

Tested in mach5 tiers 1-7.

Thanks,
Patricio

Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision:

restore assert in transition_from_native()

So this simplifies things a lot and consolidates the code when you transition from the _thread_in_vm. Essentially the real states one has to think about are _thread_in_native, _thread_in_vm and _thread_in_Java, so there's a function for each. Really nice cleanup!

src/hotspot/share/runtime/interfaceSupport.inline.hpp line 87:

85: thread->set_thread_state_fence(_thread_in_vm);
86: SafepointMechanism::process_if_requested_with_exit_check(thread, check_asyncs);
87: thread->set_thread_state(to);

I always wonder why all the set_thread_states don't have a fence. Is it really that bad for performance to not have this unconditional, to avoid any transitions potentially needing but missing the fence?

We do not even have unconditional release semantics on set_thread_state
because it was deemed too much of a performance risk on x86. I'm not so
sure that is really true, but a fence is a lot more heavyweight than
release semantics. See related discussion in

https://bugs.openjdk.java.net/browse/JDK-8267585

That aside I really object to adding ordering constraints "just in case
we needed it but we missed it". Understanding races and ordering
constraints is hard enough when you can see the participating code, and
nigh impossible when there are unnecessary constraints applied. Anywhere
there is a full fence we should have a comment describing exactly why a
full fence is needed at that point IMO.

Cheers,
David
-----

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
4 participants