Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8308469: [PPC64] Implement alternative fast-locking scheme #14069

Closed

Conversation

TheRealMDoerr
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr commented May 20, 2023

New alternative fast-locking scheme for PPC64. Mostly implemented like on other platforms.
Differences (also explained by comments in code):

  • Not using C2HandleAnonOMOwnerStub because the C2 code is reused for native wrappers.
  • Implemented a helper function MacroAssembler::atomically_flip_locked_state which makes it much easier to implement fast_lock/unlock for PPC64 (mainly because of register constraints in C1).
  • Using acquire/release barriers only for locking/unlocking.

I have changed the C2 code to use ConditionRegister CR0 which fits better to the new locking code. Therefore, I have adapted the other modes to work with that, too.
Note that we don't support RTM with new locking modes. That feature will probably get removed in a future JDK version. (Already unsupported with Power10.)


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8308469: [PPC64] Implement alternative fast-locking scheme

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14069/head:pull/14069
$ git checkout pull/14069

Update a local copy of the PR:
$ git checkout pull/14069
$ git pull https://git.openjdk.org/jdk.git pull/14069/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14069

View PR using the GUI difftool:
$ git pr show -t 14069

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14069.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 20, 2023

👋 Welcome back mdoerr! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 20, 2023

@TheRealMDoerr The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label May 20, 2023
@TheRealMDoerr TheRealMDoerr marked this pull request as ready for review May 22, 2023 11:03
@openjdk openjdk bot added the rfr Pull request is ready for review label May 22, 2023
@mlbridge
Copy link

mlbridge bot commented May 22, 2023

Webrevs

Copy link
Member

@reinrich reinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Martin,

thanks for doing the port. Just a few remarks and questions.

Cheers, Richard.

Comment on lines +2703 to +2707
if (flag != CCR0) {
mcrf(flag, CCR0);
}
beq(CCR0, success);
b(failure);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvements!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! And thanks for reviewing it!

@@ -12139,7 +12139,7 @@ instruct partialSubtypeCheck(iRegPdst result, iRegP_N2P subklass, iRegP_N2P supe

// inlined locking and unlocking

instruct cmpFastLock(flagsReg crx, iRegPdst oop, iRegPdst box, iRegPdst tmp1, iRegPdst tmp2) %{
instruct cmpFastLock(flagsRegCR0 crx, iRegPdst oop, iRegPdst box, iRegPdst tmp1, iRegPdst tmp2) %{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about the usage of CR0. First it's not always clear if CR0 is altered when using the MacroAssembler, especially if not hacking ppc assembler on a daily basis. And also there might be a few instruction forms in the AD files that haven't got an effect defined for CR0 even though they alter it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are already some nodes which use CR0 as result which is tested for quite some time. Nodes which kill it should ideally have a "kill cr0" effect, but we don't need to rely on it. C2 doesn't place other nodes between the one which sets the ConditionRegister and the conditional branch. So, using CR0 should be reliable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed not very clear which macro assembler part kills CR0. In this case, I have made changes to support providing CR0 EQ on success and CR0 NE on failure. That is close to what other platforms do which use flags as results of lock/unlock. It is fragile, but at least, it's similar on all platforms, now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

// Check for non-symmetric locking. This is allowed by the spec and the interpreter
// must handle it.
Register tmp = current_header;
// First check for lock-stack underflow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this is not done on x86? It's better to have that check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange. The other platforms have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkennke: Do you know why this is different on x86?

cmpd(CCR0, tmp, object);
bne(CCR0, slow_case);

ld(displaced_header, oopDesc::mark_offset_in_bytes(), object);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename displaced_header to just header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

Comment on lines 2845 to 2854
if (LockingMode == LM_LIGHTWEIGHT) {
// Some other platforms use a C2HandleAnonOMOwnerStub stub.
// This doesn't work well here, because we reach here for native wrappers, too.
// In addition, the stub may be out of range for conditional branches.
// The fixup code is not long, so we do it among other special cases below.

// If the owner is anonymous, we need to fix it.
andi_(R0, temp, ObjectMonitor::ANONYMOUS_OWNER);
bne(CCR0, anonymous);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is surly very rare. Can't we take the slow path, i.e. branch to failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought that, too. Yes, it is possible. But then I thought they probably added the path for C2 for a reason. Otherwise they would probably not have spent the extra effort for implementing the stub. So, I prefer to have it, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if @rkennke remembers. Roman I wonder why you've implemented the C2HandleAnonOMOwnerStub instead of just taking the slow path into the runtime? @TheRealMDoerr wants to implement the logic inline and I'm actually worried if that will pay off the cost.
Maybe we should add an illegal instruction to see if it's reached. Do you know a benchmark that would be good for such a test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if @rkennke remembers. Roman I wonder why you've implemented the C2HandleAnonOMOwnerStub instead of just taking the slow path into the runtime? @TheRealMDoerr wants to implement the logic inline and I'm actually worried if that will pay off the cost. Maybe we should add an illegal instruction to see if it's reached. Do you know a benchmark that would be good for such a test?

Handling the anonymous owner there is a rather rare situation. Handling this inline is possible, but it goes against static branch prediction: branch predictors in modern CPUs (at least in x86 and aarch64, not sure about ppc) assume that forward-branches are commonly not taken. However, we would very often branch forward in this situation and tend to cause a pipeline stall. It is better to leave the common path clean and short, and branch out to the stub in the uncommon case of handling the anonymous owner. Some of the renaissance benchmarks have been rather sensitive to the performance of monitor-locking (but I don't remember which ones ;-) ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was unclear about what I meant with 'inline': I meant part of the instruction sequence that FastUnlock is expanded to which isn't just a basic block. In that sequence it is out-of-line because it is at the end. So the forward branch that needs to be taken if it has an anonymous owner will be well predicted: not taken; continue with the next instruction.
Reading again the inflation coding I see that there was a misunderstanding on my side. I though the actual owner will be set right when the monitor was successfully installed, but this isn't done of course, because finding the actual owner is expensive with LM_LIGHTWEIGHT locking. Sorry my bad.
So it is not as unlikely as I thought, I'm still not totally convinced though that it is beneficial to handle the anonymous owner as part of FastUnlock case but I won't object against it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct that's what I meant. I was just not sure if there is a reason not to do that but have a dedicated C2HandleAnonOMOwnerStub to handle that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't even need to check the ANONYMOUS_OWNER bit because that case is also handled by

  cmpd(flag, temp, R16_thread);
  bne(flag, failure);

If temp has the ANONYMOUS_OWNER set it's unequal to the current thread and we jump to failure with CR0 NE.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed it and will rerun tests. Please take a look. And thanks for the discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had just a quick look. Looks good.

Copy link
Member

@reinrich reinrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the handling of the anonymous case. Let's see what others think.
Otherwise good!

Thanks, Richard.

@openjdk
Copy link

openjdk bot commented May 26, 2023

@TheRealMDoerr This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8308469: [PPC64] Implement alternative fast-locking scheme

Reviewed-by: rrich, lucy

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 189 new commits pushed to the master branch:

  • e827164: 8309146: extend JDI StackFrame.setValue() and JDWP StackFrame.setValues minimal support for virtual threads
  • be36096: 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not thread safe
  • c6f20db: 8308232: nsk/jdb tests don't pass -verbose flag to the debuggee
  • d987176: 8307794: Test for HSS/LMS Signature Verification
  • 050425b: 8298127: HSS/LMS Signature Verification
  • a6109bf: 8308856: jdk.internal.classfile.impl.EntryMap::nextPowerOfTwo math problem
  • 6adc242: 8308943: jdk.internal.le build fails on AIX
  • 39f6d80: 8307990: jspawnhelper must close its writing side of a pipe before reading from it
  • 4460429: 8308803: Improve java/util/UUID/UUIDTest.java
  • dfd3da3: 8307683: Loop Predication should not hoist range checks with trap on success projection by negating their condition
  • ... and 179 more: https://git.openjdk.org/jdk/compare/939344b8433b32166f42ad73ae3d96e84b033478...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 26, 2023
Copy link
Contributor

@RealLucy RealLucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I tried hard but could not find anything to complain about.

@TheRealMDoerr
Copy link
Contributor Author

Thanks for the reviews!
/integrate

@openjdk
Copy link

openjdk bot commented Jun 1, 2023

Going to push as commit 0ab0963.
Since your change was applied there have been 190 commits pushed to the master branch:

  • ec55539: 8309138: Fix container tests for jdks with symlinked conf dir
  • e827164: 8309146: extend JDI StackFrame.setValue() and JDWP StackFrame.setValues minimal support for virtual threads
  • be36096: 8302822: Method/Field/Constructor/RecordComponent::getGenericInfo() is not thread safe
  • c6f20db: 8308232: nsk/jdb tests don't pass -verbose flag to the debuggee
  • d987176: 8307794: Test for HSS/LMS Signature Verification
  • 050425b: 8298127: HSS/LMS Signature Verification
  • a6109bf: 8308856: jdk.internal.classfile.impl.EntryMap::nextPowerOfTwo math problem
  • 6adc242: 8308943: jdk.internal.le build fails on AIX
  • 39f6d80: 8307990: jspawnhelper must close its writing side of a pipe before reading from it
  • 4460429: 8308803: Improve java/util/UUID/UUIDTest.java
  • ... and 180 more: https://git.openjdk.org/jdk/compare/939344b8433b32166f42ad73ae3d96e84b033478...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 1, 2023
@openjdk openjdk bot closed this Jun 1, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 1, 2023
@openjdk
Copy link

openjdk bot commented Jun 1, 2023

@TheRealMDoerr Pushed as commit 0ab0963.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@TheRealMDoerr TheRealMDoerr deleted the 8308469_PPC64_fast_locking branch June 1, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
4 participants