-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8277180: Intrinsify recursive ObjectMonitor locking for C2 x64 and A64 #6406
Conversation
👋 Welcome back eosterlund! A progress list of the required criteria for merging this PR into |
Webrevs
|
src/hotspot/cpu/aarch64/aarch64.ad
Outdated
__ br(Assembler::NE, cont); | ||
|
||
__ cmp(disp_hdr, (u1)0); | ||
__ br(Assembler::EQ, notRecursive); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace these two with a single __ cbz(disp_hdr, notRecursive)
and avoid clobbering the flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea. BTW note that in the unlocking path for AArch64 there is an ownership check, while in the x86_64 code there is only a comment saying we definitely need one of those, but it doesn't actually check the owner. @dholmes-ora did some digging and it seems like this was previously controlled by some ancient sync flag that isn't around anymore. It would only exist to check for unbalanced JNI locking, and the JNI spec kind of says you shouldn't do that - that's a programmer error. So it seems like just not doing the ownership check is totally fine, and seems to yield 10% better performance in some workloads where there is contended locking. But I don't want to remove that check as part of this change - just something to keep in mind for a future RFE.
src/hotspot/cpu/aarch64/aarch64.ad
Outdated
__ ldr(tmp, Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); | ||
__ add(tmp, tmp, 1u); | ||
__ str(tmp, Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__ ldr(tmp, Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); | |
__ add(tmp, tmp, 1u); | |
__ str(tmp, Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); | |
__ increment(Address(disp_hdr, ObjectMonitor::recursions_offset_in_bytes() - markWord::monitor_value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The increment macro doesn't seem to utilize the fact that 1u can be encoded as an immediate to the add instruction. So it seems to generate worse code here. I'm okay with changing to increment anyway if you prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The increment macro doesn't seem to utilize the fact that 1u can be encoded as an immediate to the add instruction.
Sure it does. Try it. If it doesn't, we'll change increment()
! 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah look at that. I disassembled it and it did the right thing. Thanks for the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally I would hate any code added to our hand-carved assembler sequences, but even I have to admit that this surprisingly simple addition is worthwhile.
@fisk This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 152 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thanks for the review @theRealAph and @nick-arm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AArch64 changes LGTM.
Any takers for the x86_64 code? |
Sure, and as far as I know no-one took away my x86 programmer's badge yet. LGTM. |
Thanks Andrew. I think we can trust your x86 skills as well. :-) |
/integrate |
Going to push as commit d93b238.
Your commit was automatically rebased without conflicts. |
The C2 fast_lock and fast_unlock intrinsics don't support recursive ObjectMonitor locking. Some workloads can significantly benefit from this. Recent ObjectMonitor work has changed heuristics such that ObjectMonitors are deflated less aggressively. Therefore we can expect to see more inflated monitors in workloads where we would usually see more stack locks. That in itself is fine, except that C2 doesn't intrinsify the recursive locking paths for object monitors. Enabling those cases in the C2 code, removes a (~17%) regression we have seen with DaCapo h2 -t 1, and makes a few more benchmarks happy as well.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6406/head:pull/6406
$ git checkout pull/6406
Update a local copy of the PR:
$ git checkout pull/6406
$ git pull https://git.openjdk.java.net/jdk pull/6406/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6406
View PR using the GUI difftool:
$ git pr show -t 6406
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6406.diff