New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8290688: Optimize x86_64 nmethod entry barriers #9569
Conversation
|
/issue JDK-8290688 |
@fisk The primary solved issue for a PR is set through the PR title. Since the current title does not contain an issue reference, it will now be updated. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@fisk This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 43 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
|
Thanks for the review, @vnkozlov! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Erik,
Looks good to me!
I can imagine that hot taken branches, especially many of them, can become a
performance issue when the hardware fails to predict the target address.
Richard.
Thanks for the review @reinrich! |
/integrate |
Going to push as commit b28f9da.
Your commit was automatically rebased without conflicts. |
The current x86_64 nmethod entry barrier is good, but it could be a bit better. In particular, this enhancement targets the following ideas.
The alignment of the cmp instruction is 8 bytes. However, we only patch 4 bytes and the instruction length is always 8 bytes. So if we align the start of the instruction to 4 bytes only, that is enough to ensure that the immediate part of the instruction is 4 byte aligned, which is all we need (cf. http://cr.openjdk.java.net/~jrose/jvm/hotspot-cmc.html).
Today the fast path (conditionally) jumps over a call to a stub. It is not uncommon for the branch not taken path being better optimized, making it favourable to move the call to a stub out-of-line. This has the additional benefit of not polluting the instruction caches at the nmethod entry with instructions not used in the fast path. A bit messy but we can do it for at least C2 code.
For C1 and native wrappers, I don't think they are hot enough to warrant the stub machinery. But at least the jump that jumps over the cold stuff, can be shortened. I can get behind that.
Before addressing this, turning nmethod entry barriers on with G1 (e.g. by enabling loom) leads to a regression in DaCapo tradesoap-large. With this enhancement, the regression goes away, so that the cost of nmethod entry barriers is not visible.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9569/head:pull/9569
$ git checkout pull/9569
Update a local copy of the PR:
$ git checkout pull/9569
$ git pull https://git.openjdk.org/jdk pull/9569/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9569
View PR using the GUI difftool:
$ git pr show -t 9569
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9569.diff