-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8072070: Improve interpreter stack banging #7247
8072070: Improve interpreter stack banging #7247
Conversation
👋 Welcome back shade! A progress list of the required criteria for merging this PR into |
e369b6e
to
b1ed28f
Compare
Webrevs
|
Hi Aleksey, thanks for working on the stack banging code. I wanted to do so for a long time, but couldn't make it, yet. Results are impressive! A quick question. Why can't we just use something like the following on linux?
Is banging the shadow area strictly required on linux? |
(There is a large comment in AFAIU, the only OS that needs to bang page by page to commit stacks is Windows; got some funky GHA failures without it. My early patches for Linux were something like what you proposed. But the deeper I got into this, the more I realized it is safer to keep banging in order to cooperate with the rest of stack overflow machinery. For example, I am not at all sure that throwing the SOE when below Notice that watermark code effectively bangs (most useful parts of) the stack once. What I think it achieves is making OS-specific or call-flavor-specific optimization questions moot. We can do everything to handle the worst case, have the single path taken in all configurations (simplifies development/testing), and pay peanuts in performance penalties for it. |
I think it would be interesting to figure out if we can let the linux kernel do all the stack management work for us and avoid stack banging, protected pages etc. inside of hotspot completely. But that may be beyond the scope of your PR. (Windows is a different story.) I hope that I can find time to figure it out at some point of time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you this PR touches stackoverflow.hpp, Could you also take a look at this?
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/stackOverflow.cpp#L66
we actually get the page size from os. why do we need alignment = 4k?
Look here: jdk/src/hotspot/share/runtime/stackOverflow.cpp Lines 42 to 45 in 48523b0
StackYellowPages , StackRedPages , StackShadowPages are defined in as 4K pages. It should probably be called unit , not alignment . I'd like to avoid scope creep for this PR, so that's for another day.
|
Done in #7362. |
Hi @shipilev , Did you test the perf improvement base on the latest jdk?
Am I missed something? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a nice optimization to me. I initially thought that calling the new limit of where we can elide stack banging "watermark" had something to do with the GC stack watermark code but "watermark" is sort of the best word for this. If we had another descriptive word that might be better, but I can't think of anything.
Thank you for fixing this! We didn't have tests showing the motivation ourselves so sorry that we ignored it for so long.
@shipilev This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 75 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
I think some benchmarks in currently public SPECjvm2008 do not work with JDK 19 due to missing dependencies. I have a hacky version that is able to work with modern JDK. I used that to estimate performance on JDK mainline. |
Fiddly code, documentation update for the core subsystem, so: /reviewers 3 |
@shipilev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I am not a reviewer. we still need reviewers to approve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. And a step into the right direction IMHO. We should check the code on other platforms, too (separately is ok).
All right, thanks for reviews! Last call for comments. I am planning to integrate it later today. |
x86-32 has some weird stack handling, particularly when using the invocation interface. I guess we assume our regression tests will catch breakage there. |
As you can see in "Additional testing", I ran |
I am sure nothing bad is going to happen if I integrate this on Friday! /integrate |
Going to push as commit 3a13425.
Your commit was automatically rebased without conflicts. |
ship it and be damned :-) |
Seems like Power is not affected by this TLB / Cache bottleneck. We use 64k pages and typically 2 store instructions for banging. On the other side, I think it's a good thing to avoid touching any storage which we don't need. So, we could overwork the PPC64 implementation, too (optionally). Or wait until more experiments have been made. |
Yes, larger VM pages mean fewer addresses to touch. OTOH, in my related experiments with removing the stack banging on compiled entry whatsoever, we seem to redeem single-digit percent improvements, even though we only touch one location far away. Anyhow, I think a good plan is to wait and see if this x86 pilot change runs into any interesting problems, before translating it to other architectures. |
This is an old issue, I submitted the first RFE about this back in 2015. This shows up every time I benchmark the interpreter-only code. Most recently, it showed up in my work to get
java.lang.invoke
infra work reasonably fast when cold, which includes lots of interpreter paths.The underlying problem is that template interpreters rebang the entire shadow zone on every method entry. This takes tens of instructions, blows out TLB caches with accessing tens of pages (on some implementations, I reckon, almost the entire L1 TLB cache!), etc. I think we can make it universally better for all template interpreters by introducing the safe limit / growth watermarks for thread stacks, so that we bang only when needed. It also drops the need for special-casing the
native_call
, because we might as well bang the entire shadow zone in native case as well.This patch makes a pilot change for x86, without touching other architectures. Other architectures can follow this example later. This is why
native_call
argument persists, even though it is not used in x86 case anymore. There is also a new test group that I found useful when debugging on Windows, that group is going to go away before integration.I tried to capture the current mechanics of stack banging in
stackOverflow.hpp
, hoping the change becomes more obvious, and so that arch-specific template interpreter codes could just reference it without copy-pasting it around.I think it is fairly complete, and so would like to solicit more feedback and testing here.
Point runs on SPECjvm2008 with
-Xint
shows huge improvements on half of the tests, without any regressions:My new
java.lang.invoke
benchmarks improve a lot as well:It also palpably improves startup even on small HelloWorld, even when compilers are present:
Additional testing:
tier1
tier2
tier3
tier4
tier1
tier2
tier3
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7247/head:pull/7247
$ git checkout pull/7247
Update a local copy of the PR:
$ git checkout pull/7247
$ git pull https://git.openjdk.java.net/jdk pull/7247/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 7247
View PR using the GUI difftool:
$ git pr show -t 7247
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7247.diff