Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8303154: Investigate and improve instruction cache flushing during compilation #12877

Closed
wants to merge 6 commits into from

Conversation

dafedafe
Copy link
Contributor

@dafedafe dafedafe commented Mar 6, 2023

It was noticed that we flush the instruction cache too much for a single C1 compilation. The same is true for the C2 compilation.
There are several places in the code where the instruction cache is called and many of them are very intertwined (see bug report).

This PR is meant to be a "minimum" set of changes that improve the situation without introducing excessive extra information to keep track of the origin of the call through call stacks. This is done by avoiding calls to flush the ICache:

  • at Compilation::emit_code_epilog
  • when calling CodeCache::commit as flushing is done anyway when copying from the temporary buffer into the code cache in CodeBuffer::copy_code_to. This results in flushing the ICache only once instead of 3 times for a C1 compilation and twice for a C2 compilation. Additionally this halves the number of flushes during adapters generation (lots of calls).
  • at SharedRuntime::generate_i2c2i_adapters as this is called with a temporary buffer and an ICache flush is not needed

This change decreases the number of calls to flush the ICache for a simple Hello world program on Mac OSX aarch64 from 3569 to 2028 on C1 (43.2% improvement) and from 3572 to 1952 on C2 (45.4% improvement).

This fix includes changes for x86_32/64 and aarch64, which I could test thoroughly but also for arm and riscv, for which I would need some help with testing.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8303154: Investigate and improve instruction cache flushing during compilation

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12877/head:pull/12877
$ git checkout pull/12877

Update a local copy of the PR:
$ git checkout pull/12877
$ git pull https://git.openjdk.org/jdk pull/12877/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12877

View PR using the GUI difftool:
$ git pr show -t 12877

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12877.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 6, 2023

👋 Welcome back dafedafe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 6, 2023

@dafedafe The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Mar 6, 2023
Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. As we discussed, please file a follow-up RFE for the remaining investigations around excessive icache flushing.

@dafedafe dafedafe marked this pull request as ready for review March 10, 2023 10:06
@openjdk
Copy link

openjdk bot commented Mar 10, 2023

@dafedafe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8303154: Investigate and improve instruction cache flushing during compilation

Reviewed-by: thartmann, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 264 new commits pushed to the master branch:

  • eefbaa2: 8283400: [macos] a11y : Screen magnifier does not reflect JRadioButton value change
  • 42dd907: 8302906: AArch64: Add SVE backend support for vector unsigned comparison
  • 2b81fae: 8303022: "assert(allocates2(pc)) failed: not in CodeBuffer memory" When linking downcall handle
  • be08a25: 8304264: Debug messages always show up for NativeGSS
  • 1ae69e3: 8304287: Problemlist java/net/SocketOption/OptionsTest.java
  • 116627d: 8304267: JDK-8303415 missed change in Zero Interpreter
  • 824a5e4: 8284047: Harmonize/Standardize the SSLSocket/SSLEngine/SSLSocketSSLEngine test templates
  • 7ad48ea: 8300317: vmTestbase/nsk/stress/strace/strace* tests fail with "ERROR: wrong lengths of stack traces"
  • 35a2969: 8302659: Modernize Windows native code for NetworkInterface
  • 01e6920: 8298935: fix independence bug in create_pack logic in SuperWord::find_adjacent_refs
  • ... and 254 more: https://git.openjdk.org/jdk/compare/8f7c4969c28c58ae4b9adeed822707b28be16dd0...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@TobiHartmann, @vnkozlov) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 10, 2023
@mlbridge
Copy link

mlbridge bot commented Mar 10, 2023

Webrevs

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look on how many times we flush ICache during adapters generation?
It has most numerous cases when I looked on it:
"CodeCache::commit() is also used for adapters. But adapters uses RuntimeBlob which calls CodeBuffer::copy_code_to()."

I thought we would remove flush from CodeCache::commit() and not from copy_code_to().

@dafedafe
Copy link
Contributor Author

dafedafe commented Mar 13, 2023

Did you look on how many times we flush ICache during adapters generation? It has most numerous cases when I looked on it: "CodeCache::commit() is also used for adapters. But adapters uses RuntimeBlob which calls CodeBuffer::copy_code_to()."

@vnkozlov the ICache flushing was called 1596 times during adapters generation. You're right, these are by far the most calls and the flush calls are also performed twice in these cases, once in CodeBuffer::copy_code_to() and once in CodeCache::commit() (I've missed it).

I thought we would remove flush from CodeCache::commit() and not from copy_code_to().

I thought it would make more sense to keep the flush in CodeCache::commit() since it was generally the last call made but, in light of what you're pointing out, it would definitely make more sense to remove it from CodeCache::commit() and only leave it in copy_code_to. This also halves the number of flushing coming from adapters.

@dafedafe
Copy link
Contributor Author

@vnkozlov @TobiHartmann I pushed the changes if you want to have a look at them again. Thanks a lot!

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. What are new numbers of of calls to flush the ICache?

@vnkozlov
Copy link
Contributor

@dafedafe I look on stack traces you collected. Please look on this:

 732  Stack: 
    ICache::invalidate_range(unsigned char *, int) icache_bsd_aarch64.hpp:41
    AbstractAssembler::flush() assembler.cpp:110
    SharedRuntime::generate_i2c2i_adapters(MacroAssembler *, int, int, const BasicType *, const VMRegPair *, AdapterFingerPrint *) sharedRuntime_aarch64.cpp:794
    AdapterHandlerLibrary::create_adapter(AdapterBlob *&, int, BasicType *, bool) sharedRuntime.cpp:2970 

generate_i2c2i_adapters is called for temporary buffer so we don't need to flush it:
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/sharedRuntime.cpp#L2870

@dafedafe
Copy link
Contributor Author

Good. What are new numbers of of calls to flush the ICache?

The total number of flushes for the HelloWorld on Mac OSX aarch64 go from 3569 to 2756 on C1 (22.8% improvement) and from 3572 to 2685 on C2 (24.1% improvement).

@dafedafe
Copy link
Contributor Author

@dafedafe I look on stack traces you collected. Please look on this:

 732  Stack: 
    ICache::invalidate_range(unsigned char *, int) icache_bsd_aarch64.hpp:41
    AbstractAssembler::flush() assembler.cpp:110
    SharedRuntime::generate_i2c2i_adapters(MacroAssembler *, int, int, const BasicType *, const VMRegPair *, AdapterFingerPrint *) sharedRuntime_aarch64.cpp:794
    AdapterHandlerLibrary::create_adapter(AdapterBlob *&, int, BasicType *, bool) sharedRuntime.cpp:2970 

generate_i2c2i_adapters is called for temporary buffer so we don't need to flush it:

Yes, thanks a lot @vnkozlov! I've removed the flushing there too.
Now the number of flushes for HelloWorld is down to 2028 with C1 (43% improvement).

@vnkozlov
Copy link
Contributor

Nice! Good work. Now please thoroughly test it.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, Vladimir. New version looks good to me too.

@dafedafe
Copy link
Contributor Author

dafedafe commented Mar 14, 2023

This fix includes some changes to sharedRuntime for multiple platforms. I could thoroughly test on x86_32/64 and aarch64 but I'd need some help testing on ARM and RISCV.
@RealFYang could you or someone you know please run some testing for RISCV to make sure my change doesn't break anything? @bulasevich @shqking could I ask you the same for ARM?
Thank you very much in advance for your help!

@vnkozlov
Copy link
Contributor

@dafedafe as followup RFE (I don't want to add more changes to this PR) look on all uses of masm::flush() for temporary buffers. I see such uses in some other SharedRuntime methods: SharedRuntime::generate_resolve_blob
There are may be only few but since we start fixing such cases we should finish.

@TobiHartmann
Copy link
Member

FTR, the follow-up RFE is JDK-8303971.

@dafedafe
Copy link
Contributor Author

@dafedafe as followup RFE (I don't want to add more changes to this PR) look on all uses of masm::flush() for temporary buffers. I see such uses in some other SharedRuntime methods: SharedRuntime::generate_resolve_blob
There are may be only few but since we start fixing such cases we should finish.

Sure! Thanks for the hint @vnkozlov.

@RealFYang
Copy link
Member

Hi, I performed tier1-3 tests on linux-riscv64 boards, result looks good.

@dafedafe
Copy link
Contributor Author

@RealFYang thank you very much for running the tests!

@bulasevich
Copy link
Contributor

HI, tier1-2 tests on ARM32 are OK!

@dafedafe
Copy link
Contributor Author

@bulasevich thanks a lot for testing!

@dafedafe
Copy link
Contributor Author

@TobiHartmann @vnkozlov thanks a lot for your reviews!

@dafedafe
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Mar 16, 2023
@openjdk
Copy link

openjdk bot commented Mar 16, 2023

@dafedafe
Your change (at version 0a490b2) is now ready to be sponsored by a Committer.

@TobiHartmann
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Mar 16, 2023

Going to push as commit b7945bc.
Since your change was applied there have been 264 commits pushed to the master branch:

  • eefbaa2: 8283400: [macos] a11y : Screen magnifier does not reflect JRadioButton value change
  • 42dd907: 8302906: AArch64: Add SVE backend support for vector unsigned comparison
  • 2b81fae: 8303022: "assert(allocates2(pc)) failed: not in CodeBuffer memory" When linking downcall handle
  • be08a25: 8304264: Debug messages always show up for NativeGSS
  • 1ae69e3: 8304287: Problemlist java/net/SocketOption/OptionsTest.java
  • 116627d: 8304267: JDK-8303415 missed change in Zero Interpreter
  • 824a5e4: 8284047: Harmonize/Standardize the SSLSocket/SSLEngine/SSLSocketSSLEngine test templates
  • 7ad48ea: 8300317: vmTestbase/nsk/stress/strace/strace* tests fail with "ERROR: wrong lengths of stack traces"
  • 35a2969: 8302659: Modernize Windows native code for NetworkInterface
  • 01e6920: 8298935: fix independence bug in create_pack logic in SuperWord::find_adjacent_refs
  • ... and 254 more: https://git.openjdk.org/jdk/compare/8f7c4969c28c58ae4b9adeed822707b28be16dd0...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 16, 2023
@openjdk openjdk bot closed this Mar 16, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Mar 16, 2023
@openjdk
Copy link

openjdk bot commented Mar 16, 2023

@TobiHartmann @dafedafe Pushed as commit b7945bc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
5 participants