Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8231349: Move intrinsic stubs generation to compiler runtime initialization code #13096

Closed
wants to merge 6 commits into from

Conversation

vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Mar 20, 2023

Based on performance data (see graph in RFE) I propose to implement @cl4es suggestion to move intrinsics stubs generation to C2 (and JVMCI) runtime initialization code.

It has <1% difference from not generated these stubs at all and we will not win on 1 core VMs but it is simpler and safer solution, I think. It also automatically (no need for new code) do not generate these stubs if C2 is not used (-Xint or low TieredStopAt Level.

On demand stubs generation requires synchronization between threads during application run which may introduce some instability and may be other issues. But it could be beneficial for Interpreter and C1 if we want more intrinsics stubs to be used by C1 and Interpreter (they use CRC32 only now). I filed separate RFE 8304422.

Changes:

  • Added new platform specific diagnostic flag -XX:+MoveIntrinsicStubsGen. It is ON by default if VM is built with C2 or JVMCI compilers except Zero and 32-bit Arm VMs which have no or few intrinsics.
  • Split StubGenerator::generate_all() method into two: generate_final_stubs() and generate_compiler_stubs(). Moved only C2 (and JVMCI) intrinsic stubs generation to new method.
  • I renamed methods and stubs buffer sizes according to new code. Now we have 4 separate named stubs buffers and corresponding methods: Initial, Continuation, Compiler, Final.
  • I added new UL printing to find new sizes for buffers and adjusted them on aarch64 and x86. On other platforms I used the same as before value for compiler_stubs and final_stubs:
> java -Xlog:stubs -XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:+VerifyStackAtCalls -version
[0.006s][info][stubs] StubRoutines (initial stubs)	 [0x00007f94900fcc00, 0x00007f9490101b60] used: 16152, free: 4168
[0.026s][info][stubs] StubRoutines (continuation stubs)	 [0x00007f9490102580, 0x00007f9490102e90] used: 741, free: 1579
[0.051s][info][stubs] StubRoutines (final stubs)	 [0x00007f9490155600, 0x00007f949015cc70] used: 26484, free: 3836
[0.090s][info][stubs] StubRoutines (compiler stubs)	 [0x00007f94904ccc00, 0x00007f94904d9bd0] used: 46988, free: 6212
java version "21-internal" 2023-09-19 LTS

-Xlog:stubs=debug will print size information for each stub:
[0.005s][debug][stubs] ICache::flush_icache_stub [0x00007fb2d3828080, 0x00007fb2d382809d] (29 bytes)
[0.005s][debug][stubs] VM_Version::get_cpu_info_stub [0x00007fb2d3828380, 0x00007fb2d3828714] (916 bytes)
[0.005s][debug][stubs] VM_Version::detect_virt_stub [0x00007fb2d3828714, 0x00007fb2d382872e] (26 bytes)
[0.005s][debug][stubs] StubRoutines::forward exception [0x00007fb2d3828c00, 0x00007fb2d3828c92] (146 bytes)

Testing: tier1-7, Xcomp, stress on x64 and aarch64.

I have changes for all platforms. Please test it on platforms you support.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8231349: Move intrinsic stubs generation to compiler runtime initialization code

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13096/head:pull/13096
$ git checkout pull/13096

Update a local copy of the PR:
$ git checkout pull/13096
$ git pull https://git.openjdk.org/jdk.git pull/13096/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13096

View PR using the GUI difftool:
$ git pr show -t 13096

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13096.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 20, 2023

👋 Welcome back kvn! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 20, 2023

@vnkozlov The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Mar 20, 2023
Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this looks good to me!

Perhaps there's some improvements that can be made (see inline comments regarding the count_positives stub), but it might be prudent not to spend more time than necessary on this too much if anyone will be looking at https://bugs.openjdk.org/browse/JDK-8304422 soon enough.

if (UseSVE == 0) {
StubRoutines::aarch64::_vector_iota_indices = generate_iota_indices("iota_indices");
}

// arraycopy stubs used by compilers
generate_arraycopy_stubs();

// countPositives stub for large arrays.
StubRoutines::aarch64::_count_positives = generate_count_positives(StubRoutines::aarch64::_count_positives_long);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small detail but I am pretty certain this stub is only used by C2 and could be moved to generate_compiler_stubs. But it opens a question if there are more stubs that look like they are shared but are really only used by C2.

For historical reasons this intrinsic was implemented with a macro+stub on aarch64 but x64 et al. When doing so the macro was defined in MacroAssembler and not C2_MacroAssembler, but it is effectively only used from aarch64.ad.

It might be interesting to make C1 (and possibly interpreter) use this stub when available, but if/when that happens moving it back to generate_final_stubs is relatively straightforward.

_initial_stubs_code_size = 10000,
_continuation_stubs_code_size = 2000,
_compiler_stubs_code_size = 30000,
_final_stubs_code_size = 20000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky part when updating these is knowing which set of CPU features and VM flags will generate the largest possible stubs, but it looks like you've added ample of free space with these estimates.

Copy link
Contributor Author

@vnkozlov vnkozlov Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is why I ran with -XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:+VerifyStackAtCalls flags which increase generated code size.


void compiler_stubs_init(bool in_compiler_thread) {
if (in_compiler_thread && MoveIntrinsicStubsGen) {
// Temporare revert state of stubs generation because
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Temporarily"

@vnkozlov vnkozlov marked this pull request as ready for review March 20, 2023 15:40
@openjdk
Copy link

openjdk bot commented Mar 20, 2023

@vnkozlov This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8231349: Move intrinsic stubs generation to compiler runtime initialization code

Reviewed-by: redestad, vlivanov

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 8 new commits pushed to the master branch:

  • 6fa25cc: 8184444: The compiler error "variable not initialized in the default constructor" is not apt in case of static final variables
  • 4b8f7db: 8027682: javac wrongly accepts semicolons in package and import decls
  • c00d088: 8043179: Lambda expression can mutate final field
  • 147f347: 8219083: java/net/MulticastSocket/SetGetNetworkInterfaceTest.java failed in same binary run on windows x64
  • bf917ba: 8304687: Move add_to_hierarchy
  • 63d4afb: 8304671: javac regression: Compilation with --release 8 fails on underscore in enum identifiers
  • e2cfcfb: 6817009: Action.SELECTED_KEY not toggled when using key binding
  • af4d560: 8303951: Add asserts before record_method_not_compilable where possible

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 20, 2023
@mlbridge
Copy link

mlbridge bot commented Mar 20, 2023

Webrevs

Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good.

Just minor comments about code style.

@@ -366,6 +366,9 @@ const int ObjectAlignmentInBytes = 8;
product(bool, UseSignumIntrinsic, false, DIAGNOSTIC, \
"Enables intrinsification of Math.signum") \
\
product_pd(bool, MoveIntrinsicStubsGen, DIAGNOSTIC, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag name looks confusing. What about LazyCompilerStubGeneration or even LazyStubGeneration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I am struggling with this name too. But I can't find better one.
LazyStubGeneration is reserved for an other RFE: JDK-8304422.
And it is not lazy. Compiler stubs generation is delayed until Compiler runtime initialization but they all still generated during initialization phase.
How about DelayCompilerStubsGeneration ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DelayCompilerStubsGeneration sounds OK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DelayCompilerStubsGeneration sounds good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

} else {
generate_all();
}
StubGenerator(CodeBuffer* code, StubsKind kind) : StubCodeGenerator(code) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is repeated on multiple platforms. It makes sense to lift it to StubCodeGenerator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because StubGenerator is declared and defined separately on each platform. Doing that here will add more complexity for already not small changes. Filed new RFE: JDK-8304750

define_pd_global(bool, UncommonNullCast, true); // Uncommon-trap NULLs past to check cast

#if COMPILER2_OR_JVMCI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it easier to read when only the default value is guarded. In that respect, COMPILER2_OR_JVMCI code diverges from the rest of the code base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want something similar to COMPILER1_AND_COMPILER2_PRESENT() used for CodeCacheSegmentSize in following line?
Yes, we have similar macro for JVMCI and C2. How about this:

define_pd_global(bool, MoveIntrinsicStubsGen,   false COMPILER2_OR_JVMCI_PRESENT( || true));

Copy link
Contributor

@iwanowww iwanowww Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the following?

define_pd_global(bool, MoveIntrinsicStubsGen,  NOT_COMPILER2_OR_JVMCI(false) COMPILER2_OR_JVMCI_PRESENT(true));

Are you concerned it is too long?

Or even COMPILER2_OR_JVMCI maybe?

define_pd_global(bool, MoveIntrinsicStubsGen, COMPILER2_OR_JVMCI);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't want long line. I will try the last suggestion. I was able to build locally with it on linux.

@openjdk
Copy link

openjdk bot commented Mar 22, 2023

@vnkozlov this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout 8231349
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated labels Mar 22, 2023
@vnkozlov
Copy link
Contributor Author

I renamed flag and used COMPILER2_OR_JVMCI as default value (based on Vladimir last suggestion).

@openjdk openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Mar 23, 2023
@vnkozlov
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Mar 23, 2023

Going to push as commit 3859faf.
Since your change was applied there have been 12 commits pushed to the master branch:

  • f37674a: 8304711: Combine G1 root region abort and wait into a single method
  • 7f9e691: 8304712: Only pass total number of regions into G1Policy::calc_min_old_cset_length
  • 51035a7: 8294137: Review running times of java.math tests
  • 46cca1a: 4842457: (bf spec) Clarify meaning of "(optional operation)"
  • 6fa25cc: 8184444: The compiler error "variable not initialized in the default constructor" is not apt in case of static final variables
  • 4b8f7db: 8027682: javac wrongly accepts semicolons in package and import decls
  • c00d088: 8043179: Lambda expression can mutate final field
  • 147f347: 8219083: java/net/MulticastSocket/SetGetNetworkInterfaceTest.java failed in same binary run on windows x64
  • bf917ba: 8304687: Move add_to_hierarchy
  • 63d4afb: 8304671: javac regression: Compilation with --release 8 fails on underscore in enum identifiers
  • ... and 2 more: https://git.openjdk.org/jdk/compare/c4338620b7651f4da03ce4cfddb9e5b053fddb6a...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 23, 2023
@openjdk openjdk bot closed this Mar 23, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 23, 2023
@openjdk
Copy link

openjdk bot commented Mar 23, 2023

@vnkozlov Pushed as commit 3859faf.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@vnkozlov vnkozlov deleted the 8231349 branch March 23, 2023 19:24
@offamitkumar
Copy link
Member

Hi @vnkozlov,

these changes broke the build for s390x platform. I've opened JDK-8305227 and put hs_err* log file there as well.

It would be helpful if you could take a look and provide some suggestion.

Thanks

@vnkozlov
Copy link
Contributor Author

@offamitkumar I added comment to new bug. In short, you can try to disable this optimization for s390.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
4 participants