-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8231349: Move intrinsic stubs generation to compiler runtime initialization code #13096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back kvn! A progress list of the required criteria for merging this PR into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this looks good to me!
Perhaps there's some improvements that can be made (see inline comments regarding the count_positives
stub), but it might be prudent not to spend more time than necessary on this too much if anyone will be looking at https://bugs.openjdk.org/browse/JDK-8304422 soon enough.
if (UseSVE == 0) { | ||
StubRoutines::aarch64::_vector_iota_indices = generate_iota_indices("iota_indices"); | ||
} | ||
|
||
// arraycopy stubs used by compilers | ||
generate_arraycopy_stubs(); | ||
|
||
// countPositives stub for large arrays. | ||
StubRoutines::aarch64::_count_positives = generate_count_positives(StubRoutines::aarch64::_count_positives_long); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small detail but I am pretty certain this stub is only used by C2 and could be moved to generate_compiler_stubs
. But it opens a question if there are more stubs that look like they are shared but are really only used by C2.
For historical reasons this intrinsic was implemented with a macro+stub on aarch64 but x64 et al. When doing so the macro was defined in MacroAssembler and not C2_MacroAssembler, but it is effectively only used from aarch64.ad.
It might be interesting to make C1 (and possibly interpreter) use this stub when available, but if/when that happens moving it back to generate_final_stubs
is relatively straightforward.
_initial_stubs_code_size = 10000, | ||
_continuation_stubs_code_size = 2000, | ||
_compiler_stubs_code_size = 30000, | ||
_final_stubs_code_size = 20000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tricky part when updating these is knowing which set of CPU features and VM flags will generate the largest possible stubs, but it looks like you've added ample of free space with these estimates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is why I ran with -XX:+UseCompressedOops -XX:+CheckCompressedOops -XX:+VerifyOops -XX:+VerifyStackAtCalls
flags which increase generated code size.
|
||
void compiler_stubs_init(bool in_compiler_thread) { | ||
if (in_compiler_thread && MoveIntrinsicStubsGen) { | ||
// Temporare revert state of stubs generation because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Temporarily"
@vnkozlov This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 8 new commits pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good.
Just minor comments about code style.
@@ -366,6 +366,9 @@ const int ObjectAlignmentInBytes = 8; | |||
product(bool, UseSignumIntrinsic, false, DIAGNOSTIC, \ | |||
"Enables intrinsification of Math.signum") \ | |||
\ | |||
product_pd(bool, MoveIntrinsicStubsGen, DIAGNOSTIC, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag name looks confusing. What about LazyCompilerStubGeneration
or even LazyStubGeneration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, I am struggling with this name too. But I can't find better one.
LazyStubGeneration
is reserved for an other RFE: JDK-8304422.
And it is not lazy. Compiler stubs generation is delayed until Compiler runtime initialization but they all still generated during initialization phase.
How about DelayCompilerStubsGeneration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think DelayCompilerStubsGeneration
sounds OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DelayCompilerStubsGeneration
sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay.
} else { | ||
generate_all(); | ||
} | ||
StubGenerator(CodeBuffer* code, StubsKind kind) : StubCodeGenerator(code) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is repeated on multiple platforms. It makes sense to lift it to StubCodeGenerator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because StubGenerator
is declared and defined separately on each platform. Doing that here will add more complexity for already not small changes. Filed new RFE: JDK-8304750
define_pd_global(bool, UncommonNullCast, true); // Uncommon-trap NULLs past to check cast | ||
|
||
#if COMPILER2_OR_JVMCI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it easier to read when only the default value is guarded. In that respect, COMPILER2_OR_JVMCI
code diverges from the rest of the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you want something similar to COMPILER1_AND_COMPILER2_PRESENT()
used for CodeCacheSegmentSize
in following line?
Yes, we have similar macro for JVMCI and C2. How about this:
define_pd_global(bool, MoveIntrinsicStubsGen, false COMPILER2_OR_JVMCI_PRESENT( || true));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the following?
define_pd_global(bool, MoveIntrinsicStubsGen, NOT_COMPILER2_OR_JVMCI(false) COMPILER2_OR_JVMCI_PRESENT(true));
Are you concerned it is too long?
Or even COMPILER2_OR_JVMCI
maybe?
define_pd_global(bool, MoveIntrinsicStubsGen, COMPILER2_OR_JVMCI);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't want long line. I will try the last suggestion. I was able to build locally with it on linux.
@vnkozlov this pull request can not be integrated into git checkout 8231349
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
I renamed flag and used |
/integrate |
Going to push as commit 3859faf.
Your commit was automatically rebased without conflicts. |
Hi @vnkozlov, these changes broke the build for s390x platform. I've opened JDK-8305227 and put hs_err* log file there as well. It would be helpful if you could take a look and provide some suggestion. Thanks |
@offamitkumar I added comment to new bug. In short, you can try to disable this optimization for s390. |
Based on performance data (see graph in RFE) I propose to implement @cl4es suggestion to move intrinsics stubs generation to C2 (and JVMCI) runtime initialization code.
It has <1% difference from not generated these stubs at all and we will not win on 1 core VMs but it is simpler and safer solution, I think. It also automatically (no need for new code) do not generate these stubs if C2 is not used (-Xint or low TieredStopAt Level.
On demand stubs generation requires synchronization between threads during application run which may introduce some instability and may be other issues. But it could be beneficial for Interpreter and C1 if we want more intrinsics stubs to be used by C1 and Interpreter (they use CRC32 only now). I filed separate RFE 8304422.
Changes:
-XX:+MoveIntrinsicStubsGen
. It is ON by default if VM is built with C2 or JVMCI compilers except Zero and 32-bit Arm VMs which have no or few intrinsics.StubGenerator::generate_all()
method into two:generate_final_stubs()
andgenerate_compiler_stubs()
. Moved only C2 (and JVMCI) intrinsic stubs generation to new method.aarch64
andx86
. On other platforms I used the same as before value forcompiler_stubs
andfinal_stubs
:Testing: tier1-7, Xcomp, stress on x64 and aarch64.
I have changes for all platforms. Please test it on platforms you support.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13096/head:pull/13096
$ git checkout pull/13096
Update a local copy of the PR:
$ git checkout pull/13096
$ git pull https://git.openjdk.org/jdk.git pull/13096/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 13096
View PR using the GUI difftool:
$ git pr show -t 13096
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13096.diff