-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8287373: remove unnecessary paddings in generated code #8453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back bulasevich! A progress list of the required criteria for merging this PR into |
@bulasevich The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
c802b1e
to
a631841
Compare
@@ -37,7 +37,7 @@ define_pd_global(bool, TrapBasedNullChecks, false); | |||
define_pd_global(bool, UncommonNullCast, true); // Uncommon-trap NULLs past to check cast | |||
|
|||
define_pd_global(uintx, CodeCacheSegmentSize, 64 COMPILER1_AND_COMPILER2_PRESENT(+64)); // Tiered compilation has large code-entry alignment. | |||
define_pd_global(intx, CodeEntryAlignment, 64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks reasonable to me. I found the following in the N1 Opt Guide:
Consider aligning subroutine entry points and branch targets to 32B boundaries, within the bounds of the code-density requirements of the program. This will ensure that the subsequent fetch can maximize bandwidth following the taken branch by bringing in all useful instructions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rolled back CodeEntryAlignment 64->32 change as it shows different results on ARM platforms. I think it should be done in a separate PR. It can be tuned in vm_version after CPU architecture check.
ea4483c
to
7237201
Compare
7237201
to
0f052d1
Compare
…nerate_handler_blob asks a 1KB buffer, actually CodeBuffer::initialize reserves the given size with some extra bytes; with the new formula (code_size + align + slop * SECT_LIMIT) the extra bytes number is reduced, causing the handler_blob generator buffer to overflow. Solution: double the code_size estimate for handler_blob generator buffer (the temporary buffer size is not really important)
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you changed tested alignment to 8 for JVMCI tests?
@@ -2632,7 +2632,7 @@ SafepointBlob* SharedRuntime::generate_handler_blob(address call_ptr, int poll_t | |||
|
|||
// allocate space for the code | |||
// setup code generation tools | |||
CodeBuffer buffer("handler_blob", 1024, 512); | |||
CodeBuffer buffer("handler_blob", 2048, 1024); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you need to double the size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it to fix the CodeBuffer overflow assert on x86 build. SharedRuntime::generate_handler_blob asks a 1KB buffer -
actually CodeBuffer::initialize reserves the given size with some extra bytes; with the new formula (code_size + align + slop * SECT_LIMIT) the extra bytes number is reduced, causing the handler_blob generator buffer to overflow. My solution is to double the code_size estimate for handler_blob generator buffer. The same 2048/1024 numbers we can see on other platforms - aarch, pcc, amd, riscv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
int total_size = align_up(_constants_size, CodeSection::alignment(CodeBuffer::SECT_INSTS)) + | ||
align_up(_code_size, CodeSection::alignment(CodeBuffer::SECT_STUBS)) + | ||
stubs_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first look it is messed up. I understand that you are trying to take into account space between sections. It assumed the order of sections.
Add comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Thank you
HotSpotCompiledCode.dataSectionAlignment value was changed to 8 to correspond the new CodeBuffer.constants code section alignment. The jtreg tests issue was repoduced as "invalid data section alignment" JVMCI error in CodeInstaller::initialize_fields |
@@ -85,9 +85,11 @@ public boolean equals(Object obj) { | |||
} | |||
} | |||
|
|||
static final int validDataSectionAlignment = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment to explain where 8 comes from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. thanks!
@bulasevich Please do not rebase or force-push to an active PR as it invalidates existing review comments. All changes will be squashed into a single commit automatically when integrating. See OpenJDK Developers’ Guide for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -85,9 +85,12 @@ public boolean equals(Object obj) { | |||
} | |||
} | |||
|
|||
// DataSectionAlignment value matches the alignment of the CodeBuffer::SECT_CONSTS code section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the alignment sizeof(jdouble) of the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I will test it before approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My tier1-4 testing passed.
@bulasevich This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 166 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thank you! |
/integrate |
Going to push as commit 68b2057.
Your commit was automatically rebased without conflicts. |
@bulasevich Pushed as commit 68b2057. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
@@ -702,4 +706,19 @@ inline bool CodeSection::maybe_expand_to_ensure_remaining(csize_t amount) { | |||
return false; | |||
} | |||
|
|||
inline int CodeSection::alignment(int section) { | |||
if (section == CodeBuffer::SECT_CONSTS) { | |||
return (int) sizeof(jdouble); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks Graal which puts data items larger than a jdouble
(e.g. 32-byte vector masks) into the constants section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened https://bugs.openjdk.org/browse/JDK-8293999 to track this.
The goal is to remove unnecessary paddings in generated code. The alignment of the [Stub Code] section is determined by the same value as the alignment of the [Entry Point] section: the CodeEntryAlignment parameter with default values 64B on AARCH, and 32B on AMD.
Large entry alignment values are questionable for entry section. For example, Arm Neoverse N1 Software Optimization Guide recommends to align subroutines to 32B, while static compilers uses an even smaller value of 16B. However, with this change, I suggest to apply different (and smaller) values for [Constants] and [Stub Code] section alignments. This makes overall code 2% smaller on AARCH.
The correctness of the changes is checked by jtreg. Performance tested by Renaissance and SpecJBB benchmarkds on AARCH and AMD.
Example. Dummy method disassembly on AARCH, before vs after:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/8453/head:pull/8453
$ git checkout pull/8453
Update a local copy of the PR:
$ git checkout pull/8453
$ git pull https://git.openjdk.org/jdk pull/8453/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 8453
View PR using the GUI difftool:
$ git pr show -t 8453
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/8453.diff