Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8273459: Update code segment alignment to 64 bytes #5547

Closed
wants to merge 9 commits into from

Conversation

asgibbons
Copy link
Contributor

@asgibbons asgibbons commented Sep 16, 2021

Change the default code entry alignment to 64 bytes from 32 bytes. This allows for maintaining proper 64-byte alignment of data within a code segment, which is required by several AVX-512 instructions.

I ran into this while implementing Base64 encoding and decoding. Code segments which were allocated with the address mod 32 == 0 but with the address mod 64 != 0 would cause the align() macro to misalign. This is because the align macro aligns to the size of the code segment and not the offset of the PC. So align(64) would align the PC to a multiple of 64 bytes from the start of the segment, and not to a pure 64-byte boundary as requested. Changing the alignment of the segment to 64 bytes fixes the issue.

I have not seen any measurable difference in either performance or memory usage with the tests I have run.

See this article for the discussion thread.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8273459: Update code segment alignment to 64 bytes

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5547/head:pull/5547
$ git checkout pull/5547

Update a local copy of the PR:
$ git checkout pull/5547
$ git pull https://git.openjdk.java.net/jdk pull/5547/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5547

View PR using the GUI difftool:
$ git pr show -t 5547

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5547.diff

asgibbons and others added 5 commits Jul 14, 2021
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Sep 16, 2021

👋 Welcome back asgibbons! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Sep 16, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 16, 2021

@asgibbons The following labels will be automatically applied to this pull request:

  • build
  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added build hotspot-compiler labels Sep 16, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Sep 16, 2021

Webrevs

@@ -44,7 +44,7 @@ define_pd_global(uintx, CodeCacheSegmentSize, 64 COMPILER1_AND_COMPILER2_PRES
// the uep and the vep doesn't get real alignment but just slops on by
// only assured that the entry instruction meets the 5 byte size requirement.
#if COMPILER2_OR_JVMCI
define_pd_global(intx, CodeEntryAlignment, 32);
define_pd_global(intx, CodeEntryAlignment, 64);
Copy link
Member

@jatin-bhateja jatin-bhateja Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about setting dynamic alignment based on MaxVectorSize ? i.e. match the alignment to MaxVectorSize. This way we can save internal fragmentations over AVX2.
It could be done during VM Initialization similar to following code.
https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L1439

.gitignore Outdated
@@ -16,5 +16,5 @@ NashornProfile.txt
**/JTreport/**
**/JTwork/**
/src/utils/LogCompilation/target/
/.project/
**.project**
Copy link
Member

@erikj79 erikj79 Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to .gitignore should probably be made separately.

Copy link
Contributor

@vnkozlov vnkozlov Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This change should be removed from patch.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

I share Dean's concern from the discussion before. CodeEntryAlignment is used in a lot of places and we have to be careful about changes to it.
I found only 7 cases with align(64):

src/hotspot/cpu/x86//stubGenerator_x86_64.cpp:    __ align(64);
src/hotspot/cpu/x86//stubGenerator_x86_64.cpp:    __ align(64);
src/hotspot/cpu/x86//stubGenerator_x86_64.cpp:    __ align(64);
src/hotspot/cpu/x86//stubGenerator_x86_64.cpp:    __ align(64);
src/hotspot/cpu/x86//stubGenerator_x86_32.cpp:    __ align(64);
src/hotspot/cpu/x86//stubGenerator_x86_32.cpp:    __ align(64);
src/hotspot/cpu/x86//stubGenerator_x86_32.cpp:    __ align(64);

It does not justify such general changes.
May suggestion would be to add new align64() method to call pc() as you suggested in original proposal.

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Sep 16, 2021

New function may also use MaxVectorSize as Jatin proposed if it helps.

@asgibbons
Copy link
Contributor Author

@asgibbons asgibbons commented Sep 17, 2021

I think I have not made the point clearly enough. The align function is used to manipulate the address bits for the byte following the align(). This means that wherever the code is copied, the address of that byte should have the appropriate address bit configuration in the copy (as well as the original, of course). Since the current implementation is using the base address of the allocated segment to determine alignment, the only way to ensure the proper bit configuration of the address is to ensure the base address of the newly-allocated segment is aligned identically to the original.

I believe this is entirely independent of MaxVectorSize, so I don't believe it's appropriate to use this value for address alignment. Using pc() fixes the case in the source segment, but will break 50% of the time when the segment is copied with a CodeEntryAlignment of 32.

I think the bottom line is that align() is broken for any value greater than CodeEntryAlignment. I can foresee a case where it may be beneficial (from an algorithm perspective) to have large alignment values, like align(256) to simplify pointer arithmetic (for example). All of these proposed changes will not ensure this alignment when a segment is copied.

Perhaps the appropriate thing to do is to put an assert() in align() to fail if the requested alignment cannot be ensured?

IMHO, the "right" thing to do is to mark the bytes requiring address alignment and handle the cases on copy. This would add significant complexity, however.

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Sep 17, 2021

I think I have not made the point clearly enough. The align function is used to manipulate the address bits for the byte following the align(). This means that wherever the code is copied, the address of that byte should have the appropriate address bit configuration in the copy (as well as the original, of course). Since the current implementation is using the base address of the allocated segment to determine alignment, the only way to ensure the proper bit configuration of the address is to ensure the base address of the newly-allocated segment is aligned identically to the original.

I believe this is entirely independent of MaxVectorSize, so I don't believe it's appropriate to use this value for address alignment. Using pc() fixes the case in the source segment, but will break 50% of the time when the segment is copied with a CodeEntryAlignment of 32.

I think the bottom line is that align() is broken for any value greater than CodeEntryAlignment. I can foresee a case where it may be beneficial (from an algorithm perspective) to have large alignment values, like align(256) to simplify pointer arithmetic (for example). All of these proposed changes will not ensure this alignment when a segment is copied.

Perhaps the appropriate thing to do is to put an assert() in align() to fail if the requested alignment cannot be ensured?

IMHO, the "right" thing to do is to mark the bytes requiring address alignment and handle the cases on copy. This would add significant complexity, however.

Following code suggests that instr start addresses always honor CodeEntryAlignment.
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/asm/codeBuffer.cpp#L114

Current value of CodeEntryAligment is 32, a 32 byte aligned address is inherently 8, 16 byte aligned but not vice-versa.

Setting this value to 64 will cover cases for 8,16, 32 and 64 byte alignment constraints in stubGenerator_64.cpp.

Also there are several location in stubGenerator.cpp using __ align(CodeEntryAlignment) and suddenly it will also ensure 64 byte alignment and create internal fragmentation issue, also as Vliadimir pointed out there are handful location which needs 64 byte alignment. My suggestion was, if you are attempting it then its scope should be extended only for AVX512 and hence MaxVectorSize usage was suggested since problem of alignment will majorly surface for vector instructions and MaxVectorSize value can be set to 32 even on AVX512 targets, thus its a robust indicator of vector size and associated alignment constraints.

@dean-long
Copy link
Member

@dean-long dean-long commented Sep 17, 2021

It sounds like this will new alignment requirement will only be needed for stubs (initially?), but as proposed it will affect all other types of CodeBlobs. Just looking at the affect during startup, I saw padding for BufferBlobs go from 24 to 56, RuntimeBlobs go from 0 to 32 and 16 to 48, and nmethods go from 24 to 56.

I would like to suggest again, to use the actual alignment requirements of the CodeBuffer to determine the alignment of the CodeBlob.

Perhaps the appropriate thing to do is to put an assert() in align() to fail if the requested alignment cannot be ensured?

I agree.

IMHO, the "right" thing to do is to mark the bytes requiring address alignment and handle the cases on copy. This would add significant complexity, however.

I disagree. Let's not mark individual bytes. The call to align() is enough to allow us to record the maximum alignment required by the CodeBuffer, and the added complexity is not at the individual instruction copy, but just choosing the correct alignment value when creating the CodeBlob. For example, use MAX2(codebuffer->required_alignment(), CodeEntryAlignment) in place of CodeEntryAlignment.

And for my own curiousity, I would like to hear from Intel what the expected affect on icache performance is from increasing the alignment of code.

@sviswa7
Copy link

@sviswa7 sviswa7 commented Sep 27, 2021

@asgibbons To me Vladimir Kozlov's suggestion of adding a align64() method calling pc() as you originally proposed looks the best. It meets our purpose and is limited in scope.

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Sep 27, 2021

I am back from vacation!

I want to point out that when we generate code for these stubs we don't move them in CodeCache (in contrast to compiled methods): https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/stubRoutines.cpp#L268

BufferBlob::create() allocates specified space (of size code_size2, for example) directly in CodeCache in NonNMethod section:
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/codeBlob.cpp#L232
https://github.com/openjdk/jdk/blob/master/src/hotspot/share/code/codeBlob.cpp#L272

Based on that, using pc() and new align64() should be fine for these few cases.

@asgibbons
Copy link
Contributor Author

@asgibbons asgibbons commented Sep 28, 2021

Reverted CodeEntryAlignment to 32 bytes. Added align64() function and updated references to align(64).

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Sep 28, 2021

HI @asgibbons,
Alignment of JIT sequence for loops work under influence of -XX:OptoLoopAlignment flag. This is currently set to 16 may be we can change this to 64 and study the effect on ICache as a separate patch as pointed by @dean-long.

For stubs a new runtime constant StubCodeEntryAlignment can be added which determines the start address alignment of stub BufferBlobs. This will limit the impact of changes also will prevent any unwanted fragmentation issues.

Your current patch with new align64 macro looks good, it will also restrict the scope of changes.

@@ -1170,6 +1170,11 @@ void MacroAssembler::addpd(XMMRegister dst, AddressLiteral src) {
}
}

// See 8273459. Function for ensuring 64-byte alignment, intended for stubs only.
Copy link
Contributor

@vnkozlov vnkozlov Sep 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to extend comment to explain why it will not work for nmethods - because they are copied when published.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Current changes looks good. I have only 2 comments.

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Sep 28, 2021

HI @asgibbons, Alignment of JIT sequence for loops work under influence of -XX:OptoLoopAlignment flag. This is currently set to 16 may be we can change this to 64 and study the effect on ICache as a separate patch as pointed by @dean-long.

I would be very careful changing loop code alignment. You will introduce a lot of NOP instruction into code to alight it which will be executed. I am fine with experimenting (running SPEC benchmarks) with different OptoLoopAlignment values - may be on modern CPUs 16 bytes may not be optimal. But there are a lot of code with small loops in Java which will regress if pre-loop code has a lot of NOPs.

For stubs a new runtime constant StubCodeEntryAlignment can be added which determines the start address alignment of stub BufferBlobs. This will limit the impact of changes also will prevent any unwanted fragmentation issues.

May be but what benefit you can get with different alignment for stubs code?

Your current patch with new align64 macro looks good, it will also restrict the scope of changes.

@openjdk
Copy link

@openjdk openjdk bot commented Sep 28, 2021

⚠️ @asgibbons This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@asgibbons
Copy link
Contributor Author

@asgibbons asgibbons commented Sep 28, 2021

Thanks for the comments. I reverted the .gitignore change and added comments as requested. I also found a couple of unmodified align(64) which were changed.

As suggested, I added an assert in align() to flag spots where alignment cannot be ensured.

@vnkozlov
Copy link
Contributor

@vnkozlov vnkozlov commented Sep 28, 2021

Good. Let me test it before you push.

@jatin-bhateja
Copy link
Member

@jatin-bhateja jatin-bhateja commented Sep 28, 2021

.

HI @asgibbons, Alignment of JIT sequence for loops work under influence of -XX:OptoLoopAlignment flag. This is currently set to 16 may be we can change this to 64 and study the effect on ICache as a separate patch as pointed by @dean-long.

I would be very careful changing loop code alignment. You will introduce a lot of NOP instruction into code to alight it which will be executed. I am fine with experimenting (running SPEC benchmarks) with different OptoLoopAlignment values - may be on modern CPUs 16 bytes may not be optimal. But there are a lot of code with small loops in Java which will regress if pre-loop code has a lot of NOPs.

For stubs a new runtime constant StubCodeEntryAlignment can be added which determines the start address alignment of stub BufferBlobs. This will limit the impact of changes also will prevent any unwanted fragmentation issues.

May be but what benefit you can get with different alignment for stubs code?

It was an alternative to creating a different align64 macro in anticipation that we may introduce more stubs / code section constants which may give better performance if aligned to 64 byte boundary for wide vector targets like AVX512. New flag is specific to stubs thus has restrictive scope but at the same time parameterizable and can have different values for targets.

As you pointed out currently there are handful of places where guaranteed 64 byte alignment is desired so current approach looks good.

Your current patch with new align64 macro looks good, it will also restrict the scope of changes.

@sviswa7
Copy link

@sviswa7 sviswa7 commented Sep 28, 2021

Looks good to me as well.

@asgibbons
Copy link
Contributor Author

@asgibbons asgibbons commented Sep 28, 2021

Tested tier 1-3 Linux. All pass.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

My testing passed.

@openjdk
Copy link

@openjdk openjdk bot commented Sep 28, 2021

@asgibbons This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8273459: Update code segment alignment to 64 bytes

Reviewed-by: kvn, sviswanathan

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 10 new commits pushed to the master branch:

  • d8a278f: 8274396: Suppress more warnings on non-serializable non-transient instance fields in client libs
  • e49e5b5: 8273972: Multi-core choke point in CMM engine (LCMSTransform.doTransform)
  • 2072bc7: 8274391: Suppress more warnings on non-serializable non-transient instance fields in java.util.concurrent
  • 6a477bd: 8274415: Suppress warnings on non-serializable non-transient instance fields in java.xml
  • 67e52a3: 8273634: [TEST_BUG] Improve javax/swing/text/ParagraphView/6364882/bug6364882.java
  • b7425b6: 8239502: [TEST_BUG] Test javax/swing/text/FlowView/6318524/bug6318524.java never fails
  • c57a6c6: 8274265: Suspicious string concatenation in logTestUtils.inline.hpp
  • 6f4cefb: 8274394: Use Optional.isEmpty instead of !Optional.isPresent in jdk.jlink
  • 94f5e80: 8274276: Cache normalizedBase URL in URLClassPath.FileLoader
  • b36881f: 8274383: JNI call of getAccessibleSelection on a wrong thread

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@vnkozlov, @sviswa7) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Sep 28, 2021
Copy link

@sviswa7 sviswa7 left a comment

The updated patch looks good to me as well.

@asgibbons
Copy link
Contributor Author

@asgibbons asgibbons commented Sep 28, 2021

/integrate

@openjdk openjdk bot added the sponsor label Sep 28, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 28, 2021

@asgibbons
Your change (at version f71c778) is now ready to be sponsored by a Committer.

@asgibbons
Copy link
Contributor Author

@asgibbons asgibbons commented Sep 28, 2021

@vnkozlov @jatin-bhateja @dean-long - Thank you all for the comments and testing. Much appreciated.

@sviswa7
Copy link

@sviswa7 sviswa7 commented Sep 28, 2021

/sponsor

@openjdk
Copy link

@openjdk openjdk bot commented Sep 28, 2021

Going to push as commit 53b25bc.
Since your change was applied there have been 11 commits pushed to the master branch:

  • 1a29b1e: 8274016: Replace 'for' cycles with iterator with enhanced-for in java.desktop
  • d8a278f: 8274396: Suppress more warnings on non-serializable non-transient instance fields in client libs
  • e49e5b5: 8273972: Multi-core choke point in CMM engine (LCMSTransform.doTransform)
  • 2072bc7: 8274391: Suppress more warnings on non-serializable non-transient instance fields in java.util.concurrent
  • 6a477bd: 8274415: Suppress warnings on non-serializable non-transient instance fields in java.xml
  • 67e52a3: 8273634: [TEST_BUG] Improve javax/swing/text/ParagraphView/6364882/bug6364882.java
  • b7425b6: 8239502: [TEST_BUG] Test javax/swing/text/FlowView/6318524/bug6318524.java never fails
  • c57a6c6: 8274265: Suspicious string concatenation in logTestUtils.inline.hpp
  • 6f4cefb: 8274394: Use Optional.isEmpty instead of !Optional.isPresent in jdk.jlink
  • 94f5e80: 8274276: Cache normalizedBase URL in URLClassPath.FileLoader
  • ... and 1 more: https://git.openjdk.java.net/jdk/compare/be4037374520917d5a0ed54eebb3d5d6d100d429...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Sep 28, 2021
@openjdk openjdk bot added integrated and removed ready rfr sponsor labels Sep 28, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Sep 28, 2021

@sviswa7 @asgibbons Pushed as commit 53b25bc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@DamonFool
Copy link
Member

@DamonFool DamonFool commented Sep 29, 2021

Minimal build fails after this PR.
Please see #5764 .
Thanks.

@asgibbons asgibbons deleted the asgibbons-align-fix branch Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build hotspot-compiler integrated
7 participants