Skip to content

8338677: Improve startup of memory access var handles by simplifying combinator chains #20647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

mcimadamore
Copy link
Contributor

@mcimadamore mcimadamore commented Aug 20, 2024

This PR reduces the amount of lambda forms (LFs) which are created when generating var handles for simple struct field accessors. This contributes to the startup regression seen in JDK-8337505.

There are essentially three sources of excessive var handle adaptation:

  1. LayoutPath::dereferenceHandle has to do some very complex adaptation (including a permute) in order to inject alignment and size checks (against the enclosing layout) on the generated var handle.
  2. Even in simple cases (e.g. when there's no dynamic coordinate), the offset of the accessed field is added to the var handle via an expensive collect adapter.
  3. When we adapt a long var handle to work on MemorySegment using an AddressLayout, we make no distinction on whether the address layout has a target layout or not. In the latter case (common!) we can adapt more simply.

The meat of this PR is to address (1) by changing the shape of the generated helpers in the X-VarHandleSegmentView.java.template class. That is, the method for doing a plain get will now have the following shape:

T get(MemorySegment segment, MemoryLayout enclosing, long base, long offset)

Where:

  • segment is the segment being accessed
  • enclosing is the enclosing layout (the root of the selected layout path) against which to check size and alignment
  • base is the public-facing offset passed by the user when calling get on the var handle
  • offset is the offset at which the selected layout element can be found from the root (this can be replaced with an expression that takes several dynamic indices and turn them into a single offset)

With this organization, it is easy to see how, in order to create a memory access var handle for a struct field S.f we only need to:

  • inject the enclosing layout S into the var handle (into the enclosing coordinate)
  • inject the offset of S.f into the var handle (into the offset coordinate)

This way, we get our plain old memory access var handle featuring only two coordinates: a segment and an offset. Note how, to get there, we only needed very simple adaptations (e.g. MethodHandles::insertCoordinates).

Evaluation

I did some tests using the benchmark in JDK-8337505 to assess the impact of this change on startup. To evaluate startup, I ran the benchmark 50 times and then took some stats. Here's what the numbers look before this change (AVG = average, MED = median):

AVG        0.196ms
MED        0.198ms
MAX        0.201ms
MIN        0.186ms

And here's after this change:

AVG        0.179ms
MED        0.180ms
MAX        0.183ms
MIN        0.174ms

This is a good 10% speedup. The number of generated LFs for this test went from 99 to 67 (we're at the point where most LFs are from static initializers in the LayoutPath and Utils classes).

I also run all the memory benhmarks starting with LoopOver before and after the change, and verified no unwanted change in peak performance.

Future work

There's more work to do here. One possible option is to tweak the template further to also generate variants for MemorySegment and boolean, so that no adaptation is required in those cases. Some preliminary examples seem to show another 10ms gain with this approach.

Another option would be to add some FFM code to the HelloClasslist class, so that some of the generated classes can be optimized at link-time. This also seems to yield another 10ms gain (I have not tried to see if this adds up with the gain in the previously described approach, but I would say probably not - at least not fully).

Many thanks to @cl4es for the invaluable help and moral support :-)


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8338677: Improve startup of memory access var handles by simplifying combinator chains (Enhancement - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20647/head:pull/20647
$ git checkout pull/20647

Update a local copy of the PR:
$ git checkout pull/20647
$ git pull https://git.openjdk.org/jdk.git pull/20647/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20647

View PR using the GUI difftool:
$ git pr show -t 20647

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20647.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 20, 2024

👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 20, 2024

@mcimadamore This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8338677: Improve startup of memory access var handles by simplifying combinator chains

Reviewed-by: redestad

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 68 new commits pushed to the master branch:

  • c4cf1e9: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation
  • 715fa8f: 8336498: [macos] [build]: install-file macro may run into permission denied error
  • e88a3b0: 8338661: StackMapTable is invalid if frames appear in dead code
  • 5981697: 8337828: CDS: Trim down minimum GC region alignment
  • cafb3dc: 6318027: BasicScrollBarUI does not disable timer when enclosing frame is disabled.
  • 88ccbb6: 8336934: Clean up JavaLangReflectAccess
  • d728107: 8338482: com/sun/jdi/ThreadMemoryLeakTest.java requires that compressed oops are enabled
  • 1ebf2cf: 8336756: Improve ClassFile Annotation writing
  • 0267284: 8338611: java.lang.module specification wording not aligned with JEP 261
  • c646efc: 8205957: setfldw001/TestDescription.java fails with bad field value
  • ... and 58 more: https://git.openjdk.org/jdk/compare/38bd8a36704a962f0ad1052fd2ec150a61663256...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Aug 20, 2024

@mcimadamore The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Aug 20, 2024
@mcimadamore mcimadamore marked this pull request as ready for review August 20, 2024 18:12
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 20, 2024
@mlbridge
Copy link

mlbridge bot commented Aug 20, 2024

Webrevs

@@ -229,6 +241,16 @@ public static void checkNonNegativeIndex(long value, String name) {
}
}

@ForceInline
public static void checkEnclosingLayout(MemorySegment segment, long offset, MemoryLayout enclosing, boolean readOnly) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the first argument be AbstractMemorySegmentImpl? The new call site already has an AbstractMemorySegmentImpl and the private static method site can do the cast instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it could yes - any reason as to why moving the cast around is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just stylistic - you cast segment to AbstractMemorySegmentImpl twice here, and if you count here https://github.com/openjdk/jdk/pull/20647/files#diff-e483572155b915ded5f6290c0e91fcf3feeaadf117865ea744920b9b9bbbec45R103 you have already casted 3 times in var handles. You can change the type here and add one new cast here: https://github.com/openjdk/jdk/pull/20647/files#diff-8b4feba9593ad63edaad23970fff28004f916bfa2bf45970f63fad83fb46cd92R289

Clarify code comment
Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Some of the new and pre-existing static MHs in Utils and other places look like they are used conditionally and are likely candidates to be StableValues once available.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 21, 2024
@mcimadamore
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Aug 21, 2024

Going to push as commit 0e8fe35.
Since your change was applied there have been 72 commits pushed to the master branch:

  • 3aeb673: 8338532: Speed up the ClassFile API MethodTypeDesc#ofDescriptor
  • 918cf11: 8338490: Serial: Move Generation::print_on to subclasses
  • 80adea8: 8338545: Functional interface implementations for common pre-boot ClassFile operations
  • 7458952: 8338595: Add more linesize for MIME decoder in macro bench test Base64Decode
  • c4cf1e9: 8338539: New Object to ObjectMonitor mapping: riscv64 implementation
  • 715fa8f: 8336498: [macos] [build]: install-file macro may run into permission denied error
  • e88a3b0: 8338661: StackMapTable is invalid if frames appear in dead code
  • 5981697: 8337828: CDS: Trim down minimum GC region alignment
  • cafb3dc: 6318027: BasicScrollBarUI does not disable timer when enclosing frame is disabled.
  • 88ccbb6: 8336934: Clean up JavaLangReflectAccess
  • ... and 62 more: https://git.openjdk.org/jdk/compare/38bd8a36704a962f0ad1052fd2ec150a61663256...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 21, 2024
@openjdk openjdk bot closed this Aug 21, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 21, 2024
@openjdk
Copy link

openjdk bot commented Aug 21, 2024

@mcimadamore Pushed as commit 0e8fe35.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants