-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8338677: Improve startup of memory access var handles by simplifying combinator chains #20647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8338677: Improve startup of memory access var handles by simplifying combinator chains #20647
Conversation
👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into |
@mcimadamore This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 68 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@mcimadamore The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
@@ -229,6 +241,16 @@ public static void checkNonNegativeIndex(long value, String name) { | |||
} | |||
} | |||
|
|||
@ForceInline | |||
public static void checkEnclosingLayout(MemorySegment segment, long offset, MemoryLayout enclosing, boolean readOnly) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't the first argument be AbstractMemorySegmentImpl
? The new call site already has an AbstractMemorySegmentImpl
and the private static method site can do the cast instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it could yes - any reason as to why moving the cast around is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just stylistic - you cast segment
to AbstractMemorySegmentImpl
twice here, and if you count here https://github.com/openjdk/jdk/pull/20647/files#diff-e483572155b915ded5f6290c0e91fcf3feeaadf117865ea744920b9b9bbbec45R103 you have already casted 3 times in var handles. You can change the type here and add one new cast here: https://github.com/openjdk/jdk/pull/20647/files#diff-8b4feba9593ad63edaad23970fff28004f916bfa2bf45970f63fad83fb46cd92R289
Clarify code comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Some of the new and pre-existing static MHs in Utils
and other places look like they are used conditionally and are likely candidates to be StableValue
s once available.
/integrate |
Going to push as commit 0e8fe35.
Your commit was automatically rebased without conflicts. |
@mcimadamore Pushed as commit 0e8fe35. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
This PR reduces the amount of lambda forms (LFs) which are created when generating var handles for simple struct field accessors. This contributes to the startup regression seen in JDK-8337505.
There are essentially three sources of excessive var handle adaptation:
LayoutPath::dereferenceHandle
has to do some very complex adaptation (including a permute) in order to inject alignment and size checks (against the enclosing layout) on the generated var handle.long
var handle to work onMemorySegment
using anAddressLayout
, we make no distinction on whether the address layout has a target layout or not. In the latter case (common!) we can adapt more simply.The meat of this PR is to address (1) by changing the shape of the generated helpers in the
X-VarHandleSegmentView.java.template
class. That is, the method for doing a plain get will now have the following shape:Where:
segment
is the segment being accessedenclosing
is the enclosing layout (the root of the selected layout path) against which to check size and alignmentbase
is the public-facing offset passed by the user when callingget
on the var handleoffset
is the offset at which the selected layout element can be found from the root (this can be replaced with an expression that takes several dynamic indices and turn them into a single offset)With this organization, it is easy to see how, in order to create a memory access var handle for a struct field
S.f
we only need to:S
into the var handle (into theenclosing
coordinate)S.f
into the var handle (into theoffset
coordinate)This way, we get our plain old memory access var handle featuring only two coordinates: a segment and an offset. Note how, to get there, we only needed very simple adaptations (e.g.
MethodHandles::insertCoordinates
).Evaluation
I did some tests using the benchmark in JDK-8337505 to assess the impact of this change on startup. To evaluate startup, I ran the benchmark 50 times and then took some stats. Here's what the numbers look before this change (AVG = average, MED = median):
And here's after this change:
This is a good 10% speedup. The number of generated LFs for this test went from 99 to 67 (we're at the point where most LFs are from static initializers in the
LayoutPath
andUtils
classes).I also run all the memory benhmarks starting with
LoopOver
before and after the change, and verified no unwanted change in peak performance.Future work
There's more work to do here. One possible option is to tweak the template further to also generate variants for
MemorySegment
andboolean
, so that no adaptation is required in those cases. Some preliminary examples seem to show another 10ms gain with this approach.Another option would be to add some FFM code to the
HelloClasslist
class, so that some of the generated classes can be optimized at link-time. This also seems to yield another 10ms gain (I have not tried to see if this adds up with the gain in the previously described approach, but I would say probably not - at least not fully).Many thanks to @cl4es for the invaluable help and moral support :-)
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20647/head:pull/20647
$ git checkout pull/20647
Update a local copy of the PR:
$ git checkout pull/20647
$ git pull https://git.openjdk.org/jdk.git pull/20647/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 20647
View PR using the GUI difftool:
$ git pr show -t 20647
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20647.diff
Webrev
Link to Webrev Comment