-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8326541: [AArch64] ZGC C2 load barrier stub should consider the length of live registers when spilling registers #17977
Conversation
…ive registers when spilling registers
👋 Welcome back jzhu! A progress list of the required criteria for merging this PR into |
@JoshuaZhuwj The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
That is a welcome change - in light of possibly very large registers, we don't want to save more than is necessary. |
@stooart-mon Thanks for your review!
Yes. I wrote several cases to test against this commit to ensure the quality. TestZGCSpillingAtLoadBarrierStub.java Assembly and OptoAssembly outputs before the change: https://github.com/JoshuaZhuwj/openjdk_cases/blob/master/8326541/output_before_change.log Outputs after the change: |
Thanks, that helps - I can see you're saving/restoring the correct register lengths. Would it be possible to generate a testcase to test that registers are being saved/restored correctly? The following is a testcase that is an example of where this testing is done, although in this PR's case it isn't subroutines, but load/store barriers: 4cd3187#diff-949a4a2f889be36be47e9b02b6d6cd1247768953b95a024f649878bac721fa04 |
@stooart-mon I had previously thought about how to write a good test case, but I did not think of a good way at that time. Let me rethink how to handle this gracefully. Thanks :-) |
@JoshuaZhuwj This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 828 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@fisk, @robcasloz) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
A jtreg test case is created for this commit. |
/label add hotspot-compiler |
@JoshuaZhuwj |
int expected_number_of_push_pop_at_load_barrier_fregs) throws Exception { | ||
String keyString = keyword + expected_number_of_push_pop_at_load_barrier_fregs + " " + expected_freg_type + " registers"; | ||
if (!containOnlyOneOccuranceOfKeyword(stdout, keyString)) { | ||
throw new RuntimeException("Stdout is expected to contain only one occurance of keyString: " + "'" + keyString + "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the event of failure, would it be possible to print the erroneous output? The output from the subprocesses, being directly piped in, doesn't lend itself to easy debugging. At first I thought there might be an option that could alter OutputAnalyzers output, but sadly not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the event of failure, would it be possible to print the erroneous output? The output from the subprocesses, being directly piped in, doesn't lend itself to easy debugging. At first I thought there might be an option that could alter OutputAnalyzers output, but sadly not.
Done. Thanks for your comments.
@stooart-mon Thanks for your review. Please let me know if you have any other comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me and seems to follow a similar design to what I did on x86_64 vectors. Thanks for doing this!
Thanks a lot for the review! @fisk |
/label add hotspot-gc |
@JoshuaZhuwj |
Waiting for another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I also tested the changeset on Oracle's internal CI (ZGC tests within tiers 1-7, on Neon machines) with an additional patch (963def0) that forces ZGC read barriers to always take the slow path and clears all vector registers upon the slow path's runtime call. Testing succeeded.
Hello - I have no other comments - looks good. |
Thank you a lot for the reviews! @stooart-mon @fisk @robcasloz |
/integrate |
@JoshuaZhuwj |
/sponsor |
Going to push as commit 5c38386.
Your commit was automatically rebased without conflicts. |
@TobiHartmann @JoshuaZhuwj Pushed as commit 5c38386. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Currently ZGC C2 load barrier stub saves the whole live register regardless of what size of register is live on aarch64.
Considering the size of SVE register is an implementation-defined multiple of 128 bits, up to 2048 bits,
even the use of a floating point may cause the maximum 2048 bits stack occupied.
Hence I would like to introduce this change on aarch64: take the length of live registers into consideration in ZGC C2 load barrier stub.
In a floating point case on 2048 bits SVE machine, the following ZLoadBarrierStubC2
could be optimized into:
Besides the above benefit, when we know what size of register is live,
we could remove the unnecessary caller save in ZGC C2 load barrier stub when we meet C-ABI SOE fp registers.
Passed jtreg with option "-XX:+UseZGC -XX:+ZGenerational" with no failures introduced.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17977/head:pull/17977
$ git checkout pull/17977
Update a local copy of the PR:
$ git checkout pull/17977
$ git pull https://git.openjdk.org/jdk.git pull/17977/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 17977
View PR using the GUI difftool:
$ git pr show -t 17977
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17977.diff
Webrev
Link to Webrev Comment