-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8357443: ZGC: Optimize old page iteration in remap remembered phase #25345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8357443: ZGC: Optimize old page iteration in remap remembered phase #25345
Conversation
👋 Welcome back stefank! A progress list of the required criteria for merging this PR into |
@stefank This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 150 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Very nice to use information we are already tracking rather than walking everything.
This change removes the last usage of ZIndexDistributor. I don't know if we want to remove it, or leave it in case we need it for any of our upcoming features.
It is probably nice too at least keep our page table iterators in the code base, so you do not have to go dig them up / do something ad hoc if you ever want to check something. Whether they need ZIndexDistributor or not is another question.
Co-authored-by: Axel Boldt-Christmas <xmas1915@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
We decided to leave the ZIndexDistributor infrastructure in-place including the small additional extension to the page table. We might make some change in JDK 26, or later, to convert the distributor to work over the extended range internally but only hand out indices in the requested range. That way we could get rid of the current over head and still keep the current implementation. I've merged locally and run that true a large set of our testing so I'm going integrate this now. Thanks for all the reviews! /integrate |
Going to push as commit c909216.
Your commit was automatically rebased without conflicts. |
Before starting the relocation phase of a major collection we remap all pointers into the young generation so that we can disambiguate when an oop has bad bits for both the young generation and the old generation. See comment in remap_young_roots.
One part of this is requires us to visit all old pages. To parallelize that part we have a class that distribute indices to the page table to the GC worker threads (See ZIndexDistributor).
While looking into a potential, minor performance regression on Windows I noticed that the usage of constexpr in ZIndexDistributorClaimTree wasn't giving us the inlining we hoped for, which caused a noticeable worse performance on Windows compared to the other platforms. I created a patch for this that gave us the expected inlining. See master...stefank:jdk:8357443_zgc_optimize_remap_remembered
While thinking about this a bit more I realized that we could use the "found old" optimization that we already use for the remset scanning. This finds the old pages without scanning the entire page table. This gives a significant enough boost that I propose that we do that instead.
This mainly lowers the Major Collection times when you run a GC without any significant amount of objects in the old generation. So, most likely mostly important for micro benchmarks and small workloads.
The below is the average time (ms) of the Concurrent Remap Roots phase from only running
System.gc()
50 times before and after this PR.The second set of data is from using the extreme end of the supported heap size. This mimics how we previously used to have a large page table even for smaller heap size (we don't do that anymore for JDK 25). It shows a quite significant difference, but it also will likely be in the noise when running larger workloads.
This change removes the last usage of ZIndexDistributor. I don't know if we want to remove it, or leave it in case we need it for any of our upcoming features.
I've run this through tier1-7.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25345/head:pull/25345
$ git checkout pull/25345
Update a local copy of the PR:
$ git checkout pull/25345
$ git pull https://git.openjdk.org/jdk.git pull/25345/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25345
View PR using the GUI difftool:
$ git pr show -t 25345
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25345.diff
Using Webrev
Link to Webrev Comment