-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8359683: ZGC: NUMA-Aware Relocation #26898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back jsikstro! A progress list of the required criteria for merging this PR into |
|
@jsikstro This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 55 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
| ZRelocationTargets _shared_medium_targets; | ||
|
|
||
| ZWorkers* workers() const; | ||
| void work(ZRelocationSetParallelIterator* iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dead code since JDK-8256390, commit 372595c.
|
I addressed some offline feedback from @kstefanj in new commits. |
|
... and some more offline feedback from @xmas92! |
kstefanj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, did you look anything at changing the iterators to make it possible having something like _iters.reset() in the constructor for ZRelocateTask. This could also be done as a followup.
Co-authored-by: Stefan Johansson <54407259+kstefanj@users.noreply.github.com>
I tested out and discussed some potential alternatives with Axel and we both agreed that a nice solution to this would be better implemented in a follow-up. We envisioned something where we can reset the state of an iterator, without destructing and constructing them like we do in this patch. This would require control over changing the iterator's internal state. Another solution is to use malloc to allocate/free the array before/after every GC cycle, but I prefer the current solution more. I expanded upon the comment next to destructing the iterators, which I hope is a good compromise until a nicer solution is in place. |
|
I addressed additional offline review feedback from @xmas92 in new commits. |
xmas92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed this mostly offline. Good work.
I also think we can cleanup the iterator, but let us do it in a followup.
|
Going to push as commit b39c736.
Your commit was automatically rebased without conflicts. |
|
After this PR, on my linux-x64(224 cpu threads), fastdebug build cannot start jvm with -Xmx16M sometimes. Does it's by designed, or it's a regress ./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:+UseZGC -Xmx16M -version
Error occurred during initialization of VM
java.lang.OutOfMemoryError: Java heap space
at java.lang.System.initPhase2(java.base@26-internal/System.java:1934) |
|
@sendaoYan Yes, this is a difference in behavior after this PR. Let's continue this discussion in the issue you filed at JDK-8366476. |
Hello,
With JDK-8350441, ZGC got infrastructure to prefer allocations to end up on a specific NUMA node. When a new object is allocated, it is preferably placed on the NUMA node that is local to the allocating thread. This strategy improves access speeds for mutators working on that object, if it continues to be used by threads on the same NUMA node. However, when relocating objects, ZGC will potentially move (migrate) objects away from the NUMA node they were originally allocated on. This means that if a page is selected as part of the Relocation Set, the objects on that page could potentially be moved to another NUMA node, breaking the NUMA locality we strived for when allocating.
We should consider adding NUMA-awareness to ZGC's relocation phase to keep NUMA-locality benefits for mutators.
Proposal (expandable section)
NUMA-Awareness consists of two main features:
First: GC threads should strive toward keeping the NUMA locality of objects to their original node, meaning that objects should ideally be relocated to a page that is on the same NUMA node.
Mutator threads should have a different approach, as we know that the mutator that's (helping out with) relocating an object is also going to access it, so we migrate the object to the NUMA node associated with the relocating thread. This strategy is already in effect and does not require any changes to the code (specifically, ZObjectAllocator already track per-CPU specific Small pages). However, Medium pages are shared between CPUs and thus does not hold any guarantees on which NUMA node it is on. Combined, both mutator and Medium page relocation are not common, and thus there is little gain from introducing NUMA-awareness to that specific scenario. Instead, this can be addressed in a follow-up if we feel that's necessary.
Second: When the GC chooses a page from the Relocation Set to relocate objects from, it should choose page(s) that are local to the same NUMA node, to speed up performance by working on NUMA-local memory. There are multiple ways to achieve this, but the main goal should be to (1) start working on pages that are local to the GC thread's NUMA node, and (2) when finished with pages on its own NUMA node, start working (help out) with pages associated with other NUMA nodes.
Some key observations to consider with the above approach:
The NUMA node associated with the GC thread should be "polled"/"checked" in regular intervals, to account for the fact that the GC thread might have migrated to another CPU, and thus perhaps to another NUMA node. It is probably enough to check the associated NUMA node before claiming a new page and starting to relocate objects.
By choosing pages based on NUMA-node association rather than live bytes, we might not start with the most sparse page first. This is really only a problem if the machine is fully saturated and there are allocation stalls. Additionally, it is worth considering that in a common NUMA configuration, it takes twice as long to access remote memory compared to local memory. This means that a local page could (theoretically) be relocated twice as fast as a remote page, which could release memory faster than starting with the most sparse page, if that page is on a remote node.
The new strategy is more of an optimization for mutators and might make the GC take a bit longer to complete the relocation phase. The current strategy is to move objects to the NUMA node associated with the GC thread, regardless of where the object was originally from. This makes the GC fast, at the potential downside of mutators not accessing local memory any more. However, since ZGC is a concurrent garbage collector, it isn't really a huge issue if the relocation phase becomes a bit longer, if the mutators receive a speedup.
Depending on the distribution of what NUMA node GC threads end up on, we might see a negative impact on performance. This hasn't changed before or after this proposal, but with NUMA-awareness implemented, it would be possible to explore future enhancements where threads are placed on a core associated with a particular NUMA-node so that threads can work on NUMA-local memory more often.
To make the logic of this patch easier, multi-partition allocations for Medium pages has been disabled. After JDK-8357449, which enables variable sizes for Medium pages all the way down to 4MB, there is little gain in enabling multi-partition for Medium pages. If a Medium page allocation would only succeed with multi-partition enabled, we do not have 4MB contiguous memory available, which would only happen if memory is extremely low or if the heap is degeneratively swiss-cheesed into 2MB chunks.
Testing
Performance testing shows no regression when NUMA is disabled or not available.
Testing before/after the patch shows that GC thread's relocating objects from/to NUMA-local pages has gone from about 50% up to 95%. This depends very much on the distribution of threads placed on specific NUMA-nodes, but this shows the new strategy works as intended.
There is no apparent speedup or slowdown in the time it takes to complete the Relocation Phase with NUMA enabled. This could very well be due to the change in the strategy of choosing the destination NUMA node. Before we would always relocate an object to the NUMA node that is local to the GC thread. Now, we maintain the NUMA-locality when relocating, which means that if we relocate an object on a remote NUMA node, we will get worse performance as both the source and target destination is on a remote NUMA node.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26898/head:pull/26898$ git checkout pull/26898Update a local copy of the PR:
$ git checkout pull/26898$ git pull https://git.openjdk.org/jdk.git pull/26898/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26898View PR using the GUI difftool:
$ git pr show -t 26898Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26898.diff
Using Webrev
Link to Webrev Comment