Skip to content

Conversation

@jsikstro
Copy link
Member

@jsikstro jsikstro commented Aug 22, 2025

Hello,

With JDK-8350441, ZGC got infrastructure to prefer allocations to end up on a specific NUMA node. When a new object is allocated, it is preferably placed on the NUMA node that is local to the allocating thread. This strategy improves access speeds for mutators working on that object, if it continues to be used by threads on the same NUMA node. However, when relocating objects, ZGC will potentially move (migrate) objects away from the NUMA node they were originally allocated on. This means that if a page is selected as part of the Relocation Set, the objects on that page could potentially be moved to another NUMA node, breaking the NUMA locality we strived for when allocating.

We should consider adding NUMA-awareness to ZGC's relocation phase to keep NUMA-locality benefits for mutators.

Proposal (expandable section)

NUMA-Awareness consists of two main features:

First: GC threads should strive toward keeping the NUMA locality of objects to their original node, meaning that objects should ideally be relocated to a page that is on the same NUMA node.

Mutator threads should have a different approach, as we know that the mutator that's (helping out with) relocating an object is also going to access it, so we migrate the object to the NUMA node associated with the relocating thread. This strategy is already in effect and does not require any changes to the code (specifically, ZObjectAllocator already track per-CPU specific Small pages). However, Medium pages are shared between CPUs and thus does not hold any guarantees on which NUMA node it is on. Combined, both mutator and Medium page relocation are not common, and thus there is little gain from introducing NUMA-awareness to that specific scenario. Instead, this can be addressed in a follow-up if we feel that's necessary.

Second: When the GC chooses a page from the Relocation Set to relocate objects from, it should choose page(s) that are local to the same NUMA node, to speed up performance by working on NUMA-local memory. There are multiple ways to achieve this, but the main goal should be to (1) start working on pages that are local to the GC thread's NUMA node, and (2) when finished with pages on its own NUMA node, start working (help out) with pages associated with other NUMA nodes.

Some key observations to consider with the above approach:

  • The NUMA node associated with the GC thread should be "polled"/"checked" in regular intervals, to account for the fact that the GC thread might have migrated to another CPU, and thus perhaps to another NUMA node. It is probably enough to check the associated NUMA node before claiming a new page and starting to relocate objects.

  • By choosing pages based on NUMA-node association rather than live bytes, we might not start with the most sparse page first. This is really only a problem if the machine is fully saturated and there are allocation stalls. Additionally, it is worth considering that in a common NUMA configuration, it takes twice as long to access remote memory compared to local memory. This means that a local page could (theoretically) be relocated twice as fast as a remote page, which could release memory faster than starting with the most sparse page, if that page is on a remote node.

  • The new strategy is more of an optimization for mutators and might make the GC take a bit longer to complete the relocation phase. The current strategy is to move objects to the NUMA node associated with the GC thread, regardless of where the object was originally from. This makes the GC fast, at the potential downside of mutators not accessing local memory any more. However, since ZGC is a concurrent garbage collector, it isn't really a huge issue if the relocation phase becomes a bit longer, if the mutators receive a speedup.

  • Depending on the distribution of what NUMA node GC threads end up on, we might see a negative impact on performance. This hasn't changed before or after this proposal, but with NUMA-awareness implemented, it would be possible to explore future enhancements where threads are placed on a core associated with a particular NUMA-node so that threads can work on NUMA-local memory more often.

  • To make the logic of this patch easier, multi-partition allocations for Medium pages has been disabled. After JDK-8357449, which enables variable sizes for Medium pages all the way down to 4MB, there is little gain in enabling multi-partition for Medium pages. If a Medium page allocation would only succeed with multi-partition enabled, we do not have 4MB contiguous memory available, which would only happen if memory is extremely low or if the heap is degeneratively swiss-cheesed into 2MB chunks.

Testing

  • tier1-8 ZGC tasks only, on all Oracle supported platforms, without NUMA
  • tier1-8 with -XX:ZFakeNUMA=16 on linux-x64-debug
  • tier1-3 with NUMA on linux-x64-debug

Performance testing shows no regression when NUMA is disabled or not available.

Testing before/after the patch shows that GC thread's relocating objects from/to NUMA-local pages has gone from about 50% up to 95%. This depends very much on the distribution of threads placed on specific NUMA-nodes, but this shows the new strategy works as intended.

There is no apparent speedup or slowdown in the time it takes to complete the Relocation Phase with NUMA enabled. This could very well be due to the change in the strategy of choosing the destination NUMA node. Before we would always relocate an object to the NUMA node that is local to the GC thread. Now, we maintain the NUMA-locality when relocating, which means that if we relocate an object on a remote NUMA node, we will get worse performance as both the source and target destination is on a remote NUMA node.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8359683: ZGC: NUMA-Aware Relocation (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26898/head:pull/26898
$ git checkout pull/26898

Update a local copy of the PR:
$ git checkout pull/26898
$ git pull https://git.openjdk.org/jdk.git pull/26898/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26898

View PR using the GUI difftool:
$ git pr show -t 26898

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26898.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 22, 2025

👋 Welcome back jsikstro! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 22, 2025

@jsikstro This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8359683: ZGC: NUMA-Aware Relocation

Reviewed-by: aboldtch, sjohanss

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 55 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Aug 22, 2025

@jsikstro The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-gc hotspot-gc-dev@openjdk.org rfr Pull request is ready for review labels Aug 22, 2025
@mlbridge
Copy link

mlbridge bot commented Aug 22, 2025

Webrevs

ZRelocationTargets _shared_medium_targets;

ZWorkers* workers() const;
void work(ZRelocationSetParallelIterator* iter);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dead code since JDK-8256390, commit 372595c.

@jsikstro
Copy link
Member Author

I addressed some offline feedback from @kstefanj in new commits.

@jsikstro
Copy link
Member Author

... and some more offline feedback from @xmas92!

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, did you look anything at changing the iterators to make it possible having something like _iters.reset() in the constructor for ZRelocateTask. This could also be done as a followup.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 25, 2025
Co-authored-by: Stefan Johansson <54407259+kstefanj@users.noreply.github.com>
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Aug 25, 2025
@jsikstro
Copy link
Member Author

Looks good, did you look anything at changing the iterators to make it possible having something like _iters.reset() in the constructor for ZRelocateTask. This could also be done as a followup.

I tested out and discussed some potential alternatives with Axel and we both agreed that a nice solution to this would be better implemented in a follow-up. We envisioned something where we can reset the state of an iterator, without destructing and constructing them like we do in this patch. This would require control over changing the iterator's internal state. Another solution is to use malloc to allocate/free the array before/after every GC cycle, but I prefer the current solution more.

I expanded upon the comment next to destructing the iterators, which I hope is a good compromise until a nicer solution is in place.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 26, 2025
@jsikstro
Copy link
Member Author

I addressed additional offline review feedback from @xmas92 in new commits.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Aug 26, 2025
Copy link
Member

@xmas92 xmas92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed this mostly offline. Good work.

I also think we can cleanup the iterator, but let us do it in a followup.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 26, 2025
@jsikstro
Copy link
Member Author

Thank you for the reviews! @kstefanj @xmas92
I reran some sanity testing (tier1-3) with with and without -XX:ZFakeNUMA=16, which looks good.
/integrate

@openjdk
Copy link

openjdk bot commented Aug 27, 2025

Going to push as commit b39c736.
Since your change was applied there have been 55 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 27, 2025
@openjdk openjdk bot closed this Aug 27, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 27, 2025
@openjdk
Copy link

openjdk bot commented Aug 27, 2025

@jsikstro Pushed as commit b39c736.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@sendaoYan
Copy link
Member

After this PR, on my linux-x64(224 cpu threads), fastdebug build cannot start jvm with -Xmx16M sometimes.

Does it's by designed, or it's a regress

./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -XX:+UseZGC -Xmx16M -version
Error occurred during initialization of VM
java.lang.OutOfMemoryError: Java heap space
        at java.lang.System.initPhase2(java.base@26-internal/System.java:1934)

@jsikstro
Copy link
Member Author

jsikstro commented Sep 1, 2025

@sendaoYan Yes, this is a difference in behavior after this PR. Let's continue this discussion in the issue you filed at JDK-8366476.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

4 participants