Skip to content

Conversation

@kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Apr 23, 2025

Please review this change to improve TLAB handling in ZGC.

Summary
In ZGC the maximum TLAB size is 256k and in many cases we want the TLABs to be this big. But for threads only allocating a fraction of this, using TLABs of this size will render significant waste. This is normally handled by the shared TLAB sizing heuristic, but there have been a few things in ZGC which have prevented this mechanism to work as expected.

The heuristic bases the resizing on several things, and the GC is responsible for providing the amount used memory for TLABs (tlab_used()) and the capacity available for TLABs (tlab_capacity()). Capacity is more or less the size of Eden for the other GCs, but ZGC does not have any generation sizes there is no given size for Eden. Before this change we returned the heap capacity as the TLAB capacity, since in theory we can use what is left for TLABs. Returning this, more or less disables the sizing heuristic since we only sample the usage when this holds:

bool update_allocation_history = used > 0.5 * capacity;

So we need to come up with a better value to return as capacity, we could use the amount of free memory, but this is also an over estimation of what will actually be used. The proposed approach is to use an average over the last 10 values of what was actually used for TLABs as the capacity. This will provide a good estimate of what the expected TLAB capacity is and the sizing heuristic will work as expected.

Another problem in this area is that since ZGC does TLAB retiring concurrently, the used value returned has previously been reset before used in the sizing heuristic. So to be able to use consisten values, we need to snapshot the usage in the mark start pause for the young generation and use those value for any TLAB retired after this pause.

How we track the TLAB used value is also changed. Before this change, TLAB used was tracked per-cpu and the way it was implemented let to some unwanted overhead. We added two additional fields that were tracked for all ages, but only used for Eden. These fields were cleared in the mark start pause, and when having many CPUs this actually affect the pause time. The new code tracks the Eden usage in the page-allocator instead.

This change also fixes to that the maximum TLAB size returned from ZGC is in words not bytes, which will mostly help logging, since the actual sizing is still enforced correctly.

Testing

  • Functional testing tier1-tier7
  • Performance testing in Aurora is neutral
  • Manual testing looking at TLAB waste shows a clear reduction, in some scenarios the waste could previously be above 2% and now it is below 1%
  • Manual verification that the worse case pauses are shorter due to the reduced work in the mark start pause

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8353184: ZGC: Simplify and correct tlab_used() tracking (Enhancement - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24814/head:pull/24814
$ git checkout pull/24814

Update a local copy of the PR:
$ git checkout pull/24814
$ git pull https://git.openjdk.org/jdk.git pull/24814/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24814

View PR using the GUI difftool:
$ git pr show -t 24814

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24814.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 23, 2025

👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 23, 2025

@kstefanj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8353184: ZGC: Simplify and correct tlab_used() tracking

Reviewed-by: stefank, aboldtch

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 326 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 23, 2025
@openjdk
Copy link

openjdk bot commented Apr 23, 2025

@kstefanj The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Apr 23, 2025
@mlbridge
Copy link

mlbridge bot commented Apr 23, 2025

Webrevs

@openjdk openjdk bot mentioned this pull request May 3, 2025
3 tasks
Copy link
Member

@stefank stefank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change. I've added a few comments below.

Comment on lines 31 to 32


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines 38 to 39
precond(size <= _used);
Atomic::sub(&_used, size, memory_order_relaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
precond(size <= _used);
Atomic::sub(&_used, size, memory_order_relaxed);
precond(size <= _used);
Atomic::sub(&_used, size, memory_order_relaxed);

}

void ZTLABUsage::reset() {
const size_t current_used = Atomic::xchg(&_used, (size_t) 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work instead?

Suggested change
const size_t current_used = Atomic::xchg(&_used, (size_t) 0);
const size_t current_used = Atomic::xchg(&_used, 0u);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, 0ul works on Linux, but Windows fails with that.

}

// Save the old values for logging
const size_t old_used = used();
Copy link
Member

@stefank stefank May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not immediately obvious what _used is compared to used() Could one of these be renamed so that readers don't mistakenly assume that used() returns _used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked a bit about this offline, will add some comments and rename used() and capacity() to tlab_used() and tlab_capacity() to make it a bit more clear that they are not directly connected and also better match the ZHeap interface.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 9, 2025
Copy link
Member

@xmas92 xmas92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@kstefanj
Copy link
Contributor Author

Thanks for the reviews @stefank and @xmas92

/integrate

@openjdk
Copy link

openjdk bot commented May 13, 2025

Going to push as commit 526f543.
Since your change was applied there have been 375 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 13, 2025
@openjdk openjdk bot closed this May 13, 2025
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label May 13, 2025
@openjdk openjdk bot removed the rfr Pull request is ready for review label May 13, 2025
@openjdk
Copy link

openjdk bot commented May 13, 2025

@kstefanj Pushed as commit 526f543.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants