Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8342444: Shenandoah: Uncommit regions from a separate, STS aware thread #22019

Closed

Conversation

earthling-amzn
Copy link
Contributor

@earthling-amzn earthling-amzn commented Nov 11, 2024

Currently, Shenandoah uncommits regions from its control thread. The control thread is responsible for starting GC cycles in a timely fashion. Uncommitting memory from this thread may introduce unwanted delays in the control thread's response to GC pressure.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8342444: Shenandoah: Uncommit regions from a separate, STS aware thread (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/22019/head:pull/22019
$ git checkout pull/22019

Update a local copy of the PR:
$ git checkout pull/22019
$ git pull https://git.openjdk.org/jdk.git pull/22019/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 22019

View PR using the GUI difftool:
$ git pr show -t 22019

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/22019.diff

Using Webrev

Link to Webrev Comment

@earthling-amzn earthling-amzn marked this pull request as draft November 11, 2024 17:32
@bridgekeeper
Copy link

bridgekeeper bot commented Nov 11, 2024

👋 Welcome back wkemper! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Nov 11, 2024

@earthling-amzn This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8342444: Shenandoah: Uncommit regions from a separate, STS aware thread

Reviewed-by: shade, kdnilsen, ysr

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 37 new commits pushed to the master branch:

  • 5cc150c: 8342979: Start of release updates for JDK 25
  • 85fedbf: 8344607: Link Time Optimization - basic support for clang
  • 5a0899f: 8345302: Building microbenchmarks require larger Java heap
  • 1ece4f9: 8345514: Should use internal class name when calling ClassLoader.getResourceAsByteArray
  • ef8da28: 8345591: [aarch64] macroAssembler_aarch64.cpp compile fails ceil_log2 not declared
  • 7513b13: 8328944: NMT reports "unknown" memory
  • 691e692: 8345565: Remove remaining SecurityManager motivated APIs from sun.reflect.util
  • 97b8a09: 8345339: JFR: Missing javadoc for RecordingStream::onMetadata
  • 456c71d: 8343699: [aarch64] Bug in MacroAssembler::klass_decode_mode()
  • 308357c: 8345578: New test in JDK-8343622 fails with a promoted build
  • ... and 27 more: https://git.openjdk.org/jdk/compare/1a73c76d83d34d10519c9d10fb0e51d098907ab0...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Nov 11, 2024

@earthling-amzn The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-gc hotspot-gc-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Nov 11, 2024
if (heap->is_bitmap_slice_committed(region)) {
ctx->clear_bitmap(region);
{
ShenandoahHeapLocker locker(heap->lock());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it a bug that previous version of this code did not acquire the heap lock?

Is the lock required for the entirety of time that we are clearing the bitmap? Or is it just required to get a trustworthy check on is_bitmap_slice_committed()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading more of this PR, I believe we need the heap lock to get a reliable signal of bitmap_slice_committed(). But I believe we do not need the heap lock for ctx->clear_bitmap(region) so would prefer to move that outside the lock, unless I am misunderstanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure we can do that. Prior to this change, the control thread performed both clearing the bitmap and uncommitting the region's bitmap, so they could never happen concurrently. With this change, a separate thread could perform the uncommit. Consider:

  1. Control thread takes heap lock, observes that bitmap slice for region A is committed
  2. Control thread releases heap lock, begins clearing bitmap (writing zeros to bitmap slice)
  3. Uncommit thread takes heap lock, believes it must uncommit region A
  4. Uncommit thread uncommits bitmap slice for region A
  5. Segfault in Control Thread

I do believe if we had a per region lock, it would be useful here. Holding a lock over the entire heap for this feels like overkill. Or, we could schedule the uncommit so that it does not occur during a GC cycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, wait a sec. This code is in ShenandoahResetBitmapTask, so it can run in parallel. Putting a lock here inhibits parallelism. I understand the failure mode, but I think we should really be optimizing for the case when ShenandoahUncommit is not enabled (e.g. -Xmx == -Xms).

Sounds like there is a hassle in allowing concurrent uncommit to overlap with the GC cycle. In addition to this particular problem, we might be stealing cycles from the GC threads and take additional TTSP lag to park the uncommitter for the in-cycle GC pauses. I have no clear solution for this yet, but I think we need to explore if we can suspend the uncommit before going into GC cycle...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have the control and uncommit threads coordinate their efforts. In the worst case, it could mean delaying concurrent reset while the control thread waits for the uncommit thread to yield.

We could also try a more targeted lock only for the region's bitmap slice, but it doesn't seem right that one thread would be trying to clear a bitmap, while the other is trying to uncommit it. A lock could preserve technical correctness, but contention here would just mean that one thread would have wasted its time (either clearing a bitmap that is then uncommitted, or attempting to clear a bitmap that was first uncommitted (in this case, we would need the control thread to detect this and skip the region)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we open a ticket to consider future improved concurrency by moving clear_bitmap(region) outside the global heap lock?

@earthling-amzn earthling-amzn marked this pull request as ready for review November 12, 2024 17:27
@earthling-amzn
Copy link
Contributor Author

I modified the testing pipelines to set -Xms4g -Xmx10g -XX:+ShenandoahUncommit. All performance and stress tests completed successfully on x86 and aarch64. Marking this as ready for review.

@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 12, 2024
@mlbridge
Copy link

mlbridge bot commented Nov 12, 2024

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursory review:

if (count > 0) {
_heap->notify_heap_changed();
double elapsed = os::elapsedTime() - start;
log_info(gc)("Uncommitted " SIZE_FORMAT " regions, in %.3fs", count, elapsed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can, can we match the current log format? E.g. print Concurrent uncommit, with appropriately formatted timestamp? I think we also want log_info(gc,start) at the beginning of the method. I think ShenandoahConcurrentPhase helper did all that, can we still use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can restore the log messages, but I don't think ShenandoahConcurrentPhase and friends will like being used outside of a cycle. I'll look into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, at least restore the log format and add gc+start log as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is still not addressed, unfortunately ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I spent some time trying to resurrect ShenandoahConcurrentPhase for uncommit here, but it really doesn't want to be used outside of a gc cycle. Also, previously it was logging heap usage, which isn't quite what we want here (this may actually increase during this phase, which makes it seem as though nothing is being uncommitted).

I've restored the original logging format, but instead of logging heap usage it is now logging heap committed before and after. Here is an excerpt from specjbb2015 with -Xms5g -Xmx10g:

[2024-11-20T20:02:25.056+0000][97.396s][22293][info][gc,start       ] Concurrent uncommit
[2024-11-20T20:02:25.072+0000][97.412s][22293][info][gc             ] Concurrent uncommit 5424M->5120M(5120M) 15.988ms
[2024-11-20T20:05:17.916+0000][270.255s][22293][info][gc,start       ] Concurrent uncommit
[2024-11-20T20:05:18.169+0000][270.508s][22293][info][gc             ] Concurrent uncommit 10240M->5120M(5120M) 253.048ms
[2024-11-20T20:06:45.329+0000][357.668s][22293][info][gc,start       ] Concurrent uncommit
[2024-11-20T20:06:45.596+0000][357.935s][22293][info][gc             ] Concurrent uncommit 10240M->5120M(5120M) 267.144ms
[2024-11-20T20:06:57.147+0000][369.486s][22293][info][gc,start       ] Concurrent uncommit
[2024-11-20T20:06:57.148+0000][369.487s][22293][info][gc             ] Concurrent uncommit 5456M->5440M(5440M) 1.189ms

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are emitting a log line that looks like a properly formatted GC log line, but the numbers there mean something else for Concurrent uncommit, we are bound to confuse users and automatic tools. Uncommit should affect capacity, this is how we know how deep we have uncommitted. So, I suggest we emit:

Concurrent uncommit XXXXM->XXXXM (YYYYM) z.zzzms

...where XXXX is the heap used at the end of uncommit (note before and after are the same) and YYYY is capacity. This will not expose users to thinking uncommit grows the heap usage, and would give us instantaneous view on heap usage and capacity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the numbers are preceded by Concurrent uncommit, with that context it's not much of a stretch to think these numbers represent the change in committed memory. The original log message (in which heap usage may increase during uncommit) was not helpful. A message with the same format in which heap usage also appears to not change at all during an uncommit is also perplexing. Are we trying too hard to preserve the original, not useful message? Maybe we just want a new message that plainly says:

Concurrently uncommitted XXXXM in z.zzzms

or

Concurrent uncommit: time z.zzzms, committed before XXXXM, committed after YYYYM, capacity ZZZZM

Copy link
Member

@shipilev shipilev Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't want to emit something that looks like a heap usage GC log line, if it is not. Unfortunately, X->Y (Z) T.TTTTms is a common format for X and Y as heap use. I agree posting X == Y would be only marginally better. So, maybe this goes as middle ground:

Concurrent uncommit XXXXM (YYYYM) z.zzzms

...where XXXX is the amount uncommitted, YYYY is the final heap capacity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this looks good.

if (heap->is_bitmap_slice_committed(region)) {
ctx->clear_bitmap(region);
{
ShenandoahHeapLocker locker(heap->lock());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, wait a sec. This code is in ShenandoahResetBitmapTask, so it can run in parallel. Putting a lock here inhibits parallelism. I understand the failure mode, but I think we should really be optimizing for the case when ShenandoahUncommit is not enabled (e.g. -Xmx == -Xms).

Sounds like there is a hassle in allowing concurrent uncommit to overlap with the GC cycle. In addition to this particular problem, we might be stealing cycles from the GC threads and take additional TTSP lag to park the uncommitter for the in-cycle GC pauses. I have no clear solution for this yet, but I think we need to explore if we can suspend the uncommit before going into GC cycle...

Copy link
Contributor

@kdnilsen kdnilsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this, but best to wait for @shipilev approval before integrating.

if (heap->is_bitmap_slice_committed(region)) {
ctx->clear_bitmap(region);
{
ShenandoahHeapLocker locker(heap->lock());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we open a ticket to consider future improved concurrency by moving clear_bitmap(region) outside the global heap lock?

Copy link
Contributor

@kdnilsen kdnilsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have read through the latest version of the code. Thanks.

Copy link
Member

@ysramakrishna ysramakrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. A few documentation comment requests.

Also please share performance data in this PR or in the ticket, especially from the perf/benchmark that may have precipitated this change.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 19, 2024
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, thanks!

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll approve again, with the following nits:

if (count > 0) {
_heap->notify_heap_changed();
double elapsed = os::elapsedTime() - start;
log_info(gc)("Uncommitted " SIZE_FORMAT " regions, in %.3fs", count, elapsed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, at least restore the log format and add gc+start log as well.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 19, 2024
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Nov 19, 2024
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there, modulo restoring the logging.

if (count > 0) {
_heap->notify_heap_changed();
double elapsed = os::elapsedTime() - start;
log_info(gc)("Uncommitted " SIZE_FORMAT " regions, in %.3fs", count, elapsed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is still not addressed, unfortunately ^

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 20, 2024
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Nov 20, 2024
Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think log message is still confusing a bit...

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 26, 2024
@earthling-amzn
Copy link
Contributor Author

@ysramakrishna - I ran several iterations of specjbb2015 with different variations of polling interval. Results show that 1/10th of ShenandoahUncommitDelay is reasonable, and avoids unintentional commit delays when the polling interval is equal or greater than ShenandoahUncommitDelay.

                      Category |  Count |         Total |      GeoMean |      Average |     Trim 0.1 |       StdDev |      Minimum |      Maximum
openjdk:master   critical_jops |      5 |     50862.000 |    10162.095 |    10172.400 |    10172.400 |      513.605 |     9630.000 |    10882.000                                
30ms polling     critical_jops |      5 |     48035.000 |     9582.113 |     9607.000 |     9607.000 |      778.036 |     8808.000 |    10692.000
30s polling      critical_jops |      5 |     56398.000 |    11272.026 |    11279.600 |    11279.600 |      460.355 |    10627.000 |    11842.000
no polling       critical_jops |      5 |     55917.000 |    11176.046 |    11183.400 |    11183.400 |      460.960 |    10899.000 |    11995.000

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Dec 3, 2024
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 4, 2024
@earthling-amzn
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Dec 5, 2024

Going to push as commit bedb68a.
Since your change was applied there have been 38 commits pushed to the master branch:

  • dbf48a5: 8344665: Refactor PartialArrayState allocation for reuse
  • 5cc150c: 8342979: Start of release updates for JDK 25
  • 85fedbf: 8344607: Link Time Optimization - basic support for clang
  • 5a0899f: 8345302: Building microbenchmarks require larger Java heap
  • 1ece4f9: 8345514: Should use internal class name when calling ClassLoader.getResourceAsByteArray
  • ef8da28: 8345591: [aarch64] macroAssembler_aarch64.cpp compile fails ceil_log2 not declared
  • 7513b13: 8328944: NMT reports "unknown" memory
  • 691e692: 8345565: Remove remaining SecurityManager motivated APIs from sun.reflect.util
  • 97b8a09: 8345339: JFR: Missing javadoc for RecordingStream::onMetadata
  • 456c71d: 8343699: [aarch64] Bug in MacroAssembler::klass_decode_mode()
  • ... and 28 more: https://git.openjdk.org/jdk/compare/1a73c76d83d34d10519c9d10fb0e51d098907ab0...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 5, 2024
@openjdk openjdk bot closed this Dec 5, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 5, 2024
@openjdk
Copy link

openjdk bot commented Dec 5, 2024

@earthling-amzn Pushed as commit bedb68a.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated shenandoah shenandoah-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

4 participants