8353115: GenShen: mixed evacuation candidate regions need accurate live_data #24319

kdnilsen · 2025-03-31T03:17:51Z

The existing implementation of get_live_data_bytes() and git_live_data_words() does not always behave as might be expected. In particular, the value returned ignores any allocations that occur subsequent to the most recent mark effort that identified live data within the region. This is typically ok for young regions, where the amount of live data determines whether a region should be added to the collection set during the final-mark safepoint.

However, old-gen regions that are placed into the set of candidates for mixed evacuation are more complicated. In particular, by the time the old-gen region is added to a mixed evacuation, its live data may be much larger than at the time concurrent old marking ended.

This PR provides comments to clarify the shortcomings of the existing functions, and adds new functions that provide more accurate accountings of live data for mixed-evacuation candidate regions.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8353115: GenShen: mixed evacuation candidate regions need accurate live_data (Task - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24319/head:pull/24319
$ git checkout pull/24319

Update a local copy of the PR:
$ git checkout pull/24319
$ git pull https://git.openjdk.org/jdk.git pull/24319/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24319

View PR using the GUI difftool:
$ git pr show -t 24319

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24319.diff

Using Webrev

Link to Webrev Comment

This reverts commit 702710e.

This reverts commit 3a67b1f.

kdnilsen · 2025-10-17T17:17:52Z

I have placed instrumentation into the code to confirm that the live_data reported by ShenandoahHeapRegion is the same before and after this PR for traditional Shenandoah mode. So the regressions are either "signal noise", or perhaps inefficiencies introduced regarding how we compute the live_data.

In performance critical loops that implement freeset rebuild and choose collection set, it is common to call ShenandoahHeapRegion::get_live_data_words() and functions that depend on its implementation including: get_live_data_bytes(), garbage(), has_live(), and ShenandoahCollectionSet::add_region(). The refactored implementation calculates ShenandoahMarkingContext at loop prologue, and remembers region index as an induction variable within the loop. This saves multiple indirections in the implementations of these methods. With these changes, performance of traditional Shenandoah on specjbb2015 is restored to on par, or slighly better than before the change to fix-live-data-for-mixed-evacation-candidates. (The implementation of fix-live-data-for-mixed-evacuation-candidates makes it slightly more expensive to calculate live-data-words. This resulted in a performance regression on specjbb2015 before this commit.)

openjdk · 2025-10-24T14:18:21Z

@kdnilsen this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout fix-live-data-for-mixed-evac-candidates
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

kdnilsen · 2025-10-27T19:53:08Z

After refactoring the code to perform better in the "tight" rebuild free-set and build-collection set loops, the Shenandoah results show very slight improvement (rather than regression) on specjbb2015. Here is the performance regression that we saw before commit ecdec63

Here are comparisons (in a slightly different environment) after that same commit:

…d-evac-candidates

…d-evac-candidates The resulting fastdebug build has 64 failures. I need to debug these. Probably introduced by improper resolution of merge conflicts

earthling-amzn · 2025-11-10T21:54:14Z

src/hotspot/share/gc/shenandoah/shenandoahHeapRegion.inline.hpp

-inline size_t ShenandoahHeapRegion::get_live_data_words() const {
-  ShenandoahMarkingContext *ctx = ShenandoahHeap::heap()->marking_context();
-  HeapWord* tams = ctx->top_at_mark_start(this);
+inline size_t ShenandoahHeapRegion::get_live_data_words(ShenandoahMarkingContext* ctx, size_t index) const {


Why do we want to change this signature? If index is always this->_index why go through the trouble to pass a member field to a member function on the same instance? I have a similar sentiment about passing ShenandoahMarkingContext through the function. Should we have a member ShenandoahMarkingContext* _marking_context? Changing this signature creates a lot of noise on the PR and it's not clear to me why we would do this.

Good call out. Am willing to back this change out. Motivation for this change is that we were seeing some performance regression in this PR (especially noticeable on traditional Shenandoah). At first, I thought this was due to miscomputation of get_live_data_words(), but I confirmed through further testing that the results from get_live_data_words() were the same before and after this PR.

So I concluded that the "explanation" for performance regression is that it now takes longer for us to compute get_live_data_words(). The original implementation was:

return AtomicAccess::load(&_live_data)

The new implementation added:

Find the marking context by fetching this from ShenandoahHeap::heap() Find tams by consulting the marking context with region, which has to indirect through region to find index

I found that passing this information into the function rather than having the function recompute it brought us back to par with performance of master.

Functions are declared in-line, and it is conceivable that the compiler would figure this crude optimization out for itself, but it didn't.

ysramakrishna · 2025-11-13T17:06:12Z

What if one used "garbage" as the sorting metric for efficiency (under assumption that I stated earlier of considering only retired, fully allocated regions -- the alternative makes the metric a bit more nuanced), and compute garbage as [regionSize(or used assuming all of region allocated) - markedLive]. This makes the metric invariant after final marking for any region considered in the target evacuation set, and you don't deal with trying to determine the amount allocated above TAMS, keeping the calculations simple and the selection and sorting criteria clean and easy to reason about.

I also noticed that choosing selection set etc. takes the heap lock. Why?

I'll leave more specific comments in the code later today.

bridgekeeper · 2025-12-12T05:43:52Z

@kdnilsen This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

…d-evac-candidates

kdnilsen added 30 commits January 12, 2024 01:06

Improve documentation of how Evac-OOM Protocol works

702710e

Merge branch 'openjdk:master' into master

61b575f

Revert "Improve documentation of how Evac-OOM Protocol works"

51d056f

This reverts commit 702710e.

Merge branch 'openjdk:master' into master

ba98e42

Merge branch 'openjdk:master' into master

441487c

Merge branch 'openjdk:master' into master

dafc363

Merge branch 'openjdk:master' into master

c4c252e

Merge branch 'openjdk:master' into master

41ba86a

Merge branch 'openjdk:master' into master

f215a70

Merge branch 'openjdk:master' into master

4d6b5cd

Merge branch 'openjdk:master' into master

7fe605f

Merge branch 'openjdk:master' into master

2e224f6

Merge branch 'openjdk:master' into master

46ad5c6

Merge branch 'openjdk:master' into master

9a1989d

Merge branch 'openjdk:master' into master

4126c22

Merge branch 'openjdk:master' into master

981692e

Make GC logging less verbose

3a67b1f

Revert "Make GC logging less verbose"

3692312

This reverts commit 3a67b1f.

Merge branch 'openjdk:master' into master

045590b

Merge branch 'openjdk:master' into master

fbbd88c

Merge branch 'openjdk:master' into master

7e0edf0

Merge branch 'openjdk:master' into master

3525369

Merge branch 'openjdk:master' into master

fe0da51

Merge branch 'openjdk:master' into master

db12fe5

Merge branch 'openjdk:master' into master

0440bae

Merge branch 'openjdk:master' into master

3bdc022

Merge branch 'openjdk:master' into master

1ee2ff1

Merge branch 'openjdk:master' into master

e6e772f

Merge branch 'openjdk:master' into master

c5a159e

Merge branch 'openjdk:master' into master

e7ca4f8

kdnilsen added 7 commits October 19, 2025 21:28

Remove debug scaffolding

d62f1fa

add an assert to detect suspected bug

02137b8

fix two indexing bugs

7552690

Add debug instrumentation to CompressedClassSpaceSizeInJmapHeap.java

8bed4d4

fix errors in CompressedClassSpaceSizeInJmapHeap.java

c908783

rework CompressedClassSpaceSizeinJmapHeap.java

a19bb87

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Oct 24, 2025

kdnilsen added 2 commits October 27, 2025 20:58

Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixe…

1df268c

…d-evac-candidates

fix error in merge conflict resolution

ccba941

openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Oct 27, 2025

kdnilsen added 2 commits November 6, 2025 19:41

Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixe…

af9bbe1

…d-evac-candidates The resulting fastdebug build has 64 failures. I need to debug these. Probably introduced by improper resolution of merge conflicts

Fix mistaken merge resolution

16cd6f8

kdnilsen marked this pull request as ready for review November 10, 2025 14:35

openjdk bot added the rfr Pull request is ready for review label Nov 10, 2025

earthling-amzn suggested changes Nov 10, 2025

View reviewed changes

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Dec 4, 2025

Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixe…

2e4c463

…d-evac-candidates

openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Dec 25, 2025

Finish merge

7b9c4d6

kdnilsen mentioned this pull request Jan 5, 2026

DRAFT: Adaptive evac with surge #28955

Draft

3 tasks

touch file to force retest

6480fef

Merge remote-tracking branch 'jdk/master' into fix-live-data-for-mixe…

6d10ae5

…d-evac-candidates

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8353115: GenShen: mixed evacuation candidate regions need accurate live_data #24319

8353115: GenShen: mixed evacuation candidate regions need accurate live_data #24319

kdnilsen commented Mar 31, 2025 •

edited by openjdk bot

Loading

Uh oh!

kdnilsen commented Oct 17, 2025

Uh oh!

openjdk bot commented Oct 24, 2025

Uh oh!

kdnilsen commented Oct 27, 2025

Uh oh!

earthling-amzn Nov 10, 2025

Uh oh!

kdnilsen Nov 11, 2025 •

edited

Loading

Uh oh!

ysramakrishna commented Nov 13, 2025

Uh oh!

bridgekeeper bot commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

8353115: GenShen: mixed evacuation candidate regions need accurate live_data #24319

Are you sure you want to change the base?

8353115: GenShen: mixed evacuation candidate regions need accurate live_data #24319

Conversation

kdnilsen commented Mar 31, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

kdnilsen commented Oct 17, 2025

Uh oh!

openjdk bot commented Oct 24, 2025

Uh oh!

kdnilsen commented Oct 27, 2025

Uh oh!

earthling-amzn Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

kdnilsen Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ysramakrishna commented Nov 13, 2025

Uh oh!

bridgekeeper bot commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

kdnilsen commented Mar 31, 2025 •

edited by openjdk bot

Loading

kdnilsen Nov 11, 2025 •

edited

Loading