Skip to content

Conversation

@kdnilsen
Copy link
Contributor

@kdnilsen kdnilsen commented Jan 24, 2025

At the end of a degenerated GC, we check whether sufficient progress has been made in replenishing the memory available to the mutator. The test for good progress is implemented as a ratio of free memory against the total heap size.

For generational Shenandoah, the ratio should be computed against the size of the young generation. Note that the size of the generational collection set is based on young generation size rather than total heap size.

This issue first identified in GenShen GC logs, where a large number of degenerated cycles were upgrading to full GC because the free-set progress was short of desired by 10-25%.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8348595: GenShen: Fix generational free-memory no-progress check (Task - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23306/head:pull/23306
$ git checkout pull/23306

Update a local copy of the PR:
$ git checkout pull/23306
$ git pull https://git.openjdk.org/jdk.git pull/23306/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23306

View PR using the GUI difftool:
$ git pr show -t 23306

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23306.diff

Using Webrev

Link to Webrev Comment

Previously, we were using size of heap to asses progress of generational
degenerated cycle.  But that is not appropriate, because the collection
set is chosen based on the size of young generation.
As previously implemented, we used the heap size to measure goodness of
progress.  However, heap size is only appropriate for non-generational
Shenandoah.  Freeset abstraction works for both.
@bridgekeeper
Copy link

bridgekeeper bot commented Jan 24, 2025

👋 Welcome back kdnilsen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 24, 2025

@kdnilsen This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8348595: GenShen: Fix generational free-memory no-progress check

Reviewed-by: phh, xpeng

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 21 new commits pushed to the master branch:

  • 742e735: 8349858: Print compilation task before blocking compiler thread for shutdown
  • db42a48: 8350011: Convert jpackage test lib tests in JUnit format
  • 19c0ce4: 8349751: AIX build failure after upgrade pipewire to 1.3.81
  • fa1bd23: 8343802: Prevent NULL usage backsliding
  • 57f4c30: 8347917: AArch64: Enable upper GPR registers in C1
  • ff52859: 8285624: jpackage fails to create exe, msi when Windows OS is in FIPS mode
  • 3741c98: 8349883: Locale.LanguageRange.parse("-") throws ArrayIndexOutOfBoundsException
  • 3e7acfa: 8349873: StackOverflowError after JDK-8342550 if -Duser.timezone= is set to a deprecated zone id
  • d8fcd43: 8349927: Waiting for compiler termination delays shutdown for 10+ ms
  • a88e2a5: 8349977: JVMCIRuntime::_shared_library_javavm_id should be jlong
  • ... and 11 more: https://git.openjdk.org/jdk/compare/a637ccf2fead25ea6a06ad6bd65e92b8694ee11c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@phohensee) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 24, 2025
@openjdk
Copy link

openjdk bot commented Jan 24, 2025

@kdnilsen The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-gc hotspot-gc-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Jan 24, 2025
@kdnilsen
Copy link
Contributor Author

netflix-calibrate-better.xlsx

The attached spreadsheet provides a summary of performance benefits of this patch. In the spreadsheet:
Control represents tip without changes
Better Penalty represents #23305
Better Progress represents #23306 (this PR)
Better Both represents the combined benefits of Better Penalty and Better Progress.

Compared to Control, "Better Both" results are better across all measures:

image

@mlbridge
Copy link

mlbridge bot commented Jan 24, 2025

Webrevs

Comment on lines 49 to 52
ShenandoahFreeSet* free_set = _heap->free_set();
size_t free_actual = free_set->available();
// The sum of free_set->capacity() and ->reserved represents capacity of young in generational, heap in non-generational.
size_t free_expected = ((free_set->capacity() + free_set->reserved()) / 100) * ShenandoahCriticalFreeThreshold;
Copy link

@pengxiaolong pengxiaolong Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may pass ShenandoahGeneration as parameter to is_good_progress to simplify the calculation of free_expected, it should be like:
generation->max_capacity() / 100 * ShenandoahCriticalFreeThreshold
Good part is, free_expected might be more accurate in Full GC/Degen for global cycle, e.g. Full GC collects memory for global, free_expected should be calculated using the metrics from global generation. But either way, free_expected is not clearly defined in generational mode now, current code also works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this suggestion. I've made change. It turns out there was actually a bug in the original implementation, so I am retesting the performance results.

Copy link

@pengxiaolong pengxiaolong Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, honest I didn't understand that why (free_set->capacity() + free_set->reserved() represents capacity of young in generational, is it the bug you found? free_set->capacity() is the capacity of all mutator regions which also excludes the regions doesn't have capacity for new object alloc(it is calculated when rebuild free set)

I thought a bit more, it makes more sense to calculate free_expected in snap_before, max_capacity of generations may change after collection, the free_expected should be calculated before the cycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting thoughts. So young-generation size will change under these circumstances:

  1. There's a lot of young-gen memory to be promoted, or we choose to promote some young-gen regions in place (by relabeling the regions as OLD without evacuating their data). In both of these cases, we may shrink young in order to expand old.
  2. The GC cycle is mixed, so it has the side effect of reclaiming some old regions. These reclaimed old regions will typically be granted back to young, until such time as we need to expand old in order to hold results of promotion.

While it makes sense for expected to be computed based on "original size" of young generation, the question of how much free remaining in young represents "good progress" should probably be based on the current size of young. Ultimately, we are trying to figure out if there's enough memory in young to make it worthwhile to attempt another concurrent GC cycle.

I realize this thinking is a bit "fuzzy". The heuristic was originally designed for non-generational use.

I'm inclined to keep as is currently implemented, but should probably add a comment to explain why. What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I agree with it is bit "fuzzy".
I'm not sure we should consider following case:

Degen cycle doesn't reclaim any memory, but promoted some young regions resulting in young capacity to shrink, in this case we may treat it as "good progress" but actually it is not.

A "good progress" could be free_actual_after > free_actual_before && free_actual_after > free_expected, what do you think? I am not sure all cases triggering degen cycle, this might be a false case that never happens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we manage to pass the test "free_actual_after > free_expected" following the degen, even if young has shrunk, I think it is reasonable to pursue concurrent GC. Passing this exact test at the end of the next GC (assuming no further adjustments to generation sizes) would qualify us to continue with concurrent GC on the next cycle.

In general, it is very rare that "full gc" is the right thing to do. we're in the process of deprecating it entirely.

I will add a comment to clarify the thinking here.

ShenandoahFreeSet* free_set = _heap->free_set();
size_t free_actual = free_set->available();
// The sum of free_set->capacity() and ->reserved represents capacity of young in generational, heap in non-generational.
size_t free_expected = ((free_set->capacity() + free_set->reserved()) / 100) * ShenandoahCriticalFreeThreshold;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an outsider, the units involved and what exactly is being calculated is pretty opaque. Why would we divide by 100 to compute free_expected and not do the same for free_actual? Do we care about integer division truncation? The default value of ShenandoahCriticalFreeThreshold is 1, so multiplying by it is a nop by default, which seems strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShenandoahCriticalFreeThreshold represents a percentage of the "total size". To calculate N% of the young generation size, we divide the generation size by 100 and then multiply by ShenandoahCriticalFreeThreshold. This code is a bit different in the most recent revision. Do you think it needs a comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment here. Thanks.

In testing suggested refinements, I discovered a bug in original
implementation.  ShenandoahFreeSet::capacity() does not represent the
size of young generation.  It represents the total size of the young
regions that had available memory at the time we most recently rebuilt
the ShenandoahFreeSet.

I am rerunning the performance tests following this suggested change.
@kdnilsen
Copy link
Contributor Author

kdnilsen commented Feb 10, 2025

These are updated performance results after making the change that uses generation size to determine expected. This change computes a larger expected size, increasing the likelihood that a particular degenerated cycle will be considered "bad progress":

Screenshot 2025-02-10 at 3 38 18 PM

This represents overall improvement compared to previously reported number. It would appear that the difference in performance might be the result of "random noise".

@kdnilsen
Copy link
Contributor Author

These are the results of combining both proposed PRs into a single execution test:

Screenshot 2025-02-10 at 3 41 31 PM

This result is not as good as what was reported above. In my judgment, it still represents improvement over tip. The difference between the two runs may also be signal noise as there is no clear correlation between number of Full GCs and percentile latencies. The two full GCs reported in the "better both (redo)" run both result from alloc failure during evacuation.

Copy link

@pengxiaolong pengxiaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for the comprehensive tests and explanations, my approve doesn't count though:)

Copy link
Member

@phohensee phohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 11, 2025
kdnilsen and others added 3 commits February 11, 2025 18:10
… <= _highest_valid_narrow_klass_id) failed: narrowKlass ID out of range (3131947710)

Reviewed-by: shade
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Feb 14, 2025
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 14, 2025
@kdnilsen
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Feb 14, 2025
@openjdk
Copy link

openjdk bot commented Feb 14, 2025

@kdnilsen
Your change (at version 0e86c5b) is now ready to be sponsored by a Committer.

@phohensee
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Feb 14, 2025

Going to push as commit ba6c965.
Since your change was applied there have been 25 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 14, 2025
@openjdk openjdk bot closed this Feb 14, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Feb 14, 2025
@openjdk
Copy link

openjdk bot commented Feb 14, 2025

@phohensee @kdnilsen Pushed as commit ba6c965.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated shenandoah shenandoah-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

3 participants