8228609: G1 copy cost prediction uses used vs. actual copied bytes #927

linade · 2022-03-21T07:45:31Z

I would like to backport 8228609 which fixes a prediction error regarding cost per bytes copied.

The patch does not apply cleanly, so this backport is basically changing the calculation of copied bytes.

https://bugs.openjdk.java.net/browse/JDK-8227442 is not in 11u. So this patch also uses G1ParScanThreadState::surviving_young_words() in place of _surviving_young_words to adjust for the correct cset index.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8228609: G1 copy cost prediction uses used vs. actual copied bytes

Reviewers

Martin Doerr (@TheRealMDoerr - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk11u-dev pull/927/head:pull/927
$ git checkout pull/927

Update a local copy of the PR:
$ git checkout pull/927
$ git pull https://git.openjdk.java.net/jdk11u-dev pull/927/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 927

View PR using the GUI difftool:
$ git pr show -t 927

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk11u-dev/pull/927.diff

bridgekeeper · 2022-03-21T07:47:05Z

👋 Welcome back linade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

mlbridge · 2022-03-21T07:51:43Z

Webrevs

TheRealMDoerr · 2022-03-21T14:54:49Z

This PR is not recognized as backport. Please fix that (e.g. by changing the title to "Backport hash of original commit", https://wiki.openjdk.java.net/display/SKARA/Backports#Backports-BackportPullRequests).

openjdk · 2022-03-22T03:51:28Z

This backport pull request has now been updated with issue and summary from the original commit.

GoeLin · 2022-03-24T16:30:47Z

@linade
Two more things, please:

Enable Pre-submit tests
Only flag the change with jdk11u-fix-request once it is ready. I'll remove the tag for now, pleas add it again if the review is available.

linade · 2022-03-28T05:51:51Z

@GoeLin I enabled the tests. Some of them seem to be blocked by a gcc dependency issue. Would you take a look?

GoeLin · 2022-04-07T05:54:26Z

Hi, you should only label a change jdk11u-fix-request once it is ready for integration. I'll remove the label, please add it again if you have a review.

linade · 2022-04-07T09:41:01Z

Gentle ping. Can anyone review this?

The reason I believe it's worth a backport is, apart from what's mentioned in https://bugs.openjdk.java.net/browse/JDK-8228609, it correctly accounts for early reclaimed humongous objects. I think originally the oversight of the size of early reclaimed humongous objects cause the calculation to be erroneous, resulting in a very high copy time prediction and unstable gc frequency.

TheRealMDoerr

This change is very different from the original one, but fixing it in 11u sounds feasible to me. We'll test it.
Do you have a specific test case which allows observing unstable gc frequency?

TheRealMDoerr · 2022-04-07T14:53:01Z

src/hotspot/share/gc/g1/g1Policy.cpp

@@ -667,10 +668,9 @@ void G1Policy::record_collection_pause_end(double pause_time_ms, size_t cards_sc

    size_t freed_bytes = heap_used_bytes_before_gc - cur_used_bytes;


freed_bytes is now unused.

Removed. Thanks.

linade · 2022-04-08T02:26:50Z

This change is very different from the original one, but fixing it in 11u sounds feasible to me. We'll test it. Do you have a specific test case which allows observing unstable gc frequency?

We observed this in an application. With additional logging, we can observe the miscalculation from, e.g., test/hotspot/jtreg/gc/g1/TestEagerReclaimHumongousRegions.java

The current way to get copied_bytes will result in negative value. I had

..
[0.708s][error][gc] GC(186) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.714s][error][gc] GC(188) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.720s][error][gc] GC(190) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.726s][error][gc] GC(192) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.732s][error][gc] GC(194) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.738s][error][gc] GC(196) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.744s][error][gc] GC(198) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0

copied_bytes is negative because it is deduced from size_t copied_bytes = _collection_set->bytes_used_before() - freed_bytes;, and _collection_set->bytes_used_before() didn't consider eager-reclaimed humongous, so it appears to be much smaller than freed_bytes.

There currently is an if condition if (_collection_set->bytes_used_before() > freed_bytes) to prevent we from getting a negative value. So the current code simply doesn't report the data to the analytics.

Not reporting is fine. At worst we lose some data points. But it can also happen that we get a positive, near-zero copied_bytes. It will make the value double cost_per_byte_ms = average_copy_time / (double) copied_bytes; very big. This causes young generation to shrink.

TheRealMDoerr · 2022-04-08T16:12:28Z

/reviewers 2

openjdk · 2022-04-08T16:13:27Z

@TheRealMDoerr
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with 1 of role reviewers, 1 of role authors).

TheRealMDoerr

Looks technically good and the fix makes sense to me. Our nightly tests haven't found any issues related to it. It extracts the essential fix from the original change which differs a lot: openjdk/jdk@9a4c257
An additional person with GC experience should review it.
Approvers will have to judge how relevant it is. A comment from the original author would be helpful.

bridgekeeper · 2022-05-06T22:50:25Z

@linade This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

rkennke · 2022-06-01T23:11:57Z

I hink it makes sense. It is not exactly a backport, but rather a re-write, and I find it difficult to follow and convince myself that it's doing the correct thing. Therefore it seems somewhat risky.

rkennke · 2022-06-01T23:13:21Z

src/hotspot/share/gc/g1/g1ParScanThreadState.cpp

  uint length = _g1h->collection_set()->young_region_length();
  for (uint region_index = 0; region_index < length; region_index++) {
-    surviving_young_words[region_index] += _surviving_young_words[region_index];
+    surviving_young_words[region_index] += G1ParScanThreadState::surviving_young_words()[region_index];


Was that code using the wrong index before? IOW, it used the surviving_young_words of the wrong regions?

Yes, that's what I think.

So this pr is achieving two things (sorry if it were not clear):

fix the index issue: this causes calculation of the total surviving young words to be incorrect. The correct stats are stored in _surviving_young_words[1, .., n] instead of [0, .., n-1]

use a precise "copied bytes" instead of inferring from (used() - freed())

bridgekeeper · 2022-06-30T01:29:00Z

@linade This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper · 2022-08-05T06:31:13Z

@linade This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

openjdk bot added the rfr Pull request is ready for review label Mar 21, 2022

linade force-pushed the backport-8228609 branch from 4574137 to 89870b8 Compare March 22, 2022 03:49

linade changed the title ~~8228609: G1 copy cost prediction uses used vs. actual copied bytes~~ Backport 9a4c25731eef42d18ba0c20ff3eb42f32ae90c00 Mar 22, 2022

openjdk bot changed the title ~~Backport 9a4c25731eef42d18ba0c20ff3eb42f32ae90c00~~ 8228609: G1 copy cost prediction uses used vs. actual copied bytes Mar 22, 2022

openjdk bot added the backport label Mar 22, 2022

linade force-pushed the backport-8228609 branch from 89870b8 to be56589 Compare March 22, 2022 03:54

linade force-pushed the backport-8228609 branch from be56589 to c30b074 Compare March 28, 2022 03:03

8228609: G1 copy cost prediction uses used vs. actual copied bytes

3b097e8

linade force-pushed the backport-8228609 branch from c30b074 to 3b097e8 Compare April 1, 2022 02:03

trigger GitHub actions

b3d5f41

TheRealMDoerr reviewed Apr 7, 2022

View reviewed changes

remove unused variables and their dependency

e552fdb

TheRealMDoerr approved these changes Apr 8, 2022

View reviewed changes

linade mentioned this pull request May 5, 2022

[Bug] G1 copy cost prediction uses used vs. actual copied bytes dragonwell-project/dragonwell11#226

Closed

rkennke reviewed Jun 1, 2022

View reviewed changes

bridgekeeper bot closed this Aug 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8228609: G1 copy cost prediction uses used vs. actual copied bytes #927

8228609: G1 copy cost prediction uses used vs. actual copied bytes #927

linade commented Mar 21, 2022 •

edited by openjdk bot

bridgekeeper bot commented Mar 21, 2022

mlbridge bot commented Mar 21, 2022 •

edited

TheRealMDoerr commented Mar 21, 2022 •

edited

openjdk bot commented Mar 22, 2022

GoeLin commented Mar 24, 2022

linade commented Mar 28, 2022

GoeLin commented Apr 7, 2022

linade commented Apr 7, 2022

TheRealMDoerr left a comment

TheRealMDoerr Apr 7, 2022

linade Apr 8, 2022

linade commented Apr 8, 2022

TheRealMDoerr commented Apr 8, 2022

openjdk bot commented Apr 8, 2022

TheRealMDoerr left a comment •

edited

bridgekeeper bot commented May 6, 2022

rkennke commented Jun 1, 2022

rkennke Jun 1, 2022

linade Jul 8, 2022

bridgekeeper bot commented Jun 30, 2022

bridgekeeper bot commented Aug 5, 2022

		@@ -667,10 +668,9 @@ void G1Policy::record_collection_pause_end(double pause_time_ms, size_t cards_sc

		size_t freed_bytes = heap_used_bytes_before_gc - cur_used_bytes;

8228609: G1 copy cost prediction uses used vs. actual copied bytes #927

8228609: G1 copy cost prediction uses used vs. actual copied bytes #927

Conversation

linade commented Mar 21, 2022 • edited by openjdk bot

Progress

Issue

Reviewers

Reviewing

bridgekeeper bot commented Mar 21, 2022

mlbridge bot commented Mar 21, 2022 • edited

Webrevs

TheRealMDoerr commented Mar 21, 2022 • edited

openjdk bot commented Mar 22, 2022

GoeLin commented Mar 24, 2022

linade commented Mar 28, 2022

GoeLin commented Apr 7, 2022

linade commented Apr 7, 2022

TheRealMDoerr left a comment

Choose a reason for hiding this comment

TheRealMDoerr Apr 7, 2022

Choose a reason for hiding this comment

linade Apr 8, 2022

Choose a reason for hiding this comment

linade commented Apr 8, 2022

TheRealMDoerr commented Apr 8, 2022

openjdk bot commented Apr 8, 2022

TheRealMDoerr left a comment • edited

Choose a reason for hiding this comment

bridgekeeper bot commented May 6, 2022

rkennke commented Jun 1, 2022

rkennke Jun 1, 2022

Choose a reason for hiding this comment

linade Jul 8, 2022

Choose a reason for hiding this comment

bridgekeeper bot commented Jun 30, 2022

bridgekeeper bot commented Aug 5, 2022

linade commented Mar 21, 2022 •

edited by openjdk bot

mlbridge bot commented Mar 21, 2022 •

edited

TheRealMDoerr commented Mar 21, 2022 •

edited

TheRealMDoerr left a comment •

edited