Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8228609: G1 copy cost prediction uses used vs. actual copied bytes #927

Closed
wants to merge 3 commits into from

Conversation

linade
Copy link

@linade linade commented Mar 21, 2022

I would like to backport 8228609 which fixes a prediction error regarding cost per bytes copied.

The patch does not apply cleanly, so this backport is basically changing the calculation of copied bytes.

https://bugs.openjdk.java.net/browse/JDK-8227442 is not in 11u. So this patch also uses G1ParScanThreadState::surviving_young_words() in place of _surviving_young_words to adjust for the correct cset index.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8228609: G1 copy cost prediction uses used vs. actual copied bytes

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk11u-dev pull/927/head:pull/927
$ git checkout pull/927

Update a local copy of the PR:
$ git checkout pull/927
$ git pull https://git.openjdk.java.net/jdk11u-dev pull/927/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 927

View PR using the GUI difftool:
$ git pr show -t 927

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk11u-dev/pull/927.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 21, 2022

👋 Welcome back linade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 21, 2022
@mlbridge
Copy link

mlbridge bot commented Mar 21, 2022

Webrevs

@TheRealMDoerr
Copy link
Contributor

TheRealMDoerr commented Mar 21, 2022

This PR is not recognized as backport. Please fix that (e.g. by changing the title to "Backport hash of original commit", https://wiki.openjdk.java.net/display/SKARA/Backports#Backports-BackportPullRequests).

@linade linade changed the title 8228609: G1 copy cost prediction uses used vs. actual copied bytes Backport 9a4c25731eef42d18ba0c20ff3eb42f32ae90c00 Mar 22, 2022
@openjdk openjdk bot changed the title Backport 9a4c25731eef42d18ba0c20ff3eb42f32ae90c00 8228609: G1 copy cost prediction uses used vs. actual copied bytes Mar 22, 2022
@openjdk
Copy link

openjdk bot commented Mar 22, 2022

This backport pull request has now been updated with issue and summary from the original commit.

@GoeLin
Copy link
Member

GoeLin commented Mar 24, 2022

@linade
Two more things, please:

  • Enable Pre-submit tests
  • Only flag the change with jdk11u-fix-request once it is ready. I'll remove the tag for now, pleas add it again if the review is available.

@linade
Copy link
Author

linade commented Mar 28, 2022

@GoeLin I enabled the tests. Some of them seem to be blocked by a gcc dependency issue. Would you take a look?

@GoeLin
Copy link
Member

GoeLin commented Apr 7, 2022

Hi, you should only label a change jdk11u-fix-request once it is ready for integration. I'll remove the label, please add it again if you have a review.

@linade
Copy link
Author

linade commented Apr 7, 2022

Gentle ping. Can anyone review this?

The reason I believe it's worth a backport is, apart from what's mentioned in https://bugs.openjdk.java.net/browse/JDK-8228609, it correctly accounts for early reclaimed humongous objects. I think originally the oversight of the size of early reclaimed humongous objects cause the calculation to be erroneous, resulting in a very high copy time prediction and unstable gc frequency.

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is very different from the original one, but fixing it in 11u sounds feasible to me. We'll test it.
Do you have a specific test case which allows observing unstable gc frequency?

@@ -667,10 +668,9 @@ void G1Policy::record_collection_pause_end(double pause_time_ms, size_t cards_sc

size_t freed_bytes = heap_used_bytes_before_gc - cur_used_bytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freed_bytes is now unused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Thanks.

@linade
Copy link
Author

linade commented Apr 8, 2022

This change is very different from the original one, but fixing it in 11u sounds feasible to me. We'll test it. Do you have a specific test case which allows observing unstable gc frequency?

We observed this in an application. With additional logging, we can observe the miscalculation from, e.g., test/hotspot/jtreg/gc/g1/TestEagerReclaimHumongousRegions.java

The current way to get copied_bytes will result in negative value. I had

..
[0.708s][error][gc] GC(186) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.714s][error][gc] GC(188) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.720s][error][gc] GC(190) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.726s][error][gc] GC(192) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.732s][error][gc] GC(194) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.738s][error][gc] GC(196) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.744s][error][gc] GC(198) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0

copied_bytes is negative because it is deduced from size_t copied_bytes = _collection_set->bytes_used_before() - freed_bytes;, and _collection_set->bytes_used_before() didn't consider eager-reclaimed humongous, so it appears to be much smaller than freed_bytes.

There currently is an if condition if (_collection_set->bytes_used_before() > freed_bytes) to prevent we from getting a negative value. So the current code simply doesn't report the data to the analytics.

Not reporting is fine. At worst we lose some data points. But it can also happen that we get a positive, near-zero copied_bytes. It will make the value double cost_per_byte_ms = average_copy_time / (double) copied_bytes; very big. This causes young generation to shrink.

@TheRealMDoerr
Copy link
Contributor

/reviewers 2

@openjdk
Copy link

openjdk bot commented Apr 8, 2022

@TheRealMDoerr
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with 1 of role reviewers, 1 of role authors).

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks technically good and the fix makes sense to me. Our nightly tests haven't found any issues related to it. It extracts the essential fix from the original change which differs a lot: openjdk/jdk@9a4c257
An additional person with GC experience should review it.
Approvers will have to judge how relevant it is. A comment from the original author would be helpful.

@bridgekeeper
Copy link

bridgekeeper bot commented May 6, 2022

@linade This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@rkennke
Copy link
Contributor

rkennke commented Jun 1, 2022

I hink it makes sense. It is not exactly a backport, but rather a re-write, and I find it difficult to follow and convince myself that it's doing the correct thing. Therefore it seems somewhat risky.

uint length = _g1h->collection_set()->young_region_length();
for (uint region_index = 0; region_index < length; region_index++) {
surviving_young_words[region_index] += _surviving_young_words[region_index];
surviving_young_words[region_index] += G1ParScanThreadState::surviving_young_words()[region_index];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was that code using the wrong index before? IOW, it used the surviving_young_words of the wrong regions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I think.

So this pr is achieving two things (sorry if it were not clear):

  1. fix the index issue: this causes calculation of the total surviving young words to be incorrect. The correct stats are stored in _surviving_young_words[1, .., n] instead of [0, .., n-1]
  2. use a precise "copied bytes" instead of inferring from (used() - freed())

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 30, 2022

@linade This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 5, 2022

@linade This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport rfr Pull request is ready for review
4 participants