Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8228609: G1 copy cost prediction uses used vs. actual copied bytes #927

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

linade
Copy link

@linade linade commented Mar 21, 2022

I would like to backport 8228609 which fixes a prediction error regarding cost per bytes copied.

The patch does not apply cleanly, so this backport is basically changing the calculation of copied bytes.

https://bugs.openjdk.java.net/browse/JDK-8227442 is not in 11u. So this patch also uses G1ParScanThreadState::surviving_young_words() in place of _surviving_young_words to adjust for the correct cset index.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8228609: G1 copy cost prediction uses used vs. actual copied bytes

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk11u-dev pull/927/head:pull/927
$ git checkout pull/927

Update a local copy of the PR:
$ git checkout pull/927
$ git pull https://git.openjdk.java.net/jdk11u-dev pull/927/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 927

View PR using the GUI difftool:
$ git pr show -t 927

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk11u-dev/pull/927.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Mar 21, 2022

👋 Welcome back linade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Mar 21, 2022
@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 21, 2022

Webrevs

@TheRealMDoerr
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr commented Mar 21, 2022

This PR is not recognized as backport. Please fix that (e.g. by changing the title to "Backport hash of original commit", https://wiki.openjdk.java.net/display/SKARA/Backports#Backports-BackportPullRequests).

@linade linade changed the title 8228609: G1 copy cost prediction uses used vs. actual copied bytes Backport 9a4c25731eef42d18ba0c20ff3eb42f32ae90c00 Mar 22, 2022
@openjdk openjdk bot changed the title Backport 9a4c25731eef42d18ba0c20ff3eb42f32ae90c00 8228609: G1 copy cost prediction uses used vs. actual copied bytes Mar 22, 2022
@openjdk
Copy link

@openjdk openjdk bot commented Mar 22, 2022

This backport pull request has now been updated with issue and summary from the original commit.

@GoeLin
Copy link
Member

@GoeLin GoeLin commented Mar 24, 2022

@linade
Two more things, please:

  • Enable Pre-submit tests
  • Only flag the change with jdk11u-fix-request once it is ready. I'll remove the tag for now, pleas add it again if the review is available.

@linade
Copy link
Author

@linade linade commented Mar 28, 2022

@GoeLin I enabled the tests. Some of them seem to be blocked by a gcc dependency issue. Would you take a look?

@GoeLin
Copy link
Member

@GoeLin GoeLin commented Apr 7, 2022

Hi, you should only label a change jdk11u-fix-request once it is ready for integration. I'll remove the label, please add it again if you have a review.

@linade
Copy link
Author

@linade linade commented Apr 7, 2022

Gentle ping. Can anyone review this?

The reason I believe it's worth a backport is, apart from what's mentioned in https://bugs.openjdk.java.net/browse/JDK-8228609, it correctly accounts for early reclaimed humongous objects. I think originally the oversight of the size of early reclaimed humongous objects cause the calculation to be erroneous, resulting in a very high copy time prediction and unstable gc frequency.

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

This change is very different from the original one, but fixing it in 11u sounds feasible to me. We'll test it.
Do you have a specific test case which allows observing unstable gc frequency?

@@ -667,10 +668,9 @@ void G1Policy::record_collection_pause_end(double pause_time_ms, size_t cards_sc

size_t freed_bytes = heap_used_bytes_before_gc - cur_used_bytes;
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

freed_bytes is now unused.

Copy link
Author

@linade linade Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. Thanks.

@linade
Copy link
Author

@linade linade commented Apr 8, 2022

This change is very different from the original one, but fixing it in 11u sounds feasible to me. We'll test it. Do you have a specific test case which allows observing unstable gc frequency?

We observed this in an application. With additional logging, we can observe the miscalculation from, e.g., test/hotspot/jtreg/gc/g1/TestEagerReclaimHumongousRegions.java

The current way to get copied_bytes will result in negative value. I had

..
[0.708s][error][gc] GC(186) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.714s][error][gc] GC(188) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.720s][error][gc] GC(190) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.726s][error][gc] GC(192) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.732s][error][gc] GC(194) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.738s][error][gc] GC(196) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0
[0.744s][error][gc] GC(198) cset.used=14641416, freed_bytes=40855816, copied_bytes=-26214400, actual_copied_bytes=0

copied_bytes is negative because it is deduced from size_t copied_bytes = _collection_set->bytes_used_before() - freed_bytes;, and _collection_set->bytes_used_before() didn't consider eager-reclaimed humongous, so it appears to be much smaller than freed_bytes.

There currently is an if condition if (_collection_set->bytes_used_before() > freed_bytes) to prevent we from getting a negative value. So the current code simply doesn't report the data to the analytics.

Not reporting is fine. At worst we lose some data points. But it can also happen that we get a positive, near-zero copied_bytes. It will make the value double cost_per_byte_ms = average_copy_time / (double) copied_bytes; very big. This causes young generation to shrink.

@TheRealMDoerr
Copy link
Contributor

@TheRealMDoerr TheRealMDoerr commented Apr 8, 2022

/reviewers 2

@openjdk
Copy link

@openjdk openjdk bot commented Apr 8, 2022

@TheRealMDoerr
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with 1 of role reviewers, 1 of role authors).

Copy link
Contributor

@TheRealMDoerr TheRealMDoerr left a comment

Looks technically good and the fix makes sense to me. Our nightly tests haven't found any issues related to it. It extracts the essential fix from the original change which differs a lot: openjdk/jdk@9a4c257
An additional person with GC experience should review it.
Approvers will have to judge how relevant it is. A comment from the original author would be helpful.

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented May 6, 2022

@linade This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@rkennke
Copy link
Contributor

@rkennke rkennke commented Jun 1, 2022

I hink it makes sense. It is not exactly a backport, but rather a re-write, and I find it difficult to follow and convince myself that it's doing the correct thing. Therefore it seems somewhat risky.

uint length = _g1h->collection_set()->young_region_length();
for (uint region_index = 0; region_index < length; region_index++) {
surviving_young_words[region_index] += _surviving_young_words[region_index];
surviving_young_words[region_index] += G1ParScanThreadState::surviving_young_words()[region_index];
Copy link
Contributor

@rkennke rkennke Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was that code using the wrong index before? IOW, it used the surviving_young_words of the wrong regions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport rfr
4 participants