Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8254164: G1 only removes self forwarding pointers for last collection set increment #556

Conversation

tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Oct 8, 2020

Hi all,

can I have reviews for this change that fixes a previously hard to reproduce crash that started showing up much more frequently in conjunction with changes to young gen sizing (JDK-8244603)?

The issues is that the code to process regions where evacuation failed only processes the last increment. This leaves forwarded pointers in the mark word of some objects. Obviously other code does not like that, e.g. the crashes in JDK-8248438 which I plan to close as duplicate.

Only checking the last collection set increment for regions that failed evacuation is wrong in case there is an evacuation failure caused by reference processing in a region that has been evacuated in earlier evacuation increments. Reference processing (e.g. finalizers) can make it necessary to resurrect an otherwise unreachable object at the very end of the collection that can't be copied and is located in a region evacuated in an earlier increment.

This optimization to only look at the last increment for removal of self forwarding pointers has been introduced in JDK-8218668.

Until changes to young gen sizing in JDK-8244603 this crashes has been a very rare occurrence, but with that it has been common in some tier8 tests (KitchenSink8/24h, DaCapo24h) particularly with some additional targeted verification (enable verification only at the end of mixed gcs with optional evacuation). Without this fix both tests fail within 10 minutes to 2 hours. With the patch everything completes fine.

Testing:

  • tier1-5
  • KitchenSink/Dacapo24h with verification code and JDK-8244603.

Note: since this change strictly does more work during evacuation failure handling I consider this amount of testing sufficient, i.e. with JDK-8244603. It is very hard to reproduce without JDK-8244603.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x64 Windows x64 macOS x64
Build / test ✔️ (0/0 passed) ✔️ (0/0 passed) ✔️ (0/0 passed)

Issue

  • JDK-8254164: G1 only removes self forwarding pointers for last collection set increment

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/556/head:pull/556
$ git checkout pull/556

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 8, 2020

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 8, 2020

@tschatzl The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Oct 8, 2020
@tschatzl tschatzl marked this pull request as ready for review October 8, 2020 13:19
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 8, 2020
@mlbridge
Copy link

mlbridge bot commented Oct 8, 2020

Webrevs

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work tracking this one down Thomas. The fix looks good, just a small thing about the comment.

Comment on lines 262 to 264
// We need to check all regions whether they need self forward removals, not only
// the last collection set increment. Reference processing may copy over and fail
// evacuation in any region in the collection set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your explanation in the summary is very clear and it would be nice to capture it in the comment as well. Maybe something like:

Suggested change
// We need to check all regions whether they need self forward removals, not only
// the last collection set increment. Reference processing may copy over and fail
// evacuation in any region in the collection set.
// We need to check all regions whether they need self forward removals, not only
// the last collection set increment. The reason is that reference processing (e.g.
// finalizers) can make it necessary to resurrect an otherwise unreachable object at
// the very end of the collection. The resurrected object might be located in a region
// evacuated in an earlier increment, but copying it at the end of the collection can
// trigger an evacuation failure.

@openjdk
Copy link

openjdk bot commented Oct 8, 2020

@tschatzl This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8254164: G1 only removes self forwarding pointers for last collection set increment

Reviewed-by: sjohanss, kbarrett

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 32 new commits pushed to the master branch:

  • be26972: 8253379: [windows] Several jpackage tests failed with error code 1638
  • 52e45a3: 8229186: Improve error messages for TestStringIntrinsics failures
  • 6d2c1a6: 8254292: Update JMH devkit to 1.26
  • 2bbf8a2: 8245543: Cgroups: Incorrect detection logic on some systems (still reproducible)
  • aaa0a2a: 8254297: Zero and Minimal VMs are broken with undeclared identifier 'DerivedPointerTable' after JDK-8253180
  • 7e80c98: 8254261: fix javadocs in jdk.test.lib.Utils
  • d4b5dfd: 8253857: Shenandoah: Bugs in ShenandoahEvacOOMHandler related code
  • e9c1905: 8253740: [PPC64] Minor interpreter cleanup
  • b1448da: 8253900: SA: wrong size computation when JVM was built without AOT
  • 2bc8bc5: 8254265: s390 and linux 32 bit builds broken
  • ... and 22 more: https://git.openjdk.java.net/jdk/compare/894ec76c11dd627383c0c345ef22e3f7745e3292...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 8, 2020
@mlbridge
Copy link

mlbridge bot commented Oct 9, 2020

Mailing list message from Thomas Schatzl on hotspot-gc-dev:

Hi StefanJ,

On 08.10.20 21:15, Stefan Johansson wrote:

On Thu, 8 Oct 2020 08:05:57 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

Hi all,

can I have reviews for this change that fixes a previously hard to reproduce crash that started showing up much more
frequently in conjunction with changes to young gen sizing (JDK-8244603)?

[...]

Nice work tracking this one down Thomas. The fix looks good, just a small thing about the comment.

src/hotspot/share/gc/g1/g1EvacFailure.cpp line 264:

262: // We need to check all regions whether they need self forward removals, not only
263: // the last collection set increment. Reference processing may copy over and fail
264: // evacuation in any region in the collection set.

I think your explanation in the summary is very clear and it would be nice to capture it in the comment as well. Maybe
something like: Suggestion:

// We need to check all regions whether they need self forward removals, not only
// the last collection set increment. The reason is that reference processing (e.g.
// finalizers) can make it necessary to resurrect an otherwise unreachable object at
// the very end of the collection. The resurrected object might be located in a region
// evacuated in an earlier increment, but copying it at the end of the collection can
// trigger an evacuation failure.

thanks for your review. I updated the comment with a slightly
different version of what you wrote above.

Thanks,
Thomas

@kstefanj
Copy link
Contributor

kstefanj commented Oct 9, 2020

This is great, thanks Thomas.

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Well, as good as it can, given finalizers are still with us.

@tschatzl
Copy link
Contributor Author

:) Thanks @kimbarrett @kstefanj for your reviews.

/integrate

@openjdk openjdk bot closed this Oct 12, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 12, 2020
@openjdk
Copy link

openjdk bot commented Oct 12, 2020

@tschatzl Since your change was applied there have been 46 commits pushed to the master branch:

  • bf46acf: 8254028: G1 incorrectly updates scan_top for collection set regions during preparation of evacuation
  • a2bb4c6: 8254314: Shenandoah: null checks in c2 should not skip over native load barrier
  • c73a0ff: 8252105: Parallel heap inspection for ZCollectedHeap
  • 45b09a3: 8253833: mutexLocker assert_locked_or_safepoint should not access VMThread state from non-VM-thread
  • 77c7762: 8254353: Remove unused non-product flags
  • d3069ac: 8254362: x86_32 builds fail after JDK-8253180
  • 25001c5: 8254352: 3 compiler tests failed with "assert(allocates2(pc)) failed: not in CodeBuffer memory"
  • d43f141: 8254351: Minimal VM build fails with undeclared identifier 'MaxVectorSize' after JDK-8252847
  • cc52358: 8254335: logging/logStream.hpp includes memory/resourceArea.hpp but doesn't need it
  • 4b5ac3a: 8252847: Optimize primitive arrayCopy stubs using AVX-512 masked instructions
  • ... and 36 more: https://git.openjdk.java.net/jdk/compare/894ec76c11dd627383c0c345ef22e3f7745e3292...master

Your commit was automatically rebased without conflicts.

Pushed as commit 59378a1.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tschatzl tschatzl deleted the 8254164-incomplete-self-forward-removal branch October 12, 2020 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
3 participants