Skip to content
This repository has been archived by the owner on Sep 2, 2022. It is now read-only.
/ jdk16 Public archive

8257999: Parallel GC crash in gc/parallel/TestDynShrinkHeap.java: new region is not in covered_region #35

Closed
wants to merge 3 commits into from

Conversation

kimbarrett
Copy link

@kimbarrett kimbarrett commented Dec 16, 2020

Please review this change to ParallelGC oldgen allocation, adding a missing
memory barrier.

The problem arises in the interaction between concurrent oldgen allocations,
where each would, if done serially (in either order), require expansion of
the generation.

An allocation of size N compares the mutable space's (end - top) with N to
determine if space is available. If available, use top as the start of the
object of size N (adjusting top atomically) and assert the resulting memory
region is in the covered area. If not, then expand.

Expansion updates the covered region, then updates the space (i.e. end).
There is currently no memory barrier between those operations.

As a result, we can have thread1 having done an expansion, updating the
covered region and the space end. Because there's no memory barrier there,
the space end may be updated before the covered region as far as some other
thread is concerned.

Meanwhile thread2's allocation reads the new end and goes ahead with the
allocation (which would not have fit with the old end value), then fails the
covered region check because it used the old covered range. Although the
reads of end and the covered range are ordered here by the intervening CAS
of top, that doesn't help if the writes by thread1 are not also properly
ordered.

There is even a comment about this in PSOldGen::post_resize(), saying the
space update must be last (including after the covered region update). But
without a memory barrier, there's nothing other than source order to ensure
that ordering. So add a memory barrier.

I'm not sure whether this out-of-order update of the space end could lead to
problems in a product build (where the assert doesn't apply). Without
looking carefully, there appear to be opportunities for problems, such as
accessing uncovered parts of the card table.

There's another issue that I'm not addressing with this change. Various
values are being read while subject to concurrent writes, without being in
any way tagged as atomic. (The writes are under the ExpandHeap_lock, the
reads are not.) This includes at least the covering region bounds and space
end.

Testing:
mach5 tier1
I was unable to reproduce the failure, so can't show any before / after
improvement.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8257999: Parallel GC crash in gc/parallel/TestDynShrinkHeap.java: new region is not in covered_region

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk16 pull/35/head:pull/35
$ git checkout pull/35

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 16, 2020

👋 Welcome back kbarrett! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 16, 2020

@kimbarrett The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.java.net label Dec 16, 2020
@kimbarrett kimbarrett marked this pull request as ready for review December 16, 2020 14:08
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 16, 2020
@mlbridge
Copy link

mlbridge bot commented Dec 16, 2020

Webrevs

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a comment about a comment that you can address if you agree.

@@ -381,6 +382,7 @@ void PSOldGen::post_resize() {
&ParallelScavengeHeap::heap()->workers() : NULL;

// ALWAYS do this last!!
OrderAccess::storestore();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe update the comment to use less caps and '!'. Instead tie back to the function comment explaining that the barrier is needed to guarantee the order in which the data structures get visible to other threads.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Here's the revised comment:

-  // ALWAYS do this last!!
+  // Ensure the space bounds are updated are made visible to other
+  // threads after the other data structures have been resized.
   OrderAccess::storestore();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second "are" should be an "and", right? Otherwise looks great!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drat. Will fix.

@openjdk
Copy link

openjdk bot commented Dec 17, 2020

@kimbarrett This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8257999: Parallel GC crash in gc/parallel/TestDynShrinkHeap.java: new region is not in covered_region

Reviewed-by: sjohanss, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 17, 2020
Copy link

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Please adjust the comment a little as Stefan suggested :)

@kimbarrett
Copy link
Author

Thanks @kstefanj and @tschatzl for reviewing.

@kimbarrett
Copy link
Author

/integrate

@openjdk openjdk bot closed this Dec 17, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 17, 2020
@openjdk
Copy link

openjdk bot commented Dec 17, 2020

@kimbarrett Pushed as commit 61390d8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@kimbarrett kimbarrett deleted the shrink_heap_crash branch January 1, 2021 10:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-gc hotspot-gc-dev@openjdk.java.net integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants