-
Notifications
You must be signed in to change notification settings - Fork 77
8257999: Parallel GC crash in gc/parallel/TestDynShrinkHeap.java: new region is not in covered_region #35
Conversation
👋 Welcome back kbarrett! A progress list of the required criteria for merging this PR into |
@kimbarrett The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a comment about a comment that you can address if you agree.
@@ -381,6 +382,7 @@ void PSOldGen::post_resize() { | |||
&ParallelScavengeHeap::heap()->workers() : NULL; | |||
|
|||
// ALWAYS do this last!! | |||
OrderAccess::storestore(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe update the comment to use less caps and '!'. Instead tie back to the function comment explaining that the barrier is needed to guarantee the order in which the data structures get visible to other threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Here's the revised comment:
- // ALWAYS do this last!!
+ // Ensure the space bounds are updated are made visible to other
+ // threads after the other data structures have been resized.
OrderAccess::storestore();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second "are" should be an "and", right? Otherwise looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drat. Will fix.
@kimbarrett This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. Please adjust the comment a little as Stefan suggested :)
/integrate |
@kimbarrett Pushed as commit 61390d8. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Please review this change to ParallelGC oldgen allocation, adding a missing
memory barrier.
The problem arises in the interaction between concurrent oldgen allocations,
where each would, if done serially (in either order), require expansion of
the generation.
An allocation of size N compares the mutable space's (end - top) with N to
determine if space is available. If available, use top as the start of the
object of size N (adjusting top atomically) and assert the resulting memory
region is in the covered area. If not, then expand.
Expansion updates the covered region, then updates the space (i.e. end).
There is currently no memory barrier between those operations.
As a result, we can have thread1 having done an expansion, updating the
covered region and the space end. Because there's no memory barrier there,
the space end may be updated before the covered region as far as some other
thread is concerned.
Meanwhile thread2's allocation reads the new end and goes ahead with the
allocation (which would not have fit with the old end value), then fails the
covered region check because it used the old covered range. Although the
reads of end and the covered range are ordered here by the intervening CAS
of top, that doesn't help if the writes by thread1 are not also properly
ordered.
There is even a comment about this in PSOldGen::post_resize(), saying the
space update must be last (including after the covered region update). But
without a memory barrier, there's nothing other than source order to ensure
that ordering. So add a memory barrier.
I'm not sure whether this out-of-order update of the space end could lead to
problems in a product build (where the assert doesn't apply). Without
looking carefully, there appear to be opportunities for problems, such as
accessing uncovered parts of the card table.
There's another issue that I'm not addressing with this change. Various
values are being read while subject to concurrent writes, without being in
any way tagged as atomic. (The writes are under the ExpandHeap_lock, the
reads are not.) This includes at least the covering region bounds and space
end.
Testing:
mach5 tier1
I was unable to reproduce the failure, so can't show any before / after
improvement.
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk16 pull/35/head:pull/35
$ git checkout pull/35