Skip to content
This repository has been archived by the owner. It is now read-only.

8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region" #127

Closed
wants to merge 1 commit into from

Conversation

kimbarrett
Copy link

@kimbarrett kimbarrett commented Jan 21, 2021

Please review this fix for an intermittent crash when using ParallelGC on
aarch64. The problem is a mis-ordered pair of reads that permit an
algorithmic invariant to be violated. The mis-ordering is due to the lack
any explicit ordering request (a missing barrier).

In MutableSpace::cas_allocate, we had

HeapWord* obj = top();
if (pointer_delta(end(), obj) >= size) {
  ... space available, attempt the CAS to claim it ...
}

If end is read before top, other threads may advance top and end between
those reads. If, when top is read, current top > old end and current top +
size > current end, the range check will unexpectedly pass because of
underflow in pointer_delta. This will allow top to be advanced to a value
which is > current end, violating the algorithmic invariant, and likely
leading to crashes or memory corruption.

gcc for x86 doesn't reorder the reads, but for aarch64 it does, and is
permitted to do so. Even if it didn't, there is nothing to prevent the
hardware from doing so. The solution is to use a load_acquire for top, to
ensure the needed order.

This may have been the actual root cause of JDK-8257999. However, the
change made there was and still is needed for the reasons described.

Testing:
mach5 tier1-3

Even with knowledge of the failure mode it's very hard to reproduce. I was
unable to catch the underflow case in over 1K attempts using machines in our
test farm, though StefanK caught it a few times on a personal machine.

/summary Use load_acquire to order reads of top and end.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region"

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk16 pull/127/head:pull/127
$ git checkout pull/127

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 21, 2021

👋 Welcome back kbarrett! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 21, 2021
@openjdk
Copy link

openjdk bot commented Jan 21, 2021

@kimbarrett Setting summary to Use load_acquire to order reads of top and end.

@openjdk
Copy link

openjdk bot commented Jan 21, 2021

@kimbarrett The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.java.net label Jan 21, 2021
@mlbridge
Copy link

mlbridge bot commented Jan 21, 2021

Webrevs

Copy link

@tschatzl tschatzl left a comment

Lgtm. Thanks.

@openjdk
Copy link

openjdk bot commented Jan 21, 2021

@kimbarrett This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8259271: gc/parallel/TestDynShrinkHeap.java still fails "assert(covered_region.contains(new_memregion)) failed: new region is not in covered_region"

Use load_acquire to order reads of top and end.

Reviewed-by: tschatzl, iwalulya, eosterlund

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 21, 2021
Copy link
Member

@walulyai walulyai left a comment

looks good

fisk
fisk approved these changes Jan 21, 2021
Copy link
Contributor

@fisk fisk left a comment

Looks good.

@kimbarrett
Copy link
Author

kimbarrett commented Jan 21, 2021

Thanks @tschatzl , @walulyai , and @fisk for reviews.

@kimbarrett
Copy link
Author

kimbarrett commented Jan 22, 2021

/integrate

@openjdk openjdk bot closed this Jan 22, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 22, 2021
@openjdk
Copy link

openjdk bot commented Jan 22, 2021

@kimbarrett Since your change was applied there have been 4 commits pushed to the master branch:

  • d90e06a: 8259775: [Vector API] Incorrect code-gen for VectorReinterpret operation
  • ede1bea: 8227695: assert(pss->trim_ticks().seconds() == 0.0) failed: Unexpected partial trimming during evacuation
  • 62eab50: 8255199: Catching a few NumberFormatExceptions in xmldsig
  • a5367cb: 8247619: Improve Direct Buffering of Characters

Your commit was automatically rebased without conflicts.

Pushed as commit 685c03d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hotspot-gc hotspot-gc-dev@openjdk.java.net integrated Pull request has been integrated
4 participants