New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8277434: tests fail with "assert(is_forwarded()) failed: only decode when actually forwarded" #6582
Conversation
|
@kimbarrett The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
Thank you for the detailed analysis; this makes perfect sense.
Ooof, so a relaxed forwardee installation comes with an odd invariant like this. I linked JDK-8204524 to this issue, to track where the forwardee installation was relaxed.
The fix for this particular issue looks fine to me, thanks.
A few follow-up questions about the general issue here. How is it not affecting the other places where G1 accesses obj->forwardee()
, like in G1ScanClosureBase::prefetch_and_push
or G1ParScanThreadState::do_partial_array
? Also, it is not theoretically clear how would "All uses of a forwardee must take care to avoid reading from it without performing additional synchronization", given that the trouble had already happened on writer side, and there is little a reader can do to recover?
@kimbarrett This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the
|
Both cases examine the original object in the collection set only; for the first example there is an explicit check on
could be rephrased as |
Mailing list message from Kim Barrett on hotspot-gc-dev:
The key point in the G1ScanEvacuatedObjClosure::do_oop_work case is that the For do_partial_array, while the from_obj is in the collection set, its |
All right then, thanks for explanations. |
Thanks for reviews @albertnetymk , @shipilev , and @tschatzl . |
/integrate |
Going to push as commit 15a6806. |
@kimbarrett Pushed as commit 15a6806. |
Please review this change which fixes a rare assert failure in
G1ParScanThreadState::write_ref_field_post. The problem is that the assert is
referencing values from the forwardee, and that isn't safe. When an object is
copied and forwarded to the copy, the forwarding is performed with a relaxed
cmpxchg. This means the data being copied into the forwardee might not be
visible to a different thread that has obtained the forwardee. All uses of a
forwardee must take care to avoid reading from it without performing
additional synchronization. The assert in question violates that restriction.
The incorrect assertion is
assert(dest_attr.is_in_cset() == (obj->is_forwarded() && obj->forwardee() == obj), "...");
obj is the forwardee of a different object.
It's okay here to examine the contents of obj when it's in the cset. In
that case it's not a copy; it's the original object, self-forwarded due to
evacuation failure.
But it's not okay to examine the contents of obj when it's not in the cset,
e.g. when it's a forwardee copy of the original object. Doing so violates
the above restriction.
So that tells us this code is wrong, but it's still not obvious this is the
cause of the failure in question. The failure we're seeing appears to be
that obj->is_forwarded() returns true, but the assert of the same thing in
forwardee returns false. But is_forwarded() should never be true for a
copied forwardee. But it turns out a copied forwardee might temporarily
appear to be forwarded, due to the unordered copy + forward sequence.
Consider the following scenario:
A thread is evacuating an object and allocates a candidate forwardee, copies
into it, and finally discovers the forwarding race was lost even before the
copy. The candidate forwardee appears to be marked as forwarded. That
candidate is deallocated and the real forwardee is used by the thread.
That same thread evacuates a second object, reallocating some of the same
memory as from the previous candidate. It copies the second object into the
new forwardee, successfully performs the forwarding, and uses that newly
created forwardee.
A second thread attempts to evacuate that same second object, and finds it
already forwarded. It goes into write_ref_field_post, reaches the assert,
and attempts to use values from the forwardee. Because the copy and the
forwarding by the first thread were unordered, this second thread might see
the old copy that appears to be forwarded, rather than the up to date copy.
So the first call to is_forwarded() returns true. The forwardee() assert of
the same thing might instead get the up to date contents, resulting in the
reported failure. Alternatively, it might still get old data, in which case
nothing gets noticed because the self-forward check will fail and the
write_ref_field_post assert will (accidentally) succeed.
So there's a very small window in which the reported failure can occur. But
it's all caused by the write_ref_field_post assert performing invalid
accesses. Stop doing that and the problem should go away.
I think this failure couldn't have happened before JDK-8271579, but that's
kind of accidental and doesn't change that the write_ref_field_post assert
was performing invalid accesses. (Those accesses were introduced later.)
Testing:
mach5 tier1-5.
Ran applications/jcstress/acqrel.java 100 times without failure.
Without the change I had a 25-30% failure rate for that test.
I don't know why that test was such a good canary, when this failure seems to
otherwise be pretty rare, but glad that it was.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6582/head:pull/6582
$ git checkout pull/6582
Update a local copy of the PR:
$ git checkout pull/6582
$ git pull https://git.openjdk.java.net/jdk pull/6582/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6582
View PR using the GUI difftool:
$ git pr show -t 6582
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6582.diff