Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8271880: Tighten condition for excluding regions from collecting cards with cross-references #5037

Conversation

tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Aug 6, 2021

Hi,

can I have reviews for this change that by tightening the condition for excluding regions from collecting cards with cross-references allows us to avoid the rescanning of objects that failed evacuation in the fix up self-forwards phase after evacuation failure.

I.e. during gc g1 never collects cards/references from the young gen (including eden) for later refinement - which means that we need to rescan all live objects remaining in eden regions for cross-references.

The problem or the reason why we did that was that we did not want to add cards to refine from survivor regions (i.e. next gc's young gen) because we just don't need to as we always collect young gen, so references from there need not be recorded in the remembered sets (and actually, if we did, we errorneouosly marked cards in young gen which later card table verification will not like) - but we did not have that information on hand anywhere already quickly accessible.

This change solves that problem by actually recording this information in the region attribute table as "NewSurvivor" type region. "NewSurvivor" because I did want to make explicit that these are the survivor regions from the new (next) young generation (i.e. just survivor) and not the survivor regions of the previous gc (that were turned eden at the start of this gc) but something like "NewYoung" or so would be fine with me as well (or certainly just "Survivor", but that might be confusing).

Another interesting addition is probably the new assert in G1ParThreadScanState::enqueue_card_if_tracked

     assert(!_g1h->heap_region_containing(o)->in_collection_set(), "Should not try to enqueue reference into collection set region");

This, at this time, verifies the assumption that g1 is not trying to collect references to the collection set, i.e. other objects that failed evacuation - after all we later relabel their regions as old without a remembered set; we would do otherwise unnecessarily because the reason is that (currently) cset tracking for these regions is enabled (at least during gc - we only later relabel and drop the remembered sets).

This might change later if we want to move evacuation failed regions into survivor (or keep their remembered sets for some reason), but for now we filter attempts to add cards in the dcqs for those this way.

Testing: tier1-5, gc/g1 with JAVA_OPTIONS_=-XX+G1EvacuationFailureALot -XX:+VerifyAfterGC.

Thanks,
Thomas


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8271880: Tighten condition for excluding regions from collecting cards with cross-references

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5037/head:pull/5037
$ git checkout pull/5037

Update a local copy of the PR:
$ git checkout pull/5037
$ git pull https://git.openjdk.java.net/jdk pull/5037/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5037

View PR using the GUI difftool:
$ git pr show -t 5037

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5037.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 6, 2021

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 6, 2021

@tschatzl The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Aug 6, 2021
@tschatzl tschatzl marked this pull request as ready for review August 9, 2021 10:22
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 9, 2021
@mlbridge
Copy link

mlbridge bot commented Aug 9, 2021

Comment on lines 119 to 121
G1SkipCardEnqueueSetter(G1ScanEvacuatedObjClosure* closure, bool skip_enqueue_cards) : _closure(closure) {
assert(_closure->_skip_card_enqueue == G1ScanEvacuatedObjClosure::Uninitialized, "Must not be set");
_closure->_skip_card_enqueue = skip_enqueue_cards ? G1ScanEvacuatedObjClosure::True : G1ScanEvacuatedObjClosure::False;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason for class G1SkipCardEnqueueSetter, field _skip_card_enqueue and method arg skip_enqueue_cards to not follow the same pattern; both alternatives are fine, "skip enqueue cards" or "skip card enqueue".

@@ -112,17 +112,17 @@ class G1ScanEvacuatedObjClosure : public G1ScanClosureBase {
};

// RAII object to properly set the _scanning_in_young field in G1ScanEvacuatedObjClosure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obsolete comments; can be dropped completely, IMO.


enum ScanningInYoungValues {
False = 0,
True,
Uninitialized
};

ScanningInYoungValues _scanning_in_young;
ScanningInYoungValues _skip_card_enqueue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum name can be tristate, right?

Comment on lines 247 to 248
assert(!dest_attr.is_in_cset(), "must not scan object from cset here");
G1SkipCardEnqueueSetter x(&_scanner, dest_attr.is_new_survivor() || dest_attr.is_in_cset());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just assert !dest_attr.is_in_cset(), why still || dest_attr.is_in_cset()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(!dest_attr.is_in_cset(), "must not scan object from cset here"); is because this should not happen (yet) as we do not split large objarrays that failed evacuation at this time. I'll remove that one, because something like "not implemented" or "should not happen" isn't great either.

dest_attr.is_new_survivor() || dest_attr.is_in_cset() is incorrect, just dest_attr.is_new_survivor() is correct (then again, it could not have happened yet).

Comment on lines 280 to 283
// FIXME: check if we could just use dest_attr
dest_attr = _g1h->region_attr(to_array);
assert(!_g1h->region_attr(to_array).is_in_cset(), "must not scan object from cset here");
G1SkipCardEnqueueSetter x(&_scanner, dest_attr.is_new_survivor() || dest_attr.is_in_cset());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still WIP?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but has the same bug as the other similar place.

@tschatzl
Copy link
Contributor Author

Fyi, there is a bug (or at least a crash in another tier1-5 run) in the current code that I'm currently tracking down, so please hold off further reviews for now.

Thanks,
Thomas

@tschatzl
Copy link
Contributor Author

Here is a description of the problem:

G1 crashes because of a missing remembered set entry in the young generation for j.l.ref.References' discovered field.

The reason is that

  • reference processing uses regular write barriers for the discovered fields, enqueuing into the global barrier set DCQS
  • since young gen evacuation failed regions still have the "young" card entry set all over the card is not added to the DCQS
  • even if the card would be enqueued, we would lose the mark because G1 later clears the card table, and redirties using the gc local dcqs (which does not contain that card because it has been added to the "wrong" dcqs) -> missing remembered set entry -> crash (or verification error)
  • this can also happen in old regions with the change introduced in JDK-8270842/PR#4853, but is likely much more rare.

Previously there has been no issue because the fixup self-forwards phase re-enqueued these cards into the "correct" DCQS.

@openjdk
Copy link

openjdk bot commented Aug 25, 2021

@tschatzl This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8271880: Tighten condition for excluding regions from collecting cards with cross-references

Reviewed-by: ayang, sjohanss

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 35 new commits pushed to the master branch:

  • d91e227: 8238274: (sctp) JDK-7118373 is not fixed for SctpChannel
  • 971aa35: 8274083: Update testing docs to mention tiered testing
  • 1d44014: 8273034: Make javadoc navigation collapsible on small displays
  • bb74ae8: 8274171: java/nio/file/Files/probeContentType/Basic.java failed on "Content type" mismatches
  • 56b8b35: 8273261: Replace 'while' cycles with iterator with enhanced-for in java.base
  • 0aa63fe: 8274216: ProblemList 2 serviceability/dcmd/gc tests with ZGC on linux-all and windows-all
  • 5ffbe75: 8274195: Doc cleanup in java.nio.file
  • 1fdc656: 8274175: (fc) java/nio/channels/FileChannel/Transfer2GPlus.java still failed in timeout
  • 3b1b8fc: 8269850: Most JDK releases report macOS version 12 as 10.16 instead of 12.0
  • 1b7f4b7: 8274169: HotSpot Style Guide has stale link to chromium style guide
  • ... and 25 more: https://git.openjdk.java.net/jdk/compare/aefd4ac4a336f00c067bcb91b95472ccc9a6bf83...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 25, 2021
@tschatzl
Copy link
Contributor Author

... and using the normal write barrier is good at that time for objects evacuated somewhere else: either this had been into survivor, where we do not care about card marks (and the "young" card filter works as expected), or they are clean (previously unallocated areas in the old gen region).

Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks fine. Some confusion around assertions and comments.

Re

  // ...
  assert(!dest_attr.is_young() || _g1h->heap_region_containing(to_array)->is_survivor(), "must be");
  G1SkipCardEnqueueSetter x(&_scanner, dest_attr.is_young());

Even after reading the preceding comments, I can't follow the assertion logic, and appreciate why it's related to the constructor. I wonder if the following revised assertion/comment is correct and clearer.

  // Skip the card enqueue iff the objective (to_array) is in survivor region. However, HeapRegion::is_survivor() is too expensive here.
  // Instead, we use dest_attr.is_young() because the two values are always equal: successfully allocated young regions must be survivor regions.
  assert(dest_attr.is_young() == _g1h->heap_region_containing(to_array)->is_survivor(), "must be");
  G1SkipCardEnqueueSetter x(&_scanner, dest_attr.is_young());

@openjdk
Copy link

openjdk bot commented Aug 30, 2021

@tschatzl this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout submit/evac-failure-no-scan-during-remove-self-forwards
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated labels Aug 30, 2021
@openjdk openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Aug 31, 2021
@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Sep 7, 2021
@tschatzl
Copy link
Contributor Author

tschatzl commented Sep 7, 2021

The recent change fixes the remaining issue mentioned earlier by introducing a GC specific closure that makes the ReferenceProcessor::enqueue method configurable by GC (type).

This fixes the issue (note that the change introduced in JDK-8270842 introduced the same issue, just much less common I believe).

Passes tier1-5, a closed test that failed with around 3% failures with these changes now passes always.

From a performance POV I tested and analyzed reference processing performance on G1 so far with no particular regressions at both our test suite as well as some detailed look at discovery (i.e. the time spent per discovery/enqueue seems to be the same as before). Currently looking into Parallel GC differences.

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not getting to this one until now, but took a first quick look at this.

Apart from the small comment below, I think we might want to split this into two separate PRs. One to add the new reference processing abstraction with the general use and then do the specific G1 changes as another PR. What do you think about that?

Comment on lines 111 to 116
// References to the current collection set are references to objects that failed
// evacuation. Currently these regions are always relabelled as old without
// remembered sets, so skip them.
if (!dest_attr.is_in_cset()) {
enqueue_card_if_tracked(dest_attr, p, obj);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, the only time dest_attr.is_in_cset() == true is when obj is in a region that failed evacuation, right? To make this even more obvious could we add an else-statement with an assert that this is the case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately not possible due to races (I tried :)) - one thread may set the flag, another one in the meanwhile already fail again an object in the region, and the evacuation-failed flag not yet visible to this thread.

@tschatzl
Copy link
Contributor Author

Sorry for not getting to this one until now, but took a first quick look at this.

Apart from the small comment below, I think we might want to split this into two separate PRs. One to add the new reference processing abstraction with the general use and then do the specific G1 changes as another PR. What do you think about that?

I will first revert the faulty JDK-8270842, then add the API, then add the G1 specific code if you prefer.

@kstefanj
Copy link
Contributor

I will first revert the faulty JDK-8270842, then add the API, then add the G1 specific code if you prefer.

Sounds good to me.

@tschatzl
Copy link
Contributor Author

Already pushed PR for backing out the old change (PR#5600) - thanks!

PRs for the preparatory changes are out: PR#5607 - some refactoring of G1ParScanThreadState, PR#5605 - some renaming of a class done here, PR#5603 - factoring out the enqueue call.

Thanks.

@tschatzl tschatzl force-pushed the submit/evac-failure-no-scan-during-remove-self-forwards branch from c708d5c to cefe7fd Compare September 22, 2021 12:40
@tschatzl
Copy link
Contributor Author

Please hold off reviewing, it looks I messed something up in the merge.

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Still I think maybe we should just go with just calling it Survivor instead of NewSurvivor. I see how it could lead to some confusion, but the comments could be expanded a bit to explain that these are referring to the newly allocated survivors.

But I'm good with the change as is, so if you prefer this, let's go with this.

src/hotspot/share/gc/g1/g1ParScanThreadState.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1ParScanThreadState.cpp Outdated Show resolved Hide resolved
tschatzl and others added 2 commits September 23, 2021 12:34
@tschatzl
Copy link
Contributor Author

Not sure about NewSurvivor and Survivor as well. Unless somebody has a strong opinion about either I will push this as is later - you are right that I have been worried about confusion. The enum value can be renamed later too.

@tschatzl
Copy link
Contributor Author

tschatzl commented Sep 24, 2021

@albertnetymk made me aware that we can check whether the is_in_cset() is only true for evac-failed regions by using the fact that for those obj their forwardee is equal. Tier1-5 almost passed with this change now. Will integrate later unless something comes up.

@tschatzl
Copy link
Contributor Author

Thanks @albertnetymk @kstefanj for your reviews
/integrate

@openjdk
Copy link

openjdk bot commented Sep 24, 2021

Going to push as commit 5a12af7.
Since your change was applied there have been 36 commits pushed to the master branch:

  • db23ecd: 8274191: Improve g1 evacuation failure injector performance
  • d91e227: 8238274: (sctp) JDK-7118373 is not fixed for SctpChannel
  • 971aa35: 8274083: Update testing docs to mention tiered testing
  • 1d44014: 8273034: Make javadoc navigation collapsible on small displays
  • bb74ae8: 8274171: java/nio/file/Files/probeContentType/Basic.java failed on "Content type" mismatches
  • 56b8b35: 8273261: Replace 'while' cycles with iterator with enhanced-for in java.base
  • 0aa63fe: 8274216: ProblemList 2 serviceability/dcmd/gc tests with ZGC on linux-all and windows-all
  • 5ffbe75: 8274195: Doc cleanup in java.nio.file
  • 1fdc656: 8274175: (fc) java/nio/channels/FileChannel/Transfer2GPlus.java still failed in timeout
  • 3b1b8fc: 8269850: Most JDK releases report macOS version 12 as 10.16 instead of 12.0
  • ... and 26 more: https://git.openjdk.java.net/jdk/compare/aefd4ac4a336f00c067bcb91b95472ccc9a6bf83...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Sep 24, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 24, 2021
@openjdk
Copy link

openjdk bot commented Sep 24, 2021

@tschatzl Pushed as commit 5a12af7.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tschatzl tschatzl deleted the submit/evac-failure-no-scan-during-remove-self-forwards branch September 24, 2021 12:09
@tschatzl tschatzl restored the submit/evac-failure-no-scan-during-remove-self-forwards branch September 28, 2021 11:27
@tschatzl tschatzl deleted the submit/evac-failure-no-scan-during-remove-self-forwards branch October 7, 2021 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
3 participants