Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8278917: Use Prev Bitmap for recording evac failed objects #6867

Closed
wants to merge 7 commits into from

Conversation

kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Dec 16, 2021

Please review this enhancement to improve evacuation failure handling in G1.

Summary
Prior to this change the evacuation failure handling recorded the failing objects during evacuation in a set backed by G1SegmentedArray. Later when processing the objects the set needed to be sorted to allow for efficient zapping of the areas between the live objects. During this phase we also mark the live objects in the prev mark bitmap to allow for correct verification.

This change optimizes away sorting part by directly marking failing objects in the mark bitmap. The marks are then used to determine which objects need to be processed (and what areas should be zapped) when cleaning up after the evacuation phase. For this to work, there can't be any stale marks in the bitmap. This require us to do some additional clearing that was not needed before. Both at the end of a Full GC and during preparation for mixed collections.

This additional clearing does cost a bit, but are done in parallel and the benefit outweighs the cost. For the Full GC the clearing of the bitmap is moved into the compaction closure. This is something that was thought of before and the closure already have the bitmap as a member. This way the Full GC regression becomes minimal.

Testing

  • Mach5 with and without additional evacuation failures injected
  • Local testing to verify the evacuation failure processing is improved
  • Performance testing not generating evacuation failures show no regression
  • Full GC tests show very limited regressions due to additional clearing in compaction phase

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8278917: Use Prev Bitmap for recording evac failed objects

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6867/head:pull/6867
$ git checkout pull/6867

Update a local copy of the PR:
$ git checkout pull/6867
$ git pull https://git.openjdk.java.net/jdk pull/6867/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6867

View PR using the GUI difftool:
$ git pr show -t 6867

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6867.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 16, 2021

👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 16, 2021

@kstefanj The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Dec 16, 2021
@kstefanj kstefanj changed the title 8278917: Improve evacuation failure handling 8278917: Use Prev Bitmap for recording evac failed objects Dec 16, 2021
@kstefanj kstefanj marked this pull request as ready for review December 17, 2021 13:00
@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 17, 2021
@mlbridge
Copy link

mlbridge bot commented Dec 17, 2021

Webrevs

Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments/suggestions.


// All objects that failed evacuation has been marked in the prev bitmap.
// Use the bitmap to apply the above closure to all failing objects.
hr->apply_to_marked_objects(const_cast<G1CMBitMap*>(_g1h->concurrent_mark()->prev_mark_bitmap()), &rspc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe placing the bitmap on its own line, sth like bitmap = const_cast<G1CMBitMap*>(_g1h->concurrent_mark()->prev_mark_bitmap())?

@@ -191,6 +190,7 @@ class RemoveSelfForwardPtrHRClosure: public HeapRegionClosure {
hr->rem_set()->clear_locked(true);

hr->note_self_forwarding_removal_end(live_bytes);
_g1h->verifier()->check_bitmaps("Self-Forwarding Ptr Removal", hr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you are editing this method, why if (_evac_failure_regions->contains(hr->hrm_index()))? I thought that's the precondition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not check the call-chain, but I agree this should be an assert. Probably a left-over from the past when all heap regions were iterated.

Will fix.

// The compaction closure not only copies the object to the new
// location, but also clears the bitmap for it. This is needed
// for bitmap verification as well as for being able to use the
// prev_bitmap for evacuation failures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of clearing the bit for each obj individually, I wonder if clearing the bitmap for the whole region will be faster. IOW, just call collector()->mark_bitmap()->clear_region(hr); unconditionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this first, just removing the G1VerifyBitmaps condition. But this will cause a notable regression for regions with very few marked objects. I'm not certain that regression is to bad, but I did not see anything suggesting that clearing the whole bitmap afterwards would be quicker in any other scenario either. So went with the bit-by-bit approach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume mem write is more expensive (or at least not more efficient) than mem read, so the performance changes make sense to me.

And a comment about the observation of the regression will be helpful in the future for others who think this might be a optimization point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few lines about this.

// Young regions should always have cleared bitmaps, so only clear old.
if (hr->is_old()) {
_g1h->clear_region_bitmap(hr);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to assert "Young regions should always have cleared bitmaps"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly! Good idea, will fix.

@openjdk
Copy link

openjdk bot commented Dec 17, 2021

@kstefanj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8278917: Use Prev Bitmap for recording evac failed objects

Reviewed-by: ayang, mli, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 48 new commits pushed to the master branch:

  • ff5d417: 8278893: Parallel: Remove GCWorkerDelayMillis
  • 5179672: 8278953: Clarify Class.getDeclaredConstructor specification
  • 8549d8b: 8277100: Dynamic dump can inadvertently overwrite default CDS archive
  • 4c78c9a: 8270929: Obsolete the FilterSpuriousWakeups flag in JDK 19
  • 06206c7: 8278949: Cleanups for 8277850
  • 31fbb87: 6462028: MaskFormatter API documentation refers to getDisplayValue
  • 63e4303: 8278519: serviceability/jvmti/FieldAccessWatch/FieldAccessWatch.java failed "assert(handle != __null) failed: JNI handle should not be null"
  • 6f0e8da: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob
  • 3c10b5d: 8278104: C1 should support the compiler directive 'BreakAtExecute'
  • cc44e13: 8278623: compiler/vectorapi/reshape/TestVectorCastAVX512.java after JDK-8259610
  • ... and 38 more: https://git.openjdk.java.net/jdk/compare/4ba980ba439f94a6b5015e64382a6c308476d63f...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 17, 2021
Copy link
Contributor Author

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Albert.

@@ -191,6 +190,7 @@ class RemoveSelfForwardPtrHRClosure: public HeapRegionClosure {
hr->rem_set()->clear_locked(true);

hr->note_self_forwarding_removal_end(live_bytes);
_g1h->verifier()->check_bitmaps("Self-Forwarding Ptr Removal", hr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not check the call-chain, but I agree this should be an assert. Probably a left-over from the past when all heap regions were iterated.

Will fix.

// The compaction closure not only copies the object to the new
// location, but also clears the bitmap for it. This is needed
// for bitmap verification as well as for being able to use the
// prev_bitmap for evacuation failures.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this first, just removing the G1VerifyBitmaps condition. But this will cause a notable regression for regions with very few marked objects. I'm not certain that regression is to bad, but I did not see anything suggesting that clearing the whole bitmap afterwards would be quicker in any other scenario either. So went with the bit-by-bit approach.

// Young regions should always have cleared bitmaps, so only clear old.
if (hr->is_old()) {
_g1h->clear_region_bitmap(hr);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly! Good idea, will fix.

Copy link

@Hamlin-Li Hamlin-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Just some minor comments.

@@ -2957,14 +2957,18 @@ void G1CollectedHeap::record_obj_copy_mem_stats() {
create_g1_evac_summary(&_old_evac_stats));
}

void G1CollectedHeap::clear_region_bitmap(HeapRegion* hr) {
Copy link

@Hamlin-Li Hamlin-Li Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe currently a name like clear_region_in_prev_bitmap is more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -1252,6 +1254,9 @@ class G1CollectedHeap : public CollectedHeap {
inline bool is_obj_dead_full(const oop obj, const HeapRegion* hr) const;
inline bool is_obj_dead_full(const oop obj) const;

// Mark the live object that failed evacuation in the correct bitmap(s).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are explicitly using prev bitmap, so maybe comment should say so too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, in my first version I also marked in the next bitmap here if needed, but decided to leave it in the failure handling. Will update.

_cm->mark_in_prev_bitmap(obj);
}

// Zapping clears the bitmap, make sure it didn't clear to much.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/to/too

// The compaction closure not only copies the object to the new
// location, but also clears the bitmap for it. This is needed
// for bitmap verification as well as for being able to use the
// prev_bitmap for evacuation failures.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume mem write is more expensive (or at least not more efficient) than mem read, so the performance changes make sense to me.

And a comment about the observation of the regression will be helpful in the future for others who think this might be a optimization point.

@@ -74,6 +80,7 @@ size_t G1FullGCCompactTask::G1CompactRegionClosure::apply(oop obj) {
cast_to_oop(destination)->init_mark();
assert(cast_to_oop(destination)->klass() != NULL, "should have a class");

clear_bitmap(obj);
Copy link

@Hamlin-Li Hamlin-Li Dec 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a simple comment here will be helpful, although we already have comments where G1CompactRegionClosure is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Contributor Author

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Hamlin.

@@ -2957,14 +2957,18 @@ void G1CollectedHeap::record_obj_copy_mem_stats() {
create_g1_evac_summary(&_old_evac_stats));
}

void G1CollectedHeap::clear_region_bitmap(HeapRegion* hr) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -1252,6 +1254,9 @@ class G1CollectedHeap : public CollectedHeap {
inline bool is_obj_dead_full(const oop obj, const HeapRegion* hr) const;
inline bool is_obj_dead_full(const oop obj) const;

// Mark the live object that failed evacuation in the correct bitmap(s).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, in my first version I also marked in the next bitmap here if needed, but decided to leave it in the failure handling. Will update.

@@ -74,6 +80,7 @@ size_t G1FullGCCompactTask::G1CompactRegionClosure::apply(oop obj) {
cast_to_oop(destination)->init_mark();
assert(cast_to_oop(destination)->klass() != NULL, "should have a class");

clear_bitmap(obj);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

// The compaction closure not only copies the object to the new
// location, but also clears the bitmap for it. This is needed
// for bitmap verification as well as for being able to use the
// prev_bitmap for evacuation failures.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few lines about this.

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, some bikeshedding :p

// The compaction closure not only copies the object to the new
// location, but also clears the bitmap for it. This is needed
// for bitmap verification as well as for being able to use the
// prev_bitmap for evacuation failures. Testing showed that it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... for evacuation failures in the next young collections.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -50,7 +50,7 @@ class G1FullGCCompactTask : public G1FullGCTask {

class G1CompactRegionClosure : public StackObj {
G1CMBitMap* _bitmap;

void clear_bitmap(oop object);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a name like par_clear_in_prev_bitmap like other single bit operations for the bitmap (drop the par if you do not like it) - we do not clear the entire bitmap as the current name implies to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll update the name, but will leave the "par"-part out as this is not done with a par-operation. Since the whole region is handled by a single worker the clearing is done with a _bm.clear_bit() and I can't come up with a case where this would be a problem. The regions and bitmap should be aligned so that there are no words in the bitmap spanning two regions, right?

// of these objs later in *remove self forward* phase of post evacuation.
r->record_evac_failure_obj(old);

// Objects failing evacuation will turn in to old objects since the regions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/in to/into

r->record_evac_failure_obj(old);

// Objects failing evacuation will turn in to old objects since the regions
// is relabeled as such. We mark the failing objects in the prev bitmap and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is/are

@@ -1259,6 +1259,46 @@ class G1MergeHeapRootsTask : public WorkerTask {
G1MergeCardSetStats stats() const { return _stats; }
};

// Closure to clear the prev bitmap for any old region in the CSet. This is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if terms like CSet are not abbreviated in comments, i.e. s/CSet/collection set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

}
};

// Helper to allow multiple closure to be applied when
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/multiple/two :)

@@ -622,6 +622,8 @@ class G1CollectedHeap : public CollectedHeap {
// for all regions.
void verify_region_attr_remset_is_tracked() PRODUCT_RETURN;

void clear_region_in_prev_bitmap(HeapRegion* hr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A name like clear_prev_bitmap_for_region would be preferable to me, putting what we clear right next to the verb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this as well.

Copy link
Contributor Author

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the bikeshedding 😄

@@ -622,6 +622,8 @@ class G1CollectedHeap : public CollectedHeap {
// for all regions.
void verify_region_attr_remset_is_tracked() PRODUCT_RETURN;

void clear_region_in_prev_bitmap(HeapRegion* hr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this as well.

// The compaction closure not only copies the object to the new
// location, but also clears the bitmap for it. This is needed
// for bitmap verification as well as for being able to use the
// prev_bitmap for evacuation failures. Testing showed that it
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -50,7 +50,7 @@ class G1FullGCCompactTask : public G1FullGCTask {

class G1CompactRegionClosure : public StackObj {
G1CMBitMap* _bitmap;

void clear_bitmap(oop object);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll update the name, but will leave the "par"-part out as this is not done with a par-operation. Since the whole region is handled by a single worker the clearing is done with a _bm.clear_bit() and I can't come up with a case where this would be a problem. The regions and bitmap should be aligned so that there are no words in the bitmap spanning two regions, right?

@@ -1259,6 +1259,46 @@ class G1MergeHeapRootsTask : public WorkerTask {
G1MergeCardSetStats stats() const { return _stats; }
};

// Closure to clear the prev bitmap for any old region in the CSet. This is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good :)

@kstefanj
Copy link
Contributor Author

Thanks Albert, Hamlin and Thomas for your reviews. I will integrate this a little later when some final sanity testing is finished.

@kstefanj
Copy link
Contributor Author

/integrate

@openjdk
Copy link

openjdk bot commented Dec 21, 2021

Going to push as commit f4f2f32.
Since your change was applied there have been 49 commits pushed to the master branch:

  • 29bd736: 8277893: Arraycopy stress tests
  • ff5d417: 8278893: Parallel: Remove GCWorkerDelayMillis
  • 5179672: 8278953: Clarify Class.getDeclaredConstructor specification
  • 8549d8b: 8277100: Dynamic dump can inadvertently overwrite default CDS archive
  • 4c78c9a: 8270929: Obsolete the FilterSpuriousWakeups flag in JDK 19
  • 06206c7: 8278949: Cleanups for 8277850
  • 31fbb87: 6462028: MaskFormatter API documentation refers to getDisplayValue
  • 63e4303: 8278519: serviceability/jvmti/FieldAccessWatch/FieldAccessWatch.java failed "assert(handle != __null) failed: JNI handle should not be null"
  • 6f0e8da: 8278871: [JVMCI] assert((uint)reason < 2* _trap_hist_limit) failed: oob
  • 3c10b5d: 8278104: C1 should support the compiler directive 'BreakAtExecute'
  • ... and 39 more: https://git.openjdk.java.net/jdk/compare/4ba980ba439f94a6b5015e64382a6c308476d63f...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 21, 2021
@openjdk openjdk bot closed this Dec 21, 2021
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Dec 21, 2021
@openjdk openjdk bot removed the rfr Pull request is ready for review label Dec 21, 2021
@openjdk
Copy link

openjdk bot commented Dec 21, 2021

@kstefanj Pushed as commit f4f2f32.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
4 participants