Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8256265: G1: Improve parallelism in regions that failed evacuation #9980

Closed

Conversation

tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Aug 23, 2022

Hi all,

can I have reviews on this change that makes the evacuation failure remove forwards phase split the work across threads within regions?

This work is a continuation of PR#7047 from @Hamlin-Li and latest improvements from @albertnetymk in that thread. This is reflected in the first commit, I added a fair amount of changes to fix bugs and streamline the code.

Testing: tier1-8 with G1EvacuationFailureALot enabled, tier1-5 as is


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8256265: G1: Improve parallelism in regions that failed evacuation

Reviewers

Contributors

  • Hamlin Li <mli@openjdk.org>
  • Albert Mingkun Yang <ayang@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9980/head:pull/9980
$ git checkout pull/9980

Update a local copy of the PR:
$ git checkout pull/9980
$ git pull https://git.openjdk.org/jdk pull/9980/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9980

View PR using the GUI difftool:
$ git pr show -t 9980

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9980.diff

@tschatzl
Copy link
Contributor Author

/contributor add mli
/contributor add @albertnetymk

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 23, 2022

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8256265 8256265: G1: Improve parallelism in regions that failed evacuation Aug 23, 2022
@openjdk
Copy link

openjdk bot commented Aug 23, 2022

@tschatzl
Contributor Hamlin Li <mli@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented Aug 23, 2022

@tschatzl
Contributor Albert Mingkun Yang <ayang@openjdk.org> successfully added.

@openjdk
Copy link

openjdk bot commented Aug 23, 2022

@tschatzl The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Aug 23, 2022
@tschatzl tschatzl marked this pull request as ready for review August 25, 2022 08:44
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 25, 2022
@mlbridge
Copy link

mlbridge bot commented Aug 25, 2022

Webrevs

src/hotspot/share/gc/g1/g1EvacFailure.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1RemSet.cpp Show resolved Hide resolved
Comment on lines 2947 to 2948
void G1CollectedHeap::clear_bitmap_for_region(HeapRegion* hr, bool update_tams) {
concurrent_mark()->clear_bitmap_for_region(hr, update_tams);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding one more arg, I wonder if it's clearer to move the operation on tams to higher-level caller(s).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Albert, I think moving hr->note_end_of_clearing(); out of G1ConcurrentMark::clear_bitmap_for_region() and handle it where needed is a cleaner approach.

Copy link
Contributor Author

@tschatzl tschatzl Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround for now as I intend to make G1ConcurrentMark owner of TAMSes (like TARSes) and then remove that clumsy workaround (i.e. HeapRegion::note_end_of_clearing will be empty/obsolete and that call removed).

Beginning with that change the TAMS of a region should be part of the bitmap of that region, so the additional parameter for clear_bitmap_for_region seems obvious to me. The reason why this parameter is actually needed becomes clear with listing the usage locations.

  • G1ClearBitmapHRClosure which is a helper of `G1ConcurrentMark´; obviously that one may access TAMSes directly as well as the bitmap.
  • G1RemSet::G1ClearBitmapClosure which clears bitmaps for to-be-evacuated old regions. Their TAMS needs to be reset obviously as they will be evacuated and freed (in the regular case). Actually also only required because of the bitmap reuse for evacuation failure.
  • ClearRetainedRegionBitmaps::ClearRetainedRegionBitmapsClosure is that odd-one-out here; as we reuse the bitmap for evacuation failure we need to ask the owner of the bitmap/tams to please not do the usual thing.

The name update_tams for the parameter is probably bad, I'll rename it to clear_tams, and HeapRegion::note_end_of_marking to something similar while the discussion is ongoing.

Actually thinking about Albert's comment below about ClearRetainedRegionBitmapsClosure::do_heap_region, I think even in that case we can unconditionally reset TAMS, so being able to drop the second parameter of clear_bitmap_for_region.

if (_evac_failure_regions->record(r->hrm_index())) {
_g1h->hr_printer()->evac_failure(r);
}

// Mark the failing object in the marking bitmap and later use the bitmap to handle
// evacuation failure recovery.
_g1h->mark_evac_failure_object(_worker_id, old, word_sz);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same live-size is calculated again in process_chunk, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the one (ultimately) recorded in the G1ConcurrentMark::_live_bytes array and the one for the statistics in G1RemoveSelfForwardsInChunksTask::process_chunk?
You probably suggest to merge them somehow to do this work once too?

I do not think this is worth, particularly because process_chunk only collects live bytes per thread without any synchronization (it does collect dead bytes with synchronization).
So you could use that value (the total dead bytes) or vice versa to update G1ConcurrentMark::_live_bytes at the end of collection, but

  • G1ConcurrentMark::_live_bytes (the sum) isn't actually gathered here, but again these are basically per-thread values (using the mark-stats cache), so otoh the update is inexpensive (nothing compared to the par_mark in mark_evac_failure_object), and otoh it can't be used for updating HeapRegion::_garbage_bytes directly. This would need to flush buffers somewhere before doing that.
  • the collected HeapRegion::_garbage_bytes could be used at the end to update G1ConcurrentMark::_live_bytes, but it does not seem to be worth the complexity adding another phase somewhere else. Also it would require another G1ConcurrentMark API entry just for that which I do not see worth doing.

I.e. I do not think it is worth optimizing something that trivial and add the required code complexity. I would prefer to decrease code complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking sth like:

void HeapRegion::note_self_forwarding_removal_end_par(size_t garbage_bytes) {
  Atomic::store(&_live_bytes, region_size - _garbage_bytes, ...);
  Atomic::add(&_garbage_bytes, garbage_bytes, memory_order_relaxed);
}

Copy link
Contributor Author

@tschatzl tschatzl Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Live-bytes are required to be updated for the marking, i.e. within the concurrent start pause. HeapRegion never had _live_bytes, only _marked_bytes which you probably meant.

_marked_bytes has been removed in JDK-8290357 because it's not required to actually store that value. It has only ever been used for some unnecessary reasons (see my comments in #9511 for the removal reasons).

At the end of marking the marked bytes (or actually HeapRegion::_garbage_bytes) are set by the Remark pause based on its live/marked_bytes. So we need to update the marking's _live_bytes here so that the values add up in the end (without using extra storage somewhere).

Actually I think now that we do not strictly need to update HeapRegion::_garbage_bytes in evacuation failure handling; updating the value for the marking is sufficient as we only use it (apart from some logging) for determining the region's gc efficiency. That one is strictly using the information from the latest marking (just completed), so that update isn't really necessary. I can remove this if wanted, but if so I'd prefer to do that separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; I was assuming _live_bytes and _garbage_bytes live in the same class/context/entity. In current impl, _live_bytes lives in CM while _garbage_bytes in HeapRegion, which is quite confusing IMO.

Copy link
Contributor Author

@tschatzl tschatzl Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has never been a _live_bytes as implied by the name anywhere (there is a getter on HeapRegion that (still) calculates an upper bound on the number of live bytes in a region; it uses _garbage_bytes to determine that). The use of _live_bytes is maybe a misnomer in the marking code (but for marking only something below TAMS is relevant).

I am open to renaming it (separately).

In HeapRegion the name of this counter has always been _marked_bytes which meant "marked bytes below TAMS". TAMS is something that is only useful (and valid) during marking, a _marked_bytes on HeapRegion is imho very questionable (because tams == bottom most of the time). I.e. having nonzero marked bytes in [bottom, tams[ where tams == bottom is even more inconsistent.

Marked bytes for a region only has meaning with an associated TAMS. That's why it's gone, and it is unnecessary to store in the first place as the removal change showed.

Garbage bytes is the lower bound of dead objects between [bottom, top[, which has meaning regardless of other temporary marking data like TAMS.

This is one reason I am suggesting to (also) move TAMS out of HeapRegion. I only wanted to wait with that until after this change; another that TAMS is intrinsically linked to the bitmap (and marking) so it should be managed (like TARS) by G1ConcurrentMark, and hence cleared with the bitmap in one go.

That G1 messes with the bitmap for evacuation failure handling is from an abstraction POV not good, but overall acceptable (and adds these warts).

Maybe the clear_bitmap_for_region() method should (already) be called something like clear_marking_data_in_region()? Let me show you how this would look like.

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, this code has really improve over the last 6 months (since I looked at it before my leave) and I just have a few small comments/suggestions.

src/hotspot/share/gc/g1/g1CollectedHeap.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailure.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailure.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp Outdated Show resolved Hide resolved
G1AbstractSubTask(G1GCPhaseTimes::RestoreRetainedRegions),
_task(evac_failure_regions),
_evac_failure_regions(evac_failure_regions) { }

double worker_cost() const override {
assert(_evac_failure_regions->evacuation_failed(), "Should not call this if not executed");
return _evac_failure_regions->num_regions_failed_evacuation();
return _evac_failure_regions->num_regions_failed_evacuation() * G1PerRetainedRegionThreads;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit wrong to have a flag for this, or at least I'm not sure if the number of thread only should be connected to the number of regions only. Instead maybe we should look at the to total size for those regions. Now with the chunking there will be a lot more work items than regions so we should be able to look at size which to me looks like a better metric.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like G1CollectedHeap::get_chunks_per_region() / MAGIC_FACTOR I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be better and times the number of regions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the latest change. However I kept a diagnostic flag to allow tuning of that. I'll do some more testing to see whether it makes a difference.

src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp Outdated Show resolved Hide resolved
if (_evac_failure_regions->record(r->hrm_index())) {
_g1h->hr_printer()->evac_failure(r);
}

// Mark the failing object in the marking bitmap and later use the bitmap to handle
// evacuation failure recovery.
_g1h->mark_evac_failure_object(_worker_id, old, word_sz);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking sth like:

void HeapRegion::note_self_forwarding_removal_end_par(size_t garbage_bytes) {
  Atomic::store(&_live_bytes, region_size - _garbage_bytes, ...);
  Atomic::add(&_garbage_bytes, garbage_bytes, memory_order_relaxed);
}

src/hotspot/share/gc/g1/g1EvacFailureRegions.inline.hpp Outdated Show resolved Hide resolved
Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling note_end_of_clearing in G1ConcurrentMark::clear_bitmap_for_region, I wonder if moving it (note_end_of_clearing) to a higher-level caller (G1ClearBitmapClosure) is a bit nicer. This way, G1ConcurrentMark::clear_bitmap_for_region does exactly what its name suggests, nothing more&less.

src/hotspot/share/gc/g1/g1CollectedHeap.cpp Show resolved Hide resolved
src/hotspot/share/gc/g1/g1EvacFailure.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp Outdated Show resolved Hide resolved
@tschatzl
Copy link
Contributor Author

The last commit prospectively renames clear_bitmap_for_region().

@tschatzl
Copy link
Contributor Author

Instead of calling note_end_of_clearing in G1ConcurrentMark::clear_bitmap_for_region, I wonder if moving it (note_end_of_clearing) to a higher-level caller (G1ClearBitmapClosure) is a bit nicer. This way, G1ConcurrentMark::clear_bitmap_for_region does exactly what its name suggests, nothing more&less.

Here's a branch that explicitly sets TAMS: https://github.com/openjdk/jdk/compare/master...tschatzl:jdk:submit/8256265-parallel-evac-failure-alt?expand=1

The last commit (b77bb2f) implements the suggested change.
This omits that call in the second invocation of G1CollectedHeap::clear_bitmap_for_region() because TAMS must be bottom() at that point.

I do not think this difference is the heart of this review and should not be preventing progress :) Let's postpone the discussion whether TAMS should be part of G1ConcurrentMark or not to another day. So I would be happy to use that going forward for this review.

@albertnetymk
Copy link
Member

albertnetymk commented Sep 14, 2022

another that TAMS is intrinsically linked to the bitmap (and marking) so it should be managed (like TARS) by G1ConcurrentMark

I don't think tams and the bitmap are intrinsically linked; they are two independent entities. Only in the marking context, are they linked. tams is intrinsically linked to marking but not the bitmap.

That G1 messes with the bitmap for evacuation failure handling is from an abstraction POV not good, but overall acceptable (and adds these warts).

I don't view it as a leaky abstraction, since the bitmap is not really tied to marking in my mind. Recording evac-fail objs in the bitmap is completely isolated from the use of the bitmap during concurrent-marking.

This omits that call in the second invocation of G1CollectedHeap::clear_bitmap_for_region() because TAMS must be bottom() at that point.

That's one of the reasons why I suggested moving tams assignment out.

I do not think this difference is the heart of this review and should not be preventing progress :)

Agree; this is not a blocker.

@tschatzl
Copy link
Contributor Author

tschatzl commented Sep 14, 2022

another that TAMS is intrinsically linked to the bitmap (and marking) so it should be managed (like TARS) by G1ConcurrentMark

I don't think tams and the bitmap are intrinsically linked; they are two independent entities. Only in the marking context, are they linked. tams is intrinsically linked to marking but not the bitmap.

I see the use of the (marking) bitmap during evacuation failure as an optimization of memory usage. Evacuation failure and marking are completely different concerns and in an ideal world should not share data structures.

That G1 messes with the bitmap for evacuation failure handling is from an abstraction POV not good, but overall acceptable (and adds these warts).

I don't view it as a leaky abstraction, since the bitmap is not really tied to marking in my mind. Recording evac-fail objs in the bitmap is completely isolated from the use of the bitmap during concurrent-marking.

We can probably only use the marking bitmap for evac-fail objs because we made the (prior, well reasoned) design decision to completely separate mixed collection and marking. Marking in the mixed phase could be implemented (and actually iirc it has been possible and working in JDK7/8. At least I remember some bug which fixed some accidental enabling of that for a time...). Then this reuse might have been a problem.

In the currently only case where concurrent marking overlaps the mixed phase (Concurrent Clear for Next Mark) it just happens to be okay to clear the bitmap twice, once at gc start and then in the concurrent phase.

The marking bitmap contents (per region) are also only valid between bottom and tams of a region; that the bitmap always covers the whole region is an implementation detail (and something I actually would like to experiment with to get rid of at some point - why keep around so much memory if it isn't used? There are often regions that do not contain live objects at all, why have backing storage committed bitmaps for them in advance? Or humongous objects, is it efficient to commit the whole area they span for a single mark at the header?). There are also some marking algorithm improvements (with desirable properties) that add a little bit more per-region helper data structures that might need to be reset at that point.

Potentially having to enumerate all of these for clearing at all of the places where we now use clear_bitmap_for_region seems cumbersome and error-prone. Clearing additional data might be some extra work, but from whatever I've seen so far it isn't relevant.

This is why in my view the (per-region) bitmap and tams are both by definition marking related data structures, and should be treated as intrinsically linked. Imho evac-failure simply barges in and reuses them out of convenience (in some way) ;-)

This omits that call in the second invocation of G1CollectedHeap::clear_bitmap_for_region() because TAMS must be bottom() at that point.

That's one of the reasons why I suggested moving tams assignment out.

I found that not only doing the absolutely necessary thing very often makes the code clearer, allowing to raise the abstraction level for the code, i.e. "clear all marking data for region" vs. "clear bitmap for region". In this case the overhead is an assignment to a single variable in addition to clearing kBs of bitmap data...

Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments.

Comment on lines 116 to 117
double chunks_per_thread = (double)G1CollectedHeap::get_chunks_per_region() / G1RestoreRetainedRegionChunksPerWorker;
return chunks_per_thread * _evac_failure_regions->num_regions_failed_evacuation();
Copy link
Member

@albertnetymk albertnetymk Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation is correct, but the var name is wrong.

chunks/region / chunks/worker == workers/region then workers/region * region == workers. This method returns the optimal number of workers.

(PS: I still think doing multiplication first is easier to reason, but that can be subjective.)


void G1ParRemoveSelfForwardPtrsTask::work(uint worker_id) {
RemoveSelfForwardPtrHRClosure rsfp_cl(worker_id, _evac_failure_regions);
void G1RemoveSelfForwardsTask::initialize() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why this is not part of the constructor?

@openjdk
Copy link

openjdk bot commented Sep 14, 2022

@tschatzl This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8256265: G1: Improve parallelism in regions that failed evacuation

Co-authored-by: Hamlin Li <mli@openjdk.org>
Co-authored-by: Albert Mingkun Yang <ayang@openjdk.org>
Reviewed-by: sjohanss, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 32 new commits pushed to the master branch:

  • 7f3250d: 8293787: Linux aarch64 build fails after 8292591
  • 2a38791: 8292755: Non-default method in interface leads to a stack overflow in JShell
  • 8351b30: 8293771: runtime/handshake/SystemMembarHandshakeTransitionTest.java fails if MEMBARRIER_CMD_QUERY is unsupported
  • 91f9c0d: 8293774: Improve TraceOptoParse to dump the bytecode name
  • 1169a15: 8291657: Javac assertion when compiling a method call with switch expression as argument
  • 2baf251: 8293654: Improve SharedRuntime handling of continuation helper out-arguments
  • 60f59a4: 8293660: Fix frame::sender_for_compiled_frame frame size assert
  • b3461c1: 8293680: PPC64BE build failure after JDK-8293344
  • 7e02039: 8293647: Avoid unnecessary boxing in jdk.hotspot.agent
  • 9039022: 8287394: AArch64: Remove cbuf parameter from far_call/far_jump/trampoline_call
  • ... and 22 more: https://git.openjdk.org/jdk/compare/37df5f56259429482cfdbe38e8b6256f1efaf9e8...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 14, 2022
Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@tschatzl
Copy link
Contributor Author

Thanks for your reviews. Did a few tier1-4 runs that showed no issue.

/integrate

@openjdk
Copy link

openjdk bot commented Sep 15, 2022

Going to push as commit 15cb1fb.
Since your change was applied there have been 42 commits pushed to the master branch:

  • b31a03c: 8293695: Implement isInfinite intrinsic for RISC-V
  • 8f3bbe9: 8293472: Incorrect container resource limit detection if manual cgroup fs mounts present
  • 1caba0f: 8292948: JEditorPane ignores font-size styles in external linked css-file
  • eeb625e: 8290169: adlc: Improve child constraints for vector unary operations
  • 2057070: 8293815: P11PSSSignature.engineUpdate should not print debug messages during normal operation
  • 7376c55: 8293769: RISC-V: Add a second temporary register for BarrierSetAssembler::load_at
  • d191e47: 8293768: Add links to JLS 19 and 20 from SourceVersion enum constants
  • a75ddb8: 8293122: (fs) Use file cloning in macOS version of Files::copy method
  • 95c7c55: 8293402: hs-err file printer should reattempt stack trace printing if it fails
  • 211fab8: 8291669: [REDO] Fix array range check hoisting for some scaled loop iv
  • ... and 32 more: https://git.openjdk.org/jdk/compare/37df5f56259429482cfdbe38e8b6256f1efaf9e8...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 15, 2022
@openjdk openjdk bot closed this Sep 15, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 15, 2022
@openjdk
Copy link

openjdk bot commented Sep 15, 2022

@tschatzl Pushed as commit 15cb1fb.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tschatzl tschatzl deleted the submit/8256265-parallel-evac-failure branch September 15, 2022 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
3 participants