8256265: G1: Improve parallelism in regions that failed evacuation #7047

Hamlin-Li · 2022-01-12T09:03:45Z

Currently G1 assigns a thread per failed evacuated region. This can in effect serialize the whole process as often (particularly with region pinning) there is only one region to fix up.

This patch tries to improve parallelism when walking over the regions in chunks

Latest implementation scans regions in chunks to bring parallelism, it's based on JDK-8278917 which changes to uses prev bitmap to mark evacuation failure objs.

Here's the summary of performance data based on latest implementation, basically, it brings better and stable performance than baseline at "Post Evacuate Cleanup 1/remove self forwardee" phase. (Although some regression is spotted when calculate the results in geomean, becuase one pause time from baseline is far too small than others.)

The performance benefit trend is:

pause time (Post Evacuate Cleanup 1) is decreased from 76.79% to 2.28% for average time, from 71.61% to 3.04% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=8)
pause time (Post Evacuate Cleanup 1) is decreased from 63.84% to 15.16% for average time, from 55.41% to 12.45% for geomean, when G1EvacuationFailureALotCSetPercent is changed from 2 to 90 (-XX:ParallelGCThreads=<default=123>)
( Other common Evacuation Failure configurations are:
-XX:+G1EvacuationFailureALot -XX:G1EvacuationFailureALotInterval=0 -XX:G1EvacuationFailureALotCount=0 )

For more detailed performance data, please check the related bug.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8256265: G1: Improve parallelism in regions that failed evacuation

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7047/head:pull/7047
$ git checkout pull/7047

Update a local copy of the PR:
$ git checkout pull/7047
$ git pull https://git.openjdk.java.net/jdk pull/7047/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7047

View PR using the GUI difftool:
$ git pr show -t 7047

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7047.diff

bridgekeeper · 2022-01-12T09:04:45Z

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2022-01-12T09:07:25Z

@Hamlin-Li The following label will be automatically applied to this pull request:

hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2022-01-22T04:21:23Z

Webrevs

tschatzl · 2022-01-25T16:50:17Z

Just a recap of what the change adds:

on evacuation failure, also records the number of bytes that failed evacuation in that region in a per-region live-map (using G1RegionMarkStats)
(at the start of the Post Evacuation Cleanup 1 we flush that cache - ideally this would be done in Merge PSS, but we can't because we need it in the remove self forwards pointer task potentially running in parallel)
remove self forwards in Post Evacuation Cleanup 1 does roughly the following:

let the threads claim and "prepare" the region - mostly setting live bytes from that new per-region live map, "readying the region" (BOT reset, some statistics), finally set to "ready"
wait for the region being "ready"
let the threads claim parts ("chunks") of the region; these chunks are first generated using information from the region (and the bitmap). They contain information to handle the zapping and restoring, which is then immediately used by that thread.

Fwiw, I did some hacking, adding lots of statistics output to it because I was a bit surprised of some of the numbers I saw (available at https://github.com/tschatzl/jdk/tree/pull/7047-evac-failure-chunking).

Hamlin-Li · 2022-01-26T07:26:05Z

Thanks Thomas for the summary and logging code.
I attached related log output below.

[3.236s][debug][gc,phases] GC(0)     Post Evacuate Cleanup 1: 5.4ms
[3.236s][debug][gc,phases] GC(0)     Evac Fail Merge Live: 0.0ms
...
[3.236s][debug][gc,phases] GC(0)       Restore Retained Regions (ms): Min:  5.1, Avg:  5.1, Max:  5.3, Diff:  0.2, Sum: 41.0, Workers: 8
[3.236s][debug][gc,phases] GC(0)         Regions:                       Min: 1, Avg:  1.0, Max: 1, Diff: 0, Sum: 4, Workers: 4
[3.237s][debug][gc,phases] GC(0)         Prepared Retained Regions (ms): Min:  0.0, Avg:  0.0, Max:  0.0, Diff:  0.0, Sum:  0.0, Workers: 4
[3.237s][debug][gc,phases] GC(0)         Wait For Ready In Retained Regions (ms): Min:  0.0, Avg:  0.0, Max:  0.0, Diff:  0.0, Sum:  0.0, Workers: 8
[3.237s][debug][gc,phases] GC(0)         Prepare Chunks (ms):           Min:  0.0, Avg:  0.0, Max:  0.0, Diff:  0.0, Sum:  0.1, Workers: 8
[3.237s][debug][gc,phases] GC(0)         Remove Self Forwards In Chunks (ms): Min:  5.0, Avg:  5.1, Max:  5.2, Diff:  0.2, Sum: 40.6, Workers: 8
[3.237s][debug][gc,phases] GC(0)           Forward Chunks:                Min: 16, Avg: 16.0, Max: 16, Diff: 0, Sum: 128, Workers: 8
[3.237s][debug][gc,phases] GC(0)           Empty Forward Chunks:          Min: 32, Avg: 128.0, Max: 351, Diff: 319, Sum: 896, Workers: 7
[3.237s][debug][gc,phases] GC(0)           Forward Objects:               Min: 432330, Avg: 440400.4, Max: 449475, Diff: 17145, Sum: 3523203, Workers: 8
[3.237s][debug][gc,phases] GC(0)           Forward Bytes:                 Min: 13543912, Avg: 13680514.0, Max: 13779400, Diff: 235488, Sum: 109444112, Workers: 8

tschatzl · 2022-01-26T12:23:05Z

Regarding the log messages: We might want to fix up them a bit, I did not look at our recent email discussion on what we came up with, and their level.

Some other initial thoughts worth considering:

*) What I already noticed yesterday on some tests, and can also be seen in your log snippet, is that the "Remove self-forwards in chunks" takes a lot of time, unexpectedly much to me actually. I want to look further into this to understand the reason(s).

*) The other concern I have is whether we really need (or can avoid) the need for the "Wait for Ready In Retained Regions" phase. It looks a bit unfortunate to actually have a busy-loop in there; this should definitely use proper synchronization or something to wait on if it is really needed. What of the retained region preparation do we really need? On a first look, maybe just the BOT reset, which we might be able to put somewhere else (I may be totally wrong). Also, if so, the Prepare Retained regions should probably be split out to be started before all other tasks in this "Post Evacuate Cleanup 1" phase.

I can see that from a timing perspective "Wait For Ready" is not a problem in all of my tests so far.

*) The "Prepared Retained Regions" phase stores the amount of live data into the HeapRegion; for this reason the change adds these G1RegionMarkStats data gathering via the G1RegionMarkStatsCache; I think the same information could be provided while iterating over the chunks (just do an Atomic::add here) instead. A single Atomic::add per thread per retained region at most seems to be okay. That would also remove the Evac Fail Merge Live phase afaict.

*) Not too happy that the G1HeapRegionChunk constructor does surprisingly much work, which surprisingly takes very little time.

*) I was wondering whether it would be somewhat more efficient for the Prepare Chunks phase to collect some of the information needed there somehow else. Something is bubbling up in my mind, but nothing specific yet, and as mentioned, it might not be worth doing given its (lack of) cost.

Hamlin-Li · 2022-01-27T09:23:27Z

Regarding the log messages: We might want to fix up them a bit, I did not look at our recent email discussion on what we came up with, and their level.

Some other initial thoughts worth considering:

*) What I already noticed yesterday on some tests, and can also be seen in your log snippet, is that the "Remove self-forwards in chunks" takes a lot of time, unexpectedly much to me actually. I want to look further into this to understand the reason(s).

In fact, normally most of time of "Post Evacuate Cleanup 1" is spent on "Restore Retained Regions" in baseline version. In parallel version, the proportion of "Restore Retained Regions" in "Post Evacuate Cleanup 1" is reduced. e.g. following is the "Post Evacuate Cleanup 1"/"Restore Retained Regions" time comparison between baseline and parallel:
baseline:

[3.169s][info ][gc,phases] GC(0)   Post Evacuate Collection Set: 10.0ms
[3.169s][debug][gc,phases] GC(0)     Post Evacuate Cleanup 1: 9.5ms

parallel

[3.105s][info ][gc,phases] GC(0)   Post Evacuate Collection Set: 2.5ms
[3.106s][debug][gc,phases] GC(0)     Post Evacuate Cleanup 1: 2.0ms

the difference between "Post Evacuate Cleanup 1" and "Restore Retained Regions" is the same between baseline and parallel version, which is spent on other subphases in "Post Evacuate Cleanup 1".

*) The other concern I have is whether we really need (or can avoid) the need for the "Wait for Ready In Retained Regions" phase. It looks a bit unfortunate to actually have a busy-loop in there; this should definitely use proper synchronization or something to wait on if it is really needed. What of the retained region preparation do we really need? On a first look, maybe just the BOT reset, which we might be able to put somewhere else (I may be totally wrong). Also, if so, the Prepare Retained regions should probably be split out to be started before all other tasks in this "Post Evacuate Cleanup 1" phase.

I can see that from a timing perspective "Wait For Ready" is not a problem in all of my tests so far.

Yes, currently seems "Wait For Ready" does not cost much time, as "Prepared Retained Regions" is quick, not sure if synchronization will help any more.
But I will investigate if we can omit "Prepared Retained Regions" and "Wait For Ready" subphases totally to simplify the logic. [TODO]

*) The "Prepared Retained Regions" phase stores the amount of live data into the HeapRegion; for this reason the change adds these G1RegionMarkStats data gathering via the G1RegionMarkStatsCache; I think the same information could be provided while iterating over the chunks (just do an Atomic::add here) instead. A single Atomic::add per thread per retained region at most seems to be okay. That would also remove the Evac Fail Merge Live phase afaict.

I will do this refactor soon.

*) Not too happy that the G1HeapRegionChunk constructor does surprisingly much work, which surprisingly takes very little time.

*) I was wondering whether it would be somewhat more efficient for the Prepare Chunks phase to collect some of the information needed there somehow else. Something is bubbling up in my mind, but nothing specific yet, and as mentioned, it might not be worth doing given its (lack of) cost.

I will put it on backlog to see if it can be simplied. [TODO]

tschatzl · 2022-01-27T10:17:18Z

Some other initial thoughts worth considering:

*) What I already noticed yesterday on some tests, and can also be seen in your log snippet, is that the "Remove self-forwards in chunks" takes a lot of time, unexpectedly much to me actually. I want to look further into this to understand the reason(s).

In fact, normally most of time of "Post Evacuate Cleanup 1" is spent on "Restore Retained Regions" in baseline version. In parallel version, the proportion of "Restore Retained Regions" in "Post Evacuate Cleanup 1" is reduced. [...]

I agree. This has only been a general remark about its performance, not meant to belittle the usefulness this change and in general all the changes in this series have, which are quite substantial. 👍 I compared the throughput (bytes/ms) between Object Copy and this phase, and at least without JDK-8280374 the remove self forwards is only like 2x the throughput of Object Copy, which seemed quite bad compared to what they do. With JDK-8280374 the results are much better (~4.5x) afaict on a single benchmark I tried though. It's a bit hard to reproduce the exact situation/heap though...

Another (future) optimization that may be worthwhile here may be to get some occupancy statistics of the chunks and switch between walking the bitmap and walking the objects; another one that might be "simpler" to implement (but fairly messy probably) is to simply check if the object after the current one is also forwarded, and if so, do not switch back to the bitmap walking but immediately process that one as well.
This might help somewhat because given typical avg. object sizes (~40 bytes), the mark word after the current one might be already in the cache anyway, so a read access practically free.

These are only ideas though.

*) The other concern I have is whether we really need (or can avoid) the need for the "Wait for Ready In Retained Regions" phase. It looks a bit unfortunate to actually have a busy-loop in there; this should definitely use proper synchronization or something to wait on if it is really needed. What of the retained region preparation do we really need? On a first look, maybe just the BOT reset, which we might be able to put somewhere else (I may be totally wrong). Also, if so, the Prepare Retained regions should probably be split out to be started before all other tasks in this "Post Evacuate Cleanup 1" phase.

I can see that from a timing perspective "Wait For Ready" is not a problem in all of my tests so far.

Yes, currently seems "Wait For Ready" does not cost much time, as "Prepared Retained Regions" is quick, not sure if synchronization will help any more.
But I will investigate if we can omit "Prepared Retained Regions" and "Wait For Ready" subphases totally to simplify the logic. [TODO]

The point of "proper synchronization" isn't that it's faster, but it does not burn cpu cycles unnecessarily which potentially keeps the one thread that others are waiting on do the work. If we can remove the dependencies between the "Prepare Retained Regions" and the remaining phases, which only seems to be the BOT. One idea is that maybe all that prepare stuff could be placed where G1 adds that region to the list of retained regions. This does not work for the liveness count obviously - but that can be recreated by the actual self forwarding removal as suggested earlier 😸).

Then none of that is required which is even better.

*) The "Prepared Retained Regions" phase stores the amount of live data into the HeapRegion; for this reason the change adds these G1RegionMarkStats data gathering via the G1RegionMarkStatsCache; I think the same information could be provided while iterating over the chunks (just do an Atomic::add here) instead. A single Atomic::add per thread per retained region at most seems to be okay. That would also remove the Evac Fail Merge Live phase afaict.

I will do this refactor soon.

Thanks!

*) Not too happy that the G1HeapRegionChunk constructor does surprisingly much work, which surprisingly takes very little time.

*) I was wondering whether it would be somewhat more efficient for the Prepare Chunks phase to collect some of the information needed there somehow else. Something is bubbling up in my mind, but nothing specific yet, and as mentioned, it might not be worth doing given its (lack of) cost.

I will put it on backlog to see if it can be simplified. [TODO]

Not necessarily simplified: one option is to make that work explicit (we tend to try to not do too much work in constructors - but maybe this just fits here), another is to pre-calculate some of these values during evacuation failure somehow.

We can maybe also postpone the optimization part of that suggestion given that currently that phase takes almost no time if it seems to be too much work.

Thanks for your hard work,
Thomas

Hamlin-Li · 2022-01-28T08:01:04Z

I agree. This has only been a general remark about its performance, not meant to belittle the usefulness this change and in general all the changes in this series have, which are quite substantial. 👍 I compared the throughput (bytes/ms) between Object Copy and this phase, and at least without JDK-8280374 the remove self forwards is only like 2x the throughput of Object Copy, which seemed quite bad compared to what they do. With JDK-8280374 the results are much better (~4.5x) afaict on a single benchmark I tried though. It's a bit hard to reproduce the exact situation/heap though...

Another (future) optimization that may be worthwhile here may be to get some occupancy statistics of the chunks and switch between walking the bitmap and walking the objects;

Yes, we have a similar task on our backlog, to fall back to walking the objects if the statistics tell us so.

another one that might be "simpler" to implement (but fairly messy probably) is to simply check if the object after the current one is also forwarded, and if so, do not switch back to the bitmap walking but immediately process that one as well. This might help somewhat because given typical avg. object sizes (~40 bytes), the mark word after the current one might be already in the cache anyway, so a read access practically free.

I'm not sure how much this will help. Currently, the code looks like below, if the next obj is also marked, it will be applied with closure next time in the loop, it should has the same cache hit as the way you suggested above, the difference is the method (apply(current)) invocation overhead. But I will put it on backlog too. [TODO]

  while (next_addr < _limit) {
    Prefetch::write(next_addr, PrefetchScanIntervalInBytes);
    if (_bitmap->is_marked(next_addr)) {
      oop current = cast_to_oop(next_addr);
      next_addr += closure->apply(current);
    } else {
      next_addr = _bitmap->get_next_marked_addr(next_addr, _limit);
    }
  }

These are only ideas though.

[...]
But I will investigate if we can omit "Prepared Retained Regions" and "Wait For Ready" subphases totally to simplify the logic. [TODO]

The point of "proper synchronization" isn't that it's faster, but it does not burn cpu cycles unnecessarily which potentially keeps the one thread that others are waiting on do the work. If we can remove the dependencies between the "Prepare Retained Regions" and the remaining phases, which only seems to be the BOT. One idea is that maybe all that prepare stuff could be placed where G1 adds that region to the list of retained regions. This does not work for the liveness count obviously - but that can be recreated by the actual self forwarding removal as suggested earlier 😸).

Then none of that is required which is even better.

I have just delete the code related to "Prepared Retained Regions" and "Wait For Ready", and put the logic in G1EvacFailureRegions::record(...) and SampleCollectionSetCandidatesTask and VerifyAfterSelfForwardingPtrRemovalTask.

*) The "Prepared Retained Regions" phase stores the amount of live data into the HeapRegion; for this reason the change adds these G1RegionMarkStats data gathering via the G1RegionMarkStatsCache; I think the same information could be provided while iterating over the chunks (just do an Atomic::add here) instead. A single Atomic::add per thread per retained region at most seems to be okay. That would also remove the Evac Fail Merge Live phase afaict.

I will do this refactor soon.

Thanks!

This one is also done.

*) Not too happy that the G1HeapRegionChunk constructor does surprisingly much work, which surprisingly takes very little time.
*) I was wondering whether it would be somewhat more efficient for the Prepare Chunks phase to collect some of the information needed there somehow else. Something is bubbling up in my mind, but nothing specific yet, and as mentioned, it might not be worth doing given its (lack of) cost.

I will put it on backlog to see if it can be simplified. [TODO]

Not necessarily simplified: one option is to make that work explicit (we tend to try to not do too much work in constructors - but maybe this just fits here), another is to pre-calculate some of these values during evacuation failure somehow.

We can maybe also postpone the optimization part of that suggestion given that currently that phase takes almost no time if it seems to be too much work.

OK, let's get back to this when it occupies much time in the phase.

Thanks for your hard work, Thomas

Thanks alot for detailed discussion and valuable suggestion, it helps alot :)

…et verification crash

openjdk · 2022-02-11T07:45:39Z

@Hamlin-Li this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout parallelize-evac-failure-in-bm
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

Hamlin-Li · 2022-02-14T08:16:13Z

Hi Thomas,

My test (with the latest implementation) shows that when evacuation failure regions number is less than parallel gc thread number, it bring stable benefit in post 1 phase; but when evacuation failure regions number is more than parallel gc thread number, the benefit is not stable, and can bring some regionssion in post 1 phase.
I think the test result is reasonable. When there are more evaucuation failure regions than parallel gc threads, parallelism at region level should already assign some regions to every gc threads, i.e. it's already fully parallized in some degree; whether parallelism at chunk level could bring more benefit depends on the distribution of evacuation failure objects in regions. Otherwise, when there are less evaucuation failure regions, parallelism at region level can not assign every gc threads a evacuation failure region to process, at this situation parallism at chunk level can bring more benefit, and the benefit is stable.

A simple heuristic is to switch to original implemenation, i.e. parallelize only at region level, when detects that evacuation failure regions number is more than parallel gc thread number. The advantage is that it avoids to consume extra CPU to do unnecessary parallelism at chunk level. The drawback of this solution is that it will bring 2 pieces of code: parallelism in regions, and parallelism in chunks.

How do you think about it?

Thanks

Hamlin-Li · 2022-03-02T14:25:08Z

Thanks for clarification, I see the point.
Will update the patch.

albertnetymk

Some minor comments/suggestions.

src/hotspot/share/gc/g1/g1EvacFailure.cpp

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp

src/hotspot/share/gc/g1/g1HeapRegionChunk.hpp

src/hotspot/share/gc/g1/g1EvacFailureRegions.hpp

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp

src/hotspot/share/gc/g1/g1GCPhaseTimes.cpp

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp

src/hotspot/share/gc/g1/g1HeapRegionChunk.hpp

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp

Hamlin-Li · 2022-03-18T04:45:19Z

Thanks for the detailed reviews. :)

I'm not sure if it's feasible to move prepare_regions into post_evacuate_cleanup_1 as a G1BatchedTask.
My concern is that in prepare_regions()->PrepareEvacFailureRegionTask::prepare_region()->HeapRegion::note_self_forwarding_removal_start(), nTAMS is set, and nTAMS is used cleanup 1 phase at RemoveSelfForwardPtrObjClosure to set next bitmap. If move prepare_regions into post_evacuate_cleanup_1 as a G1BatchedTask, there is no guarantee that preparation will be done before the usage of nTAMS in RemoveSelfForwardPtrObjClosure.
How do you think about it?

tschatzl · 2022-03-18T08:28:35Z

Thanks for the detailed reviews. :)

I'm not sure if it's feasible to move prepare_regions into post_evacuate_cleanup_1 as a G1BatchedTask. My concern is that in prepare_regions()->PrepareEvacFailureRegionTask::prepare_region()->HeapRegion::note_self_forwarding_removal_start(), nTAMS is set, and nTAMS is used cleanup 1 phase at RemoveSelfForwardPtrObjClosure to set next bitmap. If move prepare_regions into post_evacuate_cleanup_1 as a G1BatchedTask, there is no guarantee that preparation will be done before the usage of nTAMS in RemoveSelfForwardPtrObjClosure. How do you think about it?

I believe this is an unnecessary dependency.

PrepareEvacFailureRegionTask::prepare_region()->HeapRegion::note_self_forwarding_removal_start sets nTAMS to top() unconditionally with the intent that the mark always happens.

So instead of calling _cm->mark_in_next_bitmap(_worker_id, obj); which checks nTAMS, just unconditionally mark the next bitmap (not sure if there is already a method for this) to achieve the same effect.

The alternative would be to add a new G1BatchedTask for just this preparation, which seems much more work not only in terms of code, but also much more work spinning up threads.

Of course, the use of this "raw" mark method needs to be documented.

Fwiw, in the protoype we have for JDK-8210708, which looks fairly good at this point, a similar change would be needed anyway.

Thanks,
Thomas

Hamlin-Li · 2022-03-18T09:18:16Z

Seems there is another dependency: in RemoveSelfForwardPtrHRChunkClosure, _prev_marked_bytes is accumulated concurrently; _prev_marked_bytes should be reset to zero in prepare_regions.

tschatzl · 2022-03-18T09:42:56Z

After a quick look through the code, I think we could just call note_self_forwarding_removal_start in G1ParScanThreadState::handle_evacuation_failure_par(), when a region is added to the list of failed regions the first time instead.

Hamlin-Li · 2022-03-18T11:42:57Z

Thanks, I've moved the note_self_forwarding_removal_start to G1EvacFailureRegions::record

tschatzl

I will push the change through our testing again since so much time and so many changes happened since last time.

tschatzl · 2022-03-21T16:29:41Z

src/hotspot/share/gc/g1/g1EvacFailure.cpp

@@ -97,7 +97,7 @@ class RemoveSelfForwardPtrObjClosure {
      // explicitly and all objects in the CSet are considered
      // (implicitly) live. So, we won't mark them explicitly and
      // we'll leave them over NTAMS.
-      _cm->mark_in_next_bitmap(_worker_id, obj);
+      _cm->mark_in_next_bitmap_unconditionally(_worker_id, obj);


I think this change, the introduction of this method, is unnecessary after moving the update to the nTAMS into the G1EvacFailureRegions::record method.

tschatzl · 2022-03-22T08:09:00Z

I will push the change through our testing again since so much time and so many changes happened since last time.

Testing seems good.

Hamlin-Li · 2022-03-22T12:07:46Z

I will push the change through our testing again since so much time and so many changes happened since last time.

Testing seems good.

Thanks Thomas for reviewing and testing. :)
I'll update the patch soon.

tschatzl

This looks good to me, with some final cleanup comments. Apologies for taking a bit.

src/hotspot/share/gc/g1/g1EvacFailureRegions.cpp

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp

src/hotspot/share/gc/g1/g1HeapRegionChunk.hpp

Hamlin-Li · 2022-03-25T14:15:23Z

Thanks, it's fine :). I've just updated the patch as suggested.

src/hotspot/share/gc/g1/g1EvacFailure.cpp

src/hotspot/share/gc/g1/g1HeapRegionChunk.cpp

src/hotspot/share/gc/g1/g1YoungGCPostEvacuateTasks.cpp

src/hotspot/share/gc/g1/g1EvacFailure.hpp

src/hotspot/share/gc/g1/g1_globals.hpp

Hamlin-Li · 2022-04-11T13:30:30Z

Thanks for the detailed review, nice catch! I will the patch as suggested.

albertnetymk · 2022-04-12T06:34:40Z

As a followup to the explicit-loop topic raised before, here's a patch exploring that alternative.

Last commit of https://github.com/openjdk/jdk/compare/master...albertnetymk:explicit-loop?expand=1

It contains mostly two changes, explicit loop and chunking logic encapsulation.

I think the workflow is easier to follow in the explicit-loop approach, compared with iterator + closure.
The chunking logic is not inherent to (evac-failure) regions; instead of it's related to processing evac-failure regions. This way the evac-failure regions class is just a collection of regions, nothing more.

It can probably be further polished, but hopefully it illustrates the gist for now. What do you think?

Hamlin-Li · 2022-04-12T07:19:27Z

Not sure, do you mind me to do this refactoring in another PR?

albertnetymk · 2022-04-12T07:58:10Z

Since the chunking files/logic are added in this PR, I am leaned towards addressing them in the same PR if you agree the explicit-loop approach is cleaner/better.

bridgekeeper · 2022-05-10T16:53:28Z

@Hamlin-Li This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper · 2022-06-07T23:23:31Z

@Hamlin-Li This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

Hamlin-Li added 2 commits December 22, 2021 11:28

Initial commit

d0691df

Merge from master

e722833

openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Jan 12, 2022

Hamlin-Li mentioned this pull request Jan 14, 2022

8256265 G1: Improve parallelism in regions that failed evacuation #6627

Closed

3 tasks

Hamlin-Li added 5 commits January 21, 2022 09:20

Merge branch 'master' into parallelize-evac-failure-in-bm

59fe1a9

adapt to update_bot_if_crossing_boundary changes

892e0f1

use const G1CMBitMap

1fbedd4

sync region preparation before iterate through chunks in a region

ece04cf

clean vm options

320549c

Hamlin-Li marked this pull request as ready for review January 22, 2022 04:15

openjdk bot added the rfr Pull request is ready for review label Jan 22, 2022

Hamlin-Li added 3 commits January 27, 2022 20:43

Collect livewords in chunk closure

98fa355

Add logging code from Thomas

a78499a

Remove Prepare-Region and Wait-For-Ready phases

1003c77

prepare evacuation failure regions explicitly before post 1; fix rems…

b4f8f26

…et verification crash

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Feb 11, 2022

Hamlin-Li added 2 commits February 11, 2022 16:13

Sync marked words to HeapRegion; Fix compilation error on windows

76a26fe

Merge branch 'master' into parallelize-evac-failure-in-bm

85bb063

openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Feb 14, 2022

albertnetymk reviewed Mar 7, 2022

View reviewed changes

Albert review

ead7659

tschatzl suggested changes Mar 15, 2022

View reviewed changes

Hamlin-Li added 2 commits March 18, 2022 12:45

Thomas review

4efb91c

Fix compilation error

abd9993

Move prepare_regions to post cleanup 1 phase

f48aa05

tschatzl suggested changes Mar 21, 2022

View reviewed changes

Remove unnecessary method

391fdce

tschatzl suggested changes Mar 24, 2022

View reviewed changes

some cleanup

66fd5af

Merge branch 'master' into parallelize-evac-failure-in-bm

4b672e6

albertnetymk suggested changes Apr 6, 2022

View reviewed changes

Albert review

a26c92d

bridgekeeper bot closed this Jun 7, 2022

Hamlin-Li deleted the parallelize-evac-failure-in-bm branch February 27, 2024 08:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8256265: G1: Improve parallelism in regions that failed evacuation #7047

8256265: G1: Improve parallelism in regions that failed evacuation #7047

Hamlin-Li commented Jan 12, 2022 •

edited by openjdk bot

Loading

bridgekeeper bot commented Jan 12, 2022

openjdk bot commented Jan 12, 2022

mlbridge bot commented Jan 22, 2022 •

edited

Loading

tschatzl commented Jan 25, 2022

Hamlin-Li commented Jan 26, 2022

tschatzl commented Jan 26, 2022

Hamlin-Li commented Jan 27, 2022

tschatzl commented Jan 27, 2022 •

edited

Loading

Hamlin-Li commented Jan 28, 2022

openjdk bot commented Feb 11, 2022

Hamlin-Li commented Feb 14, 2022 •

edited

Loading

Hamlin-Li commented Mar 2, 2022

albertnetymk left a comment

Hamlin-Li commented Mar 18, 2022

tschatzl commented Mar 18, 2022 •

edited

Loading

Hamlin-Li commented Mar 18, 2022

tschatzl commented Mar 18, 2022

Hamlin-Li commented Mar 18, 2022

tschatzl left a comment

tschatzl Mar 21, 2022

tschatzl commented Mar 22, 2022

Hamlin-Li commented Mar 22, 2022

tschatzl left a comment

Hamlin-Li commented Mar 25, 2022

Hamlin-Li commented Apr 11, 2022

albertnetymk commented Apr 12, 2022

Hamlin-Li commented Apr 12, 2022

albertnetymk commented Apr 12, 2022

bridgekeeper bot commented May 10, 2022

bridgekeeper bot commented Jun 7, 2022

8256265: G1: Improve parallelism in regions that failed evacuation #7047

8256265: G1: Improve parallelism in regions that failed evacuation #7047

Conversation

Hamlin-Li commented Jan 12, 2022 • edited by openjdk bot Loading

Progress

Issue

Reviewing

bridgekeeper bot commented Jan 12, 2022

openjdk bot commented Jan 12, 2022

mlbridge bot commented Jan 22, 2022 • edited Loading

Webrevs

tschatzl commented Jan 25, 2022

Hamlin-Li commented Jan 26, 2022

tschatzl commented Jan 26, 2022

Hamlin-Li commented Jan 27, 2022

tschatzl commented Jan 27, 2022 • edited Loading

Hamlin-Li commented Jan 28, 2022

openjdk bot commented Feb 11, 2022

Hamlin-Li commented Feb 14, 2022 • edited Loading

Hamlin-Li commented Mar 2, 2022

albertnetymk left a comment

Choose a reason for hiding this comment

Hamlin-Li commented Mar 18, 2022

tschatzl commented Mar 18, 2022 • edited Loading

Hamlin-Li commented Mar 18, 2022

tschatzl commented Mar 18, 2022

Hamlin-Li commented Mar 18, 2022

tschatzl left a comment

Choose a reason for hiding this comment

tschatzl Mar 21, 2022

Choose a reason for hiding this comment

tschatzl commented Mar 22, 2022

Hamlin-Li commented Mar 22, 2022

tschatzl left a comment

Choose a reason for hiding this comment

Hamlin-Li commented Mar 25, 2022

Hamlin-Li commented Apr 11, 2022

albertnetymk commented Apr 12, 2022

Hamlin-Li commented Apr 12, 2022

albertnetymk commented Apr 12, 2022

bridgekeeper bot commented May 10, 2022

bridgekeeper bot commented Jun 7, 2022

Hamlin-Li commented Jan 12, 2022 •

edited by openjdk bot

Loading

mlbridge bot commented Jan 22, 2022 •

edited

Loading

tschatzl commented Jan 27, 2022 •

edited

Loading

Hamlin-Li commented Feb 14, 2022 •

edited

Loading

tschatzl commented Mar 18, 2022 •

edited

Loading