JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio #2760

Hamlin-Li · 2021-02-27T06:30:55Z

Summary

Improve G1 Full GC by skip compaction for regions with high survival ratio.

Backgroud

There are 4 steps in full gc of G1 GC.

mark live objects
prepare forwardee
adjust pointers
compact

When full gc occurs, there may be very high percentage of live bytes in some regions. For these regions, it's not efficient to compact them and better to skip them, as there are little space to save but many objects to copy.

Description

We enhance the full gc implementation for the above situation through following steps:

accumulate live bytes of every hr in mark phase; (already done by JDK-8263495)
skip adding regions with high survial ratio, and set the region with high survival ratio as pinned in _region_attr_table during prepare phase;
nothing special is done in adjust phase, regions with high survial ratio are skipped because of pin setting in the above step;
nothing special is done in compact phase, regions with high survival ratio are skipped because these regions are skipped when adding regions to compaction set in the prepare phase;

VM options related

MarkSweepDeadRatio: we reuse this exising vm option to indicate the high survial ratio threhold (100-MarkSweepDeadRatio) in G1.
- default value of MarkSweepDeadRatio: 5

Test

specjbb2015: no regression
dacapo: (Attachment is the dacapo h2 full gc pause.)
- 95% of full gc pauses: 10%-19% improvement.
- 5% of full gc pauses: 1.2% improvement.
- 0.1% of full gc pauses: -6.16% improvement.

$ java -Xmx1g -Xms1g -XX:ParallelGCThreads=4 -Xlog:gc*=info:file=gc.log -jar dacapo-9.12-bach.jar --iterations 5 --size huge --no-pre-iteration-gc h2

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio

Reviewers

Stefan Johansson (@kstefanj - Reviewer) ⚠️ Review applies to bea4567
Albert Mingkun Yang (@albertnetymk - Committer)

Contributors

Shoubing Ma <mashoubing1@huawei.com>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/2760/head:pull/2760
$ git checkout pull/2760

Update a local copy of the PR:
$ git checkout pull/2760
$ git pull https://git.openjdk.java.net/jdk pull/2760/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 2760

View PR using the GUI difftool:
$ git pr show -t 2760

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/2760.diff

…th high survival ratio

bridgekeeper · 2021-02-27T06:31:17Z

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2021-02-27T06:32:29Z

@Hamlin-Li The following label will be automatically applied to this pull request:

hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2021-02-27T06:36:01Z

Webrevs

13: Full - Incremental (5cca26b5)
12: Full - Incremental (63ab0f9c)
11: Full - Incremental (1f4ead8)
10: Full (bea4567)
09: Full - Incremental (a170e2f)
08: Full (7743249)
07: Full - Incremental (a28a93f)
06: Full - Incremental (e267bc3)
05: Full - Incremental (e798c78)
04: Full - Incremental (d7cf5fc859e5a8e53b0b50b1fddcc54255e30a17)
03: Full - Incremental (9956ab844755bb28ea9e93bd1f456e7d73779066)
02: Full - Incremental (c63a61406b6d9accfd58b40b247439abefe640ca)
01: Full - Incremental (74dfb000e07a918d7f3a05a1780dee09297d3207)
00: Full (4be6c18)

Hamlin-Li · 2021-02-27T07:08:43Z

/contributor add Shoubing Ma mashoubing1@huawei.com

openjdk · 2021-02-27T07:08:50Z

@Hamlin-Li
Contributor Shoubing Ma <mashoubing1@huawei.com> successfully added.

kstefanj · 2021-03-02T15:57:43Z

Hi Hamlin,

First of all, thanks for contributing.

Before doing a more in depth review of this change I have a few questions/suggestions:

This feature if quite similar to the "dead wood" feature already present for Serial/Parallel, and I agree with what's written in the JBS issue about trying to reuse the already present options. This way the feature would be enabled by default as well (since MarkSweepDeadRatio has a default value of 5) and I think that makes sense to make sure it gets proper testing. We can of course do something similar to what Parallel does, and tweak the value for G1 but I think it should be on by default.
G1 does liveness accounting during concurrent mark as well and there is already code doing more or less the same thing as the new G1FullGCMarkRegionCache class. Have you looked at G1RegionMarkStatsCache and is there any reason we can't reuse this?
Benchmarking the performance of the Full GC is often a bit problematic, because G1 actively tries to avoid Full GCs. When I implemented the parallel FullGC I wrote some small benchmarks. Simple uses-cases that I wanted to make sure the FullGC handled well. I dug those up and took them for a spin with your changes. The results are more or less as expected. For the use-cases with "full" regions we see a clear improvement and for almost all the other use-cases the results are in line with the baseline. But this one test showed a quite significant regression in the marking times. The test looks like this:

public class SystemGCLargeArray {
  public static Object[] holder;
  public static void main(String args[]) {
    holder = new Object[128 * 1024 *1024];
    System.gc();
  }
}

So we have a large array that the workers will scan for reference to other objects, the results on my workstation look like this:

Baseline: GC(0) Phase 1: Mark live objects 59,039ms
8262068:  GC(0) Phase 1: Mark live objects 136,231ms (G1SkipCompactionLiveBytesLowerThreshold=100)
8262068:  GC(0) Phase 1: Mark live objects 137,956ms (G1SkipCompactionLiveBytesLowerThreshold=95)

The results are quite surprising, because in the other benchmarks there is not a big diff in marking times. But looking a bit at the code and how the inlining decisions are made for G1FullGCMarker::mark_and_push(T* p) it looks like the function grows a bit to much which prevents some important inlining. To avoid this I moved the accounting to the end of mark_object() instead and this removed more or less the whole regression. Getting the compiler to inline exactly the right things is hard, but this is something to keep in mind when getting strange regressions in the GC.

So before looking closer at the code I would like to see if we can reuse G1RegionMarkStatsCache and change to use the existing flag. What do you think about that?

Thanks,
Stefan

Hamlin-Li · 2021-03-03T08:06:37Z

Hi Stefan , Thanks a lot for detailed review and benchmark!
Sure, I will update the patch as you suggested later.

…s; fix regression in Mark phase by inlining live words collection into mark_object()

Hamlin-Li · 2021-03-05T02:57:11Z

Hi Stefan,

There is jdk/tier1 test failure on MaxOS, it passed on other platforms: https://github.com/openjdk/jdk/pull/2760/checks?check_run_id=2033359984.
Rerun passed: https://github.com/Hamlin-Li/jdk/actions/runs/622899539

kstefanj · 2021-03-05T08:41:58Z

Even if the re-run passes this seems to be a real problem, probably just a bit intermittent. Looking at the test output you can see there is a segmentation fault in:
G1FullGCPrepareTask::G1CalculatePointersClosure::prepare_for_skipping_compaction(...)

Running the test with debug builds might make the issue reproduce on more platforms and give hints on what the problem is.

Hope this helps,
Stefan

Hamlin-Li · 2021-03-05T09:27:30Z

Thanks Stefan, we're investigating. (currently, we don't have mac env)
BTW, before it's reproduced locally, is there any way to get more information from the online test result? I can only find the "View raw logs".

kstefanj · 2021-03-05T09:35:56Z

I don't know for sure, I tried downloading the "log archive" but it did not include the hs_err-file which would have been very useful. I can kick of an internal run to see if it reproduces here.

kstefanj · 2021-03-05T14:29:55Z

I manage to get a crash locally using a fastdebug build on Linux x64. I haven't had time to look at the error in detail, but it is the same crash. It happened in different test so just running a lot of testing should trigger it in your environment as well.

Hamlin-Li · 2021-03-06T10:02:35Z

Hi Stefan, Thanks a lot for the information, It's very helpful!

…ion when klass of dead objects is unloaded; other misc improvements.

Hamlin-Li · 2021-03-11T01:05:31Z

HI Stefan,

Seems we have fixed the previous crashes.
Would you mind to help to review the fixes and the whole change set when available? Thanks.
Summarize the previous 2 commits as below.

e267bc3:

fix the crash when get object size which is caused by class unloading, by using "_bitmap->get_next_marked_addr(next_addr, limit);" rather than call "obj->size();" directly.
some misc improvement when initializing G1RegionMarkStatsCache.
a28a93f:
fix the bot crash by cross_threshold.
fix initialization issue of G1RegionMarkStatsCache, and assert issue in previous commit e267bc3.

tschatzl

The change should support MarkSweepAlwaysCompactCount (just adapt the threshold).

The "last ditch" collection should also fully compact, as well as probably a System.gc() call. I do not know the exact rules for the other collectors right now.

Please update the summary with these (and the earlier mentioned) requirements and approach.

src/hotspot/share/gc/g1/g1FullGCMarker.cpp

src/hotspot/share/gc/g1/g1FullGCMarker.hpp

src/hotspot/share/gc/g1/g1FullCollector.hpp

tschatzl · 2021-03-11T15:15:06Z

src/hotspot/share/gc/g1/g1FullCollector.hpp

@@ -87,6 +90,10 @@ class G1FullCollector : StackObj {
  uint                     workers() { return _num_workers; }
  G1FullGCMarker*          marker(uint id) { return _markers[id]; }
  G1FullGCCompactionPoint* compaction_point(uint id) { return _compaction_points[id]; }
+  GrowableArray<HeapRegion*>* skipping_compaction_set(uint id) { return _skipping_compaction_sets[id]; }
+  size_t live_bytes_after_full_gc_mark(uint region_idx) {
+    return MarkSweepDeadRatio > 0 ? _live_stats[region_idx]._live_words * HeapWordSize : 0;


Not sure if the additional MarkSweepDeadRatio > 0 check is necessary or useful at all. In case MarkSweepDeadRatio is zero, _hr_live_bytes_threshold is larger than whatever is stored in here anyway.

Also, please drop the * HeapWordSize and compare to a _hr_live_word_threshold. This multiplication is completely unnecessary from what I can tell. (It does not really hurt either, but why would I want to multiply both sides of your comparison by the same value before the comparison?)

Also just liveness() (or name this and the equivalent in G1ConcurrentMark something like live_words) is sufficient.

tschatzl · 2021-03-11T15:16:59Z

src/hotspot/share/gc/g1/g1FullGCCompactTask.cpp

@@ -91,6 +99,15 @@ void G1FullGCCompactTask::work(uint worker_id) {
    compact_region(*it);
  }

+  if (MarkSweepDeadRatio > 0) {


This check is useless. If MarkSweepDeadRatio == 0, the skipping_compaction_queue is empty anyway (and the check whether the GrowableArray is empty is as fast). You could certainly assert that if MarkSweepDeadRatio == 0, then skipping_compaction_queue must be empty too.

src/hotspot/share/gc/g1/g1FullGCMarker.cpp

src/hotspot/share/gc/g1/g1FullGCMarker.hpp

src/hotspot/share/gc/g1/g1FullGCMarker.inline.hpp

tschatzl · 2021-03-11T15:21:11Z

src/hotspot/share/gc/g1/g1FullGCPrepareTask.cpp

-    prepare_for_compaction(hr);
+    assert(!hr->is_humongous(), "humongous objects not supported.");
+    size_t live_bytes = _collector->live_bytes_after_full_gc_mark(hr->hrm_index());
+    if(live_bytes <= _hr_live_bytes_threshold) {


Missing space after the if.

tschatzl · 2021-03-11T15:27:39Z

src/hotspot/share/gc/g1/g1FullGCPrepareTask.cpp

+void G1FullGCPrepareTask::G1CalculatePointersClosure::prepare_for_skipping_compaction(HeapRegion* hr) {
+  HeapRegion* current = hr;
+  HeapWord* limit = current->top();
+  HeapWord* next_addr = current->bottom();
+  HeapWord* live_end = current->bottom();
+  _skipping_compaction_set->append(current);
+  HeapWord* threshold = current->initialize_threshold();
+  HeapWord* pre_addr;
+
+  while (next_addr < limit) {
+    Prefetch::write(next_addr, PrefetchScanIntervalInBytes);
+    pre_addr = next_addr;
+
+    if (_bitmap->is_marked(next_addr)) {
+      oop obj = oop(next_addr);
+      size_t obj_size = obj->size();
+      // Object should not move but mark-word is used so it looks like the
+      // object is forwarded. Need to clear the mark and it's no problem
+      // since it will be restored by preserved marks. There is an exception
+      // with BiasedLocking, in this case forwardee() will return NULL
+      // even if the mark-word is used. This is no problem since
+      // forwardee() will return NULL in the compaction phase as well.
+      if (obj->forwardee() != NULL) {
+        obj->init_mark();
+      }
+
+      next_addr += obj_size;
+      // update live byte range end
+      live_end = next_addr;
+    } else {
+      next_addr = _bitmap->get_next_marked_addr(next_addr, limit);
+      assert(next_addr > live_end, "next_addr must be bigger than live_end");
+      assert(next_addr == limit || _bitmap->is_marked(next_addr), "next_addr is the limit or is marked");
+      // fill dummy object to replace dead range
+      Universe::heap()->fill_with_dummy_object(live_end, next_addr, true);
+    }
+
+    if (next_addr > threshold) {
+      threshold = current->cross_threshold(pre_addr, next_addr);
+    }
+  }
+  assert(next_addr == limit, "Should stop the scan at the limit.");
+}
+


I think all this code is not required/duplicate of the existing code that handles pinned regions (added in https://bugs.openjdk.java.net/browse/JDK-8253600). Please have a look and see if that code can be reused/repurposed.

I mean, these "skipped regions" should be equivalent to "pinned" regions wrt to required object handling. I admit I haven't actually looked how to do this, but this seems awfully similar with the only difference that skipped regions are determined only after marking and pinned regions sometimes before (full) gc.

The other handling should be the same. So at worst temporarily marking these regions as "pinned" should be sufficient for this functionality to be able to reuse the mechanism.

After further investigation, I think so too.

mlbridge · 2021-03-12T08:15:07Z

Mailing list message from Thomas Schatzl on hotspot-gc-dev:

Hi,

On 12.03.21 06:13, Hamlin Li wrote:

On Thu, 11 Mar 2021 15:19:36 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:

fix bot crash.

src/hotspot/share/gc/g1/g1FullGCMarker.hpp line 104:

102:
103: void flush_mark_region_cache() {
104: if (MarkSweepDeadRatio > 0) {

Drop this check to make the code more straightforward. Other code like in PR #2579 might find this information useful too. Maybe this could even be factored out in a separate CR.

Hi Thomas,
Do you mean to collect liveness info in mark phase when G1 full gc, even if MarkSweepDeadRatio is == 0 (which means do not skip any regions when compaction)? If this is the request, I can file a new bug for this liveness collection in G1 full gc.

Yes, please split out the liveness info gathering in the mark phase into
a separate CR. This is what PR #2579 needs.
StefanJ and me were discussing adding this already when implementing the
G1 parallel full gc, but due to lack of users we refrained from that..

For the jfr PR #2579 , I think it's can be addressed in a separate bug which depends on the one for liveness collection.
I will file/work these 2 new bugs (liveness collection when g1 full gc, jfr liveness event after full gc) if this is what you suggested.

PR #2579 is about the JFR liveness event. Jbachorik (the author of that
PR) could just reuse the information gathered in that new CR then.
Please coordinate with him if needed, but I do not think you
specifically needs to worry about the JFR event, I assume that jbachorik
will likely be happy to be able to add the plumbing for the JFR event then.

I cc'ed him (using some email address from the jfr-dev mailing list, I
hope I got it right).

Thanks,
Thomas

mlbridge · 2021-03-12T08:45:06Z

Mailing list message from Hamlin on hotspot-gc-dev:

? 2021/3/12 16:12, Thomas Schatzl ??:

Hi,

On 12.03.21 06:13, Hamlin Li wrote:

On Thu, 11 Mar 2021 15:19:36 GMT, Thomas Schatzl
<tschatzl at openjdk.org> wrote:

Hamlin Li has updated the pull request incrementally with one
additional commit since the last revision:

?? fix bot crash.

src/hotspot/share/gc/g1/g1FullGCMarker.hpp line 104:

102:
103:?? void flush_mark_region_cache() {
104:???? if (MarkSweepDeadRatio > 0) {

Drop this check to make the code more straightforward. Other code
like in PR #2579 might find this information useful too. Maybe this
could even be factored out in a separate CR.

Hi Thomas,
Do you mean to collect liveness info in mark phase when G1 full gc,
even if MarkSweepDeadRatio is == 0 (which means do not skip any
regions when compaction)? If this is the request, I can file a new
bug for this liveness collection in G1 full gc.

Yes, please split out the liveness info gathering in the mark phase
into a separate CR. This is what PR #2579 needs.
StefanJ and me were discussing adding this already when implementing
the G1 parallel full gc, but due to lack of users we refrained from
that..

Thanks for confirmation, I have created the issue:
https://bugs.openjdk.java.net/browse/JDK-8263495.

For the jfr PR #2579 , I think it's can be addressed in a separate
bug which depends on the one for liveness collection.
I will file/work these 2 new bugs (liveness collection when g1 full
gc, jfr liveness event after full gc) if this is what you suggested.

PR #2579 is about the JFR liveness event. Jbachorik (the author of
that PR) could just reuse the information gathered in that new CR
then. Please coordinate with him if needed, but I do not think you
specifically needs to worry about the JFR event, I assume that
jbachorik will likely be happy to be able to add the plumbing for the
JFR event then.

I cc'ed him (using some email address from the jfr-dev mailing list, I
hope I got it right).

Got it. Just let me know if any assitant is needed from me.

Thanks,

- Hamlin

Thanks,
? Thomas

tschatzl · 2021-03-18T12:39:54Z

After a short look I think the only changes that are needed to get a region to be considered pinned in the g1 full gc is in G1FullCollector::update_attribute_table() to set these regions that are too full as "pinned" for that collection.

There are two places in G1FullCollector where that attribute table isn't used yet, but the code uses HeapRegion::is_pinned() directly (search for r->is_pinned() in the full gc files, I think both are located in g1FullGCCompactTask.cpp). These should be rerouted to that table I think.

After that change everything should basically be still working.

No guarantees though.

Hamlin-Li · 2021-03-18T14:22:49Z

Thanks for the suggestion, will try it.
For "last-ditch" issue, we plan to re-use some flag/logic in soft-ref, would like to hear your suggestion in advance if it's convenient for you.

openjdk · 2021-03-26T20:18:59Z

@Hamlin-Li This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio

Co-authored-by: Shoubing Ma <mashoubing1@huawei.com>
Reviewed-by: sjohanss, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 215 new commits pushed to the master branch:

cb2806d: 8265018: [AIX] FileDispatcherImpl.c:31:10: fatal error: 'sys/mount.h' file not found
ecef1fc: 8264972: Unused TypeFunc declared in OptoRuntime
440c34a: 8264644: Add PrintClassLoaderDataGraphAtExit to print the detailed CLD graph
b1ebf82: 8264358: Don't create invalid oop in method handle tracing
627ad9f: 8262328: Templatize JVMFlag boilerplate access methods
c15680e: 8264868: Reduce inclusion of registerMap.hpp and register.hpp
5784f6b: 8264948: Check for TLS extensions total length
42f4d70: 8264649: runtime/InternalApi/ThreadCpuTimesDeadlock.java crash in fastdebug C2 with -XX:-UseTLAB
76bd313: 8264872: Dependencies: Migrate to PerfData counters
07c8ff4: 8264871: Dependencies: Miscellaneous cleanups in dependencies.cpp
... and 205 more: https://git.openjdk.java.net/jdk/compare/f69afba52735008613f0ede7d650372e95e9a6e0...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Hamlin-Li · 2021-03-27T06:07:18Z

Thanks Stefan, have a great vacation!

tschatzl

I will run this through internal testing before approving.

tschatzl · 2021-03-29T14:50:25Z

src/hotspot/share/gc/g1/g1FullGCPrepareTask.cpp

+  if (live_words <= live_words_threshold) {
+    return true;
+  }
+  // High live ratio region will not be compacted.
+  return false;


return live_words <= live_words_thresholds should be sufficient here.

Hamlin-Li · 2021-03-30T02:50:25Z

Thanks Thomas, sure, will hold until your test finished.
At the same time, we are also run perf tests to make sure the good performance of this final version.

tschatzl · 2021-03-30T08:42:48Z

Thanks Thomas, sure, will hold until your test finished.
At the same time, we are also run perf tests to make sure the good performance of this final version.

gc/g1/TestEagerReclaimHumongousRegionsClearMarkBits.java fails with a fairly unknown error every few runs with this change:

[26.103s][info][gc,start] GC(112) Pause Full (G1 Evacuation Pause)
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/g1BlockOffsetTable.cpp:358
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (.../src/hotspot/share/gc/g1/g1BlockOffsetTable.cpp:358), pid=74345, tid=74370
#  guarantee(backskip <= max_backskip) failed: Going backwards beyond the start_card. start_card: 225280 current_card: 225281 backskip: 256

There is a strong likelihood that this is a pre-existing issue and not directly caused by this change. I will need to investigate this.

tschatzl · 2021-03-30T09:58:29Z

The reason for this crash is that if there is a young region that is not compacted (because it's mostly full), its BOT (block offset table) is not updated to the extent the verification expects it.

I.e. that verification expects the BOT for old gen regions is completely valid, from start to end of the region. For young regions this is not the case, their BOT is not updated at all (which is normal), just containing a marker indicating that there is no BOT; compaction would take care of this.

The verification does not respect this "after this point the BOT is invalid" marker. There is actually some code in heapRegion.cpp:724 that intentionally skips young region verification for this purpose. Now that they might be old, this does not work. I think the correct fix is to not try to verify beyond this marker, i.e. change the assignment to end_card in G1BlockOffsetTablePart::verify accordingly.

Another solution would be to make the region appear like filled with a single large allocation (TLAB).

This is purely to keep the verification happy. All other code correctly handles a partially valid BOT (i.e. valid up to and including that mentioned marker).

I will try out these options.

Hamlin-Li · 2021-03-30T12:43:31Z

Hi Thomas, Thank you so much for helping investigate the issue, I will check it tomorrow too.

tschatzl · 2021-03-30T14:07:15Z

Here is a potential fix for this. This needs some discussion on whether an alternatives would be better or not; more of them could be:

just initialize a dummy BOT for not-compacted young region
exclude young regions from this dead-wood scheme
In any case this fix should probably go in separately.

Hamlin-Li · 2021-03-31T09:56:52Z

I think the first solution is better. Reasons are related to performance:

any way, the bot will be rebuild for this young region ( if this young region is compacted, bot will be rebuild when cross_threshold; )
1st solution avoid the copying of this young region.

Hamlin-Li · 2021-03-31T09:59:51Z

Here is a potential fix for this.

Thanks for the fix.

In any case this fix should probably go in separately.

Do you mean we push this pr, then discuss the fix in another thread? If yes, I will file a bug to track the issue and initialize discussion on this bug.

tschatzl · 2021-04-06T14:01:15Z

This change should go in after PR#3356 which fixes this issue. I think it is a change that is worth pointing out and discussing separately - as you might see from the long description.

Hamlin-Li · 2021-04-07T01:14:35Z

I see, Thanks Thomas!

albertnetymk · 2021-04-07T09:31:23Z

src/hotspot/share/gc/g1/g1FullCollector.hpp

@@ -95,11 +98,13 @@ class G1FullCollector : StackObj {
  G1FullGCCompactionPoint* serial_compaction_point() { return &_serial_compaction_point; }
  G1CMBitMap*              mark_bitmap();
  ReferenceProcessor*      reference_processor();
+  size_t                   live_words(uint region_index) { return _live_stats[region_index]._live_words; }


Could you add range check assertion for region_index? The extra spaces can be removed; no need to align with previous methods.

albertnetymk · 2021-04-07T09:33:48Z

src/hotspot/share/gc/g1/heapRegion.hpp

@@ -171,7 +171,7 @@ class HeapRegion : public CHeapObj<mtGC> {
  // Update heap region that has been compacted to be consistent after Full GC.
  void reset_compacted_after_full_gc();
  // Update pinned heap region (not compacted) to be consistent after Full GC.
-  void reset_pinned_after_full_gc();
+  void reset_not_compacted_after_full_gc();


Now that this version uses the existing "pinned" mechanism to skip high-live-ratio regions, this method can retain its original name/implementation, right?

That is true, full gc reuses the mechanism originally implemented for pinned regions. However "pinned" is something very specific in G1 context so I think it is better to use a different, more generic name. The regions that are not compacted (and not yet pinned) are not really temporarily pinned.

There has been an earlier discussion (not sure if here in this PR) to actually rename the use of "pinned" in G1 full gc to use "not compacted" too, resulting in a CR to rename this (in JDK-8264423).

However I was going to wait for this change to go in before sending out a PR.

Also this "not_compacted" name better matches the "compacted" method name above.

So I would prefer to keep this as is.

I see; agree.

albertnetymk · 2021-04-07T09:35:19Z

src/hotspot/share/gc/g1/g1FullCollector.cpp

-  if (hr->is_free()) {
+void G1FullCollector::update_attribute_table(HeapRegion* hr, bool force_pinned) {
+  if (force_pinned) {
+    _region_attr_table.set_pinned(hr->hrm_index());
    return;
  }
  if (hr->is_closed_archive()) {
    _region_attr_table.set_closed_archive(hr->hrm_index());
  } else if (hr->is_pinned()) {


Changing the condition to hr->is_pinned() || force_pinned is enough for this method, right?

not exactly. please check previous discussion at #2760 (comment).
I will merge the 2 conditions as you suggested.

tschatzl · 2021-04-07T10:33:32Z

src/hotspot/share/gc/g1/g1FullCollector.cpp

@@ -225,15 +228,17 @@ void G1FullCollector::complete_collection() {
  _heap->print_heap_after_full_collection(scope()->heap_transition());
 }

-void G1FullCollector::update_attribute_table(HeapRegion* hr) {
-  if (hr->is_free()) {


Another item that has been noted in a recent discussion with @albertnetymk is that with this change "Free" regions are also marked as normal in the table. It would be better to keep them as "Invalid".

I.e. something like (incorporating @albertnetymk other suggestion):

if (hr->is_free()) { return; } else if (hr->is_closed_archive(...) { [...] } else if (hr->is_pinned() || force_pinned) { [...] } else { [...] }

There is no real difference as "Free" regions should never be referenced anywhere and the code should assert elsewhere. It's still nice to also have "Free" regions as Invalid in that table though.

not exactly. please check previous discussion at #2760 (comment).

Some more investigation and discussion about the BOT handling showed that we need to update the BOT for these Survivor-turned-to-Old regions after all.

The reason is that contrary to what I thought, while BOT can handle queries for object start addresses above the "last known valid entry" (materialized in _next_offset_threshold and _next_offset_index ) mentioned in PR#3356, it only does so slowly, and while updating the BOT itself, not updating that "last known valid entry".
So every time it queries for an object start in such regions, it starts walking from the bottom of that region.

See the call chain G1BlockOffsetTablePart::block_start -> forward_to_block_containing_addr -> forward_to_block_containing_addr_slow where the call to alloc_block_work in g1BlockOffsetTable.cpp:236 only updates local boundary and index (not _next_offset_threshold and _next_offset_index).

This is a problem for young gcs as this behavior will make them slower than expected. Since it is impossible to make the updates to both _next_offset_threshold and _next_offset_index atomic, and another issue anyway; feel free to file an issue) I would prefer to penalize full gc (that is, keep old behavior) for that.

The alternative would be to just alloc_block the whole region (so that next_offset_index and friend are at the top), but I think given the rarity of this case and full gc in general it is better to do the extra work in the full gc.

Could you add code that walks such full young regions and does the cross_threshold thing? This additional code certainly does not need to actually compact these regions.

Thanks,
Thomas

Thanks for discussion! I had the same concern (BOTs of these Survivor-turned-to-Old regions contains no useful info, so might make it very slow when finding block start in these regions), but I was not sure if there is some "lazy" mechanism to fill valid BOTs info for these regions when they are accessed subsequently. I just not have time to do further investigation, now I got the answer from you.

Sure I will add code to fill valid BOTs info for these young regions by the end of full gc.
I'm not sure if it's OK for me to do this BOTs filling action in a separate issue? As we already had such a long discussion, I think it's might be better for us to initialize another discussion for this specific follow-up issue. Please kindly let me know your thoughts.

It is fine with me to fix this separately.

There is some lazy mechanism to fill this BOT, but it does not work in this case (which is the gist of what I said above).

Note that if that were fixed, I think PR #3356 would not be needed - because then young region BOTs have the expected contents :) Your call.

Thanks Thomas!
Although I'm not sure if pr #3356 is still necessary. But sure, I will add code to fill BOTs for "young" regions as a follow-up fix/enhanacement of this issue, I have just created an bug to track it https://bugs.openjdk.java.net/browse/JDK-8264987.
So after this pr is approved and integrated I will initialize the pr for JDK-8264987.

Hamlin-Li · 2021-04-12T11:01:21Z

@tschatzl Hi Thomas, I'm not sure if I'm ready to integrate this patch, would you mind to help confirm, or give some further comments? Thanks

tschatzl · 2021-04-12T12:27:50Z

Ready to integrate.

Hamlin-Li · 2021-04-12T12:39:10Z

Thanks @tschatzl @kstefanj @albertnetymk for reviewing!

Hamlin-Li · 2021-04-12T12:40:06Z

/integrate

openjdk · 2021-04-12T12:41:41Z

@Hamlin-Li Since your change was applied there have been 218 commits pushed to the master branch:

f71be8b: 8264954: unified handling for VectorMask object re-materialization during de-optimization
3c9858d: 8264827: Large mapped buffer/segment crash the VM when calling isLoaded
e604320: 8264783: G1 BOT verification should not verify beyond allocation threshold
cb2806d: 8265018: [AIX] FileDispatcherImpl.c:31:10: fatal error: 'sys/mount.h' file not found
ecef1fc: 8264972: Unused TypeFunc declared in OptoRuntime
440c34a: 8264644: Add PrintClassLoaderDataGraphAtExit to print the detailed CLD graph
b1ebf82: 8264358: Don't create invalid oop in method handle tracing
627ad9f: 8262328: Templatize JVMFlag boilerplate access methods
c15680e: 8264868: Reduce inclusion of registerMap.hpp and register.hpp
5784f6b: 8264948: Check for TLS extensions total length
... and 208 more: https://git.openjdk.java.net/jdk/compare/f69afba52735008613f0ede7d650372e95e9a6e0...master

Your commit was automatically rebased without conflicts.

Pushed as commit be0d46c.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

JDK-8262068: Improve G1 Full GC by skipping compaction for regions wi…

4be6c18

…th high survival ratio

openjdk bot added the rfr Pull request is ready for review label Feb 27, 2021

openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Feb 27, 2021

Hamlin-Li force-pushed the g1-full-gc-optimization-00 branch 3 times, most recently from 9956ab8 to d7cf5fc Compare March 4, 2021 12:43

reuse vm option MarkSweepDeadRatio; reuse G1RegionMarkStatsCache clas…

e798c78

…s; fix regression in Mark phase by inlining live words collection into mark_object()

Hamlin-Li force-pushed the g1-full-gc-optimization-00 branch from d7cf5fc to e798c78 Compare March 4, 2021 14:12

Hamlin-Li added 2 commits March 8, 2021 22:18

fix crash in G1CalculatePointersClosure::prepare_for_skipping_compact…

e267bc3

…ion when klass of dead objects is unloaded; other misc improvements.

fix bot crash.

a28a93f

tschatzl suggested changes Mar 11, 2021

View reviewed changes

tschatzl mentioned this pull request Mar 11, 2021

8258431: Provide a JFR event with live set size estimate #2579

Closed

3 tasks

openjdk bot added the ready Pull request is ready to be integrated label Mar 26, 2021

refine the code.

1f4ead8

tschatzl reviewed Mar 29, 2021

View reviewed changes

minor code improvement.

63ab0f9

tschatzl mentioned this pull request Apr 6, 2021

8264783: G1 BOT verification should not verify beyond allocation threshold #3356

Closed

3 tasks

albertnetymk suggested changes Apr 7, 2021

View reviewed changes

tschatzl reviewed Apr 7, 2021

View reviewed changes

add sanity check; refine code in update_attribute_table.

5cca26b

albertnetymk approved these changes Apr 12, 2021

View reviewed changes

openjdk bot closed this Apr 12, 2021

openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 12, 2021

Hamlin-Li deleted the g1-full-gc-optimization-00 branch April 13, 2021 09:50

JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio #2760

JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio #2760

Conversation

Hamlin-Li commented Feb 27, 2021 • edited by openjdk bot Loading

Summary

Backgroud

Description

VM options related

Test

Progress

Issue

Reviewers

Contributors

Reviewing

bridgekeeper bot commented Feb 27, 2021

openjdk bot commented Feb 27, 2021

mlbridge bot commented Feb 27, 2021 • edited Loading

Webrevs

Hamlin-Li commented Feb 27, 2021

openjdk bot commented Feb 27, 2021

kstefanj commented Mar 2, 2021

Hamlin-Li commented Mar 3, 2021

Hamlin-Li commented Mar 5, 2021

kstefanj commented Mar 5, 2021

Hamlin-Li commented Mar 5, 2021

kstefanj commented Mar 5, 2021

kstefanj commented Mar 5, 2021

Hamlin-Li commented Mar 6, 2021

Hamlin-Li commented Mar 11, 2021

tschatzl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tschatzl Mar 11, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlbridge bot commented Mar 12, 2021

mlbridge bot commented Mar 12, 2021

tschatzl commented Mar 18, 2021

Hamlin-Li commented Mar 18, 2021

openjdk bot commented Mar 26, 2021 • edited Loading

Hamlin-Li commented Mar 27, 2021

tschatzl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hamlin-Li commented Mar 30, 2021

tschatzl commented Mar 30, 2021

tschatzl commented Mar 30, 2021

Hamlin-Li commented Mar 30, 2021

tschatzl commented Mar 30, 2021

Hamlin-Li commented Mar 31, 2021

Hamlin-Li commented Mar 31, 2021 • edited Loading

tschatzl commented Apr 6, 2021

Hamlin-Li commented Apr 7, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tschatzl Apr 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tschatzl Apr 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tschatzl Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Hamlin-Li commented Apr 12, 2021

tschatzl commented Apr 12, 2021

Hamlin-Li commented Apr 12, 2021

Hamlin-Li commented Apr 12, 2021

openjdk bot commented Apr 12, 2021

Hamlin-Li commented Feb 27, 2021 •

edited by openjdk bot

Loading

mlbridge bot commented Feb 27, 2021 •

edited

Loading

tschatzl Mar 11, 2021 •

edited

Loading

openjdk bot commented Mar 26, 2021 •

edited

Loading

Hamlin-Li commented Mar 31, 2021 •

edited

Loading

tschatzl Apr 7, 2021 •

edited

Loading

tschatzl Apr 8, 2021 •

edited

Loading

tschatzl Apr 9, 2021 •

edited

Loading