Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio #2760

Closed
wants to merge 10 commits into from

Conversation

Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Feb 27, 2021

Summary

Improve G1 Full GC by skip compaction for regions with high survival ratio.

Backgroud

There are 4 steps in full gc of G1 GC.

  • mark live objects
  • prepare forwardee
  • adjust pointers
  • compact

When full gc occurs, there may be very high percentage of live bytes in some regions. For these regions, it's not efficient to compact them and better to skip them, as there are little space to save but many objects to copy.

Description

We enhance the full gc implementation for the above situation through following steps:

  • accumulate live bytes of every hr in mark phase; (already done by JDK-8263495)
  • skip adding regions with high survial ratio, and set the region with high survival ratio as pinned in _region_attr_table during prepare phase;
  • nothing special is done in adjust phase, regions with high survial ratio are skipped because of pin setting in the above step;
  • nothing special is done in compact phase, regions with high survival ratio are skipped because these regions are skipped when adding regions to compaction set in the prepare phase;

VM options related

  • MarkSweepDeadRatio: we reuse this exising vm option to indicate the high survial ratio threhold (100-MarkSweepDeadRatio) in G1.
    • default value of MarkSweepDeadRatio: 5

Test

  • specjbb2015: no regression
  • dacapo: (Attachment is the dacapo h2 full gc pause.)
    • 95% of full gc pauses: 10%-19% improvement.
    • 5% of full gc pauses: 1.2% improvement.
    • 0.1% of full gc pauses: -6.16% improvement.

$ java -Xmx1g -Xms1g -XX:ParallelGCThreads=4 -Xlog:gc*=info:file=gc.log -jar dacapo-9.12-bach.jar --iterations 5 --size huge --no-pre-iteration-gc h2


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio

Reviewers

Contributors

  • Shoubing Ma <mashoubing1@huawei.com>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/2760/head:pull/2760
$ git checkout pull/2760

Update a local copy of the PR:
$ git checkout pull/2760
$ git pull https://git.openjdk.java.net/jdk pull/2760/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 2760

View PR using the GUI difftool:
$ git pr show -t 2760

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/2760.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 27, 2021

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 27, 2021
@openjdk
Copy link

openjdk bot commented Feb 27, 2021

@Hamlin-Li The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Feb 27, 2021
@mlbridge
Copy link

mlbridge bot commented Feb 27, 2021

Webrevs

@Hamlin-Li
Copy link
Author

/contributor add Shoubing Ma mashoubing1@huawei.com

@openjdk
Copy link

openjdk bot commented Feb 27, 2021

@Hamlin-Li
Contributor Shoubing Ma <mashoubing1@huawei.com> successfully added.

@kstefanj
Copy link
Contributor

kstefanj commented Mar 2, 2021

Hi Hamlin,

First of all, thanks for contributing.

Before doing a more in depth review of this change I have a few questions/suggestions:

  1. This feature if quite similar to the "dead wood" feature already present for Serial/Parallel, and I agree with what's written in the JBS issue about trying to reuse the already present options. This way the feature would be enabled by default as well (since MarkSweepDeadRatio has a default value of 5) and I think that makes sense to make sure it gets proper testing. We can of course do something similar to what Parallel does, and tweak the value for G1 but I think it should be on by default.
  2. G1 does liveness accounting during concurrent mark as well and there is already code doing more or less the same thing as the new G1FullGCMarkRegionCache class. Have you looked at G1RegionMarkStatsCache and is there any reason we can't reuse this?
  3. Benchmarking the performance of the Full GC is often a bit problematic, because G1 actively tries to avoid Full GCs. When I implemented the parallel FullGC I wrote some small benchmarks. Simple uses-cases that I wanted to make sure the FullGC handled well. I dug those up and took them for a spin with your changes. The results are more or less as expected. For the use-cases with "full" regions we see a clear improvement and for almost all the other use-cases the results are in line with the baseline. But this one test showed a quite significant regression in the marking times. The test looks like this:
public class SystemGCLargeArray {
  public static Object[] holder;
  public static void main(String args[]) {
    holder = new Object[128 * 1024 *1024];
    System.gc();
  }
}

So we have a large array that the workers will scan for reference to other objects, the results on my workstation look like this:

Baseline: GC(0) Phase 1: Mark live objects 59,039ms
8262068:  GC(0) Phase 1: Mark live objects 136,231ms (G1SkipCompactionLiveBytesLowerThreshold=100)
8262068:  GC(0) Phase 1: Mark live objects 137,956ms (G1SkipCompactionLiveBytesLowerThreshold=95)

The results are quite surprising, because in the other benchmarks there is not a big diff in marking times. But looking a bit at the code and how the inlining decisions are made for G1FullGCMarker::mark_and_push(T* p) it looks like the function grows a bit to much which prevents some important inlining. To avoid this I moved the accounting to the end of mark_object() instead and this removed more or less the whole regression. Getting the compiler to inline exactly the right things is hard, but this is something to keep in mind when getting strange regressions in the GC.

So before looking closer at the code I would like to see if we can reuse G1RegionMarkStatsCache and change to use the existing flag. What do you think about that?

Thanks,
Stefan

@Hamlin-Li
Copy link
Author

Hi Stefan , Thanks a lot for detailed review and benchmark!
Sure, I will update the patch as you suggested later.

@Hamlin-Li Hamlin-Li force-pushed the g1-full-gc-optimization-00 branch 3 times, most recently from 9956ab8 to d7cf5fc Compare March 4, 2021 12:43
…s; fix regression in Mark phase by inlining live words collection into mark_object()
@Hamlin-Li
Copy link
Author

Hi Stefan,

There is jdk/tier1 test failure on MaxOS, it passed on other platforms: https://github.com/openjdk/jdk/pull/2760/checks?check_run_id=2033359984.
Rerun passed: https://github.com/Hamlin-Li/jdk/actions/runs/622899539

@kstefanj
Copy link
Contributor

kstefanj commented Mar 5, 2021

Even if the re-run passes this seems to be a real problem, probably just a bit intermittent. Looking at the test output you can see there is a segmentation fault in:
G1FullGCPrepareTask::G1CalculatePointersClosure::prepare_for_skipping_compaction(...)

Running the test with debug builds might make the issue reproduce on more platforms and give hints on what the problem is.

Hope this helps,
Stefan

@Hamlin-Li
Copy link
Author

Thanks Stefan, we're investigating. (currently, we don't have mac env)
BTW, before it's reproduced locally, is there any way to get more information from the online test result? I can only find the "View raw logs".

@kstefanj
Copy link
Contributor

kstefanj commented Mar 5, 2021

I don't know for sure, I tried downloading the "log archive" but it did not include the hs_err-file which would have been very useful. I can kick of an internal run to see if it reproduces here.

@kstefanj
Copy link
Contributor

kstefanj commented Mar 5, 2021

I manage to get a crash locally using a fastdebug build on Linux x64. I haven't had time to look at the error in detail, but it is the same crash. It happened in different test so just running a lot of testing should trigger it in your environment as well.

@Hamlin-Li
Copy link
Author

Hi Stefan, Thanks a lot for the information, It's very helpful!

…ion when klass of dead objects is unloaded;

other misc improvements.
@Hamlin-Li
Copy link
Author

HI Stefan,

Seems we have fixed the previous crashes.
Would you mind to help to review the fixes and the whole change set when available? Thanks.
Summarize the previous 2 commits as below.

e267bc3:

  1. fix the crash when get object size which is caused by class unloading, by using "_bitmap->get_next_marked_addr(next_addr, limit);" rather than call "obj->size();" directly.
  2. some misc improvement when initializing G1RegionMarkStatsCache.
    a28a93f:
  3. fix the bot crash by cross_threshold.
  4. fix initialization issue of G1RegionMarkStatsCache, and assert issue in previous commit e267bc3.

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change should support MarkSweepAlwaysCompactCount (just adapt the threshold).

The "last ditch" collection should also fully compact, as well as probably a System.gc() call. I do not know the exact rules for the other collectors right now.

Please update the summary with these (and the earlier mentioned) requirements and approach.

src/hotspot/share/gc/g1/g1FullGCMarker.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1FullGCMarker.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1FullCollector.hpp Outdated Show resolved Hide resolved
@@ -87,6 +90,10 @@ class G1FullCollector : StackObj {
uint workers() { return _num_workers; }
G1FullGCMarker* marker(uint id) { return _markers[id]; }
G1FullGCCompactionPoint* compaction_point(uint id) { return _compaction_points[id]; }
GrowableArray<HeapRegion*>* skipping_compaction_set(uint id) { return _skipping_compaction_sets[id]; }
size_t live_bytes_after_full_gc_mark(uint region_idx) {
return MarkSweepDeadRatio > 0 ? _live_stats[region_idx]._live_words * HeapWordSize : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the additional MarkSweepDeadRatio > 0 check is necessary or useful at all. In case MarkSweepDeadRatio is zero, _hr_live_bytes_threshold is larger than whatever is stored in here anyway.

Also, please drop the * HeapWordSize and compare to a _hr_live_word_threshold. This multiplication is completely unnecessary from what I can tell. (It does not really hurt either, but why would I want to multiply both sides of your comparison by the same value before the comparison?)

Also just liveness() (or name this and the equivalent in G1ConcurrentMark something like live_words) is sufficient.

@@ -91,6 +99,15 @@ void G1FullGCCompactTask::work(uint worker_id) {
compact_region(*it);
}

if (MarkSweepDeadRatio > 0) {
Copy link
Contributor

@tschatzl tschatzl Mar 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is useless. If MarkSweepDeadRatio == 0, the skipping_compaction_queue is empty anyway (and the check whether the GrowableArray is empty is as fast). You could certainly assert that if MarkSweepDeadRatio == 0, then skipping_compaction_queue must be empty too.

src/hotspot/share/gc/g1/g1FullGCMarker.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1FullGCMarker.hpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/g1/g1FullGCMarker.inline.hpp Outdated Show resolved Hide resolved
prepare_for_compaction(hr);
assert(!hr->is_humongous(), "humongous objects not supported.");
size_t live_bytes = _collector->live_bytes_after_full_gc_mark(hr->hrm_index());
if(live_bytes <= _hr_live_bytes_threshold) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after the if.

Comment on lines 208 to 251
void G1FullGCPrepareTask::G1CalculatePointersClosure::prepare_for_skipping_compaction(HeapRegion* hr) {
HeapRegion* current = hr;
HeapWord* limit = current->top();
HeapWord* next_addr = current->bottom();
HeapWord* live_end = current->bottom();
_skipping_compaction_set->append(current);
HeapWord* threshold = current->initialize_threshold();
HeapWord* pre_addr;

while (next_addr < limit) {
Prefetch::write(next_addr, PrefetchScanIntervalInBytes);
pre_addr = next_addr;

if (_bitmap->is_marked(next_addr)) {
oop obj = oop(next_addr);
size_t obj_size = obj->size();
// Object should not move but mark-word is used so it looks like the
// object is forwarded. Need to clear the mark and it's no problem
// since it will be restored by preserved marks. There is an exception
// with BiasedLocking, in this case forwardee() will return NULL
// even if the mark-word is used. This is no problem since
// forwardee() will return NULL in the compaction phase as well.
if (obj->forwardee() != NULL) {
obj->init_mark();
}

next_addr += obj_size;
// update live byte range end
live_end = next_addr;
} else {
next_addr = _bitmap->get_next_marked_addr(next_addr, limit);
assert(next_addr > live_end, "next_addr must be bigger than live_end");
assert(next_addr == limit || _bitmap->is_marked(next_addr), "next_addr is the limit or is marked");
// fill dummy object to replace dead range
Universe::heap()->fill_with_dummy_object(live_end, next_addr, true);
}

if (next_addr > threshold) {
threshold = current->cross_threshold(pre_addr, next_addr);
}
}
assert(next_addr == limit, "Should stop the scan at the limit.");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all this code is not required/duplicate of the existing code that handles pinned regions (added in https://bugs.openjdk.java.net/browse/JDK-8253600). Please have a look and see if that code can be reused/repurposed.

I mean, these "skipped regions" should be equivalent to "pinned" regions wrt to required object handling. I admit I haven't actually looked how to do this, but this seems awfully similar with the only difference that skipped regions are determined only after marking and pinned regions sometimes before (full) gc.

The other handling should be the same. So at worst temporarily marking these regions as "pinned" should be sufficient for this functionality to be able to reuse the mechanism.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further investigation, I think so too.

@mlbridge
Copy link

mlbridge bot commented Mar 12, 2021

Mailing list message from Thomas Schatzl on hotspot-gc-dev:

Hi,

On 12.03.21 06:13, Hamlin Li wrote:

On Thu, 11 Mar 2021 15:19:36 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

Hamlin Li has updated the pull request incrementally with one additional commit since the last revision:

fix bot crash.

src/hotspot/share/gc/g1/g1FullGCMarker.hpp line 104:

102:
103: void flush_mark_region_cache() {
104: if (MarkSweepDeadRatio > 0) {

Drop this check to make the code more straightforward. Other code like in PR #2579 might find this information useful too. Maybe this could even be factored out in a separate CR.

Hi Thomas,
Do you mean to collect liveness info in mark phase when G1 full gc, even if MarkSweepDeadRatio is == 0 (which means do not skip any regions when compaction)? If this is the request, I can file a new bug for this liveness collection in G1 full gc.

Yes, please split out the liveness info gathering in the mark phase into
a separate CR. This is what PR #2579 needs.
StefanJ and me were discussing adding this already when implementing the
G1 parallel full gc, but due to lack of users we refrained from that..

For the jfr PR #2579 , I think it's can be addressed in a separate bug which depends on the one for liveness collection.
I will file/work these 2 new bugs (liveness collection when g1 full gc, jfr liveness event after full gc) if this is what you suggested.

PR #2579 is about the JFR liveness event. Jbachorik (the author of that
PR) could just reuse the information gathered in that new CR then.
Please coordinate with him if needed, but I do not think you
specifically needs to worry about the JFR event, I assume that jbachorik
will likely be happy to be able to add the plumbing for the JFR event then.

I cc'ed him (using some email address from the jfr-dev mailing list, I
hope I got it right).

Thanks,
Thomas

@mlbridge
Copy link

mlbridge bot commented Mar 12, 2021

Mailing list message from Hamlin on hotspot-gc-dev:

? 2021/3/12 16:12, Thomas Schatzl ??:

Hi,

On 12.03.21 06:13, Hamlin Li wrote:

On Thu, 11 Mar 2021 15:19:36 GMT, Thomas Schatzl
<tschatzl at openjdk.org> wrote:

Hamlin Li has updated the pull request incrementally with one
additional commit since the last revision:

?? fix bot crash.

src/hotspot/share/gc/g1/g1FullGCMarker.hpp line 104:

102:
103:?? void flush_mark_region_cache() {
104:???? if (MarkSweepDeadRatio > 0) {

Drop this check to make the code more straightforward. Other code
like in PR #2579 might find this information useful too. Maybe this
could even be factored out in a separate CR.

Hi Thomas,
Do you mean to collect liveness info in mark phase when G1 full gc,
even if MarkSweepDeadRatio is == 0 (which means do not skip any
regions when compaction)? If this is the request, I can file a new
bug for this liveness collection in G1 full gc.

Yes, please split out the liveness info gathering in the mark phase
into a separate CR. This is what PR #2579 needs.
StefanJ and me were discussing adding this already when implementing
the G1 parallel full gc, but due to lack of users we refrained from
that..

Thanks for confirmation, I have created the issue:
https://bugs.openjdk.java.net/browse/JDK-8263495.

For the jfr PR #2579 , I think it's can be addressed in a separate
bug which depends on the one for liveness collection.
I will file/work these 2 new bugs (liveness collection when g1 full
gc, jfr liveness event after full gc) if this is what you suggested.

PR #2579 is about the JFR liveness event. Jbachorik (the author of
that PR) could just reuse the information gathered in that new CR
then. Please coordinate with him if needed, but I do not think you
specifically needs to worry about the JFR event, I assume that
jbachorik will likely be happy to be able to add the plumbing for the
JFR event then.

I cc'ed him (using some email address from the jfr-dev mailing list, I
hope I got it right).

Got it. Just let me know if any assitant is needed from me.

Thanks,

- Hamlin

Thanks,
? Thomas

@tschatzl
Copy link
Contributor

After a short look I think the only changes that are needed to get a region to be considered pinned in the g1 full gc is in G1FullCollector::update_attribute_table() to set these regions that are too full as "pinned" for that collection.

There are two places in G1FullCollector where that attribute table isn't used yet, but the code uses HeapRegion::is_pinned() directly (search for r->is_pinned() in the full gc files, I think both are located in g1FullGCCompactTask.cpp). These should be rerouted to that table I think.

After that change everything should basically be still working.

No guarantees though.

@Hamlin-Li
Copy link
Author

Thanks for the suggestion, will try it.
For "last-ditch" issue, we plan to re-use some flag/logic in soft-ref, would like to hear your suggestion in advance if it's convenient for you.

@openjdk
Copy link

openjdk bot commented Mar 26, 2021

@Hamlin-Li This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio

Co-authored-by: Shoubing Ma <mashoubing1@huawei.com>
Reviewed-by: sjohanss, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 215 new commits pushed to the master branch:

  • cb2806d: 8265018: [AIX] FileDispatcherImpl.c:31:10: fatal error: 'sys/mount.h' file not found
  • ecef1fc: 8264972: Unused TypeFunc declared in OptoRuntime
  • 440c34a: 8264644: Add PrintClassLoaderDataGraphAtExit to print the detailed CLD graph
  • b1ebf82: 8264358: Don't create invalid oop in method handle tracing
  • 627ad9f: 8262328: Templatize JVMFlag boilerplate access methods
  • c15680e: 8264868: Reduce inclusion of registerMap.hpp and register.hpp
  • 5784f6b: 8264948: Check for TLS extensions total length
  • 42f4d70: 8264649: runtime/InternalApi/ThreadCpuTimesDeadlock.java crash in fastdebug C2 with -XX:-UseTLAB
  • 76bd313: 8264872: Dependencies: Migrate to PerfData counters
  • 07c8ff4: 8264871: Dependencies: Miscellaneous cleanups in dependencies.cpp
  • ... and 205 more: https://git.openjdk.java.net/jdk/compare/f69afba52735008613f0ede7d650372e95e9a6e0...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 26, 2021
@Hamlin-Li
Copy link
Author

Thanks Stefan, have a great vacation!

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will run this through internal testing before approving.

Comment on lines 156 to 160
if (live_words <= live_words_threshold) {
return true;
}
// High live ratio region will not be compacted.
return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return live_words <= live_words_thresholds should be sufficient here.

@Hamlin-Li
Copy link
Author

Thanks Thomas, sure, will hold until your test finished.
At the same time, we are also run perf tests to make sure the good performance of this final version.

@tschatzl
Copy link
Contributor

Thanks Thomas, sure, will hold until your test finished.
At the same time, we are also run perf tests to make sure the good performance of this final version.

gc/g1/TestEagerReclaimHumongousRegionsClearMarkBits.java fails with a fairly unknown error every few runs with this change:

[26.103s][info][gc,start] GC(112) Pause Full (G1 Evacuation Pause)
# To suppress the following error report, specify this argument
# after -XX: or in .hotspotrc:  SuppressErrorAt=/g1BlockOffsetTable.cpp:358
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  Internal Error (.../src/hotspot/share/gc/g1/g1BlockOffsetTable.cpp:358), pid=74345, tid=74370
#  guarantee(backskip <= max_backskip) failed: Going backwards beyond the start_card. start_card: 225280 current_card: 225281 backskip: 256

There is a strong likelihood that this is a pre-existing issue and not directly caused by this change. I will need to investigate this.

@tschatzl
Copy link
Contributor

The reason for this crash is that if there is a young region that is not compacted (because it's mostly full), its BOT (block offset table) is not updated to the extent the verification expects it.

I.e. that verification expects the BOT for old gen regions is completely valid, from start to end of the region. For young regions this is not the case, their BOT is not updated at all (which is normal), just containing a marker indicating that there is no BOT; compaction would take care of this.

The verification does not respect this "after this point the BOT is invalid" marker. There is actually some code in heapRegion.cpp:724 that intentionally skips young region verification for this purpose. Now that they might be old, this does not work. I think the correct fix is to not try to verify beyond this marker, i.e. change the assignment to end_card in G1BlockOffsetTablePart::verify accordingly.

Another solution would be to make the region appear like filled with a single large allocation (TLAB).

This is purely to keep the verification happy. All other code correctly handles a partially valid BOT (i.e. valid up to and including that mentioned marker).

I will try out these options.

@Hamlin-Li
Copy link
Author

Hi Thomas, Thank you so much for helping investigate the issue, I will check it tomorrow too.

@tschatzl
Copy link
Contributor

Here is a potential fix for this. This needs some discussion on whether an alternatives would be better or not; more of them could be:

  • just initialize a dummy BOT for not-compacted young region
  • exclude young regions from this dead-wood scheme
    In any case this fix should probably go in separately.

@Hamlin-Li
Copy link
Author

I think the first solution is better. Reasons are related to performance:

  1. any way, the bot will be rebuild for this young region ( if this young region is compacted, bot will be rebuild when cross_threshold; )
  2. 1st solution avoid the copying of this young region.

@Hamlin-Li
Copy link
Author

Hamlin-Li commented Mar 31, 2021

Here is a potential fix for this.

Thanks for the fix.

In any case this fix should probably go in separately.

Do you mean we push this pr, then discuss the fix in another thread? If yes, I will file a bug to track the issue and initialize discussion on this bug.

@tschatzl
Copy link
Contributor

tschatzl commented Apr 6, 2021

This change should go in after PR#3356 which fixes this issue. I think it is a change that is worth pointing out and discussing separately - as you might see from the long description.

@Hamlin-Li
Copy link
Author

I see, Thanks Thomas!

@@ -95,11 +98,13 @@ class G1FullCollector : StackObj {
G1FullGCCompactionPoint* serial_compaction_point() { return &_serial_compaction_point; }
G1CMBitMap* mark_bitmap();
ReferenceProcessor* reference_processor();
size_t live_words(uint region_index) { return _live_stats[region_index]._live_words; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add range check assertion for region_index? The extra spaces can be removed; no need to align with previous methods.

@@ -171,7 +171,7 @@ class HeapRegion : public CHeapObj<mtGC> {
// Update heap region that has been compacted to be consistent after Full GC.
void reset_compacted_after_full_gc();
// Update pinned heap region (not compacted) to be consistent after Full GC.
void reset_pinned_after_full_gc();
void reset_not_compacted_after_full_gc();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this version uses the existing "pinned" mechanism to skip high-live-ratio regions, this method can retain its original name/implementation, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, full gc reuses the mechanism originally implemented for pinned regions. However "pinned" is something very specific in G1 context so I think it is better to use a different, more generic name. The regions that are not compacted (and not yet pinned) are not really temporarily pinned.

There has been an earlier discussion (not sure if here in this PR) to actually rename the use of "pinned" in G1 full gc to use "not compacted" too, resulting in a CR to rename this (in JDK-8264423).

However I was going to wait for this change to go in before sending out a PR.

Also this "not_compacted" name better matches the "compacted" method name above.

So I would prefer to keep this as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see; agree.

if (hr->is_free()) {
void G1FullCollector::update_attribute_table(HeapRegion* hr, bool force_pinned) {
if (force_pinned) {
_region_attr_table.set_pinned(hr->hrm_index());
return;
}
if (hr->is_closed_archive()) {
_region_attr_table.set_closed_archive(hr->hrm_index());
} else if (hr->is_pinned()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the condition to hr->is_pinned() || force_pinned is enough for this method, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly. please check previous discussion at #2760 (comment).
I will merge the 2 conditions as you suggested.

@@ -225,15 +228,17 @@ void G1FullCollector::complete_collection() {
_heap->print_heap_after_full_collection(scope()->heap_transition());
}

void G1FullCollector::update_attribute_table(HeapRegion* hr) {
if (hr->is_free()) {
Copy link
Contributor

@tschatzl tschatzl Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another item that has been noted in a recent discussion with @albertnetymk is that with this change "Free" regions are also marked as normal in the table. It would be better to keep them as "Invalid".

I.e. something like (incorporating @albertnetymk other suggestion):

if (hr->is_free()) {
  return;
} else if (hr->is_closed_archive(...) {
  [...]
} else if (hr->is_pinned() || force_pinned) {
  [...]
} else {
  [...]
}

There is no real difference as "Free" regions should never be referenced anywhere and the code should assert elsewhere. It's still nice to also have "Free" regions as Invalid in that table though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not exactly. please check previous discussion at #2760 (comment).

Copy link
Contributor

@tschatzl tschatzl Apr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more investigation and discussion about the BOT handling showed that we need to update the BOT for these Survivor-turned-to-Old regions after all.

The reason is that contrary to what I thought, while BOT can handle queries for object start addresses above the "last known valid entry" (materialized in _next_offset_threshold and _next_offset_index ) mentioned in PR#3356, it only does so slowly, and while updating the BOT itself, not updating that "last known valid entry".
So every time it queries for an object start in such regions, it starts walking from the bottom of that region.

See the call chain G1BlockOffsetTablePart::block_start -> forward_to_block_containing_addr -> forward_to_block_containing_addr_slow where the call to alloc_block_work in g1BlockOffsetTable.cpp:236 only updates local boundary and index (not _next_offset_threshold and _next_offset_index).

This is a problem for young gcs as this behavior will make them slower than expected. Since it is impossible to make the updates to both _next_offset_threshold and _next_offset_index atomic, and another issue anyway; feel free to file an issue) I would prefer to penalize full gc (that is, keep old behavior) for that.

The alternative would be to just alloc_block the whole region (so that next_offset_index and friend are at the top), but I think given the rarity of this case and full gc in general it is better to do the extra work in the full gc.

Could you add code that walks such full young regions and does the cross_threshold thing? This additional code certainly does not need to actually compact these regions.

Thanks,
Thomas

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for discussion! I had the same concern (BOTs of these Survivor-turned-to-Old regions contains no useful info, so might make it very slow when finding block start in these regions), but I was not sure if there is some "lazy" mechanism to fill valid BOTs info for these regions when they are accessed subsequently. I just not have time to do further investigation, now I got the answer from you.

Sure I will add code to fill valid BOTs info for these young regions by the end of full gc.
I'm not sure if it's OK for me to do this BOTs filling action in a separate issue? As we already had such a long discussion, I think it's might be better for us to initialize another discussion for this specific follow-up issue. Please kindly let me know your thoughts.

Copy link
Contributor

@tschatzl tschatzl Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine with me to fix this separately.

There is some lazy mechanism to fill this BOT, but it does not work in this case (which is the gist of what I said above).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if that were fixed, I think PR #3356 would not be needed - because then young region BOTs have the expected contents :) Your call.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Thomas!
Although I'm not sure if pr #3356 is still necessary. But sure, I will add code to fill BOTs for "young" regions as a follow-up fix/enhanacement of this issue, I have just created an bug to track it https://bugs.openjdk.java.net/browse/JDK-8264987.
So after this pr is approved and integrated I will initialize the pr for JDK-8264987.

@Hamlin-Li
Copy link
Author

@tschatzl Hi Thomas, I'm not sure if I'm ready to integrate this patch, would you mind to help confirm, or give some further comments? Thanks

@tschatzl
Copy link
Contributor

Ready to integrate.

@Hamlin-Li
Copy link
Author

Thanks @tschatzl @kstefanj @albertnetymk for reviewing!

@Hamlin-Li
Copy link
Author

/integrate

@openjdk openjdk bot closed this Apr 12, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 12, 2021
@openjdk
Copy link

openjdk bot commented Apr 12, 2021

@Hamlin-Li Since your change was applied there have been 218 commits pushed to the master branch:

  • f71be8b: 8264954: unified handling for VectorMask object re-materialization during de-optimization
  • 3c9858d: 8264827: Large mapped buffer/segment crash the VM when calling isLoaded
  • e604320: 8264783: G1 BOT verification should not verify beyond allocation threshold
  • cb2806d: 8265018: [AIX] FileDispatcherImpl.c:31:10: fatal error: 'sys/mount.h' file not found
  • ecef1fc: 8264972: Unused TypeFunc declared in OptoRuntime
  • 440c34a: 8264644: Add PrintClassLoaderDataGraphAtExit to print the detailed CLD graph
  • b1ebf82: 8264358: Don't create invalid oop in method handle tracing
  • 627ad9f: 8262328: Templatize JVMFlag boilerplate access methods
  • c15680e: 8264868: Reduce inclusion of registerMap.hpp and register.hpp
  • 5784f6b: 8264948: Check for TLS extensions total length
  • ... and 208 more: https://git.openjdk.java.net/jdk/compare/f69afba52735008613f0ede7d650372e95e9a6e0...master

Your commit was automatically rebased without conflicts.

Pushed as commit be0d46c.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@Hamlin-Li Hamlin-Li deleted the g1-full-gc-optimization-00 branch April 13, 2021 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants