-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK-8262068: Improve G1 Full GC by skipping compaction for regions with high survival ratio #2760
Conversation
…th high survival ratio
👋 Welcome back mli! A progress list of the required criteria for merging this PR into |
@Hamlin-Li The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
/contributor add Shoubing Ma mashoubing1@huawei.com |
@Hamlin-Li |
Hi Hamlin, First of all, thanks for contributing. Before doing a more in depth review of this change I have a few questions/suggestions:
So we have a large array that the workers will scan for reference to other objects, the results on my workstation look like this:
The results are quite surprising, because in the other benchmarks there is not a big diff in marking times. But looking a bit at the code and how the inlining decisions are made for So before looking closer at the code I would like to see if we can reuse Thanks, |
Hi Stefan , Thanks a lot for detailed review and benchmark! |
9956ab8
to
d7cf5fc
Compare
…s; fix regression in Mark phase by inlining live words collection into mark_object()
d7cf5fc
to
e798c78
Compare
Hi Stefan, There is jdk/tier1 test failure on MaxOS, it passed on other platforms: https://github.com/openjdk/jdk/pull/2760/checks?check_run_id=2033359984. |
Even if the re-run passes this seems to be a real problem, probably just a bit intermittent. Looking at the test output you can see there is a segmentation fault in: Running the test with debug builds might make the issue reproduce on more platforms and give hints on what the problem is. Hope this helps, |
Thanks Stefan, we're investigating. (currently, we don't have mac env) |
I don't know for sure, I tried downloading the "log archive" but it did not include the hs_err-file which would have been very useful. I can kick of an internal run to see if it reproduces here. |
I manage to get a crash locally using a fastdebug build on Linux x64. I haven't had time to look at the error in detail, but it is the same crash. It happened in different test so just running a lot of testing should trigger it in your environment as well. |
Hi Stefan, Thanks a lot for the information, It's very helpful! |
…ion when klass of dead objects is unloaded; other misc improvements.
HI Stefan, Seems we have fixed the previous crashes.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change should support MarkSweepAlwaysCompactCount
(just adapt the threshold).
The "last ditch" collection should also fully compact, as well as probably a System.gc() call. I do not know the exact rules for the other collectors right now.
Please update the summary with these (and the earlier mentioned) requirements and approach.
@@ -87,6 +90,10 @@ class G1FullCollector : StackObj { | |||
uint workers() { return _num_workers; } | |||
G1FullGCMarker* marker(uint id) { return _markers[id]; } | |||
G1FullGCCompactionPoint* compaction_point(uint id) { return _compaction_points[id]; } | |||
GrowableArray<HeapRegion*>* skipping_compaction_set(uint id) { return _skipping_compaction_sets[id]; } | |||
size_t live_bytes_after_full_gc_mark(uint region_idx) { | |||
return MarkSweepDeadRatio > 0 ? _live_stats[region_idx]._live_words * HeapWordSize : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the additional MarkSweepDeadRatio > 0
check is necessary or useful at all. In case MarkSweepDeadRatio
is zero, _hr_live_bytes_threshold
is larger than whatever is stored in here anyway.
Also, please drop the * HeapWordSize
and compare to a _hr_live_word_threshold
. This multiplication is completely unnecessary from what I can tell. (It does not really hurt either, but why would I want to multiply both sides of your comparison by the same value before the comparison?)
Also just liveness()
(or name this and the equivalent in G1ConcurrentMark
something like live_words
) is sufficient.
@@ -91,6 +99,15 @@ void G1FullGCCompactTask::work(uint worker_id) { | |||
compact_region(*it); | |||
} | |||
|
|||
if (MarkSweepDeadRatio > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is useless. If MarkSweepDeadRatio == 0
, the skipping_compaction_queue
is empty anyway (and the check whether the GrowableArray
is empty is as fast). You could certainly assert that if MarkSweepDeadRatio == 0
, then skipping_compaction_queue
must be empty too.
prepare_for_compaction(hr); | ||
assert(!hr->is_humongous(), "humongous objects not supported."); | ||
size_t live_bytes = _collector->live_bytes_after_full_gc_mark(hr->hrm_index()); | ||
if(live_bytes <= _hr_live_bytes_threshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after the if
.
void G1FullGCPrepareTask::G1CalculatePointersClosure::prepare_for_skipping_compaction(HeapRegion* hr) { | ||
HeapRegion* current = hr; | ||
HeapWord* limit = current->top(); | ||
HeapWord* next_addr = current->bottom(); | ||
HeapWord* live_end = current->bottom(); | ||
_skipping_compaction_set->append(current); | ||
HeapWord* threshold = current->initialize_threshold(); | ||
HeapWord* pre_addr; | ||
|
||
while (next_addr < limit) { | ||
Prefetch::write(next_addr, PrefetchScanIntervalInBytes); | ||
pre_addr = next_addr; | ||
|
||
if (_bitmap->is_marked(next_addr)) { | ||
oop obj = oop(next_addr); | ||
size_t obj_size = obj->size(); | ||
// Object should not move but mark-word is used so it looks like the | ||
// object is forwarded. Need to clear the mark and it's no problem | ||
// since it will be restored by preserved marks. There is an exception | ||
// with BiasedLocking, in this case forwardee() will return NULL | ||
// even if the mark-word is used. This is no problem since | ||
// forwardee() will return NULL in the compaction phase as well. | ||
if (obj->forwardee() != NULL) { | ||
obj->init_mark(); | ||
} | ||
|
||
next_addr += obj_size; | ||
// update live byte range end | ||
live_end = next_addr; | ||
} else { | ||
next_addr = _bitmap->get_next_marked_addr(next_addr, limit); | ||
assert(next_addr > live_end, "next_addr must be bigger than live_end"); | ||
assert(next_addr == limit || _bitmap->is_marked(next_addr), "next_addr is the limit or is marked"); | ||
// fill dummy object to replace dead range | ||
Universe::heap()->fill_with_dummy_object(live_end, next_addr, true); | ||
} | ||
|
||
if (next_addr > threshold) { | ||
threshold = current->cross_threshold(pre_addr, next_addr); | ||
} | ||
} | ||
assert(next_addr == limit, "Should stop the scan at the limit."); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all this code is not required/duplicate of the existing code that handles pinned regions (added in https://bugs.openjdk.java.net/browse/JDK-8253600). Please have a look and see if that code can be reused/repurposed.
I mean, these "skipped regions" should be equivalent to "pinned" regions wrt to required object handling. I admit I haven't actually looked how to do this, but this seems awfully similar with the only difference that skipped regions are determined only after marking and pinned regions sometimes before (full) gc.
The other handling should be the same. So at worst temporarily marking these regions as "pinned" should be sufficient for this functionality to be able to reuse the mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After further investigation, I think so too.
Mailing list message from Thomas Schatzl on hotspot-gc-dev: Hi, On 12.03.21 06:13, Hamlin Li wrote:
Yes, please split out the liveness info gathering in the mark phase into
PR #2579 is about the JFR liveness event. Jbachorik (the author of that I cc'ed him (using some email address from the jfr-dev mailing list, I Thanks, |
Mailing list message from Hamlin on hotspot-gc-dev: ? 2021/3/12 16:12, Thomas Schatzl ??:
Thanks for confirmation, I have created the issue:
Got it. Just let me know if any assitant is needed from me. Thanks, - Hamlin
|
After a short look I think the only changes that are needed to get a region to be considered pinned in the g1 full gc is in There are two places in After that change everything should basically be still working. No guarantees though. |
Thanks for the suggestion, will try it. |
@Hamlin-Li This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 215 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Thanks Stefan, have a great vacation! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will run this through internal testing before approving.
if (live_words <= live_words_threshold) { | ||
return true; | ||
} | ||
// High live ratio region will not be compacted. | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return live_words <= live_words_thresholds
should be sufficient here.
Thanks Thomas, sure, will hold until your test finished. |
gc/g1/TestEagerReclaimHumongousRegionsClearMarkBits.java fails with a fairly unknown error every few runs with this change:
There is a strong likelihood that this is a pre-existing issue and not directly caused by this change. I will need to investigate this. |
The reason for this crash is that if there is a young region that is not compacted (because it's mostly full), its BOT (block offset table) is not updated to the extent the verification expects it. I.e. that verification expects the BOT for old gen regions is completely valid, from start to end of the region. For young regions this is not the case, their BOT is not updated at all (which is normal), just containing a marker indicating that there is no BOT; compaction would take care of this. The verification does not respect this "after this point the BOT is invalid" marker. There is actually some code in Another solution would be to make the region appear like filled with a single large allocation (TLAB). This is purely to keep the verification happy. All other code correctly handles a partially valid BOT (i.e. valid up to and including that mentioned marker). I will try out these options. |
Hi Thomas, Thank you so much for helping investigate the issue, I will check it tomorrow too. |
Here is a potential fix for this. This needs some discussion on whether an alternatives would be better or not; more of them could be:
|
I think the first solution is better. Reasons are related to performance:
|
Thanks for the fix.
Do you mean we push this pr, then discuss the fix in another thread? If yes, I will file a bug to track the issue and initialize discussion on this bug. |
This change should go in after PR#3356 which fixes this issue. I think it is a change that is worth pointing out and discussing separately - as you might see from the long description. |
I see, Thanks Thomas! |
@@ -95,11 +98,13 @@ class G1FullCollector : StackObj { | |||
G1FullGCCompactionPoint* serial_compaction_point() { return &_serial_compaction_point; } | |||
G1CMBitMap* mark_bitmap(); | |||
ReferenceProcessor* reference_processor(); | |||
size_t live_words(uint region_index) { return _live_stats[region_index]._live_words; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add range check assertion for region_index
? The extra spaces can be removed; no need to align with previous methods.
@@ -171,7 +171,7 @@ class HeapRegion : public CHeapObj<mtGC> { | |||
// Update heap region that has been compacted to be consistent after Full GC. | |||
void reset_compacted_after_full_gc(); | |||
// Update pinned heap region (not compacted) to be consistent after Full GC. | |||
void reset_pinned_after_full_gc(); | |||
void reset_not_compacted_after_full_gc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that this version uses the existing "pinned" mechanism to skip high-live-ratio regions, this method can retain its original name/implementation, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true, full gc reuses the mechanism originally implemented for pinned regions. However "pinned" is something very specific in G1 context so I think it is better to use a different, more generic name. The regions that are not compacted (and not yet pinned) are not really temporarily pinned.
There has been an earlier discussion (not sure if here in this PR) to actually rename the use of "pinned" in G1 full gc to use "not compacted" too, resulting in a CR to rename this (in JDK-8264423).
However I was going to wait for this change to go in before sending out a PR.
Also this "not_compacted" name better matches the "compacted" method name above.
So I would prefer to keep this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; agree.
if (hr->is_free()) { | ||
void G1FullCollector::update_attribute_table(HeapRegion* hr, bool force_pinned) { | ||
if (force_pinned) { | ||
_region_attr_table.set_pinned(hr->hrm_index()); | ||
return; | ||
} | ||
if (hr->is_closed_archive()) { | ||
_region_attr_table.set_closed_archive(hr->hrm_index()); | ||
} else if (hr->is_pinned()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the condition to hr->is_pinned() || force_pinned
is enough for this method, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly. please check previous discussion at #2760 (comment).
I will merge the 2 conditions as you suggested.
@@ -225,15 +228,17 @@ void G1FullCollector::complete_collection() { | |||
_heap->print_heap_after_full_collection(scope()->heap_transition()); | |||
} | |||
|
|||
void G1FullCollector::update_attribute_table(HeapRegion* hr) { | |||
if (hr->is_free()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another item that has been noted in a recent discussion with @albertnetymk is that with this change "Free" regions are also marked as normal
in the table. It would be better to keep them as "Invalid".
I.e. something like (incorporating @albertnetymk other suggestion):
if (hr->is_free()) {
return;
} else if (hr->is_closed_archive(...) {
[...]
} else if (hr->is_pinned() || force_pinned) {
[...]
} else {
[...]
}
There is no real difference as "Free" regions should never be referenced anywhere and the code should assert elsewhere. It's still nice to also have "Free" regions as Invalid
in that table though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly. please check previous discussion at #2760 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more investigation and discussion about the BOT handling showed that we need to update the BOT for these Survivor-turned-to-Old regions after all.
The reason is that contrary to what I thought, while BOT can handle queries for object start addresses above the "last known valid entry" (materialized in _next_offset_threshold
and _next_offset_index
) mentioned in PR#3356, it only does so slowly, and while updating the BOT itself, not updating that "last known valid entry".
So every time it queries for an object start in such regions, it starts walking from the bottom of that region.
See the call chain G1BlockOffsetTablePart::block_start
-> forward_to_block_containing_addr
-> forward_to_block_containing_addr_slow
where the call to alloc_block_work
in g1BlockOffsetTable.cpp:236
only updates local boundary and index (not _next_offset_threshold
and _next_offset_index
).
This is a problem for young gcs as this behavior will make them slower than expected. Since it is impossible to make the updates to both _next_offset_threshold
and _next_offset_index
atomic, and another issue anyway; feel free to file an issue) I would prefer to penalize full gc (that is, keep old behavior) for that.
The alternative would be to just alloc_block
the whole region (so that next_offset_index
and friend are at the top), but I think given the rarity of this case and full gc in general it is better to do the extra work in the full gc.
Could you add code that walks such full young regions and does the cross_threshold
thing? This additional code certainly does not need to actually compact these regions.
Thanks,
Thomas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for discussion! I had the same concern (BOTs of these Survivor-turned-to-Old regions contains no useful info, so might make it very slow when finding block start in these regions), but I was not sure if there is some "lazy" mechanism to fill valid BOTs info for these regions when they are accessed subsequently. I just not have time to do further investigation, now I got the answer from you.
Sure I will add code to fill valid BOTs info for these young regions by the end of full gc.
I'm not sure if it's OK for me to do this BOTs filling action in a separate issue? As we already had such a long discussion, I think it's might be better for us to initialize another discussion for this specific follow-up issue. Please kindly let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine with me to fix this separately.
There is some lazy mechanism to fill this BOT, but it does not work in this case (which is the gist of what I said above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if that were fixed, I think PR #3356 would not be needed - because then young region BOTs have the expected contents :) Your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Thomas!
Although I'm not sure if pr #3356 is still necessary. But sure, I will add code to fill BOTs for "young" regions as a follow-up fix/enhanacement of this issue, I have just created an bug to track it https://bugs.openjdk.java.net/browse/JDK-8264987.
So after this pr is approved and integrated I will initialize the pr for JDK-8264987.
@tschatzl Hi Thomas, I'm not sure if I'm ready to integrate this patch, would you mind to help confirm, or give some further comments? Thanks |
Ready to integrate. |
Thanks @tschatzl @kstefanj @albertnetymk for reviewing! |
/integrate |
@Hamlin-Li Since your change was applied there have been 218 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit be0d46c. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Summary
Improve G1 Full GC by skip compaction for regions with high survival ratio.
Backgroud
There are 4 steps in full gc of G1 GC.
When full gc occurs, there may be very high percentage of live bytes in some regions. For these regions, it's not efficient to compact them and better to skip them, as there are little space to save but many objects to copy.
Description
We enhance the full gc implementation for the above situation through following steps:
VM options related
Test
$ java -Xmx1g -Xms1g -XX:ParallelGCThreads=4 -Xlog:gc*=info:file=gc.log -jar dacapo-9.12-bach.jar --iterations 5 --size huge --no-pre-iteration-gc h2
Progress
Issue
Reviewers
Contributors
<mashoubing1@huawei.com>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/2760/head:pull/2760
$ git checkout pull/2760
Update a local copy of the PR:
$ git checkout pull/2760
$ git pull https://git.openjdk.java.net/jdk pull/2760/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 2760
View PR using the GUI difftool:
$ git pr show -t 2760
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/2760.diff