-
Notifications
You must be signed in to change notification settings - Fork 6k
8276098: Do precise BOT updates in G1 evacuation phase #6166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into |
Webrevs
|
assert(_bot_updates, "must only be called for regions doing BOT updates"); | ||
_alloc_region->update_bot_at(addr, size); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I ask why do we need to update for the filler objects? Because we are not scanning them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. You are correct in that we never need to scan these parts because of dirty cards, but during remembered set rebuilding (see: G1RebuildRemSetHeapRegionClosure::rebuild_rem_set_in_region(...)
) we include the whole region when looking for references to other heap regions.
There might be some good way to avoid scanning those parts during rebuild, but such investigation is out of scope for this PR.
Thanks for reviewing 😄
Another thought is maybe |
Another good point. I think such change should also be done as a separate PR and we should probably also look at hardening the code to not allow updates because the BOT is not precise (because it should be). I will file issues for these two things. |
Took a closer look at |
You are right, it's not conservatively ordered. My bad : ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first cut of ideas to improve the change a bit - I really like it from a technical point of view. However it feels kind of patched in instead of some part of a whole.
Particularly one issue I would like to discuss in detail is that we now do bot updates at different levels (G1PLABAllocator
, G1AllocRegion
). The suggestion to move the BOT update for the waste into G1PLAB::retire_internal
could fix that though. Could you look into this refactoring suggestion in more detail please?
Other comments may be outdated if/after this change.
HeapRegion* _bot_plab_region; | ||
// Current BOT threshold, a PLAB allocation crossing this threshold will cause a BOT | ||
// update. | ||
HeapWord* _bot_plab_threshold; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also only interesting for the old generation, isn't it? Same issue as above.
@@ -133,4 +133,60 @@ inline HeapWord* G1PLABAllocator::allocate(G1HeapRegionAttr dest, | |||
return allocate_direct_or_new_plab(dest, word_sz, refill_failed, node_index); | |||
} | |||
|
|||
inline void G1PLABAllocator::update_bot_for_plab_waste(G1HeapRegionAttr attr, G1PLAB* plab) { | |||
if (!attr.is_old()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to make an extra predicate like needs_bot_update()
for this instead of explictly replicating the code in a few places (and always adding the comment).
G1PLAB(size_t word_sz); | ||
bool is_allocated(); | ||
HeapWord* get_filler(); | ||
size_t get_filler_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of two methods that return the remaining space, one that returns a MemRegion
would be nicer? Also call it something like remainder
or so.
What about making PLAB::retire_internal
virtual and override here, so that the explicit call in G1PLABAllocator::allocate_direct_or_new_plab
goes away? (and these two helpers, and probably also is_allocated()
).
Also the explict call in G1PLABAllocator::flush_and_retire_stats
could maybe be hidden too. You could store the G1PLABAllocator
in G1PLAB
so that it can call the update_bot....
method in retire_internal
.
@@ -498,6 +498,9 @@ oop G1ParScanThreadState::do_copy_to_survivor_space(G1HeapRegionAttr const regio | |||
obj->incr_age(); | |||
} | |||
_age_table.add(age, word_sz); | |||
} else { | |||
assert(dest_attr.is_old(), "Only update bot for allocations in old"); | |||
_plab_allocator->update_bot_for_object(obj_ptr, word_sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be good if the PLABAllocator
returned a struct instead of just a pointer that contains
- the HeapWord
- word_sz
- whether it was a direct allocation
?
Then this struct could be passed in here again instead of the code for update_bot_for_object
trying to reconstruct whether it has been an out-of-plab allocation or not.
Or just skip the word_sz
in that struct. Alternatively add a return value whether this allocation has been out-of-plab. But this method already has lots of locals (is too long?), so starting to group them might be a good idea.
I think explicitly carrying this information around would be much much cleaner to understand than trying to reconstruct that information later in `update_bot_for_object´ via
if (!alloc_buffer(G1HeapRegionAttr::Old, 0)->contains(obj_start)) {
// Out of PLAB allocation, BOT already updated.
return;
}
@@ -156,14 +166,27 @@ class G1PLABAllocator : public CHeapObj<mtGC> { | |||
G1CollectedHeap* _g1h; | |||
G1Allocator* _allocator; | |||
|
|||
PLAB** _alloc_buffers[G1HeapRegionAttr::Num]; | |||
// Region where the current old generation PLAB is allocated. Used to do BOT updates. | |||
HeapRegion* _bot_plab_region; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a breakage of abstraction if we only store this information for old gen - we after all allocate the G1PLAB
for all G1HeapRegionAttr::num
destination areas.
What about putting all that stuff into G1PLAB
and initializing it appropriately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. Some minor naming/method location suggestions.
@@ -198,6 +217,9 @@ class G1PLABAllocator : public CHeapObj<mtGC> { | |||
bool* refill_failed, | |||
uint node_index); | |||
|
|||
// Update the BOT for the last PLAB allocation. | |||
inline void update_bot_for_allocation(G1HeapRegionAttr dest, size_t word_sz, uint node_index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline void update_bot_for_allocation(G1HeapRegionAttr dest, size_t word_sz, uint node_index); | |
inline void update_bot_for_allocation(G1HeapRegionAttr dest, size_t word_sz, uint node_index); |
inline void update_bot_for_allocation(G1HeapRegionAttr dest, size_t word_sz, uint node_index); | |
inline void update_bot_for_plab_allocation(G1HeapRegionAttr dest, size_t word_sz, uint node_index); |
I would explicitly call this out as to be used for PLAB allocation. If changed, obviously needs updates to the callers as well.
@@ -103,6 +103,7 @@ struct G1HeapRegionAttr { | |||
bool is_young() const { return type() == Young; } | |||
bool is_old() const { return type() == Old; } | |||
bool is_optional() const { return type() == Optional; } | |||
bool needs_bot_update() const { return is_old(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that predicate needs to be here, I'd probably just add a method to G1PLABAllocator
. But it is fine to me.
@kstefanj This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 204 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Me and @tschatzl have discussed this offline and the above changes are the outcome of that. I will redo testing and if anything doesn't look good I'll update the PR. To summarize the changes a bit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor comments/suggestions.
if (state == G1HeapRegionAttr::Old) { | ||
// Specialized PLABs for old that handle BOT updates for object allocations. | ||
_alloc_buffers[state][node_index] = new G1BotUpdatingPLAB(_g1h->desired_plab_sz(state)); | ||
} else { | ||
_alloc_buffers[state][node_index] = new PLAB(_g1h->desired_plab_sz(state)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ternary operator can be used here:
word_sz = _g1h->desired_plab_sz(state);
// ...
_alloc_buffers[state][node_index] = (state == G1HeapRegionAttr::Old)
? new G1BotUpdatingPLAB(word_sz)
: new PLAB(word_sz);
assert(_bot_part.threshold_for_addr(addr) >= addr, | ||
"threshold must be at or after given address. " PTR_FORMAT " >= " PTR_FORMAT, | ||
p2i(_bot_part.threshold_for_addr(addr)), p2i(addr)); | ||
assert(is_old(), | ||
"Should only calculate BOT threshold for old regions. addr: " PTR_FORMAT " region:" HR_FORMAT, | ||
p2i(addr), HR_FORMAT_PARAMS(this)); | ||
return _bot_part.threshold_for_addr(addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of calling _bot_part.threshold_for_addr
multiple times, storing the result in a local var could be cleaner.
@@ -119,7 +119,7 @@ class PLAB: public CHeapObj<mtGC> { | |||
} | |||
|
|||
// Sets the space of the buffer to be [buf, space+word_sz()). | |||
void set_buf(HeapWord* buf, size_t new_word_sz) { | |||
virtual void set_buf(HeapWord* buf, size_t new_word_sz) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if override
is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the base class so this is not an override
but declaring the function as virtual
.
public: | ||
G1BotUpdatingPLAB(size_t word_sz) : PLAB(word_sz) { } | ||
// Sets the new PLAB buffer as well as updates the threshold and region. | ||
virtual void set_buf(HeapWord* buf, size_t word_sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is override
better here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change this to override
if you prefer. I see that we have some uses of it in G1 already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please, since it makes the intention explicit and enforces it in compile-time.
First of all, pause time looks good from my test~ I was thinking that |
Okay, it's actually not relevant to this patch. |
In this case Or However this is a different issue I think. |
I've created JDK-8276229 and I plan to open a PR for this once this issue has been resolved. I also noticed that we sometimes still enter |
Okay, thanks for clarifying :) |
As mentioned, in this particular case even the |
@linade, also many thanks for verifying that the pause times look good on your side as well. If you want to get credited as a reviewer for this change you need to change your review response to be approved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to understand what G1BlockOffsetTablePart::threshold_for_addr
does is essentially align_up
. Maybe another PR to make the intention more explicit.
/approve |
@linade Unknown command |
Thanks @linade, @albertnetymk and @tschatzl for the reviews. My additional testing all looks good so I'll integrate straight away. /integrate |
Going to push as commit 945f408.
Your commit was automatically rebased without conflicts. |
Please review this change to do precise BOT updates in the G1 evacuation phase.
Summary
In G1 young collections the BOT is updated for objects copied to old generation regions. Prior to this fix the BOT updates are very crude and only done for each new PLAB and for direct allocations (large allocation outside the PLABs).
The BOT is then updated to be more precise during concurrent refinement and when scanning the heap in later GCs. This leads to both more time spent doing concurrent refinement as well as prolonged "scan heap" phases in the following GCs.
With this change we instead update the BOT to be complete and precise while doing the copy. This way we can reduce the time in the following phases quite significantly. This comes with a slight regression in object copy times, but from my measurements the overall gain is worth the complexity and extra time spent in object copy.
Doing this more precise BOT updating requires us to not rely on a global threshold for updating the BOT but instead calculate where the updates are done, this allows us to remove a lock in the old generation allocation path which is only present to guard this threshold. So with this change we can remove the different allocation paths used for young and old regions.
Testing
All testing look good:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6166/head:pull/6166
$ git checkout pull/6166
Update a local copy of the PR:
$ git checkout pull/6166
$ git pull https://git.openjdk.java.net/jdk pull/6166/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6166
View PR using the GUI difftool:
$ git pr show -t 6166
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6166.diff