Skip to content

8276098: Do precise BOT updates in G1 evacuation phase #6166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Oct 29, 2021

Please review this change to do precise BOT updates in the G1 evacuation phase.

Summary
In G1 young collections the BOT is updated for objects copied to old generation regions. Prior to this fix the BOT updates are very crude and only done for each new PLAB and for direct allocations (large allocation outside the PLABs).

The BOT is then updated to be more precise during concurrent refinement and when scanning the heap in later GCs. This leads to both more time spent doing concurrent refinement as well as prolonged "scan heap" phases in the following GCs.

With this change we instead update the BOT to be complete and precise while doing the copy. This way we can reduce the time in the following phases quite significantly. This comes with a slight regression in object copy times, but from my measurements the overall gain is worth the complexity and extra time spent in object copy.

Doing this more precise BOT updating requires us to not rely on a global threshold for updating the BOT but instead calculate where the updates are done, this allows us to remove a lock in the old generation allocation path which is only present to guard this threshold. So with this change we can remove the different allocation paths used for young and old regions.

Testing
All testing look good:

  • Mach5 tier1-5
  • Local stress testing
  • Performance testing and pause time comparisons

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8276098: Do precise BOT updates in G1 evacuation phase

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6166/head:pull/6166
$ git checkout pull/6166

Update a local copy of the PR:
$ git checkout pull/6166
$ git pull https://git.openjdk.java.net/jdk pull/6166/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6166

View PR using the GUI difftool:
$ git pr show -t 6166

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6166.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 29, 2021

👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 29, 2021
@openjdk
Copy link

openjdk bot commented Oct 29, 2021

@kstefanj The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Oct 29, 2021
@mlbridge
Copy link

mlbridge bot commented Oct 29, 2021

Webrevs

assert(_bot_updates, "must only be called for regions doing BOT updates");
_alloc_region->update_bot_at(addr, size);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why do we need to update for the filler objects? Because we are not scanning them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. You are correct in that we never need to scan these parts because of dirty cards, but during remembered set rebuilding (see: G1RebuildRemSetHeapRegionClosure::rebuild_rem_set_in_region(...)) we include the whole region when looking for references to other heap regions.

There might be some good way to avoid scanning those parts during rebuild, but such investigation is out of scope for this PR.

Thanks for reviewing 😄

@linade
Copy link
Contributor

linade commented Nov 1, 2021

Another thought is maybe G1BlockOffsetTable::set_offset_array/_raw don't need to be conservatively ordered, that is, they can be memory_order_relaxed. Since now all updates are in the pause and there is no concurrent readers.

@kstefanj
Copy link
Contributor Author

kstefanj commented Nov 1, 2021

Another thought is maybe G1BlockOffsetTable::set_offset_array/_raw don't need to be conservatively ordered, that is, they can be memory_order_relaxed. Since now all updates are in the pause and there is no concurrent readers.

Another good point. I think such change should also be done as a separate PR and we should probably also look at hardening the code to not allow updates because the BOT is not precise (because it should be). I will file issues for these two things.

@kstefanj
Copy link
Contributor Author

kstefanj commented Nov 1, 2021

Another thought is maybe G1BlockOffsetTable::set_offset_array/_raw don't need to be conservatively ordered, that is, they can be memory_order_relaxed. Since now all updates are in the pause and there is no concurrent readers.

Took a closer look at G1BlockOffsetTable::set_offset_array/_raw, it uses Atomic::store(...) and if I'm not mistaken that will not be "conservatively ordered".

@linade
Copy link
Contributor

linade commented Nov 2, 2021

Another thought is maybe G1BlockOffsetTable::set_offset_array/_raw don't need to be conservatively ordered, that is, they can be memory_order_relaxed. Since now all updates are in the pause and there is no concurrent readers.

Took a closer look at G1BlockOffsetTable::set_offset_array/_raw, it uses Atomic::store(...) and if I'm not mistaken that will not be "conservatively ordered".

You are right, it's not conservatively ordered. My bad : )

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first cut of ideas to improve the change a bit - I really like it from a technical point of view. However it feels kind of patched in instead of some part of a whole.

Particularly one issue I would like to discuss in detail is that we now do bot updates at different levels (G1PLABAllocator, G1AllocRegion). The suggestion to move the BOT update for the waste into G1PLAB::retire_internal could fix that though. Could you look into this refactoring suggestion in more detail please?

Other comments may be outdated if/after this change.

HeapRegion* _bot_plab_region;
// Current BOT threshold, a PLAB allocation crossing this threshold will cause a BOT
// update.
HeapWord* _bot_plab_threshold;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also only interesting for the old generation, isn't it? Same issue as above.

@@ -133,4 +133,60 @@ inline HeapWord* G1PLABAllocator::allocate(G1HeapRegionAttr dest,
return allocate_direct_or_new_plab(dest, word_sz, refill_failed, node_index);
}

inline void G1PLABAllocator::update_bot_for_plab_waste(G1HeapRegionAttr attr, G1PLAB* plab) {
if (!attr.is_old()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to make an extra predicate like needs_bot_update() for this instead of explictly replicating the code in a few places (and always adding the comment).

G1PLAB(size_t word_sz);
bool is_allocated();
HeapWord* get_filler();
size_t get_filler_size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of two methods that return the remaining space, one that returns a MemRegion would be nicer? Also call it something like remainder or so.

What about making PLAB::retire_internal virtual and override here, so that the explicit call in G1PLABAllocator::allocate_direct_or_new_plab goes away? (and these two helpers, and probably also is_allocated()).

Also the explict call in G1PLABAllocator::flush_and_retire_stats could maybe be hidden too. You could store the G1PLABAllocator in G1PLAB so that it can call the update_bot.... method in retire_internal.

@@ -498,6 +498,9 @@ oop G1ParScanThreadState::do_copy_to_survivor_space(G1HeapRegionAttr const regio
obj->incr_age();
}
_age_table.add(age, word_sz);
} else {
assert(dest_attr.is_old(), "Only update bot for allocations in old");
_plab_allocator->update_bot_for_object(obj_ptr, word_sz);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good if the PLABAllocator returned a struct instead of just a pointer that contains

  • the HeapWord
  • word_sz
  • whether it was a direct allocation
    ?

Then this struct could be passed in here again instead of the code for update_bot_for_object trying to reconstruct whether it has been an out-of-plab allocation or not.

Or just skip the word_sz in that struct. Alternatively add a return value whether this allocation has been out-of-plab. But this method already has lots of locals (is too long?), so starting to group them might be a good idea.

I think explicitly carrying this information around would be much much cleaner to understand than trying to reconstruct that information later in `update_bot_for_object´ via

  if (!alloc_buffer(G1HeapRegionAttr::Old, 0)->contains(obj_start)) {
    // Out of PLAB allocation, BOT already updated.
    return;
  }

@@ -156,14 +166,27 @@ class G1PLABAllocator : public CHeapObj<mtGC> {
G1CollectedHeap* _g1h;
G1Allocator* _allocator;

PLAB** _alloc_buffers[G1HeapRegionAttr::Num];
// Region where the current old generation PLAB is allocated. Used to do BOT updates.
HeapRegion* _bot_plab_region;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a breakage of abstraction if we only store this information for old gen - we after all allocate the G1PLAB for all G1HeapRegionAttr::num destination areas.
What about putting all that stuff into G1PLAB and initializing it appropriately?

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Some minor naming/method location suggestions.

@@ -198,6 +217,9 @@ class G1PLABAllocator : public CHeapObj<mtGC> {
bool* refill_failed,
uint node_index);

// Update the BOT for the last PLAB allocation.
inline void update_bot_for_allocation(G1HeapRegionAttr dest, size_t word_sz, uint node_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
inline void update_bot_for_allocation(G1HeapRegionAttr dest, size_t word_sz, uint node_index);
inline void update_bot_for_allocation(G1HeapRegionAttr dest, size_t word_sz, uint node_index);
Suggested change
inline void update_bot_for_allocation(G1HeapRegionAttr dest, size_t word_sz, uint node_index);
inline void update_bot_for_plab_allocation(G1HeapRegionAttr dest, size_t word_sz, uint node_index);

I would explicitly call this out as to be used for PLAB allocation. If changed, obviously needs updates to the callers as well.

@@ -103,6 +103,7 @@ struct G1HeapRegionAttr {
bool is_young() const { return type() == Young; }
bool is_old() const { return type() == Old; }
bool is_optional() const { return type() == Optional; }
bool needs_bot_update() const { return is_old(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that predicate needs to be here, I'd probably just add a method to G1PLABAllocator. But it is fine to me.

@openjdk
Copy link

openjdk bot commented Nov 8, 2021

@kstefanj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8276098: Do precise BOT updates in G1 evacuation phase

Reviewed-by: tschatzl, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 204 new commits pushed to the master branch:

  • 8747882: 8276790: Rename GenericCDSFileMapHeader::_base_archive_path_offset
  • 38e6d5d: 8276677: Malformed Javadoc inline tags in JDK source in javax/net/ssl
  • a7dedb5: 8276772: Refine javax.lang.model docs
  • 14d66bd: 8276654: element-list order is non deterministic
  • 905e3e8: 8231490: Ugly racy writes to ZipUtils.defaultBuf
  • e383d26: 8275199: Bogus warning generated for serializable records
  • 7e73bca: 8276408: Deprecate Runtime.exec methods with a single string command line argument
  • 75adf54: 8276306: jdk/jshell/CustomInputToolBuilder.java fails intermittently on storage acquisition
  • 7320b77: 8276548: Use range based visitor for Howl-Full cards
  • ea23e73: 8249004: Reduce ThreadsListHandle overhead in relation to direct handshakes
  • ... and 194 more: https://git.openjdk.java.net/jdk/compare/c94dc2ab60f0548afa868e41a0b87a68030e0cac...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 8, 2021
@kstefanj
Copy link
Contributor Author

kstefanj commented Nov 8, 2021

Me and @tschatzl have discussed this offline and the above changes are the outcome of that. I will redo testing and if anything doesn't look good I'll update the PR.

To summarize the changes a bit:

  • The new G1 PLAB is changed to hold the region and threshold information and is only used for old generation PLABs.
  • BOT updates for PLAB and region waste is handled by making sure any filler objects get updates if in old regions.
  • No need to filter direct allocations when updating BOT for PLAB allocation since a direct allocation will never alter the PLAB _top.

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better :)

Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments/suggestions.

Comment on lines 303 to 308
if (state == G1HeapRegionAttr::Old) {
// Specialized PLABs for old that handle BOT updates for object allocations.
_alloc_buffers[state][node_index] = new G1BotUpdatingPLAB(_g1h->desired_plab_sz(state));
} else {
_alloc_buffers[state][node_index] = new PLAB(_g1h->desired_plab_sz(state));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ternary operator can be used here:

word_sz = _g1h->desired_plab_sz(state);
// ...
_alloc_buffers[state][node_index] = (state == G1HeapRegionAttr::Old)
                                  ? new G1BotUpdatingPLAB(word_sz)
                                  : new PLAB(word_sz);

Comment on lines 242 to 248
assert(_bot_part.threshold_for_addr(addr) >= addr,
"threshold must be at or after given address. " PTR_FORMAT " >= " PTR_FORMAT,
p2i(_bot_part.threshold_for_addr(addr)), p2i(addr));
assert(is_old(),
"Should only calculate BOT threshold for old regions. addr: " PTR_FORMAT " region:" HR_FORMAT,
p2i(addr), HR_FORMAT_PARAMS(this));
return _bot_part.threshold_for_addr(addr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling _bot_part.threshold_for_addr multiple times, storing the result in a local var could be cleaner.

@@ -119,7 +119,7 @@ class PLAB: public CHeapObj<mtGC> {
}

// Sets the space of the buffer to be [buf, space+word_sz()).
void set_buf(HeapWord* buf, size_t new_word_sz) {
virtual void set_buf(HeapWord* buf, size_t new_word_sz) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if override is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the base class so this is not an override but declaring the function as virtual.

public:
G1BotUpdatingPLAB(size_t word_sz) : PLAB(word_sz) { }
// Sets the new PLAB buffer as well as updates the threshold and region.
virtual void set_buf(HeapWord* buf, size_t word_sz);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is override better here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change this to override if you prefer. I see that we have some uses of it in G1 already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please, since it makes the intention explicit and enforces it in compile-time.

@linade
Copy link
Contributor

linade commented Nov 9, 2021

First of all, pause time looks good from my test~

I was thinking that forward_to_block_containing_addr_slow should never be reached now, since all entries are now precise. However, when I add ShouldNotReachHere before forward_to_block_containing_addr_slow in G1BlockOffsetTablePart::forward_to_block_containing_addr I noticed that there are some exceptional cases and in these cases n always equals to addr. Not sure if this is expected? I'll continue looking, just want to report this very quickly.

@linade
Copy link
Contributor

linade commented Nov 9, 2021

I was thinking that forward_to_block_containing_addr_slow should never be reached now, since all entries are now precise. However, when I add ShouldNotReachHere before forward_to_block_containing_addr_slow in G1BlockOffsetTablePart::forward_to_block_containing_addr I noticed that there are some exceptional cases and in these cases n always equals to addr. Not sure if this is expected? I'll continue looking, just want to report this very quickly.

Okay, it's actually not relevant to this patch. G1ScanHRForRegionClosure::do_claimed_block has made an unaligned request to BOT because of _scanned_to. That is, _scanned_to is not aligned on card boundary. And this causes a slow path. I think there might be an easy way to prevent slow path, e.g., make a special case for unaligned mr.start() in HeapRegion::oops_on_memregion_seq_iterate_careful. But that might be another topic.

@tschatzl
Copy link
Contributor

tschatzl commented Nov 9, 2021

Okay, it's actually not relevant to this patch. G1ScanHRForRegionClosure::do_claimed_block has made an unaligned request to BOT because of _scanned_to. That is, _scanned_to is not aligned on card boundary. And this causes a slow path. I think there might be an easy way to prevent slow path, e.g., make a special case for unaligned mr.start() in HeapRegion::oops_on_memregion_seq_iterate_careful. But that might be another topic.

In this case _scanned_to points to a valid object anyway; from just reading the code, possibly the block_start() call in HeapRegion::oops_on_memregion_seq_iterate_careful shouldn't be done in this case. Maybe if the caller of HeapRegion::oops_on_memregion_seq_iterate_careful already finds the necessary start.

Or block_start could be changed to block_start_const in this case.

However this is a different issue I think.

@kstefanj
Copy link
Contributor Author

kstefanj commented Nov 9, 2021

Okay, it's actually not relevant to this patch. G1ScanHRForRegionClosure::do_claimed_block has made an unaligned request to BOT because of _scanned_to. That is, _scanned_to is not aligned on card boundary. And this causes a slow path. I think there might be an easy way to prevent slow path, e.g., make a special case for unaligned mr.start() in HeapRegion::oops_on_memregion_seq_iterate_careful. But that might be another topic.

In this case _scanned_to points to a valid object anyway; from just reading the code, possibly the block_start() call in HeapRegion::oops_on_memregion_seq_iterate_careful shouldn't be done in this case. Maybe if the caller of HeapRegion::oops_on_memregion_seq_iterate_careful already finds the necessary start.

Or block_start could be changed to block_start_const in this case.

However this is a different issue I think.

I've created JDK-8276229 and I plan to open a PR for this once this issue has been resolved.

I also noticed that we sometimes still enter forward_to_block_containing_addr_slow and my solution is (as Thomas suggest) to use the "const" versions of the functions, but renamed without "const".

@linade
Copy link
Contributor

linade commented Nov 9, 2021

Okay, thanks for clarifying :)

@tschatzl
Copy link
Contributor

tschatzl commented Nov 9, 2021

As mentioned, in this particular case even the block_start_const call would be unnecessary (in oops_on_memregion_seq_iterate_careful), but that's just another optimization specially for walking through the cards during scanning.

@kstefanj
Copy link
Contributor Author

kstefanj commented Nov 9, 2021

@linade, also many thanks for verifying that the pause times look good on your side as well.

If you want to get credited as a reviewer for this change you need to change your review response to be approved.

Copy link
Member

@albertnetymk albertnetymk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to understand what G1BlockOffsetTablePart::threshold_for_addr does is essentially align_up. Maybe another PR to make the intention more explicit.

@linade
Copy link
Contributor

linade commented Nov 9, 2021

/approve

@openjdk
Copy link

openjdk bot commented Nov 9, 2021

@linade Unknown command approve - for a list of valid commands use /help.

@kstefanj
Copy link
Contributor Author

kstefanj commented Nov 9, 2021

Thanks @linade, @albertnetymk and @tschatzl for the reviews.

My additional testing all looks good so I'll integrate straight away.

/integrate

@openjdk
Copy link

openjdk bot commented Nov 9, 2021

Going to push as commit 945f408.
Since your change was applied there have been 204 commits pushed to the master branch:

  • 8747882: 8276790: Rename GenericCDSFileMapHeader::_base_archive_path_offset
  • 38e6d5d: 8276677: Malformed Javadoc inline tags in JDK source in javax/net/ssl
  • a7dedb5: 8276772: Refine javax.lang.model docs
  • 14d66bd: 8276654: element-list order is non deterministic
  • 905e3e8: 8231490: Ugly racy writes to ZipUtils.defaultBuf
  • e383d26: 8275199: Bogus warning generated for serializable records
  • 7e73bca: 8276408: Deprecate Runtime.exec methods with a single string command line argument
  • 75adf54: 8276306: jdk/jshell/CustomInputToolBuilder.java fails intermittently on storage acquisition
  • 7320b77: 8276548: Use range based visitor for Howl-Full cards
  • ea23e73: 8249004: Reduce ThreadsListHandle overhead in relation to direct handshakes
  • ... and 194 more: https://git.openjdk.java.net/jdk/compare/c94dc2ab60f0548afa868e41a0b87a68030e0cac...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Nov 9, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 9, 2021
@openjdk
Copy link

openjdk bot commented Nov 9, 2021

@kstefanj Pushed as commit 945f408.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants