Skip to content

8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc #2309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

kimbarrett
Copy link

@kimbarrett kimbarrett commented Jan 29, 2021

Please review this change to ParallelGC to avoid unnecessary full GCs when
concurrent threads attempt oldgen allocations during evacuation.

When a GC thread fails an oldgen allocation it expands the heap and retries
the allocation. If the second allocation attempt fails then allocation
failure is reported to the caller, which can lead to a full GC. But the
retried allocation could fail because, after expansion, some other thread
allocated enough of the available space that the retry fails. This can
happen even though there is plenty of space available, if only that retry
were to perform another expansion.

Rather than trying to combine the allocation retry with the expansion (it's
not clear there's a way to do so without breaking invariants), we instead
simply loop on the allocation attempt + expand, until either the allocation
succeeds or the expand fails. If some other thread "steals" space from the
expanding thread and causes its next allocation attempt to fail and do
another expansion, that's functionally no different from the expanding
thread succeeding and causing the other thread to fail allocation and do the
expand instead.

This change includes modifying PSOldGen::expand_to_reserved to return false
when there is no space available, where it previously returned true. It's
not clear why it returned true; that seems wrong, but was harmless. But it
must not do so with the new looping behavior for allocation, else it would
never terminate.

Testing:
mach5 tier1-3, tier5 (tier2-3, 5 do a lot of ParallelGC testing)

/summary Loop to retry allocation if expand succeeds.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issues

  • JDK-8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc
  • JDK-8260045: Parallel GC: Waiting on ExpandHeap_lock may cause "expansion storm"

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2309/head:pull/2309
$ git checkout pull/2309

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 29, 2021

👋 Welcome back kbarrett! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 29, 2021

@kimbarrett Setting summary to Loop to retry allocation if expand succeeds.

@openjdk
Copy link

openjdk bot commented Jan 29, 2021

@kimbarrett The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Jan 29, 2021
@kimbarrett kimbarrett marked this pull request as ready for review January 29, 2021 08:29
@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 29, 2021
@mlbridge
Copy link

mlbridge bot commented Jan 29, 2021

Webrevs

HeapWord* res;
do {
res = cas_allocate_noexpand(word_size);
// Retry failed allocation if expand succeeds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... but allocation did not." would be nice to be added to this comment to be complete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a "failed allocation".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed :)

if (bytes == 0) {
return;
return true;
Copy link
Contributor

@tschatzl tschatzl Jan 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if the code would guarantee or at least assert that bytes > 0 because returning true here seems scary wrt to the loop.

All code paths seem to cover this situation already, i.e. with word_size == 0 this should not be called.

But if you think it's not a big issue, we can keep it. This is pre-existing of course.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will make sure a 0 size never gets here and assert/guarantee, or otherwise figure out what to do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed various quick returns on zero size to instead be asserts, since none of them should ever be called with a zero size.

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Thanks.

HeapWord* res;
do {
res = cas_allocate_noexpand(word_size);
// Retry failed allocation if expand succeeds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed :)

@openjdk
Copy link

openjdk bot commented Jan 31, 2021

@kimbarrett This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8260044: Parallel GC: Concurrent allocation after heap expansion may cause unnecessary full gc
8260045: Parallel GC: Waiting on ExpandHeap_lock may cause "expansion storm"

Loop to retry allocation if expand succeeds.  Treat space available after obtaining expand lock as expand success.

Reviewed-by: tschatzl, iwalulya, sjohanss

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the master branch:

  • 92ff891: 8261593: Do not use NULL pointer as write buffer parameter in jfrEmergencyDump.cpp write_repository_files

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 31, 2021
Copy link
Member

@walulyai walulyai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@kimbarrett
Copy link
Author

The problem being addressed here is closely related to the "expand storm"
problem from JDK-8260045. I thought this one could be addressed separately
first, but now think not. Consider if we do an expand with excess here that
uses the remainder of the permitted space. If another thread was blocked
waiting to expand, its expand attempt will fail. With the old code, there
would still be another allocation attempt, but now the failing expand won't
do that.

Reversing the order of fixes doesn't work very well either, as avoiding the
expand storm needs the same sort of infrastructure for retrying a failed
allocation after (optional) expansion.

New commit "avoid expand storms" adds that fix. It's a little bit kludgy
because of JDK-8261284, adding a function to MutableSpace for use only by
PSOldGen. It's not the only weird function or behavior in MutableSpace.

Testing:
mach5 tier1-3, tier5 (tiers with common tests run with ParallelGC)

/issue JDK-8260045

@openjdk
Copy link

openjdk bot commented Feb 8, 2021

@kimbarrett
Adding additional issue to issue list: 8260045: Parallel GC: Waiting on ExpandHeap_lock may cause "expansion storm".

@kimbarrett
Copy link
Author

/summary Loop to retry allocation if expand succeeds. Treat space available after obtaining expand lock as expand success.

@openjdk
Copy link

openjdk bot commented Feb 8, 2021

@kimbarrett Updating existing summary to Loop to retry allocation if expand succeeds. Treat space available after obtaining expand lock as expand success.

@kimbarrett
Copy link
Author

Thanks @tschatzl , @kstefanj , and @walulyai for reviews.

@kimbarrett
Copy link
Author

/integrate

@openjdk openjdk bot closed this Feb 12, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated labels Feb 12, 2021
@openjdk openjdk bot removed the rfr Pull request is ready for review label Feb 12, 2021
@openjdk
Copy link

openjdk bot commented Feb 12, 2021

@kimbarrett Since your change was applied there has been 1 commit pushed to the master branch:

  • 92ff891: 8261593: Do not use NULL pointer as write buffer parameter in jfrEmergencyDump.cpp write_repository_files

Your commit was automatically rebased without conflicts.

Pushed as commit 6a84ec6.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@kimbarrett kimbarrett deleted the retry_alloc branch February 12, 2021 08:20
@tschatzl
Copy link
Contributor

Still looks good. Thanks. (I am aware this change has already been integrated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants