Skip to content

Conversation

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Apr 4, 2022

Hi all,

can I have reviews for this change that fixes a crash with Parallel GC, AlwaysPretouch and misconfigured large pages?

The AlwaysPreTouch code for Parallel GC does not use the actual used large page size for the heap, but the one passed in by the user. If that page size is not actually available, and the heap is not aligned to that used page size, there will be an out-of-bounds access by the pretouch code.

The change simply makes the code use the correct page size (which is passed in as alignment into MutableSpace - but that is another issue).

Testing: local testing of failing test case (see CR), tier1-3

Thanks,
Thomas


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8283935: Parallel: Crash during pretouch after large pages allocation failure

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8090/head:pull/8090
$ git checkout pull/8090

Update a local copy of the PR:
$ git checkout pull/8090
$ git pull https://git.openjdk.java.net/jdk pull/8090/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 8090

View PR using the GUI difftool:
$ git pr show -t 8090

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8090.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 4, 2022

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8283935 8283935: Parallel: Crash during pretouch after large pages allocation failure Apr 4, 2022
@openjdk
Copy link

openjdk bot commented Apr 4, 2022

@tschatzl The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Apr 4, 2022
@tschatzl tschatzl marked this pull request as ready for review April 4, 2022 11:10
@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 4, 2022
@mlbridge
Copy link

mlbridge bot commented Apr 4, 2022

Webrevs

}
assert(mr.contains(head) && mr.contains(tail), "Sanity");

size_t page_size = UseLargePages ? alignment() : os::vm_page_size();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just using alignment should be sufficient here, and there's no need to conditionalize on UseLargePages. But maybe alignment should be renamed to page_size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just using alignment should be sufficient here

I did not want to change too much, but yes, I'll try that.

But maybe alignment should be renamed to page_size?

I wanted to keep the patch small, but I'll look into this too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is actively wrong to be checking UseLargePages here. That is done when the space is reserved, and the reservation succeeds or fails, with the actual page size determined by that. Once we're here the decision of whether to use large pages is over with, and might have been in the negative, but without changing the flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. I agree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was too quick changing the setter name from alignment() to page_size() - the subclass MutableNUMASpace already contains such a method, and that has been another reason to not do this. Please let me undo and look at this again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a change now that just fixes the issues, without renaming alignment(). I would like to keep this out of this change because MutableNUMASpace has its own page_size() already, which it may modify (and reallocate everything apparently)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bleh! It seems like there's a bunch of confused / confusing code in MutableNUMASpace. It looks like page_size() always equals alignment() and alignment() is set at construction time and never changes. There are a number of places in MutableNUMASpace that are checking for changes in the page size that seemingly can never happen. And there are places in MutableNUMASpace that are checking UseLargePages inappropriately for the same reasons as discussed above. And even when the value is accurate, some of those places look (at first glance) like they are doing unnecessary extra work.

So I'm going to call your changes good, in so far as they are fixing the crashing bug. But there really should be some followup in this area.

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good. There needs to be some followup work.

@openjdk
Copy link

openjdk bot commented Apr 5, 2022

@tschatzl This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8283935: Parallel: Crash during pretouch after large pages allocation failure

Reviewed-by: kbarrett, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 26 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 5, 2022
@tschatzl
Copy link
Contributor Author

tschatzl commented Apr 6, 2022

Thanks @albertnetymk @kimbarrett for your reviews.

/integrate

@openjdk
Copy link

openjdk bot commented Apr 6, 2022

Going to push as commit b56df28.
Since your change was applied there have been 26 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 6, 2022
@openjdk openjdk bot closed this Apr 6, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 6, 2022
@openjdk
Copy link

openjdk bot commented Apr 6, 2022

@tschatzl Pushed as commit b56df28.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@tschatzl tschatzl deleted the submit/8283935-parallel-gc-wrong-page-size-for-pretouch branch April 6, 2022 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants