-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8283935: Parallel: Crash during pretouch after large pages allocation failure #8090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8283935: Parallel: Crash during pretouch after large pages allocation failure #8090
Conversation
|
👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into |
Webrevs
|
| } | ||
| assert(mr.contains(head) && mr.contains(tail), "Sanity"); | ||
|
|
||
| size_t page_size = UseLargePages ? alignment() : os::vm_page_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just using alignment should be sufficient here, and there's no need to conditionalize on UseLargePages. But maybe alignment should be renamed to page_size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just using alignment should be sufficient here
I did not want to change too much, but yes, I'll try that.
But maybe alignment should be renamed to page_size?
I wanted to keep the patch small, but I'll look into this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is actively wrong to be checking UseLargePages here. That is done when the space is reserved, and the reservation succeeds or fails, with the actual page size determined by that. Once we're here the decision of whether to use large pages is over with, and might have been in the negative, but without changing the flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was too quick changing the setter name from alignment() to page_size() - the subclass MutableNUMASpace already contains such a method, and that has been another reason to not do this. Please let me undo and look at this again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a change now that just fixes the issues, without renaming alignment(). I would like to keep this out of this change because MutableNUMASpace has its own page_size() already, which it may modify (and reallocate everything apparently)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bleh! It seems like there's a bunch of confused / confusing code in MutableNUMASpace. It looks like page_size() always equals alignment() and alignment() is set at construction time and never changes. There are a number of places in MutableNUMASpace that are checking for changes in the page size that seemingly can never happen. And there are places in MutableNUMASpace that are checking UseLargePages inappropriately for the same reasons as discussed above. And even when the value is accurate, some of those places look (at first glance) like they are doing unnecessary extra work.
So I'm going to call your changes good, in so far as they are fixing the crashing bug. But there really should be some followup in this area.
kimbarrett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good. There needs to be some followup work.
|
@tschatzl This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 26 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
Thanks @albertnetymk @kimbarrett for your reviews. /integrate |
|
Going to push as commit b56df28.
Your commit was automatically rebased without conflicts. |
Hi all,
can I have reviews for this change that fixes a crash with Parallel GC,
AlwaysPretouchand misconfigured large pages?The
AlwaysPreTouchcode for Parallel GC does not use the actual used large page size for the heap, but the one passed in by the user. If that page size is not actually available, and the heap is not aligned to that used page size, there will be an out-of-bounds access by the pretouch code.The change simply makes the code use the correct page size (which is passed in as
alignmentintoMutableSpace- but that is another issue).Testing: local testing of failing test case (see CR), tier1-3
Thanks,
Thomas
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/8090/head:pull/8090$ git checkout pull/8090Update a local copy of the PR:
$ git checkout pull/8090$ git pull https://git.openjdk.java.net/jdk pull/8090/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 8090View PR using the GUI difftool:
$ git pr show -t 8090Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/8090.diff