-
Notifications
You must be signed in to change notification settings - Fork 5.8k
8262291: Refactor reserve_memory_special_huge_tlbfs #3073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into |
This comment has been minimized.
This comment has been minimized.
@kstefanj |
@kstefanj |
/label remove hotspot-gc |
@kstefanj |
@kstefanj |
Webrevs
|
// be used and it must be aligned to both the large page size | ||
// and the given alignment. The larger of the two will be used. | ||
size_t required_alignment = MAX(os::large_page_size(), alignment); | ||
char* const aligned_start = anon_mmap_aligned(req_addr, bytes, required_alignment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add back the comment "// First reserve - but not commit"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope was that the new comment would be sufficient, but if you think it is needed I could add that the reserved range is not committed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that current comment encompasses // First reserve - but not commit
.
src/hotspot/os/linux/os_linux.cpp
Outdated
char* os::Linux::reserve_memory_special_huge_tlbfs_only(size_t bytes, | ||
char* req_addr, | ||
bool exec) { | ||
char* os::Linux::reserve_and_commit_special(size_t bytes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method name reserve_and_commit_
implicitly suggests that other methods with just reserve_memory_
do not commit (to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point, I've struggled a bit with this function name. Since we actually do the reservation using the call to anon_mmap_aligned()
maybe this one should just be called: commit_memory_special()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, seems better to me
src/hotspot/os/linux/os_linux.cpp
Outdated
::munmap(lp_start, end - lp_start); | ||
return NULL; | ||
} | ||
if (bytes == large_bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we do the check if (large_mapping == NULL) {
before this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, and I thought about adding a comment here. I probably should have. The reason we don't do the explicit check it is that if large_mapping == NULL
then NULL will be returned as expected and there is no additional work to be done if bytes == large_bytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, now I get it. I had missed that when mmap fails, the original reservation is unmapped for this range so no need to do any additional clean up.
src/hotspot/os/linux/os_linux.cpp
Outdated
// The requested size requires some small pages as well. | ||
char* small_start = aligned_start + large_bytes; | ||
size_t small_size = bytes - large_bytes; | ||
if (large_mapping == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do the check here because I need the small_start
and small_size
values if large_mapping == NULL
and there was additional memory reserved for small pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing Ivan, I will wait to update the PR until we come up with a function name that is better than the current.
// be used and it must be aligned to both the large page size | ||
// and the given alignment. The larger of the two will be used. | ||
size_t required_alignment = MAX(os::large_page_size(), alignment); | ||
char* const aligned_start = anon_mmap_aligned(req_addr, bytes, required_alignment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My hope was that the new comment would be sufficient, but if you think it is needed I could add that the reserved range is not committed here.
src/hotspot/os/linux/os_linux.cpp
Outdated
char* os::Linux::reserve_memory_special_huge_tlbfs_only(size_t bytes, | ||
char* req_addr, | ||
bool exec) { | ||
char* os::Linux::reserve_and_commit_special(size_t bytes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point, I've struggled a bit with this function name. Since we actually do the reservation using the call to anon_mmap_aligned()
maybe this one should just be called: commit_memory_special()
src/hotspot/os/linux/os_linux.cpp
Outdated
::munmap(lp_start, end - lp_start); | ||
return NULL; | ||
} | ||
if (bytes == large_bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, and I thought about adding a comment here. I probably should have. The reason we don't do the explicit check it is that if large_mapping == NULL
then NULL will be returned as expected and there is no additional work to be done if bytes == large_bytes
.
src/hotspot/os/linux/os_linux.cpp
Outdated
// The requested size requires some small pages as well. | ||
char* small_start = aligned_start + large_bytes; | ||
size_t small_size = bytes - large_bytes; | ||
if (large_mapping == NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do the check here because I need the small_start
and small_size
values if large_mapping == NULL
and there was additional memory reserved for small pages.
Renamed helper to commit_memory_special and updated the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
src/hotspot/os/linux/os_linux.cpp
Outdated
int prot = exec ? PROT_READ|PROT_WRITE|PROT_EXEC : PROT_READ|PROT_WRITE; | ||
int flags = MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED; | ||
void* result; | ||
// Start of by committing large pages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small Typo // Start of
should be // Start off
or // Start by
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, change this to // First commit using large pages.
since I've already used "start off" above.
src/hotspot/os/linux/os_linux.cpp
Outdated
// We only end up here when at least 1 large page can be used. | ||
// If the size is not a multiple of the large page size, we | ||
// will mix the type of pages used, but in a decending order. | ||
// Start of by reserving a range of the given size that is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small typo, // Start of by reserving
should be // Start off by reserving
or // Start by reserving
// be used and it must be aligned to both the large page size | ||
// and the given alignment. The larger of the two will be used. | ||
size_t required_alignment = MAX(os::large_page_size(), alignment); | ||
char* const aligned_start = anon_mmap_aligned(req_addr, bytes, required_alignment); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that current comment encompasses // First reserve - but not commit
.
if (large_mapping == NULL) { | ||
// Failed to commit large pages, so we need to unmap the | ||
// reminder of the orinal reservation. | ||
::munmap(small_start, small_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that if mmap fails for large pages, it un-maps the reservation area requested for large pages and thus here we only need to munmap for remaining reservation (small pages)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if mmap()
fails we lose the reservation for that range and need to start over only using small pages.
src/hotspot/os/linux/os_linux.cpp
Outdated
return reserve_memory_special_huge_tlbfs_only(bytes, req_addr, exec); | ||
} else { | ||
return reserve_memory_special_huge_tlbfs_mixed(bytes, alignment, req_addr, exec); | ||
// Commit the reminding bytes using small pages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Commit the reminding
should be // Commit the remaining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestions are mostly nits, if the assumption in my comment about un-mapping is correct (I suspect so based on your conversation with Ivan (@walulyai)). Looks good otherwise from my perspective.
Updated comments.
Thanks for reviewing Marcus, updated the comments per your suggestion and also changed one to better fit the flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Update helper name to better match commit_memory_special().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Stefan,
this is a welcome cleanup!
Remarks inline.
Cheers, Thomas
src/hotspot/os/linux/os_linux.cpp
Outdated
char* os::Linux::commit_memory_special(size_t bytes, | ||
size_t page_size, | ||
char* req_addr, | ||
bool exec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if this were file scope static and not exported (don't think this needs anything from the os::Linux namespace, or?).
Also, mid term we could probably merge this with commit_memory. AFAICS the only differences to the latter is that this can do huge pages, and the error handling is a bit different.
I am also not sure the return type makes a lot of sense. Either we handle commit errors inside this function, then we should return void. Or, we return a boolean and handle it in the caller.
Long term I would prefer to handle allocation errors not by aborting but by leaving it up to the caller what to do. Because it makes sense to have the option to "reserve if we get large pages, otherwise just use small pages". Eg this I would like to do in a future Metaspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first plan as well, but it uses os::Linux::hugetlbfs_page_size_flag()
which in turn depends on os::Linux::default_large_page_size()
. So I went with this.
I agree that we long term should look at using/merging commit_memoy()
but didn't want to change to much functionality at once here. What we get now is very much a like what we got before but all large pages at the beginning of the mapping.
I also agree that now when this is really just a commit function, returning the address is a bit strange. I don't see a good way to handle potential failures in here so I opted for returning a bool
.
In some sense it is left to the caller (ReservedSpace
) right now. If using large pages fail, we will try using small pages instead. But not sure if you would want this logic to move to another layer.
char* os::Linux::reserve_memory_special_huge_tlbfs(size_t bytes, | ||
size_t alignment, | ||
char* req_addr, | ||
bool exec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the contract is that this function will allocate huge paged memory with whatever page size is the default (controlled with UseLargePages and LargePageSizeInBytes).
And with Markus future changes we will mix-and-match page sizes best as we can. So, control of page size is out of the hands of the caller. Are there callers misusing alignment for page size?
Otherwise this seems fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, right now we will only use a single large page size at a time. But using more than one (after Marcus change) need to take the alignment into account. As you know I have plans to do more changes in this area so that we can pass down what page size we want for a given reservation. But if we want to apply that change before everything else is in place I will suggest to use the alignment as an upper limit for the page size.
A mixed mapping would still only use one large page size and then fill up with small pages. This is not optimal, but trying to mix and match even more will lead to a lot more error handling and if doing that it should be a separate change.
For cases where we only expect to use large pages (the old only case) the alignment is always set to large_page_size
or larger.
Changed commit_memory_special to return bool to signal if the request succeeded or not.
Thanks for reviewing @tstuefe, if you are ok with these changes I will push this tonight as I will be on parental leave for the coming two weeks. |
@tstuefe, I'm back from my leave now. Do you have any additional comments? |
Looks good to me. Thanks, Thomas |
Thanks @tstuefe, you have to actively approve the PR to make it ready for integration since you are the only Reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, sorry.
Looks good Stefan!
..Thomas
@kstefanj This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 16 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@kstefanj Since your change was applied there have been 16 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit f2f7aa3. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Please review this refactoring of the hugetlbfs reservation code.
Summary
In recent adventures in this area of the code I noticed a strange condition in
reserve_memory_special_huge_tlbfs
where we take the "mixed-mapping" route even if the size doesn't require any small pages to be used:The second condition here is needed because if the alignment is larger than the large page size, we needed to enforce this and can't just trust
mmap
to give us a properly aligned address. Doing this by using the mixed-function feels a bit weird and looking a bit more at this I found a way to refactor this function to avoid having the two helpers.Instead of only having the mixed path honor the passed down alignment, make sure that is always done. This will also have the side-effect that all large pages in a "mixed"-mapping will be at the start and then we will have a tail of small pages. This actually also ensures that we will use large pages for a mixed mapping, in the past there was a corner case where we could end up with just a head and tail of small pages and no large page in between (if the mapping was smaller than 2 large pages and there was no alignment constraint).
Testing
Mach5 tier1-3 and a lot of local testing with different large page configurations.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3073/head:pull/3073
$ git checkout pull/3073
Update a local copy of the PR:
$ git checkout pull/3073
$ git pull https://git.openjdk.java.net/jdk pull/3073/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3073
View PR using the GUI difftool:
$ git pr show -t 3073
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3073.diff