Skip to content

8262291: Refactor reserve_memory_special_huge_tlbfs #3073

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Mar 18, 2021

Please review this refactoring of the hugetlbfs reservation code.

Summary
In recent adventures in this area of the code I noticed a strange condition in reserve_memory_special_huge_tlbfs where we take the "mixed-mapping" route even if the size doesn't require any small pages to be used:

  if (is_aligned(bytes, os::large_page_size()) && alignment <= os::large_page_size()) {
    return reserve_memory_special_huge_tlbfs_only(bytes, req_addr, exec);
  } else {
    return reserve_memory_special_huge_tlbfs_mixed(bytes, alignment, req_addr, exec);
  }

The second condition here is needed because if the alignment is larger than the large page size, we needed to enforce this and can't just trust mmap to give us a properly aligned address. Doing this by using the mixed-function feels a bit weird and looking a bit more at this I found a way to refactor this function to avoid having the two helpers.

Instead of only having the mixed path honor the passed down alignment, make sure that is always done. This will also have the side-effect that all large pages in a "mixed"-mapping will be at the start and then we will have a tail of small pages. This actually also ensures that we will use large pages for a mixed mapping, in the past there was a corner case where we could end up with just a head and tail of small pages and no large page in between (if the mapping was smaller than 2 large pages and there was no alignment constraint).

Testing
Mach5 tier1-3 and a lot of local testing with different large page configurations.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8262291: Refactor reserve_memory_special_huge_tlbfs

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3073/head:pull/3073
$ git checkout pull/3073

Update a local copy of the PR:
$ git checkout pull/3073
$ git pull https://git.openjdk.java.net/jdk pull/3073/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3073

View PR using the GUI difftool:
$ git pr show -t 3073

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3073.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 18, 2021

👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 18, 2021

@kstefanj The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Mar 18, 2021
@kstefanj

This comment has been minimized.

@openjdk openjdk bot removed the hotspot-runtime hotspot-runtime-dev@openjdk.org label Mar 18, 2021
@openjdk
Copy link

openjdk bot commented Mar 18, 2021

@kstefanj
The hotspot-runtime label was successfully removed.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Mar 18, 2021
@openjdk
Copy link

openjdk bot commented Mar 18, 2021

@kstefanj
The hotspot-gc label was successfully added.

@kstefanj
Copy link
Contributor Author

/label remove hotspot-gc
/label add hotspot

@openjdk openjdk bot removed the hotspot-gc hotspot-gc-dev@openjdk.org label Mar 18, 2021
@openjdk
Copy link

openjdk bot commented Mar 18, 2021

@kstefanj
The hotspot-gc label was successfully removed.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Mar 18, 2021
@openjdk
Copy link

openjdk bot commented Mar 18, 2021

@kstefanj
The hotspot label was successfully added.

@kstefanj kstefanj marked this pull request as ready for review March 18, 2021 14:08
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 18, 2021
@mlbridge
Copy link

mlbridge bot commented Mar 18, 2021

Webrevs

// be used and it must be aligned to both the large page size
// and the given alignment. The larger of the two will be used.
size_t required_alignment = MAX(os::large_page_size(), alignment);
char* const aligned_start = anon_mmap_aligned(req_addr, bytes, required_alignment);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add back the comment "// First reserve - but not commit"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope was that the new comment would be sufficient, but if you think it is needed I could add that the reserved range is not committed here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that current comment encompasses // First reserve - but not commit.

char* os::Linux::reserve_memory_special_huge_tlbfs_only(size_t bytes,
char* req_addr,
bool exec) {
char* os::Linux::reserve_and_commit_special(size_t bytes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method name reserve_and_commit_ implicitly suggests that other methods with just reserve_memory_ do not commit (to me).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, I've struggled a bit with this function name. Since we actually do the reservation using the call to anon_mmap_aligned() maybe this one should just be called: commit_memory_special()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, seems better to me

::munmap(lp_start, end - lp_start);
return NULL;
}
if (bytes == large_bytes) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we do the check if (large_mapping == NULL) { before this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, and I thought about adding a comment here. I probably should have. The reason we don't do the explicit check it is that if large_mapping == NULL then NULL will be returned as expected and there is no additional work to be done if bytes == large_bytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, now I get it. I had missed that when mmap fails, the original reservation is unmapped for this range so no need to do any additional clean up.

// The requested size requires some small pages as well.
char* small_start = aligned_start + large_bytes;
size_t small_size = bytes - large_bytes;
if (large_mapping == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do the check here because I need the small_start and small_size values if large_mapping == NULL and there was additional memory reserved for small pages.

Copy link
Contributor Author

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing Ivan, I will wait to update the PR until we come up with a function name that is better than the current.

// be used and it must be aligned to both the large page size
// and the given alignment. The larger of the two will be used.
size_t required_alignment = MAX(os::large_page_size(), alignment);
char* const aligned_start = anon_mmap_aligned(req_addr, bytes, required_alignment);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope was that the new comment would be sufficient, but if you think it is needed I could add that the reserved range is not committed here.

char* os::Linux::reserve_memory_special_huge_tlbfs_only(size_t bytes,
char* req_addr,
bool exec) {
char* os::Linux::reserve_and_commit_special(size_t bytes,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point, I've struggled a bit with this function name. Since we actually do the reservation using the call to anon_mmap_aligned() maybe this one should just be called: commit_memory_special()

::munmap(lp_start, end - lp_start);
return NULL;
}
if (bytes == large_bytes) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, and I thought about adding a comment here. I probably should have. The reason we don't do the explicit check it is that if large_mapping == NULL then NULL will be returned as expected and there is no additional work to be done if bytes == large_bytes.

// The requested size requires some small pages as well.
char* small_start = aligned_start + large_bytes;
size_t small_size = bytes - large_bytes;
if (large_mapping == NULL) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do the check here because I need the small_start and small_size values if large_mapping == NULL and there was additional memory reserved for small pages.

Renamed helper to commit_memory_special and updated the comments.
Copy link
Member

@walulyai walulyai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

int prot = exec ? PROT_READ|PROT_WRITE|PROT_EXEC : PROT_READ|PROT_WRITE;
int flags = MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED;
void* result;
// Start of by committing large pages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small Typo // Start of should be // Start off or // Start by

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, change this to // First commit using large pages. since I've already used "start off" above.

// We only end up here when at least 1 large page can be used.
// If the size is not a multiple of the large page size, we
// will mix the type of pages used, but in a decending order.
// Start of by reserving a range of the given size that is
Copy link
Contributor

@mgkwill mgkwill Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo, // Start of by reserving should be // Start off by reserving or // Start by reserving

// be used and it must be aligned to both the large page size
// and the given alignment. The larger of the two will be used.
size_t required_alignment = MAX(os::large_page_size(), alignment);
char* const aligned_start = anon_mmap_aligned(req_addr, bytes, required_alignment);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that current comment encompasses // First reserve - but not commit.

if (large_mapping == NULL) {
// Failed to commit large pages, so we need to unmap the
// reminder of the orinal reservation.
::munmap(small_start, small_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that if mmap fails for large pages, it un-maps the reservation area requested for large pages and thus here we only need to munmap for remaining reservation (small pages)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if mmap() fails we lose the reservation for that range and need to start over only using small pages.

return reserve_memory_special_huge_tlbfs_only(bytes, req_addr, exec);
} else {
return reserve_memory_special_huge_tlbfs_mixed(bytes, alignment, req_addr, exec);
// Commit the reminding bytes using small pages.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Commit the reminding should be // Commit the remaining

Copy link
Contributor

@mgkwill mgkwill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions are mostly nits, if the assumption in my comment about un-mapping is correct (I suspect so based on your conversation with Ivan (@walulyai)). Looks good otherwise from my perspective.

Updated comments.
@kstefanj
Copy link
Contributor Author

Suggestions are mostly nits, if the assumption in my comment about un-mapping is correct (I suspect so based on your conversation with Ivan (@walulyai)). Looks good otherwise from my perspective.

Thanks for reviewing Marcus, updated the comments per your suggestion and also changed one to better fit the flow.

Copy link
Contributor

@mgkwill mgkwill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Update helper name to better match commit_memory_special().
Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stefan,

this is a welcome cleanup!

Remarks inline.

Cheers, Thomas

char* os::Linux::commit_memory_special(size_t bytes,
size_t page_size,
char* req_addr,
bool exec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if this were file scope static and not exported (don't think this needs anything from the os::Linux namespace, or?).

Also, mid term we could probably merge this with commit_memory. AFAICS the only differences to the latter is that this can do huge pages, and the error handling is a bit different.

I am also not sure the return type makes a lot of sense. Either we handle commit errors inside this function, then we should return void. Or, we return a boolean and handle it in the caller.

Long term I would prefer to handle allocation errors not by aborting but by leaving it up to the caller what to do. Because it makes sense to have the option to "reserve if we get large pages, otherwise just use small pages". Eg this I would like to do in a future Metaspace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first plan as well, but it uses os::Linux::hugetlbfs_page_size_flag() which in turn depends on os::Linux::default_large_page_size(). So I went with this.

I agree that we long term should look at using/merging commit_memoy() but didn't want to change to much functionality at once here. What we get now is very much a like what we got before but all large pages at the beginning of the mapping.

I also agree that now when this is really just a commit function, returning the address is a bit strange. I don't see a good way to handle potential failures in here so I opted for returning a bool.

In some sense it is left to the caller (ReservedSpace) right now. If using large pages fail, we will try using small pages instead. But not sure if you would want this logic to move to another layer.

char* os::Linux::reserve_memory_special_huge_tlbfs(size_t bytes,
size_t alignment,
char* req_addr,
bool exec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the contract is that this function will allocate huge paged memory with whatever page size is the default (controlled with UseLargePages and LargePageSizeInBytes).

And with Markus future changes we will mix-and-match page sizes best as we can. So, control of page size is out of the hands of the caller. Are there callers misusing alignment for page size?

Otherwise this seems fine to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, right now we will only use a single large page size at a time. But using more than one (after Marcus change) need to take the alignment into account. As you know I have plans to do more changes in this area so that we can pass down what page size we want for a given reservation. But if we want to apply that change before everything else is in place I will suggest to use the alignment as an upper limit for the page size.

A mixed mapping would still only use one large page size and then fill up with small pages. This is not optimal, but trying to mix and match even more will lead to a lot more error handling and if doing that it should be a separate change.

For cases where we only expect to use large pages (the old only case) the alignment is always set to large_page_size or larger.

Changed commit_memory_special to return bool to signal if the request succeeded or not.
@kstefanj
Copy link
Contributor Author

Thanks for reviewing @tstuefe, if you are ok with these changes I will push this tonight as I will be on parental leave for the coming two weeks.

@kstefanj
Copy link
Contributor Author

@tstuefe, I'm back from my leave now. Do you have any additional comments?

@tstuefe
Copy link
Member

tstuefe commented Apr 12, 2021

Looks good to me.

Thanks, Thomas

@kstefanj
Copy link
Contributor Author

Thanks @tstuefe, you have to actively approve the PR to make it ready for integration since you are the only Reviewer.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, sorry.

Looks good Stefan!

..Thomas

@openjdk
Copy link

openjdk bot commented Apr 13, 2021

@kstefanj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8262291: Refactor reserve_memory_special_huge_tlbfs

Reviewed-by: iwalulya, stuefe

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 16 new commits pushed to the master branch:

  • 008fc75: 8264224: Add macosx-aarch64 to Oracle build configurations
  • f4e6395: 8264190: Harden TLS interop tests
  • 18bec9c: 8265084: [BACKOUT] 8264954: unified handling for VectorMask object re-materialization during de-optimization
  • 9dd9625: 8263763: Synthetic constructor parameters of enum are not considered for annotation indices
  • 1ee80e0: 8261355: No data buffering in SunPKCS11 Cipher encryption when the underlying mechanism has no padding
  • d84a7e5: 8264124: Update MXBean specification and implementation to extend mapping of CompositeType to records
  • 714ae54: 8258788: incorrect response to change in window insets [lanai]
  • f479437: 8265082: test/hotspot/jtreg/gc/g1/TestG1SkipCompaction.java fails validate-source
  • 27f4b27: 8264623: Change to Xcode 12.4 for building on Macos at Oracle
  • 7c20d97: 8265052: Break circular include dependency in objArrayOop.inline.hpp
  • ... and 6 more: https://git.openjdk.java.net/jdk/compare/cb2806db4ed188a53161d13a0d7f882b1afe00ec...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 13, 2021
@kstefanj
Copy link
Contributor Author

Thanks for the reviews @tstuefe, @mgkwill and @walulyai

/integrate

@openjdk openjdk bot closed this Apr 13, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 13, 2021
@openjdk
Copy link

openjdk bot commented Apr 13, 2021

@kstefanj Since your change was applied there have been 16 commits pushed to the master branch:

  • 008fc75: 8264224: Add macosx-aarch64 to Oracle build configurations
  • f4e6395: 8264190: Harden TLS interop tests
  • 18bec9c: 8265084: [BACKOUT] 8264954: unified handling for VectorMask object re-materialization during de-optimization
  • 9dd9625: 8263763: Synthetic constructor parameters of enum are not considered for annotation indices
  • 1ee80e0: 8261355: No data buffering in SunPKCS11 Cipher encryption when the underlying mechanism has no padding
  • d84a7e5: 8264124: Update MXBean specification and implementation to extend mapping of CompositeType to records
  • 714ae54: 8258788: incorrect response to change in window insets [lanai]
  • f479437: 8265082: test/hotspot/jtreg/gc/g1/TestG1SkipCompaction.java fails validate-source
  • 27f4b27: 8264623: Change to Xcode 12.4 for building on Macos at Oracle
  • 7c20d97: 8265052: Break circular include dependency in objArrayOop.inline.hpp
  • ... and 6 more: https://git.openjdk.java.net/jdk/compare/cb2806db4ed188a53161d13a0d7f882b1afe00ec...master

Your commit was automatically rebased without conflicts.

Pushed as commit f2f7aa3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants