-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for memory mappings with large pages #7977
Conversation
CT Test Results 4 files 146 suites 47m 3s ⏱️ Results for commit aebc8f7. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
3f7a1f0
to
24bd2fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Comments below :)
I've only skimmed this PR, but FWIW I tried using Linux hugepages back around R15/R16 or so. Transparent hugepages were unusable (caused kernel lockups in the RHEL kernels we had at the time). What I ended up doing was to use explicit hugepages for the super carrier, but nothing else. That worked but wasn't a win (and required non-default kernel options) so I didn't pursue it. |
@jhogberg you left some other comments which all seem very reasonable. I'll follow-up on them after the weekend. Also, sorry for all the one-off comments. Responding from an e-mail does not give you the "start a review" option which lead me to having my replies dribble out incrementally. I'll be more tidy with the remainder of my comments. |
Thanks!
I don't mind. :) |
@mikpe we have had pretty good luck even back in the R15/R16 era with large pages. Back then, it was shown to be profitable to align allocations on a 2MiB and use FreeBSD's superpage feature on 64-bit x86. As for HugeTLB, I have an option to use them for |
ef64081
to
38126f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you everyone for all of your feedback. I think I've made all of the requested changes. One significant highlight is that I have pulled out the JIT-related changes. Because of its entanglement with third-party code, I will create a separate pull request for it.
f1429c9
to
1a24a03
Compare
Thanks, I've added it to our daily builds. :) |
d051b04
to
5815264
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a few tries but the CI is now green. The combination of an older version of clang
and the -Werror
and -Wunused-command-line-argument
options ultimately meant some extra code was needed to find the subset of the flags clang-10 supports for segment alignment among those GCC actually uses.
continue; | ||
if (from > &etext) | ||
break; | ||
if ((UWord)from % sys_large_page_size != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This crashes if is_linux_thp_enabled()
returns non-zero and /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
cannot be opened, resulting in sys_large_page_size
being zero. We've got two test rigs where this is the case, one with kernel version 4.4.74
and the other with 3.12.60
.
Perhaps is_linux_thp_enabled
should also check get_large_page_size() != 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I let that bug creep in after I rearranged the code.
Along the lines of what you have suggested, I have added an early exit in this function if sys_large_page_size == 0
. That should prevent the later % sys_large_page_size
from exploding. This variation appealed to my aesthetic sense of having the thp enabled check just be about reading and parsing the relevant sysfs(5)
file. However, if you think it's better to put this check in is_linux_thp_enabled
I am certainly okay with that too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. :)
5815264
to
2c694e3
Compare
This is pretty much ready to merge now. I think the only thing left is to move the flag affecting Since we may want to introduce explicit huge pages later (optionally with size when arch supports that?), I'm thinking of calling the option |
We've settled on |
Sure. Would you like me to take care of that?
I think my choice of a flag name encoding a Linux-specific mechanism like THP or HugeTLB was unfortunate as the patch is trivially generalized to other operating systems which have semantics slightly different from Linux THP. If I had to do it again, I'd call it Also, prescribing the mechanism on Linux in some all-or-nothing way prevents using both mechanisms where appropriate. For example, 1GiB pages are not supported by the iTLB so we like to use HugeTLB to get a few 1GiB pages for data and THP for the rest. Anyway, what I have in mind is having the "on" value enable some sensible default support for large pages. THP is a good default for Linux and we can support FreeBSD, macOS, Solaris and Win32 in a similar way with a very small amount of new platform specific code (likely less than was required for Linux). More exotic things like Linux HugeTLB or the similar feature on AIX probably require a few extra flags to specify policies for things like selecting the page size, hoarding the reserved pages at startup, or what to do when the reserved pages are exhausted. That will be hard to configure in a single flag so I’d imagine it would be kindest to users to configure that separately. What do you think? As an aside, I’m happy to make the follow-up pull request to support FreeBSD and macOS, at least. |
This would be really nice, we have long wanted hugepages to work natively on Linux (seems they are only supported on FreeBSD via superpages). |
Yes please. :)
We can always have |
Oh, just as a heads-up, the documentation will move over to |
#8026 has been merged now, the A brief description of the new format can be found here. |
Sure, I'll take care of it in the next few days.
What I was proposing might be a little different, which is to have the flag toggle the use of large pages and leave it up to the runtime to pick a mechanism, rather than specifying the details on the commandline of what mechanism (transparent, explicit, etc.) and where (static text, data, etc.) Consider the following cases...
At least for cases 1 and 2, it is enough to have a single flag that turns large pages on or off through whatever combination of mechanisms makes sense on the host operating system. This is easiest for users and this flag would be a good candidate for eventually being on by default since there is rarely a downside to having it enabled. For case 3, a hairy parameter, like the
I also implemented support for 1GiB pages using HugeTLB. It was a more invasive change since you need to decide which parts of the super carrier get the 1GiB huge pages. (I also had to replace a call to |
Then we're on the same page, you shouldn't need to touch the tricky flags unless you need to, but they should be there for "case 3" once we cross that bridge.
Then we'll fall back differently under |
To confirm, the direction is to rename the flag as you had suggested above but have an "on" and "off" setting for now, with off as the default, right?
Indeed, this will need a different design. |
Yeah, |
Thanks for clarifying. I will go ahead update the flag and add the documentation sometime next week. |
Linux allows an application to remap its .text segment at runtime using transparent huge pages. To do this, an application needs to determine the start and length of the .text segment and pass this range as an argument to the madvise(2) system call. There are many techniques for doing this, the approach chosen in this change is to parse the /proc filesystem. An alternative would be to get this information from the ELF header. For this to work reliably, the start address of the text segment should be aligned to a multiple of the size of a transparent huge page, and the length text segment should be greater than the size of a transparent huge page and ideally a multiple of that size. Finally, page sizes and support for multiple page sizes varies by architecture. This change only supports the 64-bit x86 but, in theory, it can be generalized to other architectures that support THP.
In order for mseg pages to be reliably mapped with pages lager than the default page size, the mapping must start and end at a multiple of the larger page size. To do this, this change adds an abstraction for performing a memory-mapping with a specified alignment. On an operating systems like SunOS 5.9 and later, this is done by passing some extra flags to mmap(2). On operating systems without such a capability, we must do this manually by over-allocating and freeing the excess. The logic in this change only affects super carrier allocations but it can be generalized to other mseg allocations.
2c694e3
to
aebc8f7
Compare
@jhogberg I have updated the documentation and changed the flag name to be a |
Thanks, it looks great, I'll merge it once 27-rc1 is released (in a day or so?). :-) |
Merged, thanks again for the PR! :) |
This change extends the memory managers in Erlang/OTP to
optionally map virtual memory on Linux using large pages. Large
pages make more efficient use of the TLB by reducing the dynamic
frequency of TLB misses thereby increasing application
performance. The effect or reducing TLB misses can be dramatic.
On our server workloads we have observed >10% performance
improvements simply from enabling mappings with large pages.
The opportunity for such an improvement comes from the growing
disparity between the size of the TLB and main memory. The
typical 64-bit x86 has a 64-entry L1 data TLB and a 128-entry L1
instruction TLB, addressing 256KiB and 512KiB of data and text
respectively, at 4KiB per entry. This is 4-5 orders of magnitude
smaller than the amount of memory installed in the typical server
of today. When the TLB is missed the processor must perform an
address translate by traversing a multi-level table in main
memory, a costly operation. This overhead is largely dispensable
by using a larger page size. For mappings using a 2MiB or 1GiB
page in lieu of a 4KiB page the L1 TLB can address 32MiB or 4GiB,
respectively, a more meaningful fraction of typical working sets.
To take advantage of large pages in Erlang, we identified three
classes of virtual memory allocations that would benefit from
large pages: the .text segment, heap allocations, JIT
allocations. While there are many strategies for using large
pages on Linux, we chose to use Transparent Huge Pages which are
the most flexible. Briefly put, in order to use THP we arranged
for all of these classes of virtual memory allocations to be
aligned and sized to a multiple of a large page and performed a
system call that advised the kernel to use large pages for the
mapping.
While this change is Linux specific, for the most part, other
operating systems have similar, and sometimes better, mechanisms
for achieving the same effect. We believe it is be possible to
use large pages for heap and JIT allocations on FreeBSD, Linux,
Solaris, and Windows. The changes to AsmJit in particular
already contain work to this effect.