-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] iobuf.c: Use mmap() huge pages when available #2940
Conversation
CLANG-FORMAT FAILURE: index aef060094..5f7851aec 100644
--- a/libglusterfs/src/iobuf.c
+++ b/libglusterfs/src/iobuf.c
@@ -26,7 +26,9 @@
/* Make sure this array is sorted based on pagesize */
static const struct iobuf_init_config gf_iobuf_init_config[] = {
/* { pagesize, num_pages }, */
- {128 * 1024, 32}, {256 * 1024, 8}, {1 * 1024 * 1024, 2},
+ {128 * 1024, 32},
+ {256 * 1024, 8},
+ {1 * 1024 * 1024, 2},
};
static int |
|
||
#ifdef GF_LINUX_HOST_OS | ||
/* On Linux, try to map with 2MB huge pages first */ | ||
iobuf_arena->mem_base = mmap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default mmap failed when call with MAP_HUGETLB flag because nr_hugepages is 0, it will not work unless user does not set the nr_hugepages.Do we need to log the same in message and set the nr_hugepages also?
Do u have any idea about improvement ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I leave it up to to the user to configure, as we also can't estimate the amount needed.
As for performance, I'd be happy to see some performance numbers on CI.
I assume to see the impact we'll also need to assess if we want 128K to come from the small, regular tcmalloc allocation or the mmap based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, i will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test, as root:
echo 20 > /proc/sys/vm/nr_hugepages
should do the trick.
And you can see the impact:
[ykaul@ykaul glusterfs]$ fgrep -i huge /proc/meminfo
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 20 <---
HugePages_Free: 18 <---
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 40960 kB
/run regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
1 test(s) failed 0 test(s) generated core 3 test(s) needed retry |
An interesting alternative worth considering, is use posix_memalign() instead of mmap() and rely on tcmalloc to use, if available and makes sense, huge pages. |
Interesting that it fails with /tests/bugs/gfapi/bug-1319374-THIS-crash.t failing, just as the other patch @mohit84 is looking at. Could be related to the removal of some entries in the pool? |
The patch looks good to me. |
/run regression |
1 test(s) failed 0 test(s) generated core 5 test(s) needed retry 3 flaky test(s) marked as success even though they failed |
There's a segmentation fault, but no core, I wonder what we are not picking:
|
I believe the core pattern should be similar to as below, it is not specific to iobuf. With the patch we are able to get the core dump. I tried to fix it but i was not able to find an approach to load/unload SSL library without generating a crash. In rhel-8 0x00007f19db7c5016 in __strcmp_sse42 () from /lib64/libc.so.6 |
@mohit84 - why do we unconditionally init the SSL library? |
I did not try the way to upload a library only while SSL is enabled, I think we can try it. May be we will be able to solve the crash. |
I will test and confirm the same. |
@mykaul I've been thinking about using huge pages, and I'm wondering if it wouldn't be better to use transparent huge pages feature through madvise() instead of explicitly mmaping huge pages. This way the kernel will use huge pages if possible and will adjust them to the available page sizes automatically (x86 has 2 MiB and 1 GiB page sizes, but other architectures could have other sizes). Instead of making Gluster aware of these low level details, it would be simpler to just tell the kernel that it can use huge pages for the given region and forget it. |
However there's one issue: if the initial mmap doesn't return a mapping aligned to 2 MiB (or any other size matching the available huge page sizes), huge pages cannot be used, at least for the first region of the mapping. |
I don't recall why I could not get it to work. Perhaps doesn't work with shared pages? Worth looking into it for sure! |
I've done some tests and I'm not sure if it's working fine. While it actually seems to be using huge pages transparently, when checking /proc/<pid>/smaps, I still see this:
It seems as if the hardware page size is still 4KiB. However it also says:
Which seems to indicate that huge pages are actually used. Looking at khugepaged stats it also says that some pages have been collapsed into bigger ones, so I guess it works even though "apparently" it's still using small pages. Trying to use explicit huge pages, I get this in the smaps file:
Now
There's also another problem with transparent huge pages: Based on that, probably it will be better to use explicit huge pages when using |
Just FYI, After linked with tcmalloc_minimal ssl library is not crashing. |
Can you rebase this PR ? |
CLANG-FORMAT FAILURE: index aef060094..5f7851aec 100644
--- a/libglusterfs/src/iobuf.c
+++ b/libglusterfs/src/iobuf.c
@@ -26,7 +26,9 @@
/* Make sure this array is sorted based on pagesize */
static const struct iobuf_init_config gf_iobuf_init_config[] = {
/* { pagesize, num_pages }, */
- {128 * 1024, 32}, {256 * 1024, 8}, {1 * 1024 * 1024, 2},
+ {128 * 1024, 32},
+ {256 * 1024, 8},
+ {1 * 1024 * 1024, 2},
};
static int |
If huge pages under Linux are available, use mmap() with 2MB pages. Would be nice to see if it improves performance. If huge pages are not available, or it's an OS other than Linux, the previous mmap() way will be used. Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
/run regression |
1 test(s) failed 0 test(s) generated core 3 test(s) needed retry |
I will check it again, last time after load tcmalloc_minimal.so when i executed a test case in a loop it was not crashing. |
I have tried to reproduce an issue, the issue is consistent reproducible. @mykaul @xhernandez |
Thanks for looking into this!
|
The test case (./tests/bugs/gfapi/bug-1319374-THIS-crash.t) is continuously getting crashed on the pull request (gluster#2940). The test case is crashed because openssl does not allow to call SSL_library_init multiple times in the multi-threading programs and the test program is trying to call SSL_library_init more than once so it is crashing. Solution: Call SSL_Library_init by gfapi library instead of calling by socket.so to avoid a crash. Fixes: gluster#3026 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
Thank you for your contributions. |
Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it. |
If huge pages under Linux are available, use mmap() with 2MB pages.
Would be nice to see if it improves performance.
If huge pages are not available, or it's an OS other than Linux, the
previous mmap() way will be used.
Signed-off-by: Yaniv Kaul ykaul@redhat.com