[RFC] iobuf.c: Use mmap() huge pages when available #2940

mykaul · 2021-11-09T20:11:50Z

If huge pages under Linux are available, use mmap() with 2MB pages.
Would be nice to see if it improves performance.

If huge pages are not available, or it's an OS other than Linux, the
previous mmap() way will be used.

Signed-off-by: Yaniv Kaul ykaul@redhat.com

gluster-ant · 2021-11-09T20:12:37Z

CLANG-FORMAT FAILURE:
Before merging the patch, this diff needs to be considered for passing clang-format

index aef060094..5f7851aec 100644
--- a/libglusterfs/src/iobuf.c
+++ b/libglusterfs/src/iobuf.c
@@ -26,7 +26,9 @@
 /* Make sure this array is sorted based on pagesize */
 static const struct iobuf_init_config gf_iobuf_init_config[] = {
     /* { pagesize, num_pages }, */
-    {128 * 1024, 32}, {256 * 1024, 8}, {1 * 1024 * 1024, 2},
+    {128 * 1024, 32},
+    {256 * 1024, 8},
+    {1 * 1024 * 1024, 2},
 };
 
 static int

mohit84 · 2021-11-10T05:01:18Z

libglusterfs/src/iobuf.c

+
+#ifdef GF_LINUX_HOST_OS
+    /* On Linux, try to map with 2MB huge pages first */
+    iobuf_arena->mem_base = mmap(


By default mmap failed when call with MAP_HUGETLB flag because nr_hugepages is 0, it will not work unless user does not set the nr_hugepages.Do we need to log the same in message and set the nr_hugepages also?
Do u have any idea about improvement ?

I leave it up to to the user to configure, as we also can't estimate the amount needed.
As for performance, I'd be happy to see some performance numbers on CI.

I assume to see the impact we'll also need to assess if we want 128K to come from the small, regular tcmalloc allocation or the mmap based.

sure, i will do.

To test, as root:

echo 20 > /proc/sys/vm/nr_hugepages

should do the trick.
And you can see the impact:

[ykaul@ykaul glusterfs]$ fgrep -i huge /proc/meminfo AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 20 <--- HugePages_Free: 18 <--- HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 40960 kB

mykaul · 2021-11-10T06:27:15Z

/run regression

mohit84

LGTM +1

gluster-ant · 2021-11-10T08:53:53Z

1 test(s) failed
./tests/bugs/gfapi/bug-1319374-THIS-crash.t

0 test(s) generated core

3 test(s) needed retry
./tests/000-flaky/basic_ec_ec-quorum-count-partial-failure.t
./tests/bugs/bug-1371806_acl.t
./tests/bugs/gfapi/bug-1319374-THIS-crash.t
https://build.gluster.org/job/gh_centos7-regression/1804/

mykaul · 2021-11-10T09:27:16Z

An interesting alternative worth considering, is use posix_memalign() instead of mmap() and rely on tcmalloc to use, if available and makes sense, huge pages.

mykaul · 2021-11-20T09:51:31Z

Interesting that it fails with /tests/bugs/gfapi/bug-1319374-THIS-crash.t failing, just as the other patch @mohit84 is looking at. Could be related to the removal of some entries in the pool?

xhernandez · 2021-11-23T08:24:09Z

The patch looks good to me.

mykaul · 2021-11-23T21:07:24Z

/run regression

gluster-ant · 2021-11-23T23:43:21Z

1 test(s) failed
./tests/bugs/gfapi/bug-1319374-THIS-crash.t

0 test(s) generated core

5 test(s) needed retry
./tests/000-flaky/basic_mount-nfs-auth.t
./tests/000-flaky/bugs_nfs_bug-1116503.t
./tests/000-flaky/glusterd-restart-shd-mux.t
./tests/basic/stats-dump.t
./tests/bugs/gfapi/bug-1319374-THIS-crash.t

3 flaky test(s) marked as success even though they failed
./tests/000-flaky/basic_mount-nfs-auth.t
./tests/000-flaky/bugs_nfs_bug-1116503.t
./tests/000-flaky/glusterd-restart-shd-mux.t
https://build.gluster.org/job/gh_centos7-regression/1858/

mykaul · 2021-11-24T06:04:18Z

There's a segmentation fault, but no core, I wonder what we are not picking:

/tests/bugs/gfapi/../../include.rc: line 345: 31509 Segmentation fault      ./tests/bugs/gfapi/bug-1319374 builder-c7-4.int.aws.gluster.org patchy /var/log/glusterfs/bug-1319374.log
not ok  10 [     16/  51048] <  23> './tests/bugs/gfapi/bug-1319374 builder-c7-4.int.aws.gluster.org patchy /var/log/glusterfs/bug-1319374.log' -> ''
Failed 1/10 subtests

mohit84 · 2021-11-24T06:12:35Z

There's a segmentation fault, but no core, I wonder what we are not picking:

/tests/bugs/gfapi/../../include.rc: line 345: 31509 Segmentation fault      ./tests/bugs/gfapi/bug-1319374 builder-c7-4.int.aws.gluster.org patchy /var/log/glusterfs/bug-1319374.log
not ok  10 [     16/  51048] <  23> './tests/bugs/gfapi/bug-1319374 builder-c7-4.int.aws.gluster.org patchy /var/log/glusterfs/bug-1319374.log' -> ''
Failed 1/10 subtests

I believe the core pattern should be similar to as below, it is not specific to iobuf. With the patch we are able to get the core dump. I tried to fix it but i was not able to find an approach to load/unload SSL library without generating a crash. In rhel-8
SSL has changed the way so it should not crash on rhel-8, the crash happen only on rhel-7.

0x00007f19db7c5016 in __strcmp_sse42 () from /lib64/libc.so.6
#1 0x00007f19d9895f09 in getrn () from /lib64/libcrypto.so.10
#2 0x00007f19d98961f0 in lh_insert () from /lib64/libcrypto.so.10
#3 0x00007f19d97e235f in OBJ_NAME_add () from /lib64/libcrypto.so.10
#4 0x00007f19cecca6e9 in SSL_library_init () from /lib64/libssl.so.10
#5 0x00007f19ceef7555 in init_openssl_mt () at socket.c:4125
#6 init (this=0x2f05058) at socket.c:4677
#7 0x00007f19db24cf7e in rpc_transport_load (ctx=ctx@entry=0x2ed84a0, options=options@entry=0x2eff278,
trans_name=trans_name@entry=0x2f24408 "gfapi") at rpc-transport.c:345
#8 0x00007f19db250921 in rpc_clnt_connection_init (name=0x2f24408 "gfapi", options=0x2eff278, ctx=0x2ed84a0, clnt=0x2f29258)
at rpc-clnt.c:1030
#9 rpc_clnt_new (options=options@entry=0x2eff278, owner=, name=name@entry=0x2f24408 "gfapi",
reqpool_size=, reqpool_size@entry=8) at rpc-clnt.c:1119
#10 0x00007f19dba61dd3 in glfs_mgmt_init (fs=fs@entry=0x2ed8310) at glfs-mgmt.c:1016
#11 0x00007f19dba5c885 in glfs_volumes_init (fs=fs@entry=0x2ed8310) at glfs.c:260
#12 0x00007f19dba5e249 in glfs_init_common (fs=0x2ed8310) at glfs.c:1093
#13 0x00007f19dba5e36f in pub_glfs_init (fs=0x2ed8310) at glfs.c:1135
#14 0x0000000000400c5e in main (argc=4, argv=0x7ffd1d8ffdb8) at ./tests/bugs/gfapi/bug-1319374.c:114

mykaul · 2021-11-24T06:34:40Z

@mohit84 - why do we unconditionally init the SSL library?

mohit84 · 2021-11-24T06:52:00Z

@mohit84 - why do we unconditionally init the SSL library?

I did not try the way to upload a library only while SSL is enabled, I think we can try it. May be we will be able to solve the crash.

mohit84 · 2021-11-24T06:57:08Z

@mohit84 - why do we unconditionally init the SSL library?

I did not try the way to upload a library only while SSL is enabled, I think we can try it. May be we will be able to solve the crash.

I will test and confirm the same.

xhernandez · 2021-11-24T14:29:56Z

@mykaul I've been thinking about using huge pages, and I'm wondering if it wouldn't be better to use transparent huge pages feature through madvise() instead of explicitly mmaping huge pages. This way the kernel will use huge pages if possible and will adjust them to the available page sizes automatically (x86 has 2 MiB and 1 GiB page sizes, but other architectures could have other sizes).

Instead of making Gluster aware of these low level details, it would be simpler to just tell the kernel that it can use huge pages for the given region and forget it.

xhernandez · 2021-11-24T14:40:08Z

However there's one issue: if the initial mmap doesn't return a mapping aligned to 2 MiB (or any other size matching the available huge page sizes), huge pages cannot be used, at least for the first region of the mapping.

mykaul · 2021-11-24T15:08:15Z

@mykaul I've been thinking about using huge pages, and I'm wondering if it wouldn't be better to use transparent huge pages feature through madvise() instead of explicitly mmaping huge pages. This way the kernel will use huge pages if possible and will adjust them to the available page sizes automatically (x86 has 2 MiB and 1 GiB page sizes, but other architectures could have other sizes).

Instead of making Gluster aware of these low level details, it would be simpler to just tell the kernel that it can use huge pages for the given region and forget it.

I don't recall why I could not get it to work. Perhaps doesn't work with shared pages? Worth looking into it for sure!

xhernandez · 2021-11-24T18:14:00Z

I've done some tests and I'm not sure if it's working fine. While it actually seems to be using huge pages transparently, when checking /proc/<pid>/smaps, I still see this:

KernelPageSize:        4 kB
MMUPageSize:           4 kB

It seems as if the hardware page size is still 4KiB. However it also says:

AnonHugePages:      8192 kB

Which seems to indicate that huge pages are actually used. Looking at khugepaged stats it also says that some pages have been collapsed into bigger ones, so I guess it works even though "apparently" it's still using small pages.

Trying to use explicit huge pages, I get this in the smaps file:

KernelPageSize:     2048 kB
MMUPageSize:        2048 kB

Now AnonHugePages is 0, but Private_Hugetlb contains the allocated size:

AnonHugePages:         0 kB
Private_Hugetlb:    8192 kB

There's also another problem with transparent huge pages: mmap() doesn't align addresses to anything bigger than 4KiB by default (unless explicit huge pages are requested). This means that if we use madvise() after mmaping a new memory area, most probably a significant fragment of all allocated memory won't be translated into a huge page, reducing its benefits.

Based on that, probably it will be better to use explicit huge pages when using mmap(), but instead of specifying the desired page size, I would simply use MAP_HUGETLB alone, which will create a mapping using the default huge page size of the system. This way we can get all the benefits from huge pages but we don't need to worry about different architectures.

mohit84 · 2021-12-02T05:03:12Z

@mohit84 - why do we unconditionally init the SSL library?

I did not try the way to upload a library only while SSL is enabled, I think we can try it. May be we will be able to solve the crash.

I will test and confirm the same.

Just FYI, After linked with tcmalloc_minimal ssl library is not crashing.

xhernandez · 2021-12-02T09:06:45Z

@mohit84 - why do we unconditionally init the SSL library?

I did not try the way to upload a library only while SSL is enabled, I think we can try it. May be we will be able to solve the crash.

I will test and confirm the same.

Just FYI, After linked with tcmalloc_minimal ssl library is not crashing.

Can you rebase this PR ?

gluster-ant · 2021-12-02T09:17:05Z

CLANG-FORMAT FAILURE:
Before merging the patch, this diff needs to be considered for passing clang-format

index aef060094..5f7851aec 100644
--- a/libglusterfs/src/iobuf.c
+++ b/libglusterfs/src/iobuf.c
@@ -26,7 +26,9 @@
 /* Make sure this array is sorted based on pagesize */
 static const struct iobuf_init_config gf_iobuf_init_config[] = {
     /* { pagesize, num_pages }, */
-    {128 * 1024, 32}, {256 * 1024, 8}, {1 * 1024 * 1024, 2},
+    {128 * 1024, 32},
+    {256 * 1024, 8},
+    {1 * 1024 * 1024, 2},
 };
 
 static int

If huge pages under Linux are available, use mmap() with 2MB pages. Would be nice to see if it improves performance. If huge pages are not available, or it's an OS other than Linux, the previous mmap() way will be used. Signed-off-by: Yaniv Kaul <ykaul@redhat.com>

mykaul · 2021-12-02T09:40:45Z

/run regression

gluster-ant · 2021-12-02T12:13:43Z

1 test(s) failed
./tests/bugs/gfapi/bug-1319374-THIS-crash.t

0 test(s) generated core

3 test(s) needed retry
./tests/000-flaky/glusterd-restart-shd-mux.t
./tests/basic/afr/split-brain-favorite-child-policy-client-side-healing.t
./tests/bugs/gfapi/bug-1319374-THIS-crash.t
https://build.gluster.org/job/gh_centos7-regression/1890/

mohit84 · 2021-12-06T04:25:59Z

/run regression

I will check it again, last time after load tcmalloc_minimal.so when i executed a test case in a loop it was not crashing.

mohit84 · 2021-12-07T13:20:29Z

bug-1319374-THIS-crash.t

I have tried to reproduce an issue, the issue is consistent reproducible.
I believe the root cause of only gfapi program is crashing a bug in openssl
and the issue is reproducible only on centos-7 not on centos-8.It seems the bug
is already fixed in centos-8. As per wiki page (https://wiki.openssl.org/index.php/Library_Initialization) the program
can be crash while SSL_library_init has been called multiple times from
the same program, it provides a way to avoid but it does not work in gfapi cases if socket.so is load/unload again and again.
In case of gfapi test case(bug-1319374-THIS-crash.t) the test case is caling glfs(init|fini) function again and again and the functions eventually load/unload socket.so again .Because the .so has been load/unload so static segment also refreshed
and the variable(initialized that is used to avoid SSL_library_init call more than once) reset to _gf_false even the main program has not been exited.To avoid an issue if we load SSL_library_init from gfapi instead of loading socket.so the program will
not crash.I have tested the fix it is working fine.I think we can call SSL_library_init from the first starting point instead of calling socket.so for other process also(glusterfsd/cli.c)

@mykaul @xhernandez
Please share your view on the same.

mykaul · 2021-12-07T13:30:41Z

Thanks for looking into this!

I believe we should load SSL only if we intend to actually use it.
I'd be happy to see the patch.
Do we know why we've regressed this way?

The test case (./tests/bugs/gfapi/bug-1319374-THIS-crash.t) is continuously getting crashed on the pull request (gluster#2940). The test case is crashed because openssl does not allow to call SSL_library_init multiple times in the multi-threading programs and the test program is trying to call SSL_library_init more than once so it is crashing. Solution: Call SSL_Library_init by gfapi library instead of calling by socket.so to avoid a crash. Fixes: gluster#3026 Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

stale · 2022-07-10T03:02:33Z

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

stale · 2022-07-30T17:57:07Z

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

mohit84 reviewed Nov 10, 2021

View reviewed changes

mohit84 requested a review from xhernandez November 10, 2021 08:24

mykaul force-pushed the mmap_huge branch from 882af9c to a14b23b Compare December 2, 2021 09:16

mykaul force-pushed the mmap_huge branch from a14b23b to f6aee86 Compare December 2, 2021 09:18

mohit84 mentioned this pull request Dec 8, 2021

test : tests/bugs/gfapi/bug-1319374-THIS-crash.t is getting crashed #3026

Closed

mohit84 mentioned this pull request Dec 8, 2021

test: bug-1319374-THIS-crash.t is getting crashed #3027

Closed

stale bot added the wontfix Managed by stale[bot] label Jul 10, 2022

stale bot closed this Jul 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] iobuf.c: Use mmap() huge pages when available #2940

[RFC] iobuf.c: Use mmap() huge pages when available #2940

mykaul commented Nov 9, 2021

gluster-ant commented Nov 9, 2021

mohit84 Nov 10, 2021

mykaul Nov 10, 2021

mohit84 Nov 10, 2021

mykaul Nov 10, 2021

mykaul commented Nov 10, 2021

mohit84 left a comment

gluster-ant commented Nov 10, 2021

mykaul commented Nov 10, 2021

mykaul commented Nov 20, 2021

xhernandez commented Nov 23, 2021

mykaul commented Nov 23, 2021

gluster-ant commented Nov 23, 2021

mykaul commented Nov 24, 2021

mohit84 commented Nov 24, 2021

mykaul commented Nov 24, 2021

mohit84 commented Nov 24, 2021

mohit84 commented Nov 24, 2021

xhernandez commented Nov 24, 2021

xhernandez commented Nov 24, 2021

mykaul commented Nov 24, 2021

xhernandez commented Nov 24, 2021

mohit84 commented Dec 2, 2021

xhernandez commented Dec 2, 2021

gluster-ant commented Dec 2, 2021

mykaul commented Dec 2, 2021

gluster-ant commented Dec 2, 2021

mohit84 commented Dec 6, 2021

mohit84 commented Dec 7, 2021

mykaul commented Dec 7, 2021

stale bot commented Jul 10, 2022

stale bot commented Jul 30, 2022

[RFC] iobuf.c: Use mmap() huge pages when available #2940

[RFC] iobuf.c: Use mmap() huge pages when available #2940

Conversation

mykaul commented Nov 9, 2021

gluster-ant commented Nov 9, 2021

mohit84 Nov 10, 2021

Choose a reason for hiding this comment

mykaul Nov 10, 2021

Choose a reason for hiding this comment

mohit84 Nov 10, 2021

Choose a reason for hiding this comment

mykaul Nov 10, 2021

Choose a reason for hiding this comment

mykaul commented Nov 10, 2021

mohit84 left a comment

Choose a reason for hiding this comment

gluster-ant commented Nov 10, 2021

mykaul commented Nov 10, 2021

mykaul commented Nov 20, 2021

xhernandez commented Nov 23, 2021

mykaul commented Nov 23, 2021

gluster-ant commented Nov 23, 2021

mykaul commented Nov 24, 2021

mohit84 commented Nov 24, 2021

mykaul commented Nov 24, 2021

mohit84 commented Nov 24, 2021

mohit84 commented Nov 24, 2021

xhernandez commented Nov 24, 2021

xhernandez commented Nov 24, 2021

mykaul commented Nov 24, 2021

xhernandez commented Nov 24, 2021

mohit84 commented Dec 2, 2021

xhernandez commented Dec 2, 2021

gluster-ant commented Dec 2, 2021

mykaul commented Dec 2, 2021

gluster-ant commented Dec 2, 2021

mohit84 commented Dec 6, 2021

mohit84 commented Dec 7, 2021

mykaul commented Dec 7, 2021

stale bot commented Jul 10, 2022

stale bot commented Jul 30, 2022