Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] iobuf.c: Use mmap() huge pages when available #2940

Closed
wants to merge 1 commit into from

Conversation

mykaul
Copy link
Contributor

@mykaul mykaul commented Nov 9, 2021

If huge pages under Linux are available, use mmap() with 2MB pages.
Would be nice to see if it improves performance.

If huge pages are not available, or it's an OS other than Linux, the
previous mmap() way will be used.

Signed-off-by: Yaniv Kaul ykaul@redhat.com

@gluster-ant
Copy link
Collaborator

CLANG-FORMAT FAILURE:
Before merging the patch, this diff needs to be considered for passing clang-format

index aef060094..5f7851aec 100644
--- a/libglusterfs/src/iobuf.c
+++ b/libglusterfs/src/iobuf.c
@@ -26,7 +26,9 @@
 /* Make sure this array is sorted based on pagesize */
 static const struct iobuf_init_config gf_iobuf_init_config[] = {
     /* { pagesize, num_pages }, */
-    {128 * 1024, 32}, {256 * 1024, 8}, {1 * 1024 * 1024, 2},
+    {128 * 1024, 32},
+    {256 * 1024, 8},
+    {1 * 1024 * 1024, 2},
 };
 
 static int


#ifdef GF_LINUX_HOST_OS
/* On Linux, try to map with 2MB huge pages first */
iobuf_arena->mem_base = mmap(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default mmap failed when call with MAP_HUGETLB flag because nr_hugepages is 0, it will not work unless user does not set the nr_hugepages.Do we need to log the same in message and set the nr_hugepages also?
Do u have any idea about improvement ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I leave it up to to the user to configure, as we also can't estimate the amount needed.
As for performance, I'd be happy to see some performance numbers on CI.

I assume to see the impact we'll also need to assess if we want 128K to come from the small, regular tcmalloc allocation or the mmap based.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To test, as root:

echo 20 > /proc/sys/vm/nr_hugepages

should do the trick.
And you can see the impact:

[ykaul@ykaul glusterfs]$ fgrep  -i huge /proc/meminfo  
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:      20    <---
HugePages_Free:       18   <---
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:           40960 kB

@mykaul
Copy link
Contributor Author

mykaul commented Nov 10, 2021

/run regression

Copy link
Contributor

@mohit84 mohit84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@gluster-ant
Copy link
Collaborator

1 test(s) failed
./tests/bugs/gfapi/bug-1319374-THIS-crash.t

0 test(s) generated core

3 test(s) needed retry
./tests/000-flaky/basic_ec_ec-quorum-count-partial-failure.t
./tests/bugs/bug-1371806_acl.t
./tests/bugs/gfapi/bug-1319374-THIS-crash.t
https://build.gluster.org/job/gh_centos7-regression/1804/

@mykaul
Copy link
Contributor Author

mykaul commented Nov 10, 2021

An interesting alternative worth considering, is use posix_memalign() instead of mmap() and rely on tcmalloc to use, if available and makes sense, huge pages.

@mykaul
Copy link
Contributor Author

mykaul commented Nov 20, 2021

Interesting that it fails with /tests/bugs/gfapi/bug-1319374-THIS-crash.t failing, just as the other patch @mohit84 is looking at. Could be related to the removal of some entries in the pool?

@xhernandez
Copy link
Contributor

The patch looks good to me.

@mykaul
Copy link
Contributor Author

mykaul commented Nov 23, 2021

/run regression

@gluster-ant
Copy link
Collaborator

1 test(s) failed
./tests/bugs/gfapi/bug-1319374-THIS-crash.t

0 test(s) generated core

5 test(s) needed retry
./tests/000-flaky/basic_mount-nfs-auth.t
./tests/000-flaky/bugs_nfs_bug-1116503.t
./tests/000-flaky/glusterd-restart-shd-mux.t
./tests/basic/stats-dump.t
./tests/bugs/gfapi/bug-1319374-THIS-crash.t

3 flaky test(s) marked as success even though they failed
./tests/000-flaky/basic_mount-nfs-auth.t
./tests/000-flaky/bugs_nfs_bug-1116503.t
./tests/000-flaky/glusterd-restart-shd-mux.t
https://build.gluster.org/job/gh_centos7-regression/1858/

@mykaul
Copy link
Contributor Author

mykaul commented Nov 24, 2021

There's a segmentation fault, but no core, I wonder what we are not picking:

/tests/bugs/gfapi/../../include.rc: line 345: 31509 Segmentation fault      ./tests/bugs/gfapi/bug-1319374 builder-c7-4.int.aws.gluster.org patchy /var/log/glusterfs/bug-1319374.log
not ok  10 [     16/  51048] <  23> './tests/bugs/gfapi/bug-1319374 builder-c7-4.int.aws.gluster.org patchy /var/log/glusterfs/bug-1319374.log' -> ''
Failed 1/10 subtests 

@mohit84
Copy link
Contributor

mohit84 commented Nov 24, 2021

There's a segmentation fault, but no core, I wonder what we are not picking:

/tests/bugs/gfapi/../../include.rc: line 345: 31509 Segmentation fault      ./tests/bugs/gfapi/bug-1319374 builder-c7-4.int.aws.gluster.org patchy /var/log/glusterfs/bug-1319374.log
not ok  10 [     16/  51048] <  23> './tests/bugs/gfapi/bug-1319374 builder-c7-4.int.aws.gluster.org patchy /var/log/glusterfs/bug-1319374.log' -> ''
Failed 1/10 subtests 

I believe the core pattern should be similar to as below, it is not specific to iobuf. With the patch we are able to get the core dump. I tried to fix it but i was not able to find an approach to load/unload SSL library without generating a crash. In rhel-8
SSL has changed the way so it should not crash on rhel-8, the crash happen only on rhel-7.

0x00007f19db7c5016 in __strcmp_sse42 () from /lib64/libc.so.6
#1 0x00007f19d9895f09 in getrn () from /lib64/libcrypto.so.10
#2 0x00007f19d98961f0 in lh_insert () from /lib64/libcrypto.so.10
#3 0x00007f19d97e235f in OBJ_NAME_add () from /lib64/libcrypto.so.10
#4 0x00007f19cecca6e9 in SSL_library_init () from /lib64/libssl.so.10
#5 0x00007f19ceef7555 in init_openssl_mt () at socket.c:4125
#6 init (this=0x2f05058) at socket.c:4677
#7 0x00007f19db24cf7e in rpc_transport_load (ctx=ctx@entry=0x2ed84a0, options=options@entry=0x2eff278,
trans_name=trans_name@entry=0x2f24408 "gfapi") at rpc-transport.c:345
#8 0x00007f19db250921 in rpc_clnt_connection_init (name=0x2f24408 "gfapi", options=0x2eff278, ctx=0x2ed84a0, clnt=0x2f29258)
at rpc-clnt.c:1030
#9 rpc_clnt_new (options=options@entry=0x2eff278, owner=, name=name@entry=0x2f24408 "gfapi",
reqpool_size=, reqpool_size@entry=8) at rpc-clnt.c:1119
#10 0x00007f19dba61dd3 in glfs_mgmt_init (fs=fs@entry=0x2ed8310) at glfs-mgmt.c:1016
#11 0x00007f19dba5c885 in glfs_volumes_init (fs=fs@entry=0x2ed8310) at glfs.c:260
#12 0x00007f19dba5e249 in glfs_init_common (fs=0x2ed8310) at glfs.c:1093
#13 0x00007f19dba5e36f in pub_glfs_init (fs=0x2ed8310) at glfs.c:1135
#14 0x0000000000400c5e in main (argc=4, argv=0x7ffd1d8ffdb8) at ./tests/bugs/gfapi/bug-1319374.c:114

@mykaul
Copy link
Contributor Author

mykaul commented Nov 24, 2021

@mohit84 - why do we unconditionally init the SSL library?

@mohit84
Copy link
Contributor

mohit84 commented Nov 24, 2021

@mohit84 - why do we unconditionally init the SSL library?

I did not try the way to upload a library only while SSL is enabled, I think we can try it. May be we will be able to solve the crash.

@mohit84
Copy link
Contributor

mohit84 commented Nov 24, 2021

@mohit84 - why do we unconditionally init the SSL library?

I did not try the way to upload a library only while SSL is enabled, I think we can try it. May be we will be able to solve the crash.

I will test and confirm the same.

@xhernandez
Copy link
Contributor

@mykaul I've been thinking about using huge pages, and I'm wondering if it wouldn't be better to use transparent huge pages feature through madvise() instead of explicitly mmaping huge pages. This way the kernel will use huge pages if possible and will adjust them to the available page sizes automatically (x86 has 2 MiB and 1 GiB page sizes, but other architectures could have other sizes).

Instead of making Gluster aware of these low level details, it would be simpler to just tell the kernel that it can use huge pages for the given region and forget it.

@xhernandez
Copy link
Contributor

However there's one issue: if the initial mmap doesn't return a mapping aligned to 2 MiB (or any other size matching the available huge page sizes), huge pages cannot be used, at least for the first region of the mapping.

@mykaul
Copy link
Contributor Author

mykaul commented Nov 24, 2021

@mykaul I've been thinking about using huge pages, and I'm wondering if it wouldn't be better to use transparent huge pages feature through madvise() instead of explicitly mmaping huge pages. This way the kernel will use huge pages if possible and will adjust them to the available page sizes automatically (x86 has 2 MiB and 1 GiB page sizes, but other architectures could have other sizes).

Instead of making Gluster aware of these low level details, it would be simpler to just tell the kernel that it can use huge pages for the given region and forget it.

I don't recall why I could not get it to work. Perhaps doesn't work with shared pages? Worth looking into it for sure!

@xhernandez
Copy link
Contributor

I've done some tests and I'm not sure if it's working fine. While it actually seems to be using huge pages transparently, when checking /proc/<pid>/smaps, I still see this:

KernelPageSize:        4 kB
MMUPageSize:           4 kB

It seems as if the hardware page size is still 4KiB. However it also says:

AnonHugePages:      8192 kB

Which seems to indicate that huge pages are actually used. Looking at khugepaged stats it also says that some pages have been collapsed into bigger ones, so I guess it works even though "apparently" it's still using small pages.

Trying to use explicit huge pages, I get this in the smaps file:

KernelPageSize:     2048 kB
MMUPageSize:        2048 kB

Now AnonHugePages is 0, but Private_Hugetlb contains the allocated size:

AnonHugePages:         0 kB
Private_Hugetlb:    8192 kB

There's also another problem with transparent huge pages: mmap() doesn't align addresses to anything bigger than 4KiB by default (unless explicit huge pages are requested). This means that if we use madvise() after mmaping a new memory area, most probably a significant fragment of all allocated memory won't be translated into a huge page, reducing its benefits.

Based on that, probably it will be better to use explicit huge pages when using mmap(), but instead of specifying the desired page size, I would simply use MAP_HUGETLB alone, which will create a mapping using the default huge page size of the system. This way we can get all the benefits from huge pages but we don't need to worry about different architectures.

@mohit84
Copy link
Contributor

mohit84 commented Dec 2, 2021

@mohit84 - why do we unconditionally init the SSL library?

I did not try the way to upload a library only while SSL is enabled, I think we can try it. May be we will be able to solve the crash.

I will test and confirm the same.

Just FYI, After linked with tcmalloc_minimal ssl library is not crashing.

@xhernandez
Copy link
Contributor

@mohit84 - why do we unconditionally init the SSL library?

I did not try the way to upload a library only while SSL is enabled, I think we can try it. May be we will be able to solve the crash.

I will test and confirm the same.

Just FYI, After linked with tcmalloc_minimal ssl library is not crashing.

Can you rebase this PR ?

@gluster-ant
Copy link
Collaborator

CLANG-FORMAT FAILURE:
Before merging the patch, this diff needs to be considered for passing clang-format

index aef060094..5f7851aec 100644
--- a/libglusterfs/src/iobuf.c
+++ b/libglusterfs/src/iobuf.c
@@ -26,7 +26,9 @@
 /* Make sure this array is sorted based on pagesize */
 static const struct iobuf_init_config gf_iobuf_init_config[] = {
     /* { pagesize, num_pages }, */
-    {128 * 1024, 32}, {256 * 1024, 8}, {1 * 1024 * 1024, 2},
+    {128 * 1024, 32},
+    {256 * 1024, 8},
+    {1 * 1024 * 1024, 2},
 };
 
 static int

If huge pages under Linux are available, use mmap() with 2MB pages.
Would be nice to see if it improves performance.

If huge pages are not available, or it's an OS other than Linux, the
previous mmap() way will be used.

Signed-off-by: Yaniv Kaul <ykaul@redhat.com>
@mykaul
Copy link
Contributor Author

mykaul commented Dec 2, 2021

/run regression

@gluster-ant
Copy link
Collaborator

1 test(s) failed
./tests/bugs/gfapi/bug-1319374-THIS-crash.t

0 test(s) generated core

3 test(s) needed retry
./tests/000-flaky/glusterd-restart-shd-mux.t
./tests/basic/afr/split-brain-favorite-child-policy-client-side-healing.t
./tests/bugs/gfapi/bug-1319374-THIS-crash.t
https://build.gluster.org/job/gh_centos7-regression/1890/

@mohit84
Copy link
Contributor

mohit84 commented Dec 6, 2021

/run regression

I will check it again, last time after load tcmalloc_minimal.so when i executed a test case in a loop it was not crashing.

@mohit84
Copy link
Contributor

mohit84 commented Dec 7, 2021

bug-1319374-THIS-crash.t

I have tried to reproduce an issue, the issue is consistent reproducible.
I believe the root cause of only gfapi program is crashing a bug in openssl
and the issue is reproducible only on centos-7 not on centos-8.It seems the bug
is already fixed in centos-8. As per wiki page (https://wiki.openssl.org/index.php/Library_Initialization) the program
can be crash while SSL_library_init has been called multiple times from
the same program, it provides a way to avoid but it does not work in gfapi cases if socket.so is load/unload again and again.
In case of gfapi test case(bug-1319374-THIS-crash.t) the test case is caling glfs(init|fini) function again and again and the functions eventually load/unload socket.so again .Because the .so has been load/unload so static segment also refreshed
and the variable(initialized that is used to avoid SSL_library_init call more than once) reset to _gf_false even the main program has not been exited.To avoid an issue if we load SSL_library_init from gfapi instead of loading socket.so the program will
not crash.I have tested the fix it is working fine.I think we can call SSL_library_init from the first starting point instead of calling socket.so for other process also(glusterfsd/cli.c)

@mykaul @xhernandez
Please share your view on the same.

@mykaul
Copy link
Contributor Author

mykaul commented Dec 7, 2021

Thanks for looking into this!

  1. I believe we should load SSL only if we intend to actually use it.
  2. I'd be happy to see the patch.
  3. Do we know why we've regressed this way?

mohit84 added a commit to mohit84/glusterfs that referenced this pull request Dec 8, 2021
The test case (./tests/bugs/gfapi/bug-1319374-THIS-crash.t) is
continuously getting crashed on the pull request (gluster#2940).
The test case is crashed because openssl does not allow
to call SSL_library_init multiple times in the
multi-threading programs and the test program is trying to
call SSL_library_init more than once so it is crashing.

Solution: Call SSL_Library_init by gfapi library instead of
          calling by socket.so to avoid a crash.

Fixes: gluster#3026
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
@stale
Copy link

stale bot commented Jul 10, 2022

Thank you for your contributions.
Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity.
It will be closed in 2 weeks if no one responds with a comment here.

@stale stale bot added the wontfix Managed by stale[bot] label Jul 10, 2022
@stale
Copy link

stale bot commented Jul 30, 2022

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

@stale stale bot closed this Jul 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix Managed by stale[bot]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants