New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A "memory trick" in `ngx_http_lua_limit_data_segment` leads to redundant memory allocation #1005

Closed
ElvinEfendi opened this Issue Mar 7, 2017 · 13 comments

Comments

Projects
None yet
5 participants
@ElvinEfendi

ElvinEfendi commented Mar 7, 2017

Nginx version: 1.11.9
Lua Nginx Module version: 0.10.6
Openssl version: 1.1.0e
OS: Linux 3.19.0-80-generic

Recently I added lua_ssl_trusted_certificate directive to main config section of one of the Nginx boxes we have at work and when it was loading the new configuration it started to allocate more and more memory and ended up being killed by Out of Memory Killer. The configuration has over 4k location sections defined.

Here is the relevant strace output during reload:

[pid 31774] open("/etc/ssl/certs/ca-certificates.crt", O_RDONLY) = 5
[pid 31774] fstat(5, {st_mode=S_IFREG|0644, st_size=274340, ...}) = 0
[pid 31774] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6dc8266000
[pid 31774] read(5, "-----BEGIN CERTIFICATE-----\nMIIH"..., 4096) = 4096
[pid 31774] read(5, "WIm\nfQwng4/F9tqgaHtPkl7qpHMyEVNE"..., 4096) = 4096
[pid 31774] read(5, "Ktmyuy/uE5jF66CyCU3nuDuP/jVo23Ee"..., 4096) = 4096
...<stripped for clarity>...
[pid 31774] read(5, "MqAw\nhi5odHRwOi8vd3d3Mi5wdWJsaWM"..., 4096) = 4096
[pid 31774] read(5, "dc/BGZFjz+iokYi5Q1K7\ngLFViYsx+tC"..., 4096) = 4096
[pid 31774] brk(0x26d3000)              = 0x26b2000
[pid 31774] mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6c927c3000
[pid 31774] read(5, "/lmci3Zt1/GiSw0r/wty2p5g0I6QNcZ4"..., 4096) = 4096
[pid 31774] read(5, "iv9kuXclVzDAGySj4dzp30d8tbQk\nCAU"..., 4096) = 4096
...<stripped for clarity>...
[pid 31774] read(5, "ye8\nFVdMpEbB4IMeDExNH08GGeL5qPQ6"..., 4096) = 4096
[pid 31774] read(5, "VVNUIEVs\nZWt0cm9uaWsgU2VydGlmaWt"..., 4096) = 4004
[pid 31774] read(5, "", 4096)           = 0
[pid 31774] close(5)                    = 0
[pid 31774] munmap(0x7f6dc8266000, 4096) = 0

This was being repeated for many(gdb shows that for every ngx_http_lua_merge_loc_conf call) times. You can see that in the middle of reads suddenly 1048576=1024x1024 bytes is being allocated. Following gdb logs show that this happens when ngx_http_lua_merge_loc_conf is called. The above log is basically Openssl decoding the trusted certificate file into its internal structures and saving it in the relevant cert_store to use later on for client verification.

Breakpoint 3, mmap64 () at ../sysdeps/unix/syscall-template.S:81
81	../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  mmap64 () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff6a44ad2 in sysmalloc (av=0x7ffff6d82760 <main_arena>, nb=48) at malloc.c:2495
#2  _int_malloc (av=0x7ffff6d82760 <main_arena>, bytes=40) at malloc.c:3800
#3  0x00007ffff6a466c0 in __GI___libc_malloc (bytes=40) at malloc.c:2891
#4  0x000000000057d829 in default_malloc_ex (num=40, file=0x6f630f "a_object.c", line=350) at mem.c:79
#5  0x000000000057deb9 in CRYPTO_malloc (num=40, file=0x6f630f "a_object.c", line=350) at mem.c:346
<internal Openssl function calls stripped for clarity>
#30 0x000000000065e2f7 in PEM_X509_INFO_read_bio (bp=0x7ffff7f28c50, sk=0x0, cb=0x0, u=0x0) at pem_info.c:248
#31 0x00000000005e8b22 in X509_load_cert_crl_file (ctx=0x7ffff7f289e0, file=0x7ffff7f20ce6 "/etc/ssl/certs/ca-certificates.crt", type=1) at by_file.c:256
#32 0x00000000005e8626 in by_file_ctrl (ctx=0x7ffff7f289e0, cmd=1, argp=0x7ffff7f20ce6 "/etc/ssl/certs/ca-certificates.crt", argl=1, ret=0x0) at by_file.c:115
#33 0x00000000005e5747 in X509_LOOKUP_ctrl (ctx=0x7ffff7f289e0, cmd=1, argc=0x7ffff7f20ce6 "/etc/ssl/certs/ca-certificates.crt", argl=1, ret=0x0) at x509_lu.c:120
#34 0x00000000005dd5c1 in X509_STORE_load_locations (ctx=0x7ffff7f28750, file=0x7ffff7f20ce6 "/etc/ssl/certs/ca-certificates.crt", path=0x0) at x509_d2.c:94
#35 0x0000000000546e22 in SSL_CTX_load_verify_locations (ctx=0x7ffff7f27fd0, CAfile=0x7ffff7f20ce6 "/etc/ssl/certs/ca-certificates.crt", CApath=0x0) at ssl_lib.c:3231
#36 0x0000000000477d94 in ngx_ssl_trusted_certificate (cf=cf@entry=0x7fffffffe150, ssl=0x7ffff7f27a78, cert=cert@entry=0x7ffff7f22f20, depth=<optimized out>) at src/event/ngx_event_openssl.c:687
#37 0x00000000004f0a1b in ngx_http_lua_set_ssl (llcf=0x7ffff7f22ef8, cf=0x7fffffffe150) at ../ngx_lua-0.10.7/src/ngx_http_lua_module.c:1240
#38 ngx_http_lua_merge_loc_conf (cf=0x7fffffffe150, parent=0x7ffff7f15808, child=0x7ffff7f22ef8) at ../ngx_lua-0.10.7/src/ngx_http_lua_module.c:1158
#39 0x000000000047e2b1 in ngx_http_merge_servers (cmcf=<optimized out>, cmcf=<optimized out>, ctx_index=<optimized out>, module=<optimized out>, cf=<optimized out>) at src/http/ngx_http.c:599
<Nginx function calls stripped for clarity>

As you can see even though malloc is instructed to allocate 40 bytes of memory it ends up allocating 1024x1024 bytes of memory using mmap because sbrk() fails(https://code.woboq.org/userspace/glibc/malloc/malloc.c.html#406) to allocate the required memory. As there's a different SSL context per location section, this ends up happening for every location section.

I tested by commenting out the call to ngx_http_lua_limit_data_segment function and confirmed that mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6c927c3000 does not happen.

I also tried to call malloc_trim() in the end of every ngx_http_lua_merge_loc_conf call but did not see any improvement.

Here is a small C program to regenerate the exact same behaviour: https://gist.github.com/ElvinEfendi/071e99d24c2235ec892144d5991b56f6

I believe #872 is related.

@agentzh

This comment has been minimized.

Member

agentzh commented Mar 7, 2017

@ElvinEfendi Hmm, this looks like a bug (a memory leak) in glibc's allocator when it fails to use sbrk(). What really matters here is your glibc version. Will you try the latest (or recent enough) glibc on your side?

@yangshuxin What do you think?

@ElvinEfendi

This comment has been minimized.

ElvinEfendi commented Mar 7, 2017

@agentzh is not that an expected glibc behaviour? As documented at
https://code.woboq.org/userspace/glibc/malloc/malloc.c.html#406

@agentzh

This comment has been minimized.

Member

agentzh commented Mar 7, 2017

@ElvinEfendi glibc should not withhold memory unboundedly when it calls mmap to allocate memory. What's even worse is that malloc_trim cannot even free anything.

@ElvinEfendi

This comment has been minimized.

ElvinEfendi commented Mar 7, 2017

Here are the glibc details I use:

$ /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Ubuntu EGLIBC 2.19-0ubuntu6.9) stable release version 2.19, by Roland McGrath et al.
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.8.4.
Compiled on a Linux 3.13.11 system on 2016-05-26.
Available extensions:
	crypt add-on version 2.1 by Michael Glad and others
	GNU Libidn by Simon Josefsson
	Native POSIX Threads Library by Ulrich Drepper et al
	BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/eglibc/+bugs>.
@ElvinEfendi

This comment has been minimized.

ElvinEfendi commented Mar 7, 2017

Will you try the latest (or recent enough) glibc on your side?

@agentzh I'll try with version 2.25(https://www.gnu.org/software/libc/ says it is the latest one) and let you know.

@luohoufu

This comment has been minimized.

@detailyang

This comment has been minimized.

Member

detailyang commented Mar 30, 2017

Hello.
@ElvinEfendi

Do you try the latest one and I guess it will still appear.

I believe that glibc's malloc (ptmalloc2) do not cause memory leak .

Your OpenResty instance hold large memory because of too many nginx location which will new SSL_CTX object and call SSL_CTX_load_verify_locations which will hold large memory based on your the size of CAfile when every location will be merged.

It looks like we need reduce SSL_CTX_load_verify_locations for tcpsock:sslhandshake (reuse SSL_CTX object) since lua_ssl_trusted_certificate merged on location is wasteful when the location do not actually call (need) tcpsock:sslhandshake

Maybe #997 is related

@agentzh How do you think of it

@ElvinEfendi

This comment has been minimized.

ElvinEfendi commented Mar 30, 2017

Hi @detailyang,

Do you try the latest one and I guess it will still appear.

I have not got a chance to try this out with the latest glibc yet - I don't see how would that make any difference here tbh.

I believe that glibc's malloc (ptmalloc2) do not cause memory leak .

Yep, it's not a memory leak, but it allocates redundant memory, because of (s)brk calls fail(https://code.woboq.org/userspace/glibc/malloc/malloc.c.html#406).

which will hold large memory based on your the size of CAfile when every location will be merged.

Here is the problem, the allocated memory is way more(redundant) than what's needed to hold decrypted CA file in the cert store in memory.

It looks like we need reduce SSL_CTX_load_verify_locations for tcpsock:sslhandshake (reuse SSL_CTX object) since lua_ssl_trusted_certificate merged on location is wasteful when the location do not actually call (need) tcpsock:sslhandshake

I looked into this for a bit when I came across this issue, but apparently it's not easy to share a cert store between different SSL sessions(OpenSSL is not thread safe?). I believe this is the reason why this behaviour is same in Nginx core(proxy_ssl_trusted_certificate) as well.

Please refer to http://www.elvinefendi.com/2017/03/07/my-experience-with-lua-nginx-openssl-strace-gdb-glibc-and-linux-vm.html for more details.

#997 will be a great step forward in my opinion. At least it will let us to avoid using lua_ssl_trusted_certificate in main config section when we only need sslhandshake i.e in init_worker phase.

@agentzh

This comment has been minimized.

Member

agentzh commented Mar 30, 2017

@ElvinEfendi The real problem is that glibc never returns a sufficient part of its withholding memory back to the OS when allocating via mmap, not even after malloc_trim() calls. This can be easily reproduced by a minimal C program.

So when glibc uses mmap to allocate the space for your SSL certificates in the nginx master process, it will keep leak more and more memory upon HUP reloads, until exhausting all the physical memory.

By definition, brk is not guaranteed to always succeed anyway and in real world glibc has to use mmap when brk fails, like there is a memory block allocated by mmap stepping into its way of data section growth. When glibc uses mmap() for allocations, it of course will allocate larger blocks to reduce the number of syscalls since it's much more expensive than brk. By design, glibc never allocates the precisely amount of memory as requested by malloc anyway nor it always returns the freed memory back to the OS (which is the withholding memory), due to optimization considerations.

Anyway, since glibc's behavior is really buggy or broken here, we've just removed this mmap(sbrk(0)) trick in git master. I'm already tired of explaining and working around this glibc issue.

Use of the GC64 mode in the recent LuaJIT v2.1 to enable the full 47-bit address space for GC-managed memory is the future instead.

@agentzh agentzh closed this Mar 30, 2017

@ElvinEfendi

This comment has been minimized.

ElvinEfendi commented Mar 30, 2017

Anyway, since glibc's behavior is really buggy or broken here, we've just removed this mmap(sbrk(0)) trick in git master. I'm already tired of explaining and working around this glibc issue.

👍

@yangshuxin

This comment has been minimized.

yangshuxin commented Mar 30, 2017

I don't think glibc allocates "redundant" memory. It just to allocate big chunk of memory, and then carve a block out of it.

Luajit will call mmap() to allocate a block in the address space of (.bss, 4G]. Assuming the return value of the mmap() is L. If you run out of the space of (.bss, L], the malloc() have to rely on mmap() to allocate. So, the problem will show up again.

So, IMHO, mmap(sbrk(0)) just to make the "problem" show up early; removing the trick does not hide this "problem".

Note that sbrk(0) dose nothing tricky; it just return the top of the heap; it is not a system call at all.

@agentzh

This comment has been minimized.

Member

agentzh commented Mar 31, 2017

I agree with @yangshuxin. The real problem is still there, in glibc. Just harder or later to show up.

@ElvinEfendi

This comment has been minimized.

ElvinEfendi commented Mar 31, 2017

@yangshuxin, @agentzh thanks for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment