Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

bud leaks memory #97

Closed
phillipp opened this issue Dec 3, 2016 · 13 comments
Closed

bud leaks memory #97

phillipp opened this issue Dec 3, 2016 · 13 comments

Comments

@phillipp
Copy link
Contributor

phillipp commented Dec 3, 2016

Howdy,
I have just noticed that bud heavily leaks memory in our setup.

We had 4 workers running and each of them consumed ~5.7 GB of memory after running for around 7 days. I've now started to collectd memory usage data and will report my findings but wanted to give some advance notice.

Edit: This is what the memory usage looks like when bud is restarted: http://i.imgur.com/nWERbLm.png

@indutny
Copy link
Owner

indutny commented Dec 5, 2016

Wow! Were there any leaks before the latest patches?

@phillipp
Copy link
Contributor Author

phillipp commented Dec 5, 2016

I honestly have no idea, I saw this 2 days ago and started collecting some data. Here you can see the VM size graphed for the last 2 days. "ares" is just idling and doesn't handle any real traffic, zeus takes the most part of prod traffic and hera the rest. You can see that more traffic makes it grow faster:

http://i.imgur.com/9DPfijV.png

I have checked the derivative (change over time, basically allocated/freed per second), too, and it is mostly allocations:

http://i.imgur.com/AWgVHrt.png

It looks like there are large blocks that get allocated (~6-8 MiB) that look like reponse or request bodies, but such large amounts never get free'd, only small.

The core dump I got from the other segfauklt is 2 GB sin size. That could be a potential starting point for looking, what kind of memory is allocated. I will see if I can dig around a bit. Maybe you have an idea where this could occur? SNI again? ;-)

@phillipp
Copy link
Contributor Author

phillipp commented Dec 7, 2016

I have now run bud with valgrind and got some results. It looks like mostly it's the certificates that leak memory:

==17155== 132,384 bytes in 4 blocks are possibly lost in loss record 1,328 of 1,368
==17155==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17155==    by 0x4FEF69: default_malloc_ex (mem.c:79)
==17155==    by 0x4FF5F9: CRYPTO_malloc (mem.c:346)
==17155==    by 0x4135F6: freelist_extract (s3_both.c:626)
==17155==    by 0x4137C8: ssl3_setup_read_buffer (s3_both.c:682)
==17155==    by 0x425314: ssl3_read_bytes (s3_pkt.c:1168)
==17155==    by 0x4230C9: ssl3_read_internal (s3_lib.c:4459)
==17155==    by 0x423178: ssl3_read (s3_lib.c:4483)
==17155==    by 0x439DD8: SSL_read (ssl_lib.c:1027)
==17155==    by 0x5B1297: bud_client_backend_out (client.c:638)
==17155==    by 0x5B08F4: bud_client_cycle (client.c:452)
==17155==    by 0x5B066B: bud_client_read_cb (client.c:412)
==17155==
==17155== 165,480 bytes in 5 blocks are possibly lost in loss record 1,338 of 1,368
==17155==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17155==    by 0x4FEF69: default_malloc_ex (mem.c:79)
==17155==    by 0x4FF5F9: CRYPTO_malloc (mem.c:346)
==17155==    by 0x4135F6: freelist_extract (s3_both.c:626)
==17155==    by 0x4137C8: ssl3_setup_read_buffer (s3_both.c:682)
==17155==    by 0x425314: ssl3_read_bytes (s3_pkt.c:1168)
==17155==    by 0x4230C9: ssl3_read_internal (s3_lib.c:4459)
==17155==    by 0x423178: ssl3_read (s3_lib.c:4483)
==17155==    by 0x439DD8: SSL_read (ssl_lib.c:1027)
==17155==    by 0x5B1297: bud_client_backend_out (client.c:638)
==17155==    by 0x5B08F4: bud_client_cycle (client.c:452)
==17155==    by 0x5B0E91: bud_client_sni_cb (client.c:557)
==17155==
==17155== 319,488 bytes in 13 blocks are possibly lost in loss record 1,345 of 1,368
==17155==    at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17155==    by 0x5B3E8A: bud_hashmap_init (common.c:253)
==17155==    by 0x5B5686: bud_config_load_backend_list (config.c:445)
==17155==    by 0x5C5655: bud_sni_from_json (sni.c:83)
==17155==    by 0x5B0BC5: bud_client_sni_cb (client.c:505)
==17155==    by 0x5C082F: bud_http_request_done (http-pool.c:292)
==17155==    by 0x5C12FF: bud_http_request_read_cb (http-pool.c:553)
==17155==    by 0x5D6794: uv__read (stream.c:1197)
==17155==    by 0x5D695A: uv__stream_io (stream.c:1264)
==17155==    by 0x5DB508: uv__io_poll (linux-core.c:382)
==17155==    by 0x5C98D3: uv_run (core.c:352)
==17155==    by 0x5AF66F: main (bud.c:66)
==17155==
==17155== 326,394 bytes in 5,715 blocks are definitely lost in loss record 1,346 of 1,368
==17155==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17155==    by 0x5C152E: bud_http_request_escape_url (http-pool.c:631)
==17155==    by 0x5C02F6: bud_http_request (http-pool.c:162)
==17155==    by 0x5C05A7: bud_http_get (http-pool.c:227)
==17155==    by 0x5B3189: bud_client_on_hello (client.c:1250)
==17155==    by 0x5B30EC: bud_client_ssl_cert_cb (client.c:1234)
==17155==    by 0x428F99: ssl3_get_client_hello (s3_srvr.c:1402)
==17155==    by 0x4271C3: ssl3_accept (s3_srvr.c:363)
==17155==    by 0x439CF9: SSL_accept (ssl_lib.c:999)
==17155==    by 0x4125DE: ssl23_get_client_hello (s23_srvr.c:647)
==17155==    by 0x4117AE: ssl23_accept (s23_srvr.c:209)
==17155==    by 0x4111C5: ssl23_read (s23_lib.c:133)
==17155==
==17155== 1,987,440 bytes in 30 blocks are possibly lost in loss record 1,365 of 1,368
==17155==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17155==    by 0x5AF789: bud_client_create (client.c:67)
==17155==    by 0x5C60C2: bud_worker_ipc_client_cb (worker.c:110)
==17155==    by 0x5C1C03: bud_ipc_accept_pending (ipc.c:175)
==17155==    by 0x5C1B3A: bud_ipc_read_cb (ipc.c:159)
==17155==    by 0x5D6794: uv__read (stream.c:1197)
==17155==    by 0x5D695A: uv__stream_io (stream.c:1264)
==17155==    by 0x5DB508: uv__io_poll (linux-core.c:382)
==17155==    by 0x5C98D3: uv_run (core.c:352)
==17155==    by 0x5AF66F: main (bud.c:66)
==17155==
==17155== 29,410,423 (979,984 direct, 28,430,439 indirect) bytes in 5,326 blocks are definitely lost in loss record 1,368 of 1,368
==17155==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17155==    by 0x4FEF69: default_malloc_ex (mem.c:79)
==17155==    by 0x4FF5F9: CRYPTO_malloc (mem.c:346)
==17155==    by 0x46D927: asn1_item_ex_combine_new (tasn_new.c:179)
==17155==    by 0x46D68A: ASN1_item_ex_new (tasn_new.c:85)
==17155==    by 0x46A176: ASN1_item_ex_d2i (tasn_dec.c:390)
==17155==    by 0x469706: ASN1_item_d2i (tasn_dec.c:146)
==17155==    by 0x473DDB: d2i_X509 (x_x509.c:143)
==17155==    by 0x51336D: PEM_ASN1_read_bio (pem_oth.c:81)
==17155==    by 0x51512E: PEM_read_bio_X509 (pem_x509.c:68)
==17155==    by 0x5BA881: bud_context_use_certificate_chain (utils.c:95)
==17155==    by 0x5B7D35: bud_context_load_cert (context.c:178)
==17155==
==17155== LEAK SUMMARY:
==17155==    definitely lost: 1,310,677 bytes in 11,047 blocks
==17155==    indirectly lost: 28,430,439 bytes in 666,504 blocks
==17155==      possibly lost: 3,529,552 bytes in 7,332 blocks
==17155==    still reachable: 1,134,580 bytes in 5,747 blocks
==17155==         suppressed: 0 bytes in 0 blocks
==17155== Reachable blocks (those to which a pointer was found) are not shown.
==17155== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==17155==
==17155== For counts of detected and suppressed errors, rerun with: -v
==17155== ERROR SUMMARY: 241 errors from 241 contexts (suppressed: 0 from 0)

Another one points in the same direction:

==16081== 1,126,216 bytes in 17 blocks are possibly lost in loss record 1,245 of 1,261
==16081==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16081==    by 0x5AF789: bud_client_create (client.c:67)
==16081==    by 0x5C60C2: bud_worker_ipc_client_cb (worker.c:110)
==16081==    by 0x5C1C03: bud_ipc_accept_pending (ipc.c:175)
==16081==    by 0x5C1B3A: bud_ipc_read_cb (ipc.c:159)
==16081==    by 0x5D6794: uv__read (stream.c:1197)
==16081==    by 0x5D695A: uv__stream_io (stream.c:1264)
==16081==    by 0x5DB508: uv__io_poll (linux-core.c:382)
==16081==    by 0x5C98D3: uv_run (core.c:352)
==16081==    by 0x5AF66F: main (bud.c:66)
==16081==
==16081== 62,968,003 (2,115,448 direct, 60,852,555 indirect) bytes in 11,497 blocks are definitely lost in loss record 1,261 of 1,261
==16081==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16081==    by 0x4FEF69: default_malloc_ex (mem.c:79)
==16081==    by 0x4FF5F9: CRYPTO_malloc (mem.c:346)
==16081==    by 0x46D927: asn1_item_ex_combine_new (tasn_new.c:179)
==16081==    by 0x46D68A: ASN1_item_ex_new (tasn_new.c:85)
==16081==    by 0x46A176: ASN1_item_ex_d2i (tasn_dec.c:390)
==16081==    by 0x469706: ASN1_item_d2i (tasn_dec.c:146)
==16081==    by 0x473DDB: d2i_X509 (x_x509.c:143)
==16081==    by 0x51336D: PEM_ASN1_read_bio (pem_oth.c:81)
==16081==    by 0x51512E: PEM_read_bio_X509 (pem_x509.c:68)
==16081==    by 0x5BA881: bud_context_use_certificate_chain (utils.c:95)
==16081==    by 0x5B7D35: bud_context_load_cert (context.c:178)
==16081==

Is it possible that certificates that have been loaded via HTTP are not freed after the request has ended?

@indutny
Copy link
Owner

indutny commented Dec 8, 2016

Should be fixed in 4.0.25, please give it a try! Fixed by: b93b897

@indutny indutny closed this as completed Dec 8, 2016
@indutny
Copy link
Owner

indutny commented Dec 8, 2016

(Sorry for delay, I was on a conference and the internet was spotty)

@phillipp
Copy link
Contributor Author

phillipp commented Dec 8, 2016

Nice job, thanks! I had thought the problem would lie somewhere else entirely! I'll report the new results in a few days!

@indutny
Copy link
Owner

indutny commented Dec 8, 2016

Thank you!

@phillipp
Copy link
Contributor Author

phillipp commented Dec 9, 2016

It is still leaking ~70 megs of memory in 24 hours (x 30 days => 2.1 Gigs), but not nearly as fast as before.

I don't want to waste your time, but to me this seems like something that is still worth looking into...

I thought the client.c:67 might be worth a shot, but I couldn't find an obvious case where it should be free'd and is not, I'm just not familiar enough with the code :-(

@phillipp
Copy link
Contributor Author

phillipp commented Dec 9, 2016

Ah, screenshot, If you like: https://i.imgur.com/gMjSCKr.png

@indutny
Copy link
Owner

indutny commented Dec 9, 2016

Sorry, @phillipp ! At least we have more time for this now :) Will take a look at it soon.

@phillipp
Copy link
Contributor Author

phillipp commented Dec 9, 2016 via email

indutny added a commit that referenced this issue Dec 18, 2016
@indutny
Copy link
Owner

indutny commented Dec 18, 2016

@phillipp thanks for kind words! Just pushed the commit that should fix the leak, please give it a try 😉 (bud@4.0.26)

@phillipp
Copy link
Contributor Author

Nice, I'll give it a spin!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants