Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
pmlogger memory leak and crash after Ctrl-C #116
Comments
|
Thanks Marko - I can reproduce this (as listed below). I also saw a similar buffer overflow during a QA run with tests 666 and 669 but on re-running those tests it did not occur - perhaps timing related with the interrupt? pmlogger is in libpcp::__pmGetPDU() when the interrupt is received [mgoodwin@kilcunda ~]$ /usr/libexec/pcp/bin/pmlogger -h localhost -r -c /tmp/mlan-leak.conf -m marko_check -l /tmp/pmlogger.log 123 |
|
If I revert commit 993b5c8 the buffer overrun does not occur, but it looks like the memory leak still does. I think we need to fix this for 3.11.5 - the buffer overrun isn't good but it's tolerable since it only occurs on a SIGTERM interrupt. But the memory leak is a real problem (and may not even be related to the mentioned commit id). |
|
| [...] need to fix this for 3.11.5 pcp-3.11.6 it will have to be, as pcp-3.11.5 was tagged several hours ago. |
^C after ten-twenty seconds indicates a few leaks of concern: ==15718== 15,640 bytes in 893 blocks are possibly lost in loss record 143 of 163 ==15718== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299) ==15718== by 0x4E888DF: __dmpostfetch (derive_fetch.c:1215) ==15718== by 0x4E4E1F2: __pmFinishResult (fetch.c:62) ==15718== by 0x1100E1: myFetch (fetch.c:150) ==15718== by 0x110D1D: do_work (callback.c:513) ==15718== by 0x10D4DF: main (pmlogger.c:1028) ==15718== ==15718== 41,696 bytes in 904 blocks are possibly lost in loss record 150 of 163 ==15718== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299) ==15718== by 0x4E8853B: __dmpostfetch (derive_fetch.c:1194) ==15718== by 0x4E4E1F2: __pmFinishResult (fetch.c:62) ==15718== by 0x1100E1: myFetch (fetch.c:150) ==15718== by 0x110D1D: do_work (callback.c:513) ==15718== by 0x10D4DF: main (pmlogger.c:1028) ==15718== ==15718== 11,577,697 (1,294,560 direct, 10,283,137 indirect) bytes in 180 blocks are definitely lost in loss record 163 of 163 ==15718== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299) ==15718== by 0x4E88491: __dmpostfetch (derive_fetch.c:1116) ==15718== by 0x4E4E1F2: __pmFinishResult (fetch.c:62) ==15718== by 0x1100E1: myFetch (fetch.c:150) ==15718== by 0x110D1D: do_work (callback.c:513) ==15718== by 0x10D4DF: main (pmlogger.c:1028) ==15718== Looking at |
|
As for the crash while responding to the SIGINT, gdb -args pmlogger -c tmp.conf /tmp/foo
(gdb) break __fortify_fail
(gdb) run
^C
(gdb) signal SIGINT
Breakpoint 1, __GI___fortify_fail (msg=msg@entry=0x2aaaab2fdc58 "buffer overflow detected") at fortify_fail.c:27
27 {
(gdb) bt
#0 __GI___fortify_fail (msg=msg@entry=0x2aaaab2fdc58 "buffer overflow detected") at fortify_fail.c:27
#1 0x00002aaaab285d80 in __GI___chk_fail () at chk_fail.c:28
#2 0x00002aaaab287bca in __fdelt_chk (d=d@entry=-1) at fdelt_chk.c:25
#3 0x00002aaaaad2ffed in __pmSocketReady (fd=fd@entry=-1, timeout=timeout@entry=0x7fffffffa7a0) at secureconnect.c:1630
#4 0x00002aaaaaceeee2 in pduread (fd=fd@entry=-1, buf=buf@entry=0x555557cf5050 "", len=len@entry=12, part=part@entry=-1,
timeout=timeout@entry=-1) at pdu.c:170
#5 0x00002aaaaacef8ff in __pmGetPDU (fd=-1, mode=mode@entry=0, timeout=timeout@entry=-1,
result=result@entry=0x7fffffffa988) at pdu.c:395
#6 0x000055555555bf3b in myFetch (numpmid=, pmidlist=, pdup=pdup@entry=0x7fffffffaab8)
at fetch.c:121
(gdb) frame 6
(gdb) p * ctxp->c_pmcd
$6 = {pc_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0,
__list = {__prev = 0x0, __next = 0x0}}, __size = '\000' , __align = 0}, pc_refcnt = 1, pc_fd = -1,
pc_hosts = 0x555555771260, pc_nhosts = 1, pc_timeout = 0, pc_tout_sec = 10, pc_again = 0}
Note the |
|
since we're going to exit on SIGINT anyway, this patch avoids the buffer overflow (applies a very blunt hammer to the problem) diff --git a/src/pmlogger/src/ports.c b/src/pmlogger/src/ports.c
|
|
The blunt hammer avoids the log cleanup, meaning we may end up with the tail of the archive being damaged ... I think we should be aiming for a more surgical solution. |
|
FWIW the memory leak exists independent of the SIGINT. A |
natoscott
pushed a commit
that referenced
this issue
Oct 2, 2016
fche
referenced this issue
Oct 31, 2016
Closed
*** buffer overflow detected ***: /usr/libexec/pcp/bin/pmlogger terminated #265
|
OK, it has taken a while, but getting back to this.
The code fixes that resolve all the memory leaks are in commits 535cacc, 8bcbe54 and 999e407 that are currently sitting in my nextrelease branch and will flow up stream in a matter of days. |
mkevac commentedSep 26, 2016
Hi.
With this config
pmlogger executed as
$ /usr/libexec/pcp/bin/pmlogger -h localhost -r -c /tmp/cppbig23.mlan-leak.conf -m marko_check -l /tmp/pmlogger.log 123leaks memory.
You can see this with pidstat output
If you keep it that way, it will allocate all available memory on server.
Additionaly, if you stop such process with Ctrl-C, it will crash with buffer overflow: