Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

pmlogger memory leak and crash after Ctrl-C #116

Open
mkevac opened this Issue Sep 26, 2016 · 9 comments

Comments

Projects
None yet
5 participants

mkevac commented Sep 26, 2016

Hi.

marko@fedora:~/tmp $ cat /etc/fedora-release 
Fedora release 24 (Twenty Four)
marko@fedora:~/tmp $ sudo dnf info pcp
Last metadata expiration check: 1:54:49 ago on Mon Sep 26 16:02:17 2016.
Installed Packages
Name        : pcp
Arch        : x86_64
Epoch       : 0
Version     : 3.11.4
Release     : 1.fc24
Size        : 3.3 M
Repo        : @System
From repo   : updates
Summary     : System-level performance monitoring and performance management
URL         : http://www.pcp.io
License     : GPLv2+ and LGPLv2.1+ and CC-BY
Description : Performance Co-Pilot (PCP) provides a framework and services to support
            : system-level performance monitoring and performance management.
            : 
            : The PCP open source release provides a unifying abstraction for all of
            : the interesting performance data in a system, and allows client
            : applications to easily retrieve and process any subset of that data.

With this config

$ cat /tmp/cppbig23.mlan-leak.conf
log mandatory on 10 msec {
    cgroup
    containers
    disk
    event
    filesys
    hinv
    hotproc
    ipc
    jbd2
    kernel
    mem
    mmv
    network
    nfs
    nfs3
    nfs4
    pmcd
    pmda
    proc
    quota
    rpc
    swap
    swapdev
    sysfs
    tmpfs
    vfs
    xfs
}

pmlogger executed as
$ /usr/libexec/pcp/bin/pmlogger -h localhost -r -c /tmp/cppbig23.mlan-leak.conf -m marko_check -l /tmp/pmlogger.log 123
leaks memory.

You can see this with pidstat output

$ pidstat -r -p $(pgrep pmlogger) 1
Linux 4.7.3-200.fc24.x86_64 (fedora)    09/26/2016  _x86_64_    (8 CPU)

05:52:22 PM   UID       PID  minflt/s  majflt/s     VSZ     RSS   %MEM  Command
05:52:23 PM  1000     21394    270.00      0.00   76172    9656   0.03  pmlogger
05:52:24 PM  1000     21394    278.00      0.00   77284   10772   0.03  pmlogger
05:52:25 PM  1000     21394    282.00      0.00   78444   11780   0.04  pmlogger
05:52:26 PM  1000     21394    270.00      0.00   79492   12840   0.04  pmlogger
05:52:27 PM  1000     21394    280.00      0.00   80648   13992   0.04  pmlogger
05:52:28 PM  1000     21394    263.00      0.00   81664   15152   0.05  pmlogger
05:52:29 PM  1000     21394    265.00      0.00   82772   16276   0.05  pmlogger
05:52:30 PM  1000     21394    280.00      0.00   83920   17280   0.05  pmlogger
05:52:31 PM  1000     21394    276.00      0.00   85072   18440   0.06  pmlogger
05:52:32 PM  1000     21394    278.00      0.00   86076   19568   0.06  pmlogger
05:52:33 PM  1000     21394    276.00      0.00   87288   20536   0.06  pmlogger
05:52:34 PM  1000     21394    272.00      0.00   88284   21816   0.07  pmlogger
^C
Average:     1000     21394    274.17      0.00   82260   15676   0.05  pmlogger

If you keep it that way, it will allocate all available memory on server.

Additionaly, if you stop such process with Ctrl-C, it will crash with buffer overflow:

$ /usr/libexec/pcp/bin/pmlogger -h localhost -r -c /tmp/cppbig23.mlan-leak.conf -m marko_check -l /tmp/pmlogger.log 123
^C*** buffer overflow detected ***: /usr/libexec/pcp/bin/pmlogger terminated
======= Backtrace: =========
/lib64/libc.so.6(+0x7566b)[0x7f32c3a5766b]
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f32c3af6c57]
/lib64/libc.so.6(+0x112d80)[0x7f32c3af4d80]
/lib64/libc.so.6(+0x114bca)[0x7f32c3af6bca]
/lib64/libpcp.so.3(+0x5efed)[0x7f32c401efed]
/lib64/libpcp.so.3(+0x1dee2)[0x7f32c3fddee2]
/lib64/libpcp.so.3(__pmGetPDU+0x5f)[0x7f32c3fde8ff]
/usr/libexec/pcp/bin/pmlogger(+0x7f3b)[0x55b298668f3b]
/usr/libexec/pcp/bin/pmlogger(+0x8d1e)[0x55b298669d1e]
/usr/libexec/pcp/bin/pmlogger(+0x54e0)[0x55b2986664e0]
/lib64/libc.so.6(__libc_start_main+0xf1)[0x7f32c3a02731]
/usr/libexec/pcp/bin/pmlogger(+0x65e9)[0x55b2986675e9]
======= Memory map: ========
55b298661000-55b29867b000 r-xp 00000000 fd:00 2366568                    /usr/bin/pmlogger
55b29887b000-55b29887c000 r--p 0001a000 fd:00 2366568                    /usr/bin/pmlogger
55b29887c000-55b29887d000 rw-p 0001b000 fd:00 2366568                    /usr/bin/pmlogger
55b29887d000-55b29887e000 rw-p 00000000 00:00 0 
55b29a36c000-55b29b816000 rw-p 00000000 00:00 0                          [heap]
7f32bfdf1000-7f32bfdfc000 r-xp 00000000 fd:00 2370037                    /usr/lib64/libnss_files-2.23.so
7f32bfdfc000-7f32bfffb000 ---p 0000b000 fd:00 2370037                    /usr/lib64/libnss_files-2.23.so
7f32bfffb000-7f32bfffc000 r--p 0000a000 fd:00 2370037                    /usr/lib64/libnss_files-2.23.so
7f32bfffc000-7f32bfffd000 rw-p 0000b000 fd:00 2370037                    /usr/lib64/libnss_files-2.23.so
7f32bfffd000-7f32c0003000 rw-p 00000000 00:00 0 
7f32c0003000-7f32c0007000 r-xp 00000000 fd:00 2366396                    /usr/lib64/libattr.so.1.1.0
7f32c0007000-7f32c0206000 ---p 00004000 fd:00 2366396                    /usr/lib64/libattr.so.1.1.0
7f32c0206000-7f32c0207000 r--p 00003000 fd:00 2366396                    /usr/lib64/libattr.so.1.1.0
7f32c0207000-7f32c0208000 rw-p 00000000 00:00 0 
7f32c0208000-7f32c027a000 r-xp 00000000 fd:00 2370087                    /usr/lib64/libpcre.so.1.2.7
7f32c027a000-7f32c0479000 ---p 00072000 fd:00 2370087                    /usr/lib64/libpcre.so.1.2.7
7f32c0479000-7f32c047a000 r--p 00071000 fd:00 2370087                    /usr/lib64/libpcre.so.1.2.7
7f32c047a000-7f32c047b000 rw-p 00072000 fd:00 2370087                    /usr/lib64/libpcre.so.1.2.7
7f32c047b000-7f32c0491000 r-xp 00000000 fd:00 2360969                    /usr/lib64/libgcc_s-6.2.1-20160916.so.1
7f32c0491000-7f32c0690000 ---p 00016000 fd:00 2360969                    /usr/lib64/libgcc_s-6.2.1-20160916.so.1
7f32c0690000-7f32c0691000 r--p 00015000 fd:00 2360969                    /usr/lib64/libgcc_s-6.2.1-20160916.so.1
7f32c0691000-7f32c0692000 rw-p 00016000 fd:00 2360969                    /usr/lib64/libgcc_s-6.2.1-20160916.so.1
7f32c0692000-7f32c06a4000 r-xp 00000000 fd:00 2359774                    /usr/lib64/libgpg-error.so.0.19.1
7f32c06a4000-7f32c08a4000 ---p 00012000 fd:00 2359774                    /usr/lib64/libgpg-error.so.0.19.1
7f32c08a4000-7f32c08a5000 r--p 00012000 fd:00 2359774                    /usr/lib64/libgpg-error.so.0.19.1
7f32c08a5000-7f32c08a6000 rw-p 00013000 fd:00 2359774                    /usr/lib64/libgpg-error.so.0.19.1
7f32c08a6000-7f32c0982000 r-xp 00000000 fd:00 2370131                    /usr/lib64/libgcrypt.so.20.0.6
7f32c0982000-7f32c0b82000 ---p 000dc000 fd:00 2370131                    /usr/lib64/libgcrypt.so.20.0.6
7f32c0b82000-7f32c0b83000 r--p 000dc000 fd:00 2370131                    /usr/lib64/libgcrypt.so.20.0.6
7f32c0b83000-7f32c0b8b000 rw-p 000dd000 fd:00 2370131                    /usr/lib64/libgcrypt.so.20.0.6
7f32c0b8b000-7f32c0b8c000 rw-p 00000000 00:00 0 
7f32c0b8c000-7f32c0b9d000 r-xp 00000000 fd:00 2366821                    /usr/lib64/liblz4.so.1.7.1
7f32c0b9d000-7f32c0d9c000 ---p 00011000 fd:00 2366821                    /usr/lib64/liblz4.so.1.7.1
7f32c0d9c000-7f32c0d9d000 r--p 00010000 fd:00 2366821                    /usr/lib64/liblz4.so.1.7.1
7f32c0d9d000-7f32c0d9e000 rw-p 00000000 00:00 0 
7f32c0d9e000-7f32c0dc3000 r-xp 00000000 fd:00 2366822                    /usr/lib64/liblzma.so.5.2.2
7f32c0dc3000-7f32c0fc2000 ---p 00025000 fd:00 2366822                    /usr/lib64/liblzma.so.5.2.2
7f32c0fc2000-7f32c0fc3000 r--p 00024000 fd:00 2366822                    /usr/lib64/liblzma.so.5.2.2
7f32c0fc3000-7f32c0fc4000 rw-p 00000000 00:00 0 
7f32c0fc4000-7f32c0fc8000 r-xp 00000000 fd:00 2366447                    /usr/lib64/libcap.so.2.24
7f32c0fc8000-7f32c11c7000 ---p 00004000 fd:00 2366447                    /usr/lib64/libcap.so.2.24
7f32c11c7000-7f32c11c8000 r--p 00003000 fd:00 2366447                    /usr/lib64/libcap.so.2.24
7f32c11c8000-7f32c11c9000 rw-p 00004000 fd:00 2366447                    /usr/lib64/libcap.so.2.24
7f32c11c9000-7f32c11ec000 r-xp 00000000 fd:00 2370089                    /usr/lib64/libselinux.so.1
7f32c11ec000-7f32c13ec000 ---p 00023000 fd:00 2370089                    /usr/lib64/libselinux.so.1
7f32c13ec000-7f32c13ed000 r--p 00023000 fd:00 2370089                    /usr/lib64/libselinux.so.1
7f32c13ed000-7f32c13ee000 rw-p 00024000 fd:00 2370089                    /usr/lib64/libselinux.so.1
7f32c13ee000-7f32c13f0000 rw-p 00000000 00:00 0 
7f32c13f0000-7f32c13f2000 r-xp 00000000 fd:00 2360747                    /usr/lib64/libfreebl3.so
7f32c13f2000-7f32c15f1000 ---p 00002000 fd:00 2360747                    /usr/lib64/libfreebl3.so
7f32c15f1000-7f32c15f2000 r--p 00001000 fd:00 2360747                    /usr/lib64/libfreebl3.so
7f32c15f2000-7f32c15f3000 rw-p 00002000 fd:00 2360747                    /usr/lib64/libfreebl3.so
7f32c15f3000-7f32c1641000 r-xp 00000000 fd:00 2373020                    /usr/lib64/libdbus-1.so.3.15.2
7f32c1641000-7f32c1840000 ---p 0004e000 fd:00 2373020                    /usr/lib64/libdbus-1.so.3.15.2
7f32c1840000-7f32c1842000 r--p 0004d000 fd:00 2373020                    /usr/lib64/libdbus-1.so.3.15.2
7f32c1842000-7f32c1843000 rw-p 0004f000 fd:00 2373020                    /usr/lib64/libdbus-1.so.3.15.2
7f32c1843000-7f32c184a000 r-xp 00000000 fd:00 2370024                    /usr/lib64/libcrypt-2.23.so
7f32c184a000-7f32c1a49000 ---p 00007000 fd:00 2370024                    /usr/lib64/libcrypt-2.23.so
7f32c1a49000-7f32c1a4a000 r--p 00006000 fd:00 2370024                    /usr/lib64/libcrypt-2.23.so
7f32c1a4a000-7f32c1a4b000 rw-p 00007000 fd:00 2370024                    /usr/lib64/libcrypt-2.23.so
7f32c1a4b000-7f32c1a79000 rw-p 00000000 00:00 0 
7f32c1a79000-7f32c1a90000 r-xp 00000000 fd:00 2370046                    /usr/lib64/libresolv-2.23.so
7f32c1a90000-7f32c1c8f000 ---p 00017000 fd:00 2370046                    /usr/lib64/libresolv-2.23.so
7f32c1c8f000-7f32c1c90000 r--p 00016000 fd:00 2370046                    /usr/lib64/libresolv-2.23.so
7f32c1c90000-7f32c1c91000 rw-p 00017000 fd:00 2370046                    /usr/lib64/libresolv-2.23.so
7f32c1c91000-7f32c1c93000 rw-p 00000000 00:00 0 
7f32c1c93000-7f32c1c9a000 r-xp 00000000 fd:00 2370048                    /usr/lib64/librt-2.23.so
7f32c1c9a000-7f32c1e99000 ---p 00007000 fd:00 2370048                    /usr/lib64/librt-2.23.so
7f32c1e99000-7f32c1e9a000 r--p 00006000 fd:00 2370048                    /usr/lib64/librt-2.23.so
7f32c1e9a000-7f32c1e9b000 rw-p 00007000 fd:00 2370048                    /usr/lib64/librt-2.23.so
7f32c1e9b000-7f32c1eb0000 r-xp 00000000 fd:00 2367257                    /usr/lib64/libz.so.1.2.8
7f32c1eb0000-7f32c20af000 ---p 00015000 fd:00 2367257                    /usr/lib64/libz.so.1.2.8
7f32c20af000-7f32c20b0000 r--p 00014000 fd:00 2367257                    /usr/lib64/libz.so.1.2.8
7f32c20b0000-7f32c20b1000 rw-p 00015000 fd:00 2367257                    /usr/lib64/libz.so.1.2.8
7f32c20b1000-7f32c20b4000 r-xp 00000000 fd:00 2370099                    /usr/lib64/libplds4.so
7f32c20b4000-7f32c22b3000 ---p 00003000 fd:00 2370099                    /usr/lib64/libplds4.so
7f32c22b3000-7f32c22b4000 r--p 00002000 fd:00 2370099                    /usr/lib64/libplds4.so
7f32c22b4000-7f32c22b5000 rw-p 00003000 fd:00 2370099                    /usr/lib64/libplds4.so
7f32c22b5000-7f32c22b9000 r-xp 00000000 fd:00 2370098                    /usr/lib64/libplc4.so
7f32c22b9000-7f32c24b8000 ---p 00004000 fd:00 2370098                    /usr/lib64/libplc4.so
7f32c24b8000-7f32c24b9000 r--p 00003000 fd:00 2370098                    /usr/lib64/libplc4.so
7f32c24b9000-7f32c24ba000 rw-p 00004000 fd:00 2370098                    /usr/lib64/libplc4.so
7f32c24ba000-7f32c24e1000 r-xp 00000000 fd:00 2365346                    /usr/lib64/libnssutil3.so
7f32c24e1000-7f32c26e0000 ---p 00027000 fd:00 2365346                    /usr/lib64/libnssutil3.so
7f32c26e0000-7f32c26e7000 r--p 00026000 fd:00 2365346                    /usr/lib64/libnssutil3.so
7f32c26e7000-7f32c26e8000 rw-p 0002d000 fd:00 2365346                    /usr/lib64/libnssutil3.so
7f32c26e8000-7f32c26eb000 r-xp 00000000 fd:00 2370025                    /usr/lib64/libdl-2.23.so
7f32c26eb000-7f32c28ea000 ---p 00003000 fd:00 2370025                    /usr/lib64/libdl-2.23.so
7f32c28ea000-7f32c28eb000 r--p 00002000 fd:00 2370025                    /usr/lib64/libdl-2.23.so
7f32c28eb000-7f32c28ec000 rw-p 00003000 fd:00 2370025                    /usr/lib64/libdl-2.23.so
7f32c28ec000-7f32c29f4000 r-xp 00000000 fd:00 2370027                    /usr/lib64/libm-2.23.so
7f32c29f4000-7f32c2bf3000 ---p 00108000 fd:00 2370027                    /usr/lib64/libm-2.23.so
7f32c2bf3000-7f32c2bf4000 r--p 00107000 fd:00 2370027                    /usr/lib64/libm-2.23.so
7f32c2bf4000-7f32c2bf5000 rw-p 00108000 fd:00 2370027                    /usr/lib64/libm-2.23.so
7f32c2bf5000-7f32c2c05000 r-xp 00000000 fd:00 2373023                    /usr/lib64/libavahi-client.so.3.2.9
7f32c2c05000-7f32c2e04000 ---p 00010000 fd:00 2373023                    /usr/lib64/libavahi-client.so.3.2.9
7f32c2e04000-7f32c2e05000 r--p 0000f000 fd:00 2373023                    /usr/lib64/libavahi-client.so.3.2.9
7f32c2e05000-7f32c2e06000 rw-p 00010000 fd:00 2373023                    /usr/lib64/libavahi-client.so.3.2.9
7f32c2e06000-7f32c2e11000 r-xp 00000000 fd:00 2366078                    /usr/lib64/libavahi-common.so.3.5.3
7f32c2e11000-7f32c3011000 ---p 0000b000 fd:00 2366078                    /usr/lib64/libavahi-common.so.3.5.3
7f32c3011000-7f32c3012000 r--p 0000b000 fd:00 2366078                    /usr/lib64/libavahi-common.so.3.5.3
7f32c3012000-7f32c3013000 rw-p 0000c000 fd:00 2366078                    /usr/lib64/libavahi-common.so.3.5.3
7f32c3013000-7f32c302f000 r-xp 00000000 fd:00 2367032                    /usr/lib64/libsasl2.so.3.0.0
7f32c302f000-7f32c322e000 ---p 0001c000 fd:00 2367032                    /usr/lib64/libsasl2.so.3.0.0
7f32c322e000-7f32c322f000 r--p 0001b000 fd:00 2367032                    /usr/lib64/libsasl2.so.3.0.0
7f32c322f000-7f32c3230000 rw-p 0001c000 fd:00 2367032                    /usr/lib64/libsasl2.so.3.0.0
7f32c3230000-7f32c326b000 r-xp 00000000 fd:00 2370097                    /usr/lib64/libnspr4.so
7f32c326b000-7f32c346a000 ---p 0003b000 fd:00 2370097                    /usr/lib64/libnspr4.so
7f32c346a000-7f32c346b000 r--p 0003a000 fd:00 2370097                    /usr/lib64/libnspr4.so
7f32c346b000-7f32c346d000 rw-p 0003b000 fd:00 2370097                    /usr/lib64/libnspr4.so
7f32c346d000-7f32c346f000 rw-p 00000000 00:00 0 
7f32c346f000-7f32c3590000 r-xp 00000000 fd:00 2392926                    /usr/lib64/libnss3.so
7f32c3590000-7f32c378f000 ---p 00121000 fd:00 2392926                    /usr/lib64/libnss3.so
7f32c378f000-7f32c3794000 r--p 00120000 fd:00 2392926                    /usr/lib64/libnss3.so
7f32c3794000-7f32c3796000 rw-p 00125000 fd:00 2392926                    /usr/lib64/libnss3.so
7f32c3796000-7f32c3798000 rw-p 00000000 00:00 0 
7f32c3798000-7f32c37dd000 r-xp 00000000 fd:00 2392928                    /usr/lib64/libssl3.so
7f32c37dd000-7f32c39dc000 ---p 00045000 fd:00 2392928                    /usr/lib64/libssl3.so
7f32c39dc000-7f32c39e0000 r--p 00044000 fd:00 2392928                    /usr/lib64/libssl3.so
7f32c39e0000-7f32c39e1000 rw-p 00048000 fd:00 2392928                    /usr/lib64/libssl3.so
7f32c39e1000-7f32c39e2000 rw-p 00000000 00:00 0 
7f32c39e2000-7f32c3b9b000 r-xp 00000000 fd:00 2370021                    /usr/lib64/libc-2.23.so
7f32c3b9b000-7f32c3d9a000 ---p 001b9000 fd:00 2370021                    /usr/lib64/libc-2.23.so
7f32c3d9a000-7f32c3d9e000 r--p 001b8000 fd:00 2370021                    /usr/lib64/libc-2.23.so
7f32c3d9e000-7f32c3da0000 rw-p 001bc000 fd:00 2370021                    /usr/lib64/libc-2.23.so
7f32c3da0000-7f32c3da4000 rw-p 00000000 00:00 0 
7f32c3da4000-7f32c3dbb000 r-xp 00000000 fd:00 2370045                    /usr/lib64/libpthread-2.23.so
7f32c3dbb000-7f32c3fba000 ---p 00017000 fd:00 2370045                    /usr/lib64/libpthread-2.23.so
7f32c3fba000-7f32c3fbb000 r--p 00016000 fd:00 2370045                    /usr/lib64/libpthread-2.23.so
7f32c3fbb000-7f32c3fbc000 rw-p 00017000 fd:00 2370045                    /usr/lib64/libpthread-2.23.so
7f32c3fbc000-7f32c3fc0000 rw-p 00000000 00:00 0 
7f32c3fc0000-7f32c403d000 r-xp 00000000 fd:00 2361064                    /usr/lib64/libpcp.so.3
7f32c403d000-7f32c423c000 ---p 0007d000 fd:00 2361064                    /usr/lib64/libpcp.so.3
7f32c423c000-7f32c4240000 r--p 0007c000 fd:00 2361064                    /usr/lib64/libpcp.so.3
7f32c4240000-7f32c4241000 rw-p 00080000 fd:00 2361064                    /usr/lib64/libpcp.so.3
7f32c4241000-7f32c4246000 rw-p 00000000 00:00 0 
7f32c4246000-7f32c426a000 r-xp 00000000 fd:00 2392880                    /usr/lib64/ld-2.23.so
7f32c43ab000-7f32c43b4000 rw-p 00000000 00:00 0 
7f32c43b4000-7f32c4438000 r-xp 00000000 fd:00 2367449                    /usr/lib64/libsystemd.so.0.14.0
7f32c4438000-7f32c4439000 ---p 00084000 fd:00 2367449                    /usr/lib64/libsystemd.so.0.14.0
7f32c4439000-7f32c443c000 r--p 00084000 fd:00 2367449                    /usr/lib64/libsystemd.so.0.14.0
7f32c443c000-7f32c443d000 rw-p 00087000 fd:00 2367449                    /usr/lib64/libsystemd.so.0.14.0
7f32c443d000-7f32c4445000 rw-p 00000000 00:00 0 
7f32c4467000-7f32c4469000 rw-p 00000000 00:00 0 
7f32c4469000-7f32c446a000 r--p 00023000 fd:00 2392880                    /usr/lib64/ld-2.23.so
7f32c446a000-7f32c446b000 rw-p 00024000 fd:00 2392880                    /usr/lib64/ld-2.23.so
7f32c446b000-7f32c446c000 rw-p 00000000 00:00 0 
7ffc38c28000-7ffc38c4a000 rw-p 00000000 00:00 0                          [stack]
7ffc38dcc000-7ffc38dce000 r--p 00000000 00:00 0                          [vvar]
7ffc38dce000-7ffc38dd0000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Contributor

goodwinos commented Sep 27, 2016

Thanks Marko - I can reproduce this (as listed below). I also saw a similar buffer overflow during a QA run with tests 666 and 669 but on re-running those tests it did not occur - perhaps timing related with the interrupt? pmlogger is in libpcp::__pmGetPDU() when the interrupt is received

[mgoodwin@kilcunda ~]$ /usr/libexec/pcp/bin/pmlogger -h localhost -r -c /tmp/mlan-leak.conf -m marko_check -l /tmp/pmlogger.log 123
^C*** buffer overflow detected ***: /usr/libexec/pcp/bin/pmlogger terminated
======= Backtrace: =========
/lib64/libc.so.6(+0x7566b)[0x7f37834bf66b]
/lib64/libc.so.6(__fortify_fail+0x37)[0x7f378355ec57]
/lib64/libc.so.6(+0x112d80)[0x7f378355cd80]
/lib64/libc.so.6(+0x114bca)[0x7f378355ebca]
/lib64/libpcp.so.3(+0x5f68d)[0x7f3783a8768d]
/lib64/libpcp.so.3(+0x1dfa2)[0x7f3783a45fa2]
/lib64/libpcp.so.3(__pmGetPDU+0x5f)[0x7f3783a469bf]
/usr/libexec/pcp/bin/pmlogger(+0x7f5b)[0x558213182f5b]
/usr/libexec/pcp/bin/pmlogger(+0x8d3e)[0x558213183d3e]
/usr/libexec/pcp/bin/pmlogger(+0x54e0)[0x5582131804e0]
/lib64/libc.so.6(__libc_start_main+0xf1)[0x7f378346a731]
/usr/libexec/pcp/bin/pmlogger(+0x65e9)[0x5582131815e9]

Contributor

goodwinos commented Sep 27, 2016

If I revert commit 993b5c8 the buffer overrun does not occur, but it looks like the memory leak still does. I think we need to fix this for 3.11.5 - the buffer overrun isn't good but it's tolerable since it only occurs on a SIGTERM interrupt. But the memory leak is a real problem (and may not even be related to the mentioned commit id).

Contributor

natoscott commented Sep 27, 2016

| [...] need to fix this for 3.11.5

pcp-3.11.6 it will have to be, as pcp-3.11.5 was tagged several hours ago.

Contributor

fche commented Sep 27, 2016 edited

valgrind --leak-check=full pmlogger -c tmp.conf /tmp/foo

^C after ten-twenty seconds indicates a few leaks of concern:

==15718== 15,640 bytes in 893 blocks are possibly lost in loss record 143 of 163
==15718==    at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==15718==    by 0x4E888DF: __dmpostfetch (derive_fetch.c:1215)
==15718==    by 0x4E4E1F2: __pmFinishResult (fetch.c:62)
==15718==    by 0x1100E1: myFetch (fetch.c:150)
==15718==    by 0x110D1D: do_work (callback.c:513)
==15718==    by 0x10D4DF: main (pmlogger.c:1028)
==15718== 
==15718== 41,696 bytes in 904 blocks are possibly lost in loss record 150 of 163
==15718==    at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==15718==    by 0x4E8853B: __dmpostfetch (derive_fetch.c:1194)
==15718==    by 0x4E4E1F2: __pmFinishResult (fetch.c:62)
==15718==    by 0x1100E1: myFetch (fetch.c:150)
==15718==    by 0x110D1D: do_work (callback.c:513)
==15718==    by 0x10D4DF: main (pmlogger.c:1028)
==15718== 
==15718== 11,577,697 (1,294,560 direct, 10,283,137 indirect) bytes in 180 blocks are definitely lost in loss record 163 of 163
==15718==    at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==15718==    by 0x4E88491: __dmpostfetch (derive_fetch.c:1116)
==15718==    by 0x4E4E1F2: __pmFinishResult (fetch.c:62)
==15718==    by 0x1100E1: myFetch (fetch.c:150)
==15718==    by 0x110D1D: do_work (callback.c:513)
==15718==    by 0x10D4DF: main (pmlogger.c:1028)
==15718== 

Looking at top while this test is running, even without valgrind, shows the various pmdas just utterly saturating the host (especially pmdaproc) and other errors in pmcd.log re. dead pmdas. It confirms suspicion of error handling paths in pmlogger.

Contributor

fche commented Sep 27, 2016

As for the crash while responding to the SIGINT,

gdb -args pmlogger -c tmp.conf /tmp/foo
(gdb) break __fortify_fail
(gdb) run
^C
(gdb) signal SIGINT
Breakpoint 1, __GI___fortify_fail (msg=msg@entry=0x2aaaab2fdc58 "buffer overflow detected") at fortify_fail.c:27
27  {
(gdb) bt
#0  __GI___fortify_fail (msg=msg@entry=0x2aaaab2fdc58 "buffer overflow detected") at fortify_fail.c:27
#1  0x00002aaaab285d80 in __GI___chk_fail () at chk_fail.c:28
#2  0x00002aaaab287bca in __fdelt_chk (d=d@entry=-1) at fdelt_chk.c:25
#3  0x00002aaaaad2ffed in __pmSocketReady (fd=fd@entry=-1, timeout=timeout@entry=0x7fffffffa7a0) at secureconnect.c:1630
#4  0x00002aaaaaceeee2 in pduread (fd=fd@entry=-1, buf=buf@entry=0x555557cf5050 "", len=len@entry=12, part=part@entry=-1, 
    timeout=timeout@entry=-1) at pdu.c:170
#5  0x00002aaaaacef8ff in __pmGetPDU (fd=-1, mode=mode@entry=0, timeout=timeout@entry=-1, 
    result=result@entry=0x7fffffffa988) at pdu.c:395
#6  0x000055555555bf3b in myFetch (numpmid=, pmidlist=, pdup=pdup@entry=0x7fffffffaab8)
    at fetch.c:121
(gdb) frame 6
(gdb) p * ctxp->c_pmcd
$6 = {pc_lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, 
      __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' , __align = 0}, pc_refcnt = 1, pc_fd = -1, 
  pc_hosts = 0x555555771260, pc_nhosts = 1, pc_timeout = 0, pc_tout_sec = 10, pc_again = 0}

Note the pc_refcnt=1 and fd=-1. That seems like an invariant breakage.

Contributor

goodwinos commented Sep 28, 2016

since we're going to exit on SIGINT anyway, this patch avoids the buffer overflow (applies a very blunt hammer to the problem)

diff --git a/src/pmlogger/src/ports.c b/src/pmlogger/src/ports.c
index 1417674..6e2d6d6 100644
--- a/src/pmlogger/src/ports.c
+++ b/src/pmlogger/src/ports.c
@@ -143,6 +143,7 @@ sigterm_handler(int sig)
{
/* exit as soon as possible, handler is deferred for log cleanup */
exit_code = sig;

  • exit(exit_code);
    }
Contributor

kmcdonell commented Sep 28, 2016

The blunt hammer avoids the log cleanup, meaning we may end up with the tail of the archive being damaged ... I think we should be aiming for a more surgical solution.

Contributor

fche commented Sep 28, 2016

FWIW the memory leak exists independent of the SIGINT. A pmlogger -t 10s ... here shuts itself down cleanly enough, but still leaks a lot of memory.

natoscott pushed a commit that referenced this issue Oct 2, 2016

src/pmlogger: fix SIGINT issue
Issue raised as #116.

This commit (and the earlier pduread() change in libpcp) fixes the
SIGINT part of the issue ... the root trigger is fetching lots of
metrics with a very short (10msec) update interval, which means
there is close to a 100% guarantee that the SIGINT will interrupt
a socket read() and the resulting error was not being propaged back
up the call stack correctly.

The mem leak part of the issue will be addressed separately.
Contributor

kmcdonell commented Oct 18, 2017

OK, it has taken a while, but getting back to this.

  1. I believe the SIGINT issue is long gone ... I've not seen this in QA for months, and a new qa/1159 that reproduces @mkevac 's recipe is passing without a problem.
  2. The memory leak issue fallls into 3 parts: 2 small ones in pmlogger and libpcp, and one big one in pmlogger, but only if the configuration file names (explicitly or implicitly via PMNS traversal) one or more derived metrics. The new qa/1109 demonstrates this.

The code fixes that resolve all the memory leaks are in commits 535cacc, 8bcbe54 and 999e407 that are currently sitting in my nextrelease branch and will flow up stream in a matter of days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment