Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

pminfo -t aborts pmcd #30

Closed
aabc opened this Issue Jun 30, 2015 · 5 comments

Comments

Projects
None yet
4 participants

aabc commented Jun 30, 2015

Excuse me, but when first and simplest command from manual crashes, it's a bit weird for 20 year old software.

apt-get install pcp on Ubuntu 14.04.2 LTS, pcp package version is 3.8.12ubuntu1

root@n:~# ps axu|grep pmcd
pcp      15018  0.0  0.5  25580 12148 ?        Ssl  10:39   0:00 /usr/lib/pcp/bin/pmcd
root@n:~# pminfo -t
[... lots of output skipped ...]
root@n:~# pminfo -t
pminfo: Cannot connect to PMCD on host "local:": Connection refused
root@n:~# ps axu|grep pmcd
[nothing]
root@n:~# dmesg
[1897410.769797] potentially unexpected fatal signal 6.
[1897410.769803] code at b77afd50: 5d 5a 59 c3 ec f9 ff ff 14 00 00 00 71 00 03 03 
[1897410.769826] CPU: 1 PID: 15018 Comm: pmcd Not tainted 3.16.0-38-generic #52~14.04.1-Ubuntu
[1897410.769830] Hardware name: System manufacturer System Product Name/P5B-VM, BIOS 0405    09/21/2006
[1897410.769834] task: ced65070 ti: cea5c000 task.ti: cea5c000
[1897410.769838] EIP: 0073:[<b77afd50>] EFLAGS: 00200206 CPU: 1
[1897410.769868] EIP is at 0xb77afd50
[1897410.769871] EAX: 00000000 EBX: 00003aaa ECX: 00003aaa EDX: 00000006
[1897410.769874] ESI: b798ac4c EDI: b7725000 EBP: 00000044 ESP: bf818da4
[1897410.769877]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[1897410.769879] 

When I do service pcp restart pminfo -t works again, once more. Also, pminfo w/o -t doesn't abort pmcd.

Contributor

kmcdonell commented Jun 30, 2015

Some of the code is not 20 years old ... but you're right, this should not happen, even for PCP 3.8.12 which is 22 releases in the past and was released 18 months ago. It is certainly not reproducible on my local Ubuntu system with the latest PCP version.

There is already a test in the QA suite for 'pminfo -t' followed by 'pminfo -T' and I cannot recall this test ever failing in the way you describe (and it is run many times on dozens of machines, including Ubuntu, during any release cycle ... and would have been run prior to the 3.8.12 PCP release). So this is probably not a generic issue, but something specific to your setup.

Are you able to upgrade to a newer version from the PCP project web site or bintray.com?

If not, or if the problem persists, the following additional information would help us diagnose the root cause:

  1. contents of /var/log/pcp/pmcd/pmcd.log when pmcd is first observed to be not working
  2. output from pcp(1) command when pmcd is working
  3. an indication if pminfo with other command line options ... try none or -d or -v ... produce the same failure scenario.

aabc commented Jul 1, 2015

I didn't notice at first that pminfo -t output changes to IPC protocol failure from ipc.shm.max_segproc.

ipc.msg.num_smsghdr [number of system message headers (from msgctl(..,IPC_INFO,..))]
ipc.msg.max_seg [maximum number of message segments (from msgctl(..,IPC_INFO,..))]
ipc.shm.max_segsz [maximum shared segment size in bytes (from shmctl(..,IPC_INFO,..))]
ipc.shm.min_segsz [minimum shared segment size in bytes (from shmctl(..,IPC_INFO,..))]
ipc.shm.max_seg [maximum number of shared segments in system (from shmctl(..,IPC_INFO,..))]
ipc.shm.max_segproc: pmLookupName: IPC protocol failure
ipc.shm.max_shmsys: pmLookupName: IPC protocol failure
vfs.files.count: pmLookupName: IPC protocol failure
vfs.files.free: pmLookupName: IPC protocol failure
vfs.files.max: pmLookupName: IPC protocol failure
vfs.inodes.count: pmLookupName: IPC protocol failure

pminfo -T behaves similar to pminfo -t

ipc.shm.min_segsz
Help:
minimum shared segment size in bytes (from shmctl(..,IPC_INFO,..))

ipc.shm.max_seg
Help:
maximum number of shared segments in system (from shmctl(..,IPC_INFO,..))
ipc.shm.max_segproc: pmLookupName: IPC protocol failure
ipc.shm.max_shmsys: pmLookupName: IPC protocol failure
vfs.files.count: pmLookupName: IPC protocol failure
vfs.files.free: pmLookupName: IPC protocol failure

But, pminfo -T ipc does not abort pmcd.

Unfortunately, there is no debuginfo/dbg package for pcp for Ubuntu. gdb stack trace output when attached to pmcd process and run pminfo -t:

Program received signal SIGSEGV, Segmentation fault.
0xb728d43c in pmdaTreePMID () from /usr/lib/libpcp_pmda.so.3
(gdb) bt
#0  0xb728d43c in pmdaTreePMID () from /usr/lib/libpcp_pmda.so.3
#1  0xb725da14 in ?? () from /var/lib/pcp/pmdas/mmv/pmda_mmv.so
#2  0xb77b2393 in DoPMNSNames ()
#3  0xb77a969c in HandleClientInput ()
#4  0xb77a99de in ?? ()
#5  0xb77a8348 in main ()

(gdb) x/11i 0xb728d43c - 11
   0xb728d431 <pmdaTreePMID+17>:    inc    %esp
   0xb728d432 <pmdaTreePMID+18>:    and    $0x20,%al
   0xb728d434 <pmdaTreePMID+20>:    mov    0x24(%esp),%edx
   0xb728d438 <pmdaTreePMID+24>:    mov    0x28(%esp),%esi
=> 0xb728d43c <pmdaTreePMID+28>:    mov    (%eax),%eax
   0xb728d43e <pmdaTreePMID+30>:    mov    0x8(%eax),%eax
   0xb728d441 <pmdaTreePMID+33>:    call   0xb728d060
   0xb728d446 <pmdaTreePMID+38>:    test   %eax,%eax
   0xb728d448 <pmdaTreePMID+40>:    je     0xb728d468 <pmdaTreePMID+72>
   0xb728d44a <pmdaTreePMID+42>:    mov    0x14(%eax),%eax
   0xb728d44d <pmdaTreePMID+45>:    cmp    $0xffffffff,%eax
(gdb) i reg
eax            0x0  0
ecx            0xb72ad700   -1221929216
edx            0xb8c83bb0   -1194837072
ebx            0xb7260000   -1222246400
esp            0xbfbb5280   0xbfbb5280
ebp            0x44 0x44
esi            0xb8c83c1c   -1194836964
edi            0xb8c83c1c   -1194836964
eip            0xb728d43c   0xb728d43c <pmdaTreePMID+28>
eflags         0x210246 [ PF ZF IF RF ID ]
cs             0x73 115
ss             0x7b 123
ds             0x7b 123
es             0x7b 123
fs             0x0  0
gs             0x33 51

Looks like NULL pointer read.

Content of /var/log/pcp/pmcd/pmcd.log

Log for pmcd on n started Wed Jul  1 10:37:08 2015

[Wed Jul  1 10:37:08] pmcd(18026) Error: Permission clash for unix: with earlier statement for unix:

active agent dom   pid  in out ver protocol parameters
============ === ===== === === === ======== ==========
pmcd           2                 2 dso i:5  lib=/var/lib/pcp/pmdas/pmcd/pmda_pmcd.so entry=pmcd_init [0xb776b2e0]
linux         60                 2 dso i:4  lib=/var/lib/pcp/pmdas/linux/pmda_linux.so entry=linux_init [0xb726c780]
proc           3 18036  10  11   2 bin pipe cmd=/var/lib/pcp/pmdas/proc/pmdaproc -d 3
mmv           70                 2 dso i:4  lib=/var/lib/pcp/pmdas/mmv/pmda_mmv.so entry=mmv_init [0xb725dea0]
xfs           11 18042  12  13   2 bin pipe cmd=/var/lib/pcp/pmdas/xfs/pmdaxfs -d 11
jbd2         122                 2 dso i:4  lib=/var/lib/pcp/pmdas/jbd2/pmda_jbd2.so entry=jbd2_init [0xb7258860]

Host access list:
00 01 Cur/MaxCons host-spec                               host-mask                               lvl host-name
== == =========== ======================================= ======================================= === ==============
 y  y     0     0 127.0.1.1                               255.255.255.255                           0 localhost
 y  y     0     0 /                                       /                                         1 unix:
 n  n     0     0 0.0.0.0                                 0.0.0.0                                   4 .*
 n  n     0     0 ::                                      ::                                        8 :*
User access list empty: user-based access control turned off
Group access list empty: group-based access control turned off


pmcd: PID = 18026, PDU version = 2
pmcd request port(s):
  sts fd   port  family address
  === ==== ===== ====== =======
  ok  1026       unix   /var/run/pcp/pmcd.socket
  ok  1024 44321 inet   INADDR_ANY
  ok  1025 44321 ipv6   INADDR_ANY
[Wed Jul  1 10:43:28] pmcd(18026) Error: Unexpected signal 11 ...

Dumping to core ...

pcp output:

# pcp
Performance Co-Pilot configuration on rin:

 platform: Linux n 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8 09:44:48 UTC 2015 i686
 hardware: 2 cpus, 2 disks, 1995MB RAM
 timezone: MSK-3
     pmcd: Version 3.8.12-1, 6 agents, 1 client
     pmda: pmcd proc xfs linux mmv jbd2
 pmlogger: primary logger: n/20150701.10.46
     pmie: n: /var/log/pcp/pmie/n/pmie.log

I will try newer version of pcp later.

aabc commented Jul 1, 2015

I tried packages from ftp://ftp.pcp.io/projects/pcp/download/deb/i386/ (all installed well except perl related dpkg: dependency problems prevent configuration ... depends on perl (>= 5.20.1-5); however: Version of perl on system is 5.18.2-2ubuntu1, probably because of difference between Debian and Ubuntu).

After these packages installed pminfo -t does not crash pmcd. That's good. But sad, that Ubuntu users get faulty package by default.

Contributor

natoscott commented Aug 6, 2015

As per earlier comment, current PCP does not exhibit the problem.

@natoscott natoscott closed this Aug 6, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment