Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
PCP webapi not returning data/metrics ? #14
Comments
|
If the phenomenon is the same there as it is here, this is probably a bug in libpcp. What happens is that if pmwebd and pmcd both run, and pmwebd issues a successful context for the running pmcd, it remembers this too long. If you now stop the pmcd process, the previous pmwebd (via libpcp's pmNewContext) will still issue new pcp context#s, perhaps because it decides to reuse prior not-quite-dead tcp connections. Those new context#s are useless however; pmFetch() might return errors (or not?). |
|
I'm not sure if Frank's explanation is 100% correct ... if pmwebd calls pmDestroyContext() when it sees a fatal error, then I can't immediately see how the stale pmcd connection info can leak out to new contexts. To get some more information, do the following: edit /etc/pcp/pmwebd/pmwebd.options and add this line at the end OPTIONS="$OPTIONS -Dcontext,pdu" then restart pmwebd (sudo /etc/init.d/pmweb restart, or equivalent) Try your web queries again. Then post the contents of /var/log/pcp/pmweb/pmwebd.log ... this will contain the detailed diagnostics for pmwebd communicating with pmcd.
|
hc000
commented
Apr 18, 2015
|
I actually talked to him on irc, and he suggested to change PMCD_REQUEST_TIMEOUT from 1 to 10, which did start showing the metrics. unfortunately I am not able to grab the log right now, but i can provide that to you on monday if you still want them. |
|
@kmcdonell The issue is fully reproducible here with 3.10.4 code, as follows:
context,pdu logs at http://web.elastic.org/~fche/issue14.txt (Note pmwebd doesn't call pmDestroyContext() at all in this case - it's deferred a few minutes after the last use of a pmwebapi context; and there were no errors evident anyway.) |
Once I un-transpositioned 41 -> 14, all good. Thanks. So I think the problem here is that ...
|
natoscott
added
the
bug
label
Jun 3, 2015
|
This is now fixed in my tree ... commits will flow to the github tree in due course. From commit 3d4d2c0 ...
And qa/1090 reproduces Frank's failure recipe to verify (a) it used to fail, and (b) it now works as expected. |
kmcdonell
closed this
Dec 22, 2015
|
Thanks, Ken! "Since we cannot rely on any socket-level service to let us know the remote end of a socket has been closed" ... this part might not actually be true. According to stackexchange, a __pmRecv(fd, buf, 1, MSG_PEEK) should signal failure if the socket was closed by the other side, without having to send anything. |
|
You're right Frank. The MSG_PEEK appoach "might" work on some platforms (I saw this in my research also) ... but there is also lots of google-based intelligence to suggest that an application level ping is safer and I guarantee the ping approach will work across all platforms. |
hc000 commentedApr 17, 2015
I followed these instruction and compiled pcp from source.
$ git clone git://git.pcp.io/pcp
$ apt-get build-dep pcp
$ cd pcp
$ ./configure --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-webapi
$ make
$ groupadd -r pcp
$ useradd -c "Performance Co-Pilot" -g pcp -d /var/lib/pcp -M -r -s /usr/sbin/nologin pcp
$ make install
but I am getting this error when calling the API
PMWEBD error, code -12443: Insufficient elements in list
when i request http://host:44323/pmapi/context?hostspec=localhost&polltimeout=600
I get { "context": 367806010 }
when i request http://host:44323/pmapi/367806010/_metric?prefix=hinv
i get { "metrics":[]}
how can i make it populate data in the metrics array?
Thank you!