New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing data in received metrics from multiple JMX instances. #329
Comments
Could you tell us which version of JmxTrans you are using ? There are known issues in the way TCP connections are managed in older versions (I could check exactly which), but they should be corrected for recent version. |
It is latest build - 251. |
Any chance you could send a threaddump taken when you see the issue? Do you see memory or cpu starvation on the jmxtrans machine? Do you see errors on the graphite side? |
On 4 CPU VM, load average: 1.98, 1.89, 1.82
There are no errors on Graphite side except from "invalid line received from client 127.0.0.1:49933, ignoring" which are malformed metrics sent to Graphite derived from objects with non-number attributes. |
Any chance to get a thread dump?
|
Seems that you are running with logs at debug level and that this causes If you want to keep logs in debug, you could try to configure an async
|
Thread dumps sent before, were taken with log level INFO. I've sent you new thread dump, taken with log level WARN. |
Seems that logs are still at debug level. Logging configuration is utterly broken for quite some time in JmxTrans ... :-( Do you still see debug level messages in the logs ? PR #288 should fix the logging configuration issue, but it needs to be rebased before it can be merged. I did not have time to do it yet... If you have some time and energy, could you try to rebase it and see if it helps ? I'd be happy to review and merge it, but not sure I have the time right now to do it myself... |
With log levels INFO or WARN nothing is actually logged to file. With log level NONE, there is still large number of threads related to logging but nothing is actually logged. If you think #288 can fix the issue, I will consider rebasing it. |
That's strange ... Your thread dumps show that most threads are blocked in logback OutputStreamAppender, and this seems to come from debug level messages. #288 will probably not fix the issue, but it might help in diagnosing the problem. Are you installing from RPM, DEB or just running the jar directly ? |
I am installing release 251 from rpm. |
Version 252 has been released, with minor improvements to logging. Any chance you could try again? Or has your issue been magically solved? |
Sorry for not commenting on this for so long. I returned to using Jmxtrans after some period of trying other tools. Occasional holes in metrics from random endpoints can still be observed but it is unclear if it has same root cause. They correlate with other JMX host(s) being unavailable for connection for longer periods of time. I am closing this issue and will |
@spynode thanks for the house cleaning! |
I am trying to gather metrics from 8 applications, each of which holds ~ 2000-4000 mbeans with varying amount of attributes for each, with approximate total of 200K metrics. I send them to Graphite host. When configuring Jmxtrans to gather metrics just from 2-4 hosts, information is complete without any data missing, but after enabling all 8 JMX hosts in jmxtrans configuration, there are holes in every metric received on Graphite. Initially I tried to configure jmxtrans to use wildcard mbean definition : . Later I generated json of all mbeans but with same results. I don't use attribute definitions, letting jmxtrans to gather them all.
Here is part of configuration:
Is there any reason why jmxtrans would skip some data on bigger load or larger incoming data sets? In jmxtrans logs when looking for particular metric name, I see that it is not queried every minute, but it is mentioned in logs with same irregular intervals as the metrics show up in Graphite. I cannot identify anything in Jmxtrans logs which would suggest a problem, even with DEBUG log level.
The text was updated successfully, but these errors were encountered: