-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statsd intermittently out puts two data points for the same metric on the same timestamp, the second of which is always zero. #551
Comments
Hi, this seems like a potentially bad bug with statsd. What kind of load (UDP packets per second) does this happen under, and is there someway for us to recreate this situation for debugging? Also what does your environment look like? |
Hi, answers to questions As far as our environment is concerned. We have 5 physical statsd servers, all running matching versions of OS, statsd, node, carbon & hardware. Each statsd instance deliveres it's data points to a carbon relay agent running locally on the same server. In turn, the carbon relay distributes the metrics according to a consistent hashing algorithm to one of the same 5 servers all running carbon writers. The metrics are then deposited in whisper file format via the carbon writer. In front of the cluster sits an f5 load balancer to distribute statsd traffic to each node. The load balancer is configured to use a persistence profile, ensuring statsd traffic is delivered consistently across servers according to data point source address.
Total input rate is approx. 7000 data points per second spread across thousands of metrics using 'c' and 'ms' data types. Measured across in a 5 minute window see working below,
I've written a test script to deliver data points to statsd and managed to reproduce the error. Test script
Debuging results
in particular, the duplicate data points
and
|
@stevencherry Are you still having these problems? I tried to reproduce this bug this week, but failed. I'm wondering if this is a bug stemming from a particular configuration. I tested on Cent7, node 0.10.16. Next chance I get I'll look to test this under the same node runtime. |
Just noticed this issue in our environment as well. Let me know what information you need to help track this down. |
I do believe this is related to this code: and this function:
This used to be done using setInterval, and now changed to setTimeout (I don't understand why) It seems to me that setTimeout is not that exact, and even thou you're correcting its offset, that might be causing this problem. I tested this with 0.7.2 (uses setinterval) and it worked like a charm. |
This results in what looks like metric drop outs in our Graphite graphs. See my debugging below
taken from above, the duplicate data points
The zero data point always follows the first valid data point, as a result the valid point is replaced by zero, hence the apparent drop outs.
The text was updated successfully, but these errors were encountered: