OK, this is based against a fresh new copy of etsy's statsd master branch. Reporting code for graphite now in the separated-out graphite backend.
Adding two new types of collected data: Raw and Averaged (|r and |a r…
Updating readme for Raw and Average message types
Adding even BETTER error handling that catches ECONREFUSED when carbo…
…n it totally offline
Merge remote-tracking branch 'rawpatch/master' into justraw
Add the requirement for the sys module
Fix sys -> util
Fix minor issues I thought I'd already fixed.
Merge commit '9c5bada34cee79' into raw_and_averages
Move raw type into the graphite backend.
Correct variables that were out of scope in backends/graphite.js
This pull request passes (merged a45ab89 into 4000ffc).
Fixed re-setting stats.
Added commands to view+delete raws and averages (untested)
Remove the re-set of the averages in the graphite backend. Handled in
stats.js in the prior comit, don't want to kill stats if another
backend will receive it or something else is supposed to be done with
This pull request passes (merged 0274bb1 into 4000ffc).
OK, so I've submitted a bunch of updates to this based on re-discovering where the code has moved around to. I don't know enough about the tests to understand how to test for the changes I've made in the way the test runner would find useful and would really like to be able to help in that respect, too.
Fix raw date. Was ignoring the field and now interprets correctly.
This pull request passes (merged 259578b into 4000ffc).
Ping on what would be required to get this merged?
Just me looking over it a bit closer at the moment :).
i think this raw type becomes moot once gauges are implemented better. see #95
Are you intending that the gauge type, like the raw type, will be able to keep a queue of stats and submit those? I'm not clear from the linked request that this is an exact overlap.
Oops, you're right. Upon further inspection, I think I've spotted 2 differences between your raw type and the gauge type as defined in #95 (or in fact, the default gauge type). Let me know if this is correct:
I look at this as three independent features:
Regarding the optional timestamp (and tagging the message if it doesn't have one), I think that was based on the pull#13 originally, and I'd be fine with removing it. I agree that it's unlikely to be a common case.
I would remove the "add timestamp in statsd when metric is received", but allowing clients to optionally pass their own timestamp seems like a good idea, but this would also make sense for gauge, not just for raw. (and it would not for timing)
I don't know how to collaborate on pull requests (I'd probably need push permissions for pcn's repo, I think) so I copied this all over, and rebased it on the latest etsy/master for a pull request #105.
To contribute to the continuing thread...
The timestamp pass is a good idea, and I kept it for #105. I did change the format a bit (see the readme) to be more consistent, but the concept stays the same.
The name "raw data" is completely ok. Conceptually, you are sending raw data to statsd, and you want it passed directly to the backend. Other metrics do some kind of transformation before leaving statsd or are kept in some kind of unique manner (Gauges) so "raw" is still a good name for it.
Adding the timestamp to Gauges makes no sense. Gauges keep their last reported value and always report the latest value to the backend (regardless of how "stale" it might be). What, conceptually, would passing in a gauge with a metric imply? You're essentially doing the raw request at that point -- not what that particular "bucket type" is for. Leave it be and separate I say. (Personally, I don't see the utility of Gauges, but they do offer unique functionality different in important ways from both counters and raw data.)
@jeffminard-ck gauges are useful in a similar way to how raw is useful (tracking values at certain points in time), but are more appropriate if you get many metrics per flush interval and wish to avoid the somewhat expensive (and sometimes completely unnecessary) average calculation graphite does on all metrics which arrive for the same precision-interval, or the network traffic caused by statsd sending every metric, or both of these.
in such case, the gauge behavior is very useful because it only takes one metric per flushinterval, and because it uses the last one received, the timestamp will be close to the timestamp of the actual flush. thinking about it more, you're right that higher accuracy timestamps are not needed for a case like this, graphite will average out per precision-interval anyway (usually 60 seconds). Also consider this for raw, btw !
another way to put it: raw for low-volume metrics, and gauge for high-volume.
No, I get that part (as I detailed it in the updated README.md file), I just don't see the utility of the metric. It seems weird, mostly, that you'd keep the value around after it has been flushed and keep reporting it. But there is already another bug request / pull to clear gauges after each flush, so I guess I'm not the only one :)
ah yes, i'm also in favor of #95, but that's a different issue.
Closing in favor of #105, though that would need to be updated inorder to be merged.
Thanks for starting the discuss and the initial code!