Raw and averages - a more recent merge of pull request #13 #84

wants to merge 12 commits into


None yet

7 participants

pcn commented May 9, 2012

OK, this is based against a fresh new copy of etsy's statsd master branch. Reporting code for graphite now in the separated-out graphite backend.

This pull request passes (merged a45ab89 into 4000ffc).

Peter C. Norton added some commits May 9, 2012
Peter C. Norton Fixed re-setting stats.
Added commands to view+delete raws and averages (untested)
Peter C. Norton Remove the re-set of the averages in the graphite backend. Handled in
stats.js in the prior comit, don't want to kill stats if another
backend will receive it or something else is supposed to be done with

This pull request passes (merged 0274bb1 into 4000ffc).

pcn commented May 9, 2012

OK, so I've submitted a bunch of updates to this based on re-discovering where the code has moved around to. I don't know enough about the tests to understand how to test for the changes I've made in the way the test runner would find useful and would really like to be able to help in that respect, too.

This pull request passes (merged 259578b into 4000ffc).

pcn commented May 10, 2012

Ping on what would be required to get this merged?

mrtazz commented May 10, 2012

Just me looking over it a bit closer at the moment :).

Dieterbe commented Jun 9, 2012

i think this raw type becomes moot once gauges are implemented better. see #95

pcn commented Jun 9, 2012

Are you intending that the gauge type, like the raw type, will be able to keep a queue of stats and submit those? I'm not clear from the linked request that this is an exact overlap.

Dieterbe commented Jun 9, 2012

Oops, you're right. Upon further inspection, I think I've spotted 2 differences between your raw type and the gauge type as defined in #95 (or in fact, the default gauge type). Let me know if this is correct:

  • upon receiving a metric of type raw, it's always added as new entry to the array. user can submit timestamp along with the metric (btw this is not documented in your README), otherwise timestamp is added when metric is received.
  • upon receiving a metric of type gauge, the last value is overwritten, and the timestamp is always the one when the flush event triggers (as is the case with all other types)

I look at this as three independent features:

  • the "pass on the value without overwriting/averaging/..." is a behavior we AFAIK don't have yet in statsd, and having a new type for this is probably a good thing.
  • the optional explicit passing of a timestamp seems useful for optional higher accuracy, and more types can benefit from this feature. (at this point, those other types is just the gauge).
  • using the timestamp of when receiving the metric instead of on the flush event. I doubt this will ever be used. If you want high accuracy, submit the timestamp along with the metric, if you don't, the timestamp of the flush event seems like a good enough approximation. If not, I would rather consider flushing more often. Doing this every time a metric is reiceved probably comes with a performance impact and if a user would want this, why would (s)he only want it for raw metrics?
pcn commented Jun 14, 2012

Regarding the optional timestamp (and tagging the message if it doesn't have one), I think that was based on the pull#13 originally, and I'd be fine with removing it. I agree that it's unlikely to be a common case.


I would remove the "add timestamp in statsd when metric is received", but allowing clients to optionally pass their own timestamp seems like a good idea, but this would also make sense for gauge, not just for raw. (and it would not for timing)

I don't know how to collaborate on pull requests (I'd probably need push permissions for pcn's repo, I think) so I copied this all over, and rebased it on the latest etsy/master for a pull request #105.

To contribute to the continuing thread...

The timestamp pass is a good idea, and I kept it for #105. I did change the format a bit (see the readme) to be more consistent, but the concept stays the same.

The name "raw data" is completely ok. Conceptually, you are sending raw data to statsd, and you want it passed directly to the backend. Other metrics do some kind of transformation before leaving statsd or are kept in some kind of unique manner (Gauges) so "raw" is still a good name for it.

Adding the timestamp to Gauges makes no sense. Gauges keep their last reported value and always report the latest value to the backend (regardless of how "stale" it might be). What, conceptually, would passing in a gauge with a metric imply? You're essentially doing the raw request at that point -- not what that particular "bucket type" is for. Leave it be and separate I say. (Personally, I don't see the utility of Gauges, but they do offer unique functionality different in important ways from both counters and raw data.)


@jeffminard-ck gauges are useful in a similar way to how raw is useful (tracking values at certain points in time), but are more appropriate if you get many metrics per flush interval and wish to avoid the somewhat expensive (and sometimes completely unnecessary) average calculation graphite does on all metrics which arrive for the same precision-interval, or the network traffic caused by statsd sending every metric, or both of these.
in such case, the gauge behavior is very useful because it only takes one metric per flushinterval, and because it uses the last one received, the timestamp will be close to the timestamp of the actual flush. thinking about it more, you're right that higher accuracy timestamps are not needed for a case like this, graphite will average out per precision-interval anyway (usually 60 seconds). Also consider this for raw, btw !

another way to put it: raw for low-volume metrics, and gauge for high-volume.

No, I get that part (as I detailed it in the updated README.md file), I just don't see the utility of the metric. It seems weird, mostly, that you'd keep the value around after it has been flushed and keep reporting it. But there is already another bug request / pull to clear gauges after each flush, so I guess I'm not the only one :)


ah yes, i'm also in favor of #95, but that's a different issue.


Closing in favor of #105, though that would need to be updated inorder to be merged.

Thanks for starting the discuss and the initial code!

@draco2003 draco2003 closed this Mar 7, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment