Skip to content

sample_rate should only be supported for counters. #36

Closed
alexdean opened this Issue Feb 8, 2013 · 6 comments

2 participants

@alexdean
alexdean commented Feb 8, 2013

I think the sample_rate argument should be removed from gauge, set, and time/timing. Allowing users to set a sample rate which statsd will simple ignore invites misunderstandings & confusion about what's actually being recorded & flushed to graphite.

https://github.com/etsy/statsd/blob/53aae0ad2cd33879d79a786e067a5e3fd2548e1d/stats.js#L167 Note that the 3rd field (the sample rate) is only acted on if the type is not ms, g, or s.

I will submit a PR if this is something you'll accept.

@alexdean
alexdean commented Feb 8, 2013

OK, i see that this sample_rate is used by the library to determine if a metric should be sent to statsd. https://github.com/reinh/statsd/blob/master/lib/statsd.rb#L264

Maybe this could be a documentation improvement? It has caused me a fair bit of confusion.

  1. sample_rate will control how often the Ruby library will report the metric to statsd.
  2. for counters only, statsd will adjust the value flushed to graphite based on sample_rate. For all other types, statsd ignores the sample rate.
@reinh
Owner
reinh commented Feb 16, 2013

@alexdean We can add documentation if you think it would clear things up but the canonical source of this documentation is StatsD itself. ruby-statsd is just a dumb client.

@alexdean

Sorry I'm slow to reply. I think my confusion stemmed partially from comments like this:

The statsd server then uses the sample_rate to correctly track the average timing for the stat.

https://github.com/reinh/statsd/blob/master/lib/statsd.rb#L212

The statsd server does not do this.

For most of its history, statsd did nothing at all with the sample rate for any type except counters. Since etsy/statsd@cc12534, statsd now adjusts the count metric for a timing based on the sample rate, but no adjustments are attempted for the other timing-based metrics like sum, mean, median, etc.

It makes sense to me that the client supports a sampling rate, but it should be clearer that this is a client-side-only operation in the case of gauges and sets. The server does not do any interpolation/adjustment of the reported values based on sample rate, except in the case of counters and timings. Does that seem reasonable?

Other references:

  • etsy/statsd#138 "Make it clearer that sampling is only for counters."
  • etsy/statsd#139 "Not obvious that sampling for gauges and timers is client-side only"
  • etsy/statsd#168 "samplerate support all metrics, not just counters"

This may be more of an issue with statsd itself than with ruby-statsd, as the sampling rules seem non-obvious and easy to misunderstand. Interested what you think about that. If it's just me being thick-headed, oh well. Thanks for reading. :)

@reinh
Owner
reinh commented Mar 28, 2013

@alexdean Hey, thanks for looking into it! I agree that the current documentation needs to be fixed and could be improved as well. Would you be willing to take a stab at it?

@alexdean

@reinh Absolutely. I will give it a try and submit a PR when I've got something.

@alexdean alexdean closed this Mar 28, 2013
@reinh
Owner
reinh commented Mar 28, 2013

@alexdean ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.