Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Add counter and sum of squares to timer stats #376

merged 3 commits into from Apr 15, 2014


None yet
4 participants

mheffner commented Dec 11, 2013

This adds the count of members in each percentile and the sum of squares to the precalculated timer statistics. These are both used to do rapid calculation of standard deviation/variance over an arbitrary number of consecutive intervals: http://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods.

This is a superset of the patch in #342

These pre-calculated values can be leveraged by the Librato backend to reduce duplicated timer processing.

mheffner added some commits Sep 12, 2013

Add "sum of squares" to the pre-calculated timer metrics.
Sum of squares (along with count and sum) can be used to do rapid
calculation of standard deviation/variance over an arbitrary number
of consecutive intervals [1].

[1]: http://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods

mheffner commented Dec 11, 2013

/cc @mrtazz


mheffner commented Jan 25, 2014

Any update on this? Can this be merged? /cc @mrtazz @draco2003

I've removed ETSY from our consideration list due to the failure to get this pull in a timely manner. Sum of Squares calculation is extremely memory intensive; except as implemented here. Its utility in parametric and non-parametric work is important. For your edification, the Chief Scientist of Twitter implemented the same at AOL and it was widely used; especially as its memory footprint and compute cost was minimal. My guess is that the Etsy team doesn't understand this issue.


jgoulah commented Mar 6, 2014

Its not that we don't understand this issue, and apologies for not meeting your self imposed timelines on free open source software that you don't pay for. It is more that we have a lot going on here on our end, and adding additional code can slow statsd down so has to be carefully evaluated. Its great that this is a lean implementation but since we use this same code in house and are currently fighting scale issues, we've had to defer. Be assured we will get to this.


mrtazz commented Apr 14, 2014

hey @mheffner sorry it took so long. Have you done any benchmarking how much it increases stats calculation time for a lot of timers for this? Timers already have the biggest impact on CPU and I wonder if this warrants a feature flag to be turned off by default. Any thoughts on this?


mheffner commented Apr 14, 2014

@mrtazz No, unfortunately I do not have performance numbers to compare with and without this change. In theory it's adding one additional step per loop of the timer calculation.

In the case that a backend (eg. Librato) requires either of these values, the net change is a huge performance improvement as it removes the need to reprocess all timers in the backend. I feel that there are two solutions, either a) make the general processing here calculate all appropriate statistical functions of the timers, b) don't do any processing here and make it part of the backend's job to process the metrics as they require. Obviously option b) sucks if you have multiple backends.


mrtazz commented Apr 15, 2014

Oh you're right, I misread the diff. I think that's actually ok then. Shouldn't make much of an impact. Thanks for the contribution!

mrtazz added a commit that referenced this pull request Apr 15, 2014

Merge pull request #376 from mheffner/feature/more_counter_stats
Add counter and sum of squares to timer stats

@mrtazz mrtazz merged commit 61c9607 into etsy:master Apr 15, 2014

1 check passed

default The Travis CI build passed

@mheffner mheffner referenced this pull request in librato/statsd-librato-backend Sep 3, 2014


Switch to cumulative sum calculation #3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment