Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Add counter and sum of squares to timer stats #376

Merged
merged 3 commits into from

4 participants

@mheffner

This adds the count of members in each percentile and the sum of squares to the precalculated timer statistics. These are both used to do rapid calculation of standard deviation/variance over an arbitrary number of consecutive intervals: http://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods.

This is a superset of the patch in #342

These pre-calculated values can be leveraged by the Librato backend to reduce duplicated timer processing.

mheffner added some commits
@mheffner

Any update on this? Can this be merged? /cc @mrtazz @draco2003

@wAtNeustar

I've removed ETSY from our consideration list due to the failure to get this pull in a timely manner. Sum of Squares calculation is extremely memory intensive; except as implemented here. Its utility in parametric and non-parametric work is important. For your edification, the Chief Scientist of Twitter implemented the same at AOL and it was widely used; especially as its memory footprint and compute cost was minimal. My guess is that the Etsy team doesn't understand this issue.

@jgoulah
Owner

Its not that we don't understand this issue, and apologies for not meeting your self imposed timelines on free open source software that you don't pay for. It is more that we have a lot going on here on our end, and adding additional code can slow statsd down so has to be carefully evaluated. Its great that this is a lean implementation but since we use this same code in house and are currently fighting scale issues, we've had to defer. Be assured we will get to this.

@mrtazz
Owner

hey @mheffner sorry it took so long. Have you done any benchmarking how much it increases stats calculation time for a lot of timers for this? Timers already have the biggest impact on CPU and I wonder if this warrants a feature flag to be turned off by default. Any thoughts on this?

@mheffner

@mrtazz No, unfortunately I do not have performance numbers to compare with and without this change. In theory it's adding one additional step per loop of the timer calculation.

In the case that a backend (eg. Librato) requires either of these values, the net change is a huge performance improvement as it removes the need to reprocess all timers in the backend. I feel that there are two solutions, either a) make the general processing here calculate all appropriate statistical functions of the timers, b) don't do any processing here and make it part of the backend's job to process the metrics as they require. Obviously option b) sucks if you have multiple backends.

@mrtazz
Owner

Oh you're right, I misread the diff. I think that's actually ok then. Shouldn't make much of an impact. Thanks for the contribution!

@mrtazz mrtazz merged commit 61c9607 into etsy:master
@mheffner mheffner referenced this pull request in librato/statsd-librato-backend
Open

Switch to cumulative sum calculation #3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Dec 10, 2013
  1. @mheffner
  2. @mheffner
Commits on Dec 11, 2013
  1. @mheffner

    Add "sum of squares" to the pre-calculated timer metrics.

    mheffner authored
    Sum of squares (along with count and sum) can be used to do rapid
    calculation of standard deviation/variance over an arbitrary number
    of consecutive intervals [1].
    
    [1]: http://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods
This page is out of date. Refresh to see the latest.
Showing with 38 additions and 7 deletions.
  1. +14 −1 lib/process_metrics.js
  2. +24 −6 test/process_metrics_tests.js
View
15 lib/process_metrics.js
@@ -30,11 +30,15 @@ var process_metrics = function (metrics, flushInterval, ts, flushCallback) {
var max = values[count - 1];
var cumulativeValues = [min];
+ var cumulSumSquaresValues = [min * min];
for (var i = 1; i < count; i++) {
cumulativeValues.push(values[i] + cumulativeValues[i-1]);
+ cumulSumSquaresValues.push((values[i] * values[i]) +
+ cumulSumSquaresValues[i - 1]);
}
var sum = min;
+ var sumSquares = min * min;
var mean = min;
var thresholdBoundary = max;
@@ -42,8 +46,10 @@ var process_metrics = function (metrics, flushInterval, ts, flushCallback) {
for (key2 in pctThreshold) {
var pct = pctThreshold[key2];
+ var numInThreshold = count;
+
if (count > 1) {
- var numInThreshold = Math.round(Math.abs(pct) / 100 * count);
+ numInThreshold = Math.round(Math.abs(pct) / 100 * count);
if (numInThreshold === 0) {
continue;
}
@@ -51,22 +57,28 @@ var process_metrics = function (metrics, flushInterval, ts, flushCallback) {
if (pct > 0) {
thresholdBoundary = values[numInThreshold - 1];
sum = cumulativeValues[numInThreshold - 1];
+ sumSquares = cumulSumSquaresValues[numInThreshold - 1];
} else {
thresholdBoundary = values[count - numInThreshold];
sum = cumulativeValues[count - 1] - cumulativeValues[count - numInThreshold - 1];
+ sumSquares = cumulSumSquaresValues[count - 1] -
+ cumulSumSquaresValues[count - numInThreshold - 1];
}
mean = sum / numInThreshold;
}
var clean_pct = '' + pct;
clean_pct = clean_pct.replace('.', '_').replace('-', 'top');
+ current_timer_data["count_" + clean_pct] = numInThreshold;
current_timer_data["mean_" + clean_pct] = mean;
current_timer_data[(pct > 0 ? "upper_" : "lower_") + clean_pct] = thresholdBoundary;
current_timer_data["sum_" + clean_pct] = sum;
+ current_timer_data["sum_squares_" + clean_pct] = sumSquares;
}
sum = cumulativeValues[count-1];
+ sumSquares = cumulSumSquaresValues[count-1];
mean = sum / count;
var sumOfDiffs = 0;
@@ -84,6 +96,7 @@ var process_metrics = function (metrics, flushInterval, ts, flushCallback) {
current_timer_data["count"] = timer_counters[key];
current_timer_data["count_ps"] = timer_counters[key] / (flushInterval / 1000);
current_timer_data["sum"] = sum;
+ current_timer_data["sum_squares"] = sumSquares;
current_timer_data["mean"] = mean;
current_timer_data["median"] = median;
View
30 test/process_metrics_tests.js
@@ -46,7 +46,7 @@ module.exports = {
test.done();
},
timers_single_time: function(test) {
- test.expect(8);
+ test.expect(9);
this.metrics.timers['a'] = [100];
this.metrics.timer_counters['a'] = 1;
pm.process_metrics(this.metrics, 100, this.time_stamp, function(){});
@@ -57,12 +57,13 @@ module.exports = {
test.equal(1, timer_data.count);
test.equal(10, timer_data.count_ps);
test.equal(100, timer_data.sum);
+ test.equal(100 * 100, timer_data.sum_squares);
test.equal(100, timer_data.mean);
test.equal(100, timer_data.median);
test.done();
},
timers_multiple_times: function(test) {
- test.expect(8);
+ test.expect(9);
this.metrics.timers['a'] = [100, 200, 300];
this.metrics.timer_counters['a'] = 3;
pm.process_metrics(this.metrics, 100, this.time_stamp, function(){});
@@ -73,12 +74,14 @@ module.exports = {
test.equal(3, timer_data.count);
test.equal(30, timer_data.count_ps);
test.equal(600, timer_data.sum);
+ test.equal(100 * 100 + 200 * 200 + 300 * 300,
+ timer_data.sum_squares);
test.equal(200, timer_data.mean);
test.equal(200, timer_data.median);
test.done();
},
timers_single_time_single_percentile: function(test) {
- test.expect(3);
+ test.expect(4);
this.metrics.timers['a'] = [100];
this.metrics.timer_counters['a'] = 1;
this.metrics.pctThreshold = [90];
@@ -87,48 +90,63 @@ module.exports = {
test.equal(100, timer_data.mean_90);
test.equal(100, timer_data.upper_90);
test.equal(100, timer_data.sum_90);
+ test.equal(100 * 100, timer_data.sum_squares_90);
test.done();
},
timers_single_time_multiple_percentiles: function(test) {
- test.expect(6);
+ test.expect(9);
this.metrics.timers['a'] = [100];
this.metrics.timer_counters['a'] = 1;
this.metrics.pctThreshold = [90, 80];
pm.process_metrics(this.metrics, 100, this.time_stamp, function(){});
timer_data = this.metrics.timer_data['a'];
+ test.equal(1, timer_data.count_90);
test.equal(100, timer_data.mean_90);
test.equal(100, timer_data.upper_90);
test.equal(100, timer_data.sum_90);
+ test.equal(100 * 100, timer_data.sum_squares_90);
test.equal(100, timer_data.mean_80);
test.equal(100, timer_data.upper_80);
test.equal(100, timer_data.sum_80);
+ test.equal(100 * 100, timer_data.sum_squares_80);
test.done();
},
timers_multiple_times_single_percentiles: function(test) {
- test.expect(3);
+ test.expect(5);
this.metrics.timers['a'] = [100, 200, 300];
this.metrics.timer_counters['a'] = 3;
this.metrics.pctThreshold = [90];
pm.process_metrics(this.metrics, 100, this.time_stamp, function(){});
timer_data = this.metrics.timer_data['a'];
+ test.equal(3, timer_data.count_90);
test.equal(200, timer_data.mean_90);
test.equal(300, timer_data.upper_90);
test.equal(600, timer_data.sum_90);
+ test.equal(100 * 100 + 200 * 200 + 300 * 300,
+ timer_data.sum_squares_90);
test.done();
},
timers_multiple_times_multiple_percentiles: function(test) {
- test.expect(6);
+ test.expect(11);
this.metrics.timers['a'] = [100, 200, 300];
this.metrics.timer_counters['a'] = 3;
this.metrics.pctThreshold = [90, 80];
pm.process_metrics(this.metrics, 100, this.time_stamp, function(){});
timer_data = this.metrics.timer_data['a'];
+ test.equal(3, timer_data.count);
+ test.equal(3, timer_data.count_90);
test.equal(200, timer_data.mean_90);
test.equal(300, timer_data.upper_90);
test.equal(600, timer_data.sum_90);
+ test.equal(100 * 100 + 200 * 200 + 300 * 300,
+ timer_data.sum_squares_90);
+
+ test.equal(2, timer_data.count_80);
test.equal(150, timer_data.mean_80);
test.equal(200, timer_data.upper_80);
test.equal(300, timer_data.sum_80);
+ test.equal(100 * 100 + 200 * 200,
+ timer_data.sum_squares_80);
test.done();
},
timers_sampled_times: function(test) {
Something went wrong with that request. Please try again.