Skip to content

Move counter and timer calculations out of graphite backend and into separate module #167

Merged
merged 17 commits into from Nov 11, 2012

2 participants

@draco2003

An initial attempt at solving Issue #103

This replicates the graphite timer and counter calculations, but separates them out into a module.

It adds the counter_rates and timer_data to the metrics_hash prior to it being passed to the backends.

By merging it into the metrics_hash it keeps the function signature and events the same, though by putting them into their own data sets it keeps backwards compatibility with an easy migration path.

The backends are still required to iterate over the different types of metrics in order to format them as needed per backend, but they no longer need to do any heavy lifting as far as calculations go (unless they want to).

Also added an initial set of tests to run against the processedmetrics module to help get better test coverage for the calculations.

@mrtazz mrtazz commented on an outdated diff Oct 13, 2012
backends/graphite.js
for (key in counters) {
- var value = counters[key];
- var valuePerSecond = value / (flushInterval / 1000); // calculate "per second" rate
-
- statString += 'stats.' + key + ' ' + valuePerSecond + ' ' + ts + "\n";
- statString += 'stats_counts.' + key + ' ' + value + ' ' + ts + "\n";
+ statString += 'stats.' + key + ' ' + counter_rates[key] + ' ' + ts + "\n";
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 13, 2012

minor pet peeve, please keep the alignment of the string concatenation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mrtazz mrtazz and 1 other commented on an outdated diff Oct 13, 2012
backends/graphite.js
for (key in counters) {
- var value = counters[key];
- var valuePerSecond = value / (flushInterval / 1000); // calculate "per second" rate
-
- statString += 'stats.' + key + ' ' + valuePerSecond + ' ' + ts + "\n";
- statString += 'stats_counts.' + key + ' ' + value + ' ' + ts + "\n";
+ statString += 'stats.' + key + ' ' + counter_rates[key] + ' ' + ts + "\n";
+ statString += 'stats_counts.' + key + ' ' + counters[key] + ' ' + ts + "\n";
numStats += 1;
}
for (key in timers) {
if (timers[key].length > 0) {
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 13, 2012

shouldn't this be for (key in timer_data) and if (timer_data[key].length > 0)?

@draco2003
draco2003 added a note Oct 13, 2012
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 13, 2012

in this case it might be the same. But in general if you're acquiring keys from a different array than you're using them on, you're gonna have a bad time. And if we ever decide to drop timers from calculations or anything like that, this will bite us.

@draco2003
draco2003 added a note Oct 13, 2012

I was thinking more along the lines for things like in the librato library where they might want to calculate non-standard calculations in a single timer loop (similar to the counters loop above where its using the same key but different data sources. Though I can change it to timer_data easy enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mrtazz mrtazz and 1 other commented on an outdated diff Oct 13, 2012
lib/processedmetrics.js
@@ -0,0 +1,88 @@
+var ProcessedMetrics = function (metrics, flushInterval) {
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 13, 2012

I think process_metrics would be a better name here.

@draco2003
draco2003 added a note Oct 13, 2012

easy enough. Would you want the filename to follow the same naming?

@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 13, 2012

yeah I think that makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mrtazz mrtazz commented on an outdated diff Oct 13, 2012
@@ -69,6 +71,8 @@ function flushMetrics() {
}
});
+ metrics_hash = pm.ProcessedMetrics(metrics_hash, flushInterval)
+
// Flush metrics to each backend.
backendEvents.emit('flush', time_stamp, metrics_hash);
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 13, 2012

This shouldn't be blocking. The metrics processing function should take a callback, which then emits to the backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mrtazz
Etsy, Inc. member
mrtazz commented Oct 13, 2012

I took a quick glance and commented on what came to mind. Gonna do a test run of it as a whole, once the changes are in.

@draco2003

Did the quick changes. Will post the non-blocking metric processing call next.

@mrtazz mrtazz commented on an outdated diff Oct 17, 2012
lib/process_metrics.js
+ current_timer_data["std"] = stddev;
+ current_timer_data["upper"] = max;
+ current_timer_data["lower"] = min;
+ current_timer_data["count"] = count;
+ current_timer_data["sum"] = sum;
+ current_timer_data["mean"] = mean;
+
+ timer_data[key] = current_timer_data;
+
+ }
+ }
+
+ //add processed metrics to the metrics_hash
+ metrics.counter_rates = counter_rates;
+ metrics.timer_data = timer_data;
+ flushCallback();
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 17, 2012

We should pass the metrics object to the callback as a parameter and not rely on a global ref.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mrtazz mrtazz commented on an outdated diff Oct 17, 2012
@@ -69,8 +71,11 @@ function flushMetrics() {
}
});
- // Flush metrics to each backend.
- backendEvents.emit('flush', time_stamp, metrics_hash);
+ pm.process_metrics(metrics_hash, flushInterval, time_stamp, function emitFlush() {
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 17, 2012

the callback should take the metrics object as a parameter, since we really don't want to access the global metrics object here. Also it might make sense to also pass in an exitcode/status parameter so that the callback can act on any errors that happen during metrics processing. And at least log it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@draco2003

Based on the fact that we always have 2 counter_rates keys this should never be true. We could change the counter_rates to check for less than 3 to log when we aren't getting any non-statsd metrics to calculate.

Let me know if there are any other checks/catches you had in mind for triggering an error based on.

The calculations are bounds checked fairly well. Other place i could think of was type checking of parameters coming into the function if we think that makes sense.

@mrtazz mrtazz commented on an outdated diff Oct 20, 2012
lib/logger.js
}
- this.util.log(type + msg);
+ this.util.log(type + ": " + msg);
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 20, 2012

Why is this added here? It seems unrelated to the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mrtazz mrtazz and 1 other commented on an outdated diff Oct 20, 2012
@@ -69,8 +72,14 @@ function flushMetrics() {
}
});
- // Flush metrics to each backend.
- backendEvents.emit('flush', time_stamp, metrics_hash);
+ pm.process_metrics(metrics_hash, flushInterval, time_stamp, function emitFlush(err, metrics) {
+ // Flush metrics to each backend.
+ if (err) {
+ l.log("Errored processing metrics with: " + err, 'debug');
+ }
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 20, 2012

Thinking about whether we want to wrap the backends emit into a an else clause or let the backend somehow know that the metrics processing failed. Just passing the potentially erroneous metrics to the backend is probably not the right thing to do.

@draco2003
draco2003 added a note Oct 20, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@draco2003

Reverted the log tweak commit. I'll submit that as a separate pull request.

Changed the error handling to be a fatal error.
If processing metrics errors something else is most likely wrong and should be identified quickly.
I have it emit the error to all backends so they can subscribe to it and cleanup prior to the application exiting.

if we did an else, the flush once wouldn't be triggered and the metrics wouldn't be cleand up and most likely cause the same error after the next interval. It would also cause any of the flushInterval based calculations to be off due to it potentially being two or more intervals before it processed successfully.

If we called the clear_metrics and clean it up we're probably going to have gaps in our data and we'll most likely have the same error the next flush interval.

@mrtazz mrtazz commented on an outdated diff Oct 27, 2012
@@ -69,8 +72,21 @@ function flushMetrics() {
}
});
- // Flush metrics to each backend.
- backendEvents.emit('flush', time_stamp, metrics_hash);
+ pm.process_metrics(metrics_hash, flushInterval, time_stamp, function emitFlush(err, metrics) {
+ // Flush metrics to each backend only if the metrics processing was sucessful.
+ // Add processing_errors counter to allow for monitoring
+ if (err) {
+ l.log("Exiting due to error processing metrics with: " + err);
+ // Send metrics to backends for any last minute processing
+ // and give backends a chance to cleanup before exiting.
+ backendEvents.emit('error', time_stamp, metrics, err);
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 27, 2012

After thinking about this some more I think the best way to handle this is to just call backendEvents.emit("error", time_stamp, metrics, err). This will work with the existing backends for now, but provides a possibility to check for the error and handle it accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mrtazz mrtazz commented on an outdated diff Oct 27, 2012
@@ -69,8 +72,21 @@ function flushMetrics() {
}
});
- // Flush metrics to each backend.
- backendEvents.emit('flush', time_stamp, metrics_hash);
+ pm.process_metrics(metrics_hash, flushInterval, time_stamp, function emitFlush(err, metrics) {
+ // Flush metrics to each backend only if the metrics processing was sucessful.
+ // Add processing_errors counter to allow for monitoring
+ if (err) {
+ l.log("Exiting due to error processing metrics with: " + err);
+ // Send metrics to backends for any last minute processing
+ // and give backends a chance to cleanup before exiting.
+ backendEvents.emit('error', time_stamp, metrics, err);
+ // Only needed if other backends override the standard stacktrace/exit functionality
+ process.exit(1);
+ } else {
+ backendEvents.emit('flush', time_stamp, metrics);
@mrtazz
Etsy, Inc. member
mrtazz added a note Oct 27, 2012

I don't think we want to kill the whole daemon just because a calculation went wrong. It could be for metrics you don't even care about for example. I also don't think we need an extra event just for errors. The flush case in the backend should be able to handle this just fine. But I think we should update a counter statsd.calculation_error or something like that to actually record the failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mrtazz
Etsy, Inc. member
mrtazz commented Nov 3, 2012

I tested the changes locally, can you update the README with the new additional metrics that get passed through the backend API and then I'll merge it into master.

@mrtazz mrtazz merged commit c9e09f8 into etsy:master Nov 11, 2012

1 check passed

Details default The Travis build passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.