Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Move stat key name sanitization to Graphite backend. #155

Merged
merged 1 commit into from

7 participants

@mheffner

Taking a stab at moving the metric name sanitization to the backend, as described originally in #110 (also refs #154).

This moves the Graphite-specific stat name regular expressions to the Graphite backend. Other backends should implement similar stat name expression cleanups based on their permitted character codes.

One issue that is not handled: Since the original code was performing these name changes at the time of ingress, it was possible that two stat reports could map to the same name. For example, the following two packets would map to the same timing metric:

glork:320|ms
gl+ork:320|ms

With this patch, these metrics will be tracked separately, but will both be submitted to Graphite using the same name "glork". Therefore, the second metric will likely overwrite the first one. It's not clear if this is actually behavior users are (or should be) depending on. The backend could go through and merge all metrics whose names map to the same key after sanitization, however, this might be tricky for metrics like gauges that track the last value (would require a timestamp). Thoughts?

@mrtazz
Owner

Sorry for the delay, looks pretty good. Since the sanitization is now a distinct function, we can haz tests? :)

@mheffner

@mrtazz Yeah, I'll take a look at adding some tests to this.

@huyl

Hello, what's the status on this PR?

@mheffner

@huyl I just rebased the original patch and added some tests to verify the name regexp is running.

@draco2003 @mrtazz This is the updated pull request I spoke with you about. The metric name regexp is now applied in the graphite backend. There is still the open issue of whether anyone is depending on the fact that multiple metric names could map to the same name post transform. If that's the case, then the graphite backend would need to merge the metric buckets.

@draco2003

@mheffner this looks good, can we add a config setting that defaults to the current way, and then we can update the backends to check for that setting and do the sanitization if it wasn't already done?
This lets us not break the current functionality, but allows us to deprecate it out down the road.

Thanks!

cc. @Dieterbe

@mheffner

@draco2003 Yeah, I'll take a stab at that.

@mheffner mheffner Stat key name sanitization is now configurable at the top-level.
Setting keyNameSanitize to false pushes the requirement of sanitizing
key names to the backends. This permits backends that have less strict
character set requirements to take advantage of an expanded stat name
character set. The default behavior remains the same as collisions in
key name space are not handled if two different stat names map to the
same sanitized key name.
2d8aaf0
@mheffner

@draco2003 I finally pushed a new version of this that makes the top-level key name sanitization configurable -- on by default. Let me know what you think!

@ivantopo

Hello @mheffner @draco2003, is there any particular reason why this PR never moved forward?

@shaylang

Hi,
When this PR is planned to be pushed ?
Thanks,
Shay

@mheffner

@ivantopo @shaylang This probably needs some cleanup to get it merged into master again, if you're interested it could probably use some assistance. I've been busy with other projects recently.

A cleaned up PR may be more attractive given the age of this.

@shaylang

i have created an updated pull request to current master
#451
how can we continue?

@pathzzrd
Owner

@shaylang looking to rope other maintainers in, but we're on it now.

@ivantopo ivantopo referenced this pull request in kamon-io/docker-grafana-graphite
Closed

patched statsd #8

@pathzzrd pathzzrd merged commit 2d8aaf0 into from
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 12, 2013
  1. @mheffner

    Stat key name sanitization is now configurable at the top-level.

    mheffner authored
    Setting keyNameSanitize to false pushes the requirement of sanitizing
    key names to the backends. This permits backends that have less strict
    character set requirements to take advantage of an expanded stat name
    character set. The default behavior remains the same as collisions in
    key name space are not handled if two different stat names map to the
    same sanitized key name.
This page is out of date. Refresh to see the latest.
View
28 backends/graphite.js
@@ -34,6 +34,7 @@ var prefixTimer;
var prefixGauge;
var prefixSet;
var globalSuffix;
+var globalKeySanitize = true;
// set up namespaces
var legacyNamespace = true;
@@ -98,15 +99,27 @@ var flush_stats = function graphite_flush(ts, metrics) {
var timer_data = metrics.timer_data;
var statsd_metrics = metrics.statsd_metrics;
+ // Sanitize key for graphite if not done globally
+ function sk(key) {
+ if (globalKeySanitize) {
+ return key;
+ } else {
+ return key.replace(/\s+/g, '_')
+ .replace(/\//g, '-')
+ .replace(/[^a-zA-Z_\-0-9\.]/g, '');
+ }
+ };
+
for (key in counters) {
- var namespace = counterNamespace.concat(key);
var value = counters[key];
var valuePerSecond = counter_rates[key]; // pre-calculated "per second" rate
+ var keyName = sk(key);
+ var namespace = counterNamespace.concat(keyName);
if (legacyNamespace === true) {
statString += namespace.join(".") + globalSuffix + valuePerSecond + ts_suffix;
if (flush_counts) {
- statString += 'stats_counts.' + key + globalSuffix + value + ts_suffix;
+ statString += 'stats_counts.' + keyName + globalSuffix + value + ts_suffix;
}
} else {
statString += namespace.concat('rate').join(".") + globalSuffix + valuePerSecond + ts_suffix;
@@ -119,8 +132,9 @@ var flush_stats = function graphite_flush(ts, metrics) {
}
for (key in timer_data) {
- var namespace = timerNamespace.concat(key);
+ var namespace = timerNamespace.concat(sk(key));
var the_key = namespace.join(".");
+
for (timer_data_key in timer_data[key]) {
if (typeof(timer_data[key][timer_data_key]) === 'number') {
statString += the_key + '.' + timer_data_key + globalSuffix + timer_data[key][timer_data_key] + ts_suffix;
@@ -138,13 +152,13 @@ var flush_stats = function graphite_flush(ts, metrics) {
}
for (key in gauges) {
- var namespace = gaugesNamespace.concat(key);
+ var namespace = gaugesNamespace.concat(sk(key));
statString += namespace.join(".") + globalSuffix + gauges[key] + ts_suffix;
numStats += 1;
}
for (key in sets) {
- var namespace = setsNamespace.concat(key);
+ var namespace = setsNamespace.concat(sk(key));
statString += namespace.join(".") + '.count' + globalSuffix + sets[key].values().length + ts_suffix;
numStats += 1;
}
@@ -238,6 +252,10 @@ exports.init = function graphite_init(startup_time, config, events) {
graphiteStats.flush_time = 0;
graphiteStats.flush_length = 0;
+ if (config.keyNameSanitize !== undefined) {
+ globalKeySanitize = config.keyNameSanitize;
+ }
+
flushInterval = config.flushInterval;
flush_counts = typeof(config.flush_counts) === "undefined" ? true : config.flush_counts;
View
3  exampleConfig.js
@@ -52,6 +52,9 @@ Optional Variables:
deleteCounters: don't send values to graphite for inactive counters, as opposed to sending 0 [default: false]
prefixStats: prefix to use for the statsd statistics data for this running instance of statsd [default: statsd]
applies to both legacy and new namespacing
+ keyNameSanitize: sanitize all stat names on ingress [default: true]
+ If disabled, it is up to the backends to sanitize keynames
+ as appropriate per their storage requirements.
console:
prettyprint: whether to prettyprint the console backend
View
20 stats.js
@@ -28,6 +28,7 @@ var flushInterval, keyFlushInt, server, mgmtServer;
var startup_time = Math.round(new Date().getTime() / 1000);
var backendEvents = new events.EventEmitter();
var healthStatus = config.healthStatus || 'up';
+var keyNameSanitize = true;
// Load and init the backend from the backends/ directory.
function loadBackend(config, name) {
@@ -135,6 +136,16 @@ var stats = {
}
};
+function sanitizeKeyName(key) {
+ if (keyNameSanitize) {
+ return key.replace(/\s+/g, '_')
+ .replace(/\//g, '-')
+ .replace(/[^a-zA-Z_\-0-9\.]/g, '');
+ } else {
+ return key;
+ }
+}
+
// Global for the logger
var l;
@@ -156,6 +167,10 @@ config.configFile(process.argv[2], function (config, oldConfig) {
counters[bad_lines_seen] = 0;
counters[packets_received] = 0;
+ if (config.keyNameSanitize !== undefined) {
+ keyNameSanitize = config.keyNameSanitize;
+ }
+
if (server === undefined) {
// key counting
@@ -180,10 +195,7 @@ config.configFile(process.argv[2], function (config, oldConfig) {
l.log(metrics[midx].toString());
}
var bits = metrics[midx].toString().split(':');
- var key = bits.shift()
- .replace(/\s+/g, '_')
- .replace(/\//g, '-')
- .replace(/[^a-zA-Z_\-0-9\.]/g, '');
+ var key = sanitizeKeyName(bits.shift());
if (keyFlushInterval > 0) {
if (! keyCounter[key]) {
View
19 test/graphite_tests.js
@@ -358,5 +358,24 @@ module.exports = {
});
});
});
+ },
+
+ metric_names_are_sanitized: function(test) {
+ var me = this;
+ this.acceptor.once('connection', function(c) {
+ statsd_send('fo/o:250|c',me.sock,'127.0.0.1',8125,function(){
+ statsd_send('b ar:250|c',me.sock,'127.0.0.1',8125,function(){
+ statsd_send('foo+bar:250|c',me.sock,'127.0.0.1',8125,function(){
+ collect_for(me.acceptor, me.myflush, function(strings){
+ var str = strings.join();
+ test.ok(str.indexOf('fo-o') !== -1, "Did not map 'fo/o' => 'fo-o'");
+ test.ok(str.indexOf('b_ar') !== -1, "Did not map 'b ar' => 'b_ar'");
+ test.ok(str.indexOf('foobar') !== -1, "Did not map 'foo+bar' => 'foobar'");
+ test.done();
+ });
+ });
+ });
+ });
+ });
}
}
Something went wrong with that request. Please try again.