many people want different things. also some backends (i.e. graphite) allow many runtime computations (and hence need fewer different metrics if one can be calculated from each other), here it makes sense to aim for keeping the carbon/whisper load lower.
other backends may be less advanced and having statsd store similar metrics becomes appropriate.
there should be a generic way to enable/disable certain metrics.
right now there's a bunch of issues/PR's where this is issue is treated for individual metrics, here are some of them:
We've been talking this over a bit and came to a few conclusions:
We have a variety of namespaces to contend with - legacy/new/suffix/statsd metrics/global prefixes, and whatever the next guy mind find useful.
A regex or wildcard match against the key name would probably be the right solution - it allows you to use the wildcard you need for your namespace setup, and still be able to do counter rates for some metrics, but not others, as an example.
Ideally, the disable would stop as much of the processing of the metric as possible to save resources...but that may not be as easily said as done.
A simple output filter at flush, just before being shipped to the backend, might be the easiest fix - a list of regexs to match against, and we check each key against the regex.
Anyone else have thoughts on how this could be universally codified?
we = ?
I think we should do the enabled/disabled check as early possible. for example in process_metrics where the rate and the count is being calculated for a counter you could just check against configuration like counter.rate and counter.count; this is before the prefixes and such are getting applied.
that makes it much less centralized to implement, though - you have a bunch of places to make the tests, instead of just one. (esp if you wanted to suppress internal stats, or some but not all counter rates..)
and every time a new metric is added to the system, you have to explicitly support the suppression system - instead of only each time a backend is implemented.
we = my team. ;)
your goal of "decentralization" is not very beneficial IMHO. it would only save a few if/else branches throughout the code; while making it impossible to prevent calculation of unneeded things.
note that for a bunch of metrics (like packets_seen) you don't need to wrap those early, the +1 calls don't need an if/else around them, they only need an if/else at the end, because the performance impact is neglible
Thanks for submitting this. I think we could have a a pull request for a generic enable/disable which would also save time in the process_metrics() part if you are still interested in having this in StatsD.