Add Graphite backend configuration flag to connect only when there are metrics to send #345

Open
wants to merge 3 commits into
from

Projects

None yet

5 participants

@urlgrey
urlgrey commented Sep 26, 2013

The Statsd Graphite backend in the master branch will connect to a Graphite carbon-relay or carbon-cache server even when there are no metrics to report. This can be useful for indicating that the Statsd process is alive and well, and capable of sending metrics if/when they occur; however, the frequency & volume of connections to Graphite could become a problem in some circumstances.

In my case, I have a large number of Amazon EC2 instances that can perform various tasks. Statsd metrics are generated for just one type of task. Using the master branch of Statsd, all EC2 instances connect to the Graphite server at 1 second intervals, usually reporting no metrics. I would prefer to have just the instances handling the task that produces metrics connect to Graphite and report stats. My applications produces several Statsd gauge values while the task is running.

I have added a 'quiet' boolean flag to the Statsd Graphite configuration. It has a default value of 'false', which will cause the Graphite backend to perform identically to the current master branch (always connect to Graphite). If the 'quiet' flag is set to 'true', then a connection will be made to Graphite only when there are metrics to report.

Here's an example Statsd config.js file showing its usage:

{
  graphitePort: "2013"
, graphiteHost: "x.x.x.x"
, flushInterval: "1000"
, port: 8125
, backends: [ "./backends/graphite" ]
, graphite: { quiet: true }
, deleteIdleStats: true
}
@urlgrey urlgrey Adding 'quiet' boolean flag to Graphite backend connector to avoid co…
…nnecting to Graphite when there are not metrics to report.
510a3a1
@mrtazz
Member
mrtazz commented Oct 1, 2013

Interesting patch, do you see any downsides of this just being the normal behaviour? I don't think connecting without any metrics is useful in any situation. So if you can't think of any problems that might arise, I'd say let's make this the default (and only) way of connecting.

@urlgrey
urlgrey commented Oct 1, 2013

It might be useful to have Statsd connect to Graphite at the configured flush interval to indicate that it's alive & well. This would help distinguish between an absence of application metrics because there are no metrics to report versus no metrics because the Statsd service is down. In my case, the application metrics are not considered mission critical. It's preferable to have Statsd stay quiet when no application metrics need to be pushed to Graphite.

I like the idea of making 'quiet=true' the default, and think that it's worth allowing that behavior to be controlled through the quiet configuration option.

@draco2003

Great idea/patch, thanks!

Lunch break braindump

Is it accurate that this functionality requires that deleteIdleStats is
enabled, otherwise those objects will always have a length after the first
set of metrics are sent i believe?
Might want to add that to the docs if that is the case.

Also Object.keys() is O(n) so we should make sure that we aren't making all
of those checks if quiet is disabled, or deleteIdleStats is disabled to
prevent performance issues.

Taking a step back, would it make sense to just skip the flush function all

together if packets received is 0 instead of all of the object.keys checks?

Will try to look at more of this on the train ride home this evening.

On Tue, Oct 1, 2013 at 2:55 PM, Scott Kidder notifications@github.comwrote:

It might be useful to have Statsd connect to Graphite at the configured
flush interval to indicate that it's alive & well. This would help
distinguish between an absence of application metrics because there are no
metrics to report versus no metrics because the Statsd service is down. In
my case, the application metrics are not considered mission critical. It's
preferable to have Statsd stay quiet when no application metrics need to be
pushed to Graphite.

I like the idea of making 'quiet=true' the default, and think that it's
worth allowing that behavior to be controlled through the quiet
configuration option.


Reply to this email directly or view it on GitHubhttps://github.com/etsy/statsd/pull/345#issuecomment-25477560
.

urlgrey added some commits Oct 10, 2013
@urlgrey urlgrey Applied feedback from pull-request. The quiet option is effective onl…
…y when the deleteIdleStats option is enabled. The check for empty stats associative arrays has been made more efficient.
6506508
@urlgrey urlgrey Updated unit test to include deleteIdleStats value of true 9468677
@urlgrey
urlgrey commented Oct 10, 2013

Thanks for the excellent feedback! I've updated the pull-request to perform the 'quiet' operation only when the 'deleteIdleStats' option is enabled. The documentation in the example configuration file has bee updated accordingly. I also modified the check for empty stats associative arrays to be more efficient.

@draco2003

Thanks for the PR and for updating the PR!

Just a heads up. I'm looking into how we can have this PR and #364 coexist.
I think both options valid use cases and would be good to include.

Thinking we move to a graphite connection_type config option so we can set quiet , standard(?) or persistent, and then handle the cases accordingly. Debating about including the current case as the "standard" option to support backwards compatibility.

Other option would be to try and do some combination of both automagically. Persisting the connection while regularly flushing metrics, but timeout and don't reconnect if we have a flushInterval that doesn't have metrics to send. Need to think this option through a little more, but like the idea of less config options for everyone :)

@draco2003

Just added a PR for Persistent connections for high traffic statsd servers.
If that gets the 👍 i think that we'll just need to wrap the
this.end() with an if quiet check in your PR so we only call end when in quiet mode, otherwise we reuse the connection, then merge this in as well.

https://github.com/urlgrey/statsd/blob/9468677584f8b45ea863a257f6eca7f3df32c13e/backends/graphite.js#L74

@urlgrey
urlgrey commented Dec 13, 2013

That sounds great, I'm looking forward to seeing both features be a part of the Statsd project!

@urlgrey
urlgrey commented Dec 13, 2013

On second thought, I'm not sure that we should close the connection when the 'quiet' flag is true. In my case I'd like to keep the connection open, if possible, in the interest of efficiency. Does the persistent-connection pull-request handle the case where the connection is left open & idle for an extended period of time (i.e. > 5 minutes)?

@draco2003

It doesn't timeout an idle connection currently. If the connection has
closed it reconnects persistently the next flush interval.

We could modify your PR to close the connection to graphite if it doesn't
have any metrics to send on the next flush interval.

On Fri, Dec 13, 2013 at 4:21 PM, Scott Kidder notifications@github.comwrote:

On second thought, I'm not sure that we should close the connection when
the 'quiet' flag is true. In my case I'd like to keep the connection open,
if possible, in the interest of efficiency. Does the persistent-connection
pull-request handle the case where the connection is left open & idle for
an extended period of time (i.e. > 5 minutes)?


Reply to this email directly or view it on GitHubhttps://github.com/etsy/statsd/pull/345#issuecomment-30544536
.

@skidder
skidder commented Dec 13, 2013

Awesome, that'd do it.

On Fri, Dec 13, 2013 at 1:34 PM, Dan Rowe notifications@github.com wrote:

It doesn't timeout an idle connection currently. If the connection has
closed it reconnects persistently the next flush interval.

We could modify your PR to close the connection to graphite if it doesn't
have any metrics to send on the next flush interval.

On Fri, Dec 13, 2013 at 4:21 PM, Scott Kidder notifications@github.comwrote:

On second thought, I'm not sure that we should close the connection when
the 'quiet' flag is true. In my case I'd like to keep the connection
open,
if possible, in the interest of efficiency. Does the
persistent-connection
pull-request handle the case where the connection is left open & idle
for
an extended period of time (i.e. > 5 minutes)?


Reply to this email directly or view it on GitHub<
https://github.com/etsy/statsd/pull/345#issuecomment-30544536>
.


Reply to this email directly or view it on GitHubhttps://github.com/etsy/statsd/pull/345#issuecomment-30545400
.

@mrtazz
Member
mrtazz commented Apr 14, 2014

Hey is this synced with the changes in #364 ? Or do we need to sync both up to merge this?

@cPanelScott

This feature looks useful to me. I've incorporated urlgrey's work into an up to date fork (I hope that's okay) so that I can get the latest version of statsd along with this feature. I'm willing to help where I can if there's outstanding work required to get this merged into mainline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment