UDP fire-and-forget makes perfect sense for the app->statsd link when statsd runs on localhost. In many circumstances it's also fine when statsd runs on a remote host with a reliable link.
But where there's a risk of packet loss is high and the stats are important it would be good to have a more reliable transport.
For this use-case I envisage using UDP for the app -> statsd on localhost link. That statsd would then use the repeater backend to forward the stats via TCP, to a central statsd aggregating stats from many hosts.
Couldn't the central StatsD run within the "reliable" network then? I take it that normally not all network connectivity goes drastically downhill outside of localhost. Since we're already using TCP based communication for a lot of the backend flushing (like Graphite for example) that could be leveraged to place the more reliable communication on the more unreliable link. I'm not completely opposed to a TCP option, but I also don't know if this is a serious enough problem to warrant making StatsD way more complex (what protocol are which instances running? should we combine multiple metrics in a packet or repeat directly? How would we handle connection errors?).
I'd suggest having TCP metric input as an option (default off). That much should be very simple.
The next step would be to enable the repeater to use a tcp connection (default off). Probably also simple.
I'd keep it simple: just repeat metrics directly because tcp will look after putting them on the wire efficiently. (Someone else could write a more advanced buffering backend if they want. I don't see much need.)
Similarly, failure to connect would just keep retrying, and blocking on write would hang the local statsd. Both simple and reasonable behaviours because if the remote statsd or link was down the udp data would be lost anyway.
A variation of "tcp" would be to have it support HTTP; with keepalive or even websockets it wouldn't necessarily be terribly inefficient. It'd make it easier to fit into odd infrastructures.
Yes, we are planning to adapt the StatsD internals a bit to make it easier to have different ways of connecting to it.
+1 for TCP. Great work though!
any update on the TCP feature?
Lets make it open source alternative of newrelic.
#448 adds a tcp interface to statsd. To use it, set the server configuration variable to "./servers/tcp"