How do you guys feel about adding compression to the protocol? Since we added the capability to do multiple metrics in a single packet, I think this would be beneficial. I think this could be done in at least two ways.
Listen on port 8126 and treat all datagrams there as compressed. Simple and to the point with low overhead. Probably won't scale well to add new features in this manner and stuck using one compression algorithm.
Update the protocol to have a header.
The complete header would look something like this:
If you don't see this header, continue as we do now. If we do see it, parse for options, then mostly continue as we do now, except change behavior as the options dictate, ie decompressing data. This should keep things backward compatible and also allow for future additions.
As for the compression protocol to use, I'd suggest LZ4. It has a New BSD license, and has an MIT licensed node.js implementation. It's very fast especially for decompression:
Name Ratio C.speed D.speed
LZ4 (r59) 2.084 330 915
LZO 2.05 1x_1 2.038 311 480
QuickLZ 1.5 -1 2.233 257 277
Snappy 1.0.5 2.024 227 729
LZF 2.076 197 465
FastLZ 2.030 190 420
zlib 1.2.5 -1 2.728 39 195
LZ4 HC (r66) 2.712 18 1020
zlib 1.2.5 -6 3.095 14 210
(for speeds higher is faster)
It also has a "high compression" mode that is comparable to zlib and uses the same decompression routine so it is transparent to the decompressing side.
In my simple tests I saw for 99 metrics (sorry for the odd number) they had an input size of 3062 bytes and a compressed output size of 495 bytes. This did not include any header data.
Since we can pass different options with method 2, we can of course implement a long list of compression options, but I think LZ4 is a great place to start.
I'd personally also like to see the ability for clients to pass in headers on a packet. These could be used for a lot of namespacing information (bot hierarchical namespaces and tags) as well a custom information such as authorization tokens and account information. This would make statsd more consumable by commercial and multi-tenant tools.
As for how it get implemented. As statsd (today) is a text based protocol, no (sensible) packet can every arrive with the first octet of '\0'. I suggest extening the protocol to have packet types, reserving the first 4 bytes as a packet type and reserving all packet types with a leading octet of non-'\0' to be of type "legacy text protocol."
This will allow use of the same port, future extension of the packet format and not break any existing client/server.
@postwait the idea of a header does open up some really interesting possibilities, tags notwithstanding. It's a neat way to pass instructions to statsd itself. I like it.
I like the idea and I wouldn't be opposed to working with someone on a pull request to add this. But I'll close out this issue as it's not really an issue. But thank you for submitting, this is definitely an interesting topic to discuss on a pull request.