Adding compression to statsd protocol #221

PSUdaemon opened this Issue Dec 23, 2012 · 3 comments


None yet

4 participants


Hello All,

How do you guys feel about adding compression to the protocol? Since we added the capability to do multiple metrics in a single packet, I think this would be beneficial. I think this could be done in at least two ways.

  1. Listen on port 8126 and treat all datagrams there as compressed. Simple and to the point with low overhead. Probably won't scale well to add new features in this manner and stuck using one compression algorithm.

  2. Update the protocol to have a header.

    • Add a "magic" value like 'STATSD' to the beginning of the packet
    • Add options after. Prepend with a colon and separate with commas. For example ':OPT1,OPT2'
    • Add a \n to end the header

The complete header would look something like this:

If you don't see this header, continue as we do now. If we do see it, parse for options, then mostly continue as we do now, except change behavior as the options dictate, ie decompressing data. This should keep things backward compatible and also allow for future additions.

As for the compression protocol to use, I'd suggest LZ4. It has a New BSD license, and has an MIT licensed node.js implementation. It's very fast especially for decompression:

Name            Ratio   C.speed D.speed
LZ4 (r59)       2.084   330      915
LZO 2.05 1x_1   2.038   311      480
QuickLZ 1.5 -1  2.233   257      277
Snappy 1.0.5    2.024   227      729
LZF             2.076   197      465
FastLZ          2.030   190      420
zlib 1.2.5 -1   2.728    39      195
LZ4 HC (r66)    2.712    18     1020
zlib 1.2.5 -6   3.095    14      210

(for speeds higher is faster)

It also has a "high compression" mode that is comparable to zlib and uses the same decompression routine so it is transparent to the decompressing side.

In my simple tests I saw for 99 metrics (sorry for the odd number) they had an input size of 3062 bytes and a compressed output size of 495 bytes. This did not include any header data.

Since we can pass different options with method 2, we can of course implement a long list of compression options, but I think LZ4 is a great place to start.



I'd personally also like to see the ability for clients to pass in headers on a packet. These could be used for a lot of namespacing information (bot hierarchical namespaces and tags) as well a custom information such as authorization tokens and account information. This would make statsd more consumable by commercial and multi-tenant tools.

As for how it get implemented. As statsd (today) is a text based protocol, no (sensible) packet can every arrive with the first octet of '\0'. I suggest extening the protocol to have packet types, reserving the first 4 bytes as a packet type and reserving all packet types with a leading octet of non-'\0' to be of type "legacy text protocol."

This will allow use of the same port, future extension of the packet format and not break any existing client/server.


@postwait the idea of a header does open up some really interesting possibilities, tags notwithstanding. It's a neat way to pass instructions to statsd itself. I like it.

Etsy, Inc. member

I like the idea and I wouldn't be opposed to working with someone on a pull request to add this. But I'll close out this issue as it's not really an issue. But thank you for submitting, this is definitely an interesting topic to discuss on a pull request.

@mrtazz mrtazz closed this Jan 22, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment