Skip to content

Storing binary values via API #1548

@smarterclayton

Description

@smarterclayton

Currently etcd takes strings as value inputs and stores them in memory as strings on the node struct. We are currently looking using etcd as a key value store for Kubenetes at higher densities than many etcd deployments (or many Kubernetes deployments), and a large portion of our size overhead would be the size of the value in etcd - the JSON object representing the stored Kubernetes resource. A string representation of a medium sized JSON object is roughly 3 times smaller gzipped or msgpacked (1.1K ~> 400 bytes). However, base64 encoding eats some of the savings (~25%) to store as a string and the break-even point is around 600 bytes or so for string -> gzip -> base64 compression, which is a common size for most JSON objects anyway.

We are testing etcd out to tens of millions of nodes, and we're at a point where storage overhead might be a factor in choosing etcd vs a different key value store and emulating etcd's features (of which we like for many reasons).

Would it be possible to allow direct binary input during a PUT to a key such that the value was stored in the node value as []byte? It would require that some indicator be stored in the node as well that the value needs to be base64 encoded by users, but it would allow larger data to be more efficiently stored. Potentially via a ContentType header on PUT?

Does this fit with the design goals for etcd (or within the scope that it might be useful for others with larger keys)? Obviously there are other obstacles to running millions of keys, but this is not a high write volume scenario we're discussing (I think our transaction mix at that scale is more like 97% read -> 3% write).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions