-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing binary values via API #1548
Comments
This is related to #1073 in terms of benefit, but is about internal storage vs wire format. |
+1 would allow spf13/viper and xordataexchange/crypt to skip base64 encoding, which adds storage overhead and performance impact. |
@smarterclayton @bketelsen Supporting binary value is on the roadmap of 0.6. |
@xiangli-cmu great news. I searched, but didn't find a value size limitation in etcd. Is there one? My tests with etcdctl and curl all broke the shell (zsh) before they even attempted to post. |
@bketelsen There is not limitation currently. However, etcd can only commit the write after it replicates the data to a majority nodes in the cluster. Thus it is not recommend to commit a large value at once, since it would block the progress of the whole cluster. Most implementations (zk, chubby) sets a hard limitation to avoid bad behavior that is not in line with the goal of the project: a consistent data store that serves small value. If you want to put really large values, probably what you really need is a database. |
@smarterclayton The goal of 0.6 is to make etcd scale. Better proxying model(scale read, watch) and better underlay storage system(low key-value pair overhead, fast snapshotting scale number of small keys that can live inside etcd). |
This is absolutely something we want to do. We are working on writing down the pieces that will be needed to make this happen. The first is refactoring the internal store to support binary values in a reasonable way and then having a content-type option for the v2/keys endpoint to return just the original contents. Early next week Xiang will put up a doc that outlines the pieces; help would be great once we put that up. |
We'd be happy to help. ----- Original Message -----
|
@xiangli-cmu can you clarify, "scale number of small keys that can live in etcd" Is that all of the above?
Scale is relative, are you looking at scales of 1-10k keys or 10 million keys? I have done some initial investigations on memory overhead of 0.5 focused on the following use cases:
Independent of value size (which binary data may help), I see about a 13:1 ratio of total process memory use to the actual snapshot size on disk, and I am hoping we can reduce that ration as the number of keys increase so I can scale to the 10s of million keys instead of 100k keys. |
@derekwaynecarr Can we discuss this at #1634? |
"Storing value as a binary object" feature released in any of the stable versions? Also looking for a C binding to do the same. |
@deepakthipeswamy yes, the v3 API supports it; values are uninterpreted byte arrays |
@heyitsanthony where do you see this is possible with v3? It looks like everything takes strings to me. https://godoc.org/github.com/coreos/etcd/clientv3#KV states:
@philips is there no way to pass in |
@aaronjwood internally it's a byte array on the backend and in the grpc protobuf wire protocol. It's
Have you profiled it? |
@heyitsanthony bummer. No I haven't profiled it but I know that conversions between I think we'll end up doing something different for our case since only strings are accepted. We were going to store protobuf bytes but I think what I'll do now is just marshal it to a string, store it as a string, and unmarshal it as a string later on. Was just hoping to use bytes to save space/overhead. Not a huge deal I guess. |
Are there any future plans to support What we are doing now is encoding structs via We can write a custom transcoder to get around this but if binary data was accepted we would:
Thoughts? |
@aaronjwood I don't understand why |
For people using protobufs they can just take the output of Relating to the base64 stuff, I thought that you cannot fully represent something in bytes (like a struct) using a string since you'd have many unprintable characters. Can you do this in Go? |
@aaronjwood |
@heyitsanthony you're right, thanks for pointing this out. I was not aware that Go doesn't really work with C-style strings underneath:
This threw me off big time. Looks like there are some quirks with raw strings:
It's good to know that no special encoding is needed for these situations with Go. |
I'd recommend a high speed compression/decompression library, like lz4 or zstd. Gzip just takes too many resources to marry up with etcd |
Currently etcd takes strings as value inputs and stores them in memory as strings on the node struct. We are currently looking using etcd as a key value store for Kubenetes at higher densities than many etcd deployments (or many Kubernetes deployments), and a large portion of our size overhead would be the size of the value in etcd - the JSON object representing the stored Kubernetes resource. A string representation of a medium sized JSON object is roughly 3 times smaller gzipped or msgpacked (1.1K ~> 400 bytes). However, base64 encoding eats some of the savings (~25%) to store as a string and the break-even point is around 600 bytes or so for string -> gzip -> base64 compression, which is a common size for most JSON objects anyway.
We are testing etcd out to tens of millions of nodes, and we're at a point where storage overhead might be a factor in choosing etcd vs a different key value store and emulating etcd's features (of which we like for many reasons).
Would it be possible to allow direct binary input during a PUT to a key such that the value was stored in the node value as []byte? It would require that some indicator be stored in the node as well that the value needs to be base64 encoded by users, but it would allow larger data to be more efficiently stored. Potentially via a ContentType header on PUT?
Does this fit with the design goals for etcd (or within the scope that it might be useful for others with larger keys)? Obviously there are other obstacles to running millions of keys, but this is not a high write volume scenario we're discussing (I think our transaction mix at that scale is more like 97% read -> 3% write).
The text was updated successfully, but these errors were encountered: