Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing binary values via API #1548

Closed
smarterclayton opened this issue Oct 31, 2014 · 21 comments
Closed

Storing binary values via API #1548

smarterclayton opened this issue Oct 31, 2014 · 21 comments

Comments

@smarterclayton
Copy link
Contributor

Currently etcd takes strings as value inputs and stores them in memory as strings on the node struct. We are currently looking using etcd as a key value store for Kubenetes at higher densities than many etcd deployments (or many Kubernetes deployments), and a large portion of our size overhead would be the size of the value in etcd - the JSON object representing the stored Kubernetes resource. A string representation of a medium sized JSON object is roughly 3 times smaller gzipped or msgpacked (1.1K ~> 400 bytes). However, base64 encoding eats some of the savings (~25%) to store as a string and the break-even point is around 600 bytes or so for string -> gzip -> base64 compression, which is a common size for most JSON objects anyway.

We are testing etcd out to tens of millions of nodes, and we're at a point where storage overhead might be a factor in choosing etcd vs a different key value store and emulating etcd's features (of which we like for many reasons).

Would it be possible to allow direct binary input during a PUT to a key such that the value was stored in the node value as []byte? It would require that some indicator be stored in the node as well that the value needs to be base64 encoded by users, but it would allow larger data to be more efficiently stored. Potentially via a ContentType header on PUT?

Does this fit with the design goals for etcd (or within the scope that it might be useful for others with larger keys)? Obviously there are other obstacles to running millions of keys, but this is not a high write volume scenario we're discussing (I think our transaction mix at that scale is more like 97% read -> 3% write).

@smarterclayton
Copy link
Contributor Author

This is related to #1073 in terms of benefit, but is about internal storage vs wire format.

@bketelsen
Copy link
Contributor

+1 would allow spf13/viper and xordataexchange/crypt to skip base64 encoding, which adds storage overhead and performance impact.

@xiang90 xiang90 added this to the v0.6.0 milestone Oct 31, 2014
@xiang90
Copy link
Contributor

xiang90 commented Oct 31, 2014

@smarterclayton @bketelsen Supporting binary value is on the roadmap of 0.6.

@bketelsen
Copy link
Contributor

@xiangli-cmu great news. I searched, but didn't find a value size limitation in etcd. Is there one? My tests with etcdctl and curl all broke the shell (zsh) before they even attempted to post.

@xiang90
Copy link
Contributor

xiang90 commented Oct 31, 2014

@bketelsen There is not limitation currently. However, etcd can only commit the write after it replicates the data to a majority nodes in the cluster. Thus it is not recommend to commit a large value at once, since it would block the progress of the whole cluster. Most implementations (zk, chubby) sets a hard limitation to avoid bad behavior that is not in line with the goal of the project: a consistent data store that serves small value. If you want to put really large values, probably what you really need is a database.

@xiang90
Copy link
Contributor

xiang90 commented Oct 31, 2014

@smarterclayton The goal of 0.6 is to make etcd scale. Better proxying model(scale read, watch) and better underlay storage system(low key-value pair overhead, fast snapshotting scale number of small keys that can live inside etcd).

@philips
Copy link
Contributor

philips commented Oct 31, 2014

This is absolutely something we want to do.

We are working on writing down the pieces that will be needed to make this happen. The first is refactoring the internal store to support binary values in a reasonable way and then having a content-type option for the v2/keys endpoint to return just the original contents. Early next week Xiang will put up a doc that outlines the pieces; help would be great once we put that up.

@smarterclayton
Copy link
Contributor Author

We'd be happy to help.

----- Original Message -----

This is absolutely something we want to do.

We are working on writing down the pieces that will be needed to make this
happen. The first is refactoring the internal store to support binary values
in a reasonable way and then having a content-type option for the v2/keys
endpoint to return just the original contents. Early next week Xiang will
put up a doc that outlines the pieces; help would be great once we put that
up.


Reply to this email directly or view it on GitHub:
#1548 (comment)

@derekwaynecarr
Copy link

@xiangli-cmu can you clarify, "scale number of small keys that can live in etcd"

Is that all of the above?

  1. Total number of keys in system
  2. Total number of keys as first-level children of a directory
  3. Total number of directories in system

Scale is relative, are you looking at scales of 1-10k keys or 10 million keys?

I have done some initial investigations on memory overhead of 0.5 focused on the following use cases:

  1. large number of nodes as first level children of a directory
  2. large number of directories each with a single child

Independent of value size (which binary data may help), I see about a 13:1 ratio of total process memory use to the actual snapshot size on disk, and I am hoping we can reduce that ration as the number of keys increase so I can scale to the 10s of million keys instead of 100k keys.

@xiang90
Copy link
Contributor

xiang90 commented Nov 6, 2014

@derekwaynecarr Can we discuss this at #1634?

@deepakthipeswamy
Copy link

deepakthipeswamy commented Oct 14, 2016

"Storing value as a binary object" feature released in any of the stable versions? Also looking for a C binding to do the same.

@heyitsanthony
Copy link
Contributor

@deepakthipeswamy yes, the v3 API supports it; values are uninterpreted byte arrays

@aaronjwood
Copy link

@heyitsanthony where do you see this is possible with v3? It looks like everything takes strings to me.

https://godoc.org/github.com/coreos/etcd/clientv3#KV states:

// Put puts a key-value pair into etcd.
// Note that key,value can be plain bytes array and string is
// an immutable representation of that bytes array.
// To get a string of bytes, do string([]byte{0x10, 0x20}).

@philips is there no way to pass in []byte directly? Do I always need to work with string representations of byte slices? This seems a bit heavy compared to having it all in bytes and letting the application convert it to a string when needed.

@heyitsanthony
Copy link
Contributor

heyitsanthony commented Mar 6, 2017

@aaronjwood internally it's a byte array on the backend and in the grpc protobuf wire protocol. It's string in the client to encode constness in the type; it should still be bit-equivalent when stored in etcd.

This seems a bit heavy compared to having it all in bytes and letting the application convert it to a string when needed

Have you profiled it?

@aaronjwood
Copy link

@heyitsanthony bummer.

No I haven't profiled it but I know that conversions between []byte and string cause allocations. I think the overhead is better in newer versions of Go from what I've heard. We'll be doing this in one of our hotspots which is why I'm trying to avoid having the whole conversion thing.

I think we'll end up doing something different for our case since only strings are accepted. We were going to store protobuf bytes but I think what I'll do now is just marshal it to a string, store it as a string, and unmarshal it as a string later on. Was just hoping to use bytes to save space/overhead. Not a huge deal I guess.

@aaronjwood
Copy link

Are there any future plans to support []byte for values like some other KV stores do? We have run into a few significant issues in our application due to the fact that everything needs to be a string.

What we are doing now is encoding structs via gob, base64ing that binary data, and then sending the encoded string to etcd. When we need to retrieve things we do the reverse. This works okay and initially we thought the amplification of data being stored due to the base64 encoding would be okay. While that is something that still needs to be evaluated on a larger cluster I've found something much more serious. The gob encoder will not send zero values (https://groups.google.com/forum/#!msg/golang-nuts/VnFs2Cv0_UY/_AowBInicbQJ) and in some of our structs that we encode there are other (nested) structs/types/slices which in some cases will use 0 or a pointer to 0 for some value. So now when we're encoding we are silently ignoring anything that's a zero value and, at the same time, breaking ourselves when we go to decode and use the data.

We can write a custom transcoder to get around this but if binary data was accepted we would:

  1. reduce the amount of data stored in etcd (and consequently reduce the amount of data replicated, which means less network overhead and faster consistency, etc.)
  2. not only reduce the work of anyone needing to write a custom transcoder but eliminate the possibility for others to run into the zero value encoding issue with Go, which I think may not be very well known

Thoughts?

@heyitsanthony
Copy link
Contributor

heyitsanthony commented Aug 7, 2017

@aaronjwood I don't understand why []byte wouldn't have the same problems as string here. It's the same data, but string gives immutability. Why encode as base64 when string([]byte{1,2,3,4}) will be accepted by etcd?

@aaronjwood
Copy link

For people using protobufs they can just take the output of Unmarshal and store that directly. For regular structs, we were using the gob encoder to be more efficient since it's supposed to have a more compact binary format. If we use something like binary.Write() then we will have a much larger data amplification issue.

Relating to the base64 stuff, I thought that you cannot fully represent something in bytes (like a struct) using a string since you'd have many unprintable characters. Can you do this in Go?

@heyitsanthony
Copy link
Contributor

@aaronjwood string(b) == string([]byte(string(b))) for all b []byte in go afaik. It's more of a problem for protobuf definitions-- string won't handle raw binary like bytes, but etcd's messages use bytes in this case.

@aaronjwood
Copy link

@heyitsanthony you're right, thanks for pointing this out. I was not aware that Go doesn't really work with C-style strings underneath:

In Go, a string is in effect a read-only slice of bytes. If you're at all uncertain about what a slice of bytes is or how it works, please read the previous blog post; we'll assume here that you have.

It's important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes.

This threw me off big time. Looks like there are some quirks with raw strings:

In short, Go source code is UTF-8, so the source code for the string literal is UTF-8 text. If that string literal contains no escape sequences, which a raw string cannot, the constructed string will hold exactly the source text between the quotes. Thus by definition and by construction the raw string will always contain a valid UTF-8 representation of its contents. Similarly, unless it contains UTF-8-breaking escapes like those from the previous section, a regular string literal will also always contain valid UTF-8.

Some people think Go strings are always UTF-8, but they are not: only string literals are UTF-8. As we showed in the previous section, string values can contain arbitrary bytes; as we showed in this one, string literals always contain UTF-8 text as long as they have no byte-level escapes.

To summarize, strings can contain arbitrary bytes, but when constructed from string literals, those bytes are (almost always) UTF-8.

It's good to know that no special encoding is needed for these situations with Go.

@therevoman
Copy link

I'd recommend a high speed compression/decompression library, like lz4 or zstd. Gzip just takes too many resources to marry up with etcd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

9 participants