New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store UUID in an optimized way for InnoDB #117

Closed
ramsey opened this Issue Apr 23, 2016 · 9 comments

Comments

4 participants
@ramsey
Owner

ramsey commented Apr 23, 2016

From Karthik Appigatla on the Percona blog ("Store UUID in an optimized way"):

  • UUID has 36 characters which makes it bulky.
  • InnoDB stores data in the PRIMARY KEY order and all the secondary keys also contain PRIMARY KEY. So having UUID as PRIMARY KEY makes the index bigger which can not be fit into the memory
  • Inserts are random and the data is scattered.

He goes on to "explain how to store UUID in an efficient way by re-arranging timestamp part of UUID."

While, ramsey/uuid provides COMB sequential UUIDs, these do not rearrange the timestamp part of the UUID to optimize for storage in InnoDB. I'd like to consider the addition of a codec that would help optimize UUIDs as primary keys for the InnoDB engine. Perhaps other database engines require similar optimization techniques, so we should thing about those, as well.

The problem with rearranging the timestamp part of a UUID is that the version number is lost, since it is shuffled around, and there would be no way to determine the original state of the UUID. Whether or not this is important is up for debate.

@ramsey ramsey added the enhancement label Apr 23, 2016

@ramsey ramsey added this to the Version 4.0.0 milestone Apr 23, 2016

@barryvdh

This comment has been minimized.

barryvdh commented Apr 24, 2016

The problem with rearranging the timestamp part of a UUID is that the version number is lost, since it is shuffled around, and there would be no way to determine the original state of the UUID.

But you rearrange in a fixed way, and keep the version number, right? So if your just rearranging in a fixed way, you can still convert back to the original?

This php lib also does this I think: https://github.com/vend/mysql-uuid

@ramsey

This comment has been minimized.

Owner

ramsey commented Apr 24, 2016

In the Percona blog post, the version number (1) gets rearranged so that it appears first, and now "a" appears in the version number spot of the UUID. If reading it later, there's no indication of how this UUID was created. It no longer conforms to RFC 4122, and since it doesn't tell us it's variant or version, we don't know that it was rearranged specifically for MySQL.

However, from an optimization standpoint, this may not be of concern to anyone.

@ramsey

This comment has been minimized.

Owner

ramsey commented Apr 24, 2016

BTW, thanks for the link to vend/mysql-uuid. I'll take a look at their approach, but since it's licensed under GPL3, I won't be able to require it or link against it from ramsey/uuid.

@barryvdh

This comment has been minimized.

barryvdh commented Apr 25, 2016

Okay I was under the impression that you wanted it more like a encoder/decoder, for just the bytes. A rearranged string version isn't really useful, right? If you want to optimize like this, you're going to use the binary version anyways.

Regular UUID v1 -> getBytes() -> Rearranged binary for storage
Rearranged binary -> fromBytes() -> original UUID v1

So you would only change the encodeBinary/decodeBytes, to rearrange the fields. Not the actual UUID itself. And your application would just generate (en perhaps decode) original v1 UUIDs, and the string version is the same.

But maybe I'm misunderstanding the point and thinking to simple about this.

@radmen

This comment has been minimized.

radmen commented Apr 27, 2016

We're using ordered variation of uuid1 (as in Percona blog) for few months. One thing I need to say is that I don't see any reason why we would like to determine version of uuid which is used. IMO you shouldn't care about it :)

Generating ordered uuid should be optional.

Btw - we've done a simple wrapper around this package to generate ordered uuids (https://github.com/DeSmart/support/blob/master/src/Uuid.php#L22).

@fabre-thibaud

This comment has been minimized.

Contributor

fabre-thibaud commented Apr 27, 2016

I agree that you shouldn't really care about whether the optimized version is RFC compliant, because this optimized version should not be exposed.

Ideally, you'd use the optimized codec to read/write the value in MySQL, but as soon as you try to convert to/from string outside of the data layer, you should use the regular string codec to stay compliant with the RFC.

This is probably a different issue, but it could be a nice feature for v4.0 to ease use of different codecs based on the context where the conversion is done. It's currently possible, but not clearly documented, and not facilitated by the global design.

@barryvdh

This comment has been minimized.

barryvdh commented Apr 27, 2016

@radmen You generate the string, but you're still storing as BINARY(16) right?

I've made an example PR to optimize the binary output in a new codec, but keep the strings the same. No data is lost (so version number is kept) and converting betweens optimized <-> original uuid string is possible. Let me know if that is what you were thinking about and fits your use-cases!

@radmen

This comment has been minimized.

radmen commented Apr 27, 2016

@barryvdh Postgresql has dedicated uuid field type so we're just using it. In case of MySQL we were storing this as.. string.

Your PR looks legit, although I'm not the author of uuid package :)

@ramsey ramsey removed this from the Version 4.0.0 milestone Apr 27, 2016

@ramsey

This comment has been minimized.

Owner

ramsey commented Apr 27, 2016

Moved this out of the 4.0.0 milestone. There's no reason this codec can't go out in an earlier release.

@aztech-dev, I agree about the need for a better design for using codecs in v4.0.

@ramsey ramsey closed this in #118 Aug 2, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment