Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UUID v1 is somewhat predictable #578

Closed
chadfurman opened this issue Sep 6, 2017 · 8 comments
Closed

UUID v1 is somewhat predictable #578

chadfurman opened this issue Sep 6, 2017 · 8 comments

Comments

@chadfurman
Copy link
Collaborator

https://www.graphile.org/postgraphile/postgresql-schema-design/#postgres-roles

We suggest v1mc in the docs, though as you can see here uuid1 is much less random than uuid4
https://www.sohamkamani.com/blog/2016/10/05/uuid1-vs-uuid4/

@benjie
Copy link
Member

benjie commented Sep 7, 2017

Yes, this is why we chose UUID v1. It's extremely expensive for postgres to write an id that is smaller than an existing id because it has to split the pages - this reduces write performance significantly. Because UUIDv1 is prefixed with the date stamp it is (generally) an increasing value and thus fairly safe to use as the PK unless you're expecting hundreds or thousands of inserts per second (and even in those cases the pages are probably still in memory so splitting them is less expensive than cold reads).

Neither option we recommend give you cryptographically secure random IDs; for that I'd recommend using a second column with a unique index and a simple int/bigint primary key, then don't expose the primary key. We do not provide any help in achieving this however 😞

@benjie
Copy link
Member

benjie commented Sep 7, 2017

@chadfurman
Copy link
Collaborator Author

@benjie some of the comments to that article mention a couple of points:

  1. Postgres doesn't have clustered indexes, so it's actually not a concern (https://www.postgresql.org/message-id/20151222124018.bee10b60b3d9b58d7b3a1839%40potentialtech.com)
  2. There is a CLUSTER command, but it's a one-time sort and so not relevant: https://www.postgresql.org/message-id/30034.1472566554%40sss.pgh.pa.us
  3. if you really really really want sequenced IDs (the whole point of a UUID imho is to not have sequential, guessable identifiers), you can mix the randomness of UUIDv4 with a sequential epoch prefix (someone is making a postgres extension to do this)

@benjie
Copy link
Member

benjie commented Sep 8, 2017

Interesting, thanks for researching this!

Personally I don't buy that the whole point of UUID is to not have sequential guessable identifiers - the reason UUIDv1 ids include the mac address is to make sure they don't class with other machines acting in good faith; it's not to protect against nefarious actors. UUIDv4 on the other hand is intended to give a truly random non-guessable identifier.

I've filed an issue against the docs site to research this further as Bill Moran does not cite any authoritative sources backing his claims; however his reasoning seems sound and so with the lack of any further evidence I'd definitely support someone removing that particular paragraph.

If you're looking for high-performance seemingly random IDs another option is a Feistel cipher - this will result in less work to generate (no randomness required), less storage required, and faster performance due to significantly smaller indexes (even with bigint you're looking at 8bytes per key vs 16 with UUID). You could do this as part of id generation if you believe that there won't be the fragmentation cost discussed above, or as part of encoding the input/output of the id field otherwise. However UUIDv4 is probably more standard so if you don't require this level of performance it's probably safe to go with that - it's also much more secure due to the amount of randomness contained in the generated identifiers.

@chadfurman
Copy link
Collaborator Author

I'm going to go with uuid v4, myself. Thank you for the talk on this. Will leave this open for you to finish as you see fit.

@benjie
Copy link
Member

benjie commented Sep 8, 2017

Closing as it's open on the website repo

@benjie benjie closed this as completed Sep 8, 2017
@hi2u
Copy link

hi2u commented Jun 9, 2018

I'm not commenting on this issue as a whole. I just came across this thread from searching for some general UUID stuff.

@benjie But just FYI on UUIDv1 in general, it's a common misconception that they're sequential. They're not. Yeah there's a timestamp before the MAC address, but it's kind of akin to using ss:hh:mm dd-mm-yyyy format (not sequential) instead of starting with yyyy-mm-dd (sequential).

Generate a bunch of UUIDv1s within a few seconds. You'll see that the first 8 characters (time) move very quickly, and the middle parts (date) are the slower ones.

That's why there's alternatives out there like "comb UUIDs" and "UUIDv6" which are similar to UUIDv1 except they've put things in the right order (year first).

No idea why they did the timestamp backwards for UUIDv1, seems silly to me.

Also I agree that the point of UUIDs has nothing to do with guessability. Especially any version that isn't v4. And even then, known issues in various UUIDv4 generators can make them guessable too.

As far as I've read, the order doesn't matter much in postgres, even UUIDv4 is ok compared to other databases. But sequential still feels like a good idea to me, especially seeing you might be also storing copies of the data outside postgres too.

I've made my own version of UUID that is (I think) is a good mix of v1 + comb + v4.

Consists of timestamp (in the right order, I've even made the date human readable), node (not tied to hardware) and also random at the end. I've heard of UUIDv4 (very) occasionally conflicting, so at least with my version the conflict will happen on the same node immediately and fail early, rather that accepting the new data and finding you have conflicting IDs later on when it comes time to merge.

@cymen
Copy link

cymen commented Sep 10, 2021

I also ended up here while searching and I think this article is relevant to the discussion:

https://www.2ndquadrant.com/en/blog/sequential-uuid-generators/

But there are disadvantages too – they may make the access patterns much more random compared to traditional sequential identifiers, cause WAL write amplification etc. So let’s look at an extension generating “sequential” UUIDs, and how it can reduce the negative consequences of using UUIDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants