-
Notifications
You must be signed in to change notification settings - Fork 31
[Proposal] Add support for optimized binary UUID keys? #645
Comments
I did a lot of the legwork for this in a package to handle UUIDs in Eloquent, as well as tackling the efficient storage of them not too long ago. I generally find it easier to use the UUID in combination with an auto-incrementing integer as the primary key - it is my understanding that the combination is much more efficient for foreign keys anyway. As far as adding them to the core, it doesn't seem like that level of complexity fits with the philosophy of the framework, being more of an extension but I'd be happy to see them included in some way. One less external dependency to pull in, for sure 👍 |
Your second point about the 36 characters required; that is a workaround because Mysql does not support uuid types. If you want uuids, switch to something that supports them, like postgresql or SQL Server, where they are stored as 16 bytes.
The current implementation of uuids-as-primary-keys requires you to disable So, it seems to basically boil down to using a database engine that doesn't properly support your application requirements. You want uuids, and you've picked a database engine that doesn't support them natively. Why should the framework provide you with the required (in my opinion hack-ish) workarounds? |
I would also like to point out that your 128-144 bytes argument is based on using the utf8mb4 collation. Just use another collation that fits the storage requirement, like the ascii collation, and you've reduced the key length to 32-36 bytes (depending on if you store the dashes or not). |
Surely, storing uuid's in binary form would be preferable, if you want to go that route. See: http://mysqlserverteam.com/storing-uuid-values-in-mysql-tables/ |
Yes @sisve , I'm aware that there are DBMS with native binary UUID support and I understand that it's possible to save the plain text UUID with a different collaction. As I said, that was just a short summary of the problem. All of your points are adressed in the link I provided. The main point, however, is not that some DBMS lack a dedicated binary UUID data type. It's more that you have to arrange the UUID in a specific way to make it as close to sequential as possible so it can be indexed more efficiently. So you don't just throw a plain UUID at the the DB, you have to adjust it to your needs first. And yes, there are things like stored procedures and indexed virtual columns. MySql supports them, PostgreSQL, however, doesn't. And yes, you could argue that I should "just use a database that supports my application requirements". But the thing is that actually every DBMS does support it if you massage the data into the right shape. And maybe I use MySql in production but Sqlite for my tests. So while it's quite hard to achieve optimized binary UUID primary keys on a DB level, there are still two things to keep in mind:
So there is a way to achieve this regardless of the DBMS. It's quite complex, but it's possible. And as I also mentioned, the real complexity is not how to enable this for one model (I know how to hook into the creating callback...) but to actually make Eloquent aware of it and use it within relationships and eager loads and automatically converting between the original UUID (for the user) and the optimized binary version (for the DB). Thanks @michaeldyrynda - how could I miss your package while googling for "laravel uuid"? :-D Yes, using a good old auto-incrementing key under the hood sure makes things a lot easier. And I also read that post on the Percona blog you refered to, but the statement that the combination of auto-incrementing keys and UUIDs is more efficient than UUIDs alone kind of flew over my head (it's hidden in the comments). I'm not quite sure if I really get the point though. I'll cite (what I think is) the relevant part:
While yes, I get that the index becomes larger in size for UUIDs, but how much is "MUCH smaller" when using auto-incrementing keys? If that's like 50MB instead of 100MB for 1 million records it would be "MUCH smaller" but I don't think that it would really matter. But I'm not overly familiar with the internals of a DBMS and with how indexing works and what has to be taken into account. Other than that, how exactly does this auto-incrementing-with-uuid thing work? You still would have to reference the auto-incrementing ID as the foreign key on related table, no? If so, you maybe would finally end up in the same situation that you were trying to avoid by switching to UUID primary keys in the first place, like running out of IDs (not very likely in an 'average' project, but surely possible) or making replication / multiple databases easier to maintain. So yeah... The question remains: Should this be part of Laravel core? I don't want it to be a Jurassic Park thing where they "were so preoccupied with whether or not they could, they didn’t stop to think if they should". However, while it sure adds some complexity, it also seems to address a quite common use case. |
If you choose to store them as binary, and manipulate the bytes stored in them, make sure they copy the functionality of MysSQL 8's UUID_TO_BIN() & BIN_TO_UUID() function so that it would be a seamless transition/upgrade to it in the future.
Source: https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_uuid-to-bin |
@sisve Yes, I understand the concept and I know that those functions exist in MySql. I'm just asking if it would make sense for Laravel to include a DBMS-agnostic approach to this out of the box. |
I think this absolutely makes sense. All the current RDBMSs support storing binary UUIDs in some way, be it a specific UUID column a la postgres, or a binary column a la older MySQL. |
@michaeldyrynda Your package just converts it to bytes, not re-arrange the order to be incrementing, or did I overlook something? ramsey/uuid also has a way to re-arrange the uuid1 bytes to be incrementing with time using the |
Nope, just handles conversion to/from bytes and storing them as such in a (MySQL) database @barryvdh |
hope this will be integrated. I personally really need this for a project. |
We just created a package doing exactly what OP is looking for. It's not built into Laravel of course, but maybe it could be a start to solve @sauladam's and other's problem. https://github.com/spatie/laravel-binary-uuid The README goes into more detail on how to use the package with models, and also shows some benchmark results. Internally we're using Ramsey's UUID package, with the addition of the bit switching mechanism that @barryvdh added in that package. |
@brendt There's no indexes in your benchmarks except for the binary column. Isn't that a bit unfair? And what about using the ascii collation for the textual checks? |
this still have another problem, I've already moved my test project to use optimized binary uuid as primary id key, and also I have some relations also use this uuid format. if I don't cast the relation key to text format at all, then the relation can be found, but once I add the cast for the binary uuid, then all relation depend on this key will not be found. after a little dig , I found that laravel also cast the key value before create the relation binding parameter, that's why all this uuid cast will cause all relation to be null(not found). And, I'm also confused about why relation key value should be casted before creating the relation parameter binding. |
@sisve, you can checkout this https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/ in this article, it's a real table with some other indexes exists, and also the file size, benchmark result. |
@sisve You're right, and I've changed the benchmarks accordingly. The same result can still be seen, albeit with a less extreme difference. The blogpost @steve3d mentioned also has more thorough benchmarks, the same result can be observed. The package I linked to was actually inspired by the results in that blogpost. |
I suggest that we split this issue into three parts;
Optimization is a broad term, and has a context. Sequential uuids are suboptimal if I am partitioning on the first part of the uuid, for example. Binary form is suboptimal if I have to keep database compatibility with something else. Percona's blog entry is focusing on storage space; both in form of data and index space. The last graph suggest that the larger index size is hitting some buffer/cache limits and causing io requests. We're also talking millions of rows where the MERGE_THRESHOLD is relevant but not declared. The Mysql-blog also acknowledges this:
@brendt's benchmark focuses on execution times. No benchmark so far is focusing on concurrency. A sequential uuid (based on time) means that all your servers will be writing to the same database page/extent. Would this be a problem? I guess my point here is to convince you that "optimized uuids" are too vague, they aren't optimized for everything, and rename them to "sequential uuids". It also closer resembles integer's auto_increment flag which has a similar behavior; an incrementally increasing value. So, there's five scenarios we need to support
On a related note; wouldn't sequential uuids have similar performance boosts over the "old way" even in text form? |
Imho, the optimizing/re-arranging is already handled by ramsey/uuid in the form of a different Codec: https://github.com/ramsey/uuid/tree/master/src/Codec (it's called OrderedTimeCodec there). It alsy handles text <-> byte form conversions. So the only thing required is to wrap it in a package, what https://github.com/spatie/laravel-binary-uuid does. Is there anything that's not possible to do with that, what wouldn't be possible in the core? Cause that's the only thing we need to worry about here. |
We had to add a "hack" to be able to save real From our point of view, there's two things missing in the core:
All the other things required to add sequential UUIDs could be done on project or package level. And to answer some of @sisve 's questions. It's clear that sequential UUIDs only have a positive performance impact when working with a lot of data. The benchmarks I ran only showed a positive effect from ~300k-400k records or more. Everything below that actually makes it worse. |
I've found this "limitation" recently as well, but unfortunately Blueprint has already a The Let me talk about vendor implementations:
Usually the Internally, Laravel does map
It does make a lot sense if you consider the This could be improved by introducing a new I'm commenting this just for future reference on this subject. 👍 |
@paulofreitas I don't think I would support a rename of binary() type to blob() as this would probably have compatibility repercussions. However, the ability to be more specific about whether you want a fixed binary column, or a blob representation, is probably something that should exist, considering at least two of the engines support it, and the others can fallback to blobs. For UUIDs, I think we really need a specific uuid() column specifier. I think the use case is common enough, and the underlying storage different enough across RDBMS to warrant this. Internally, this type would map to the following:
This is more optimal than anything binary() or blob() could give us, and it could also be used to handle any conversions or optimisations required for that particular underlying engine - such as converting to binary from display or vice versa, or rearranging the UUID bytes for better sequential values. On a side note, database engines that don't have a dedicated UUID type in 2017 need a kick in the behind! |
I see an obvious problem in defaulting to binary for some engines; your code will no longer be portable. Where is the responsibility to cast the strings that the user has into binary values? Imagine a Using a binary field means that we're treating the values as opaque blobs that the user may use, but not easily view or modify. This works on the database level for primary and foreign keys, but it causes issues as soon as we move out of the database world and into the web. How would these values be converted to a uuid and used in routes? How would model binding take incoming values (large readable strings) and convert them back to binary? And how would the model binding know if they are normal binary, or should have the values swapped around to become sequential? I disagree with the binary fields for mysql+sqlite because the closest match in functionality is a char(36). Basic queries that work on real uuid types will also work on a char(36). You can select it, update it, filter on it, etc. There's also the issue with breaking existing code by changing the Now, a |
What about something like casts? The easiest for developers would be to be able to just work with textual UUIDs, and not worry about conversions at all.
I'd also opt for a separate method and not changing the |
Yes, casts sounds like a good idea. We would need two of them; one for the binary format, and one for the sequential format. We would also need to expose the casting logic somewhere so they can be called manually, for example when you use the DB facade that doesn't know about the configured casts rules. |
So what's next? Who eventually decides how the problem must be solved, and who is allowed to make the changes in the core? |
@brendt Are we even sure that there needs to be changes in the core? Perhaps this entire thing can be implemented as a third-party package if we just PR the necessary extension points into the core. |
@sisve As far as I'm concerned, that would be as good a solution. From our point of view, the most simple solution is to be able to add custom What do you consider to be these extension points? |
Binary uuids values in general
I imagine that we need two separate casts/conversions, one for the straight binary<->text conversion, and one for the rearranged uuids that has been discussed. This has to be done at the "casting" layer since we can have models using both the "normal" and the "rearranged" uuids, so we cannot just change what Binary uuids as primary keys There's also the model binding; we would need to make sure that Model::getRouteKey() does use the text value (which it seems that it does by reading the code). We would also need to extend Model::resolveRouteBinding to encode the text value (from the route/querystring) into the database format. I believe all these things can be implemented as opt-in traits. |
And there is another problem need to consider, what if someone want to use the optimized uuid as foreign key? Then the core need to distinguish the convert to text value and use the binary value then get for relations |
@steve3d Could you explain that scenario some more? I don't see how that differs from the casts described in the "Binary uuids values in general". |
here, for example, I have a with this setup, if you just return the text value of uuid in and this problem don't exists when the and there is a dirty solution for this. for getting the text value of Hope you understand the problem here. |
You lost me here. The text values aren't persisted, they are still binary values in the database. You should be able to add foreign keys as usual. Could you show some code that would expose the problem? Are you writing custom sql queries? Associating models? |
I wanted to use UUIDs as primary keys in a project and did some research on that. In short, here's what I found out (https://mariadb.com/kb/en/mariadb/guiduuid-performance/ explains it in more detail):
The problem: Using raw UUIDs as primary keys in tables can be challenging, because 1. they are not technically sequential so INSERTing into an index means jumping around a lot, and 2. the 36 characters of a UUID result in a pretty big index footprint, especially in InnoDB where every secondary key also includes the primary key. And the Laravel default of utf8mb4-encoding makes this even worse, making the 36 character string 146bytes big.
The solution: UUID v1 is time-based, so some parts of it are kind of sequential. If we rearrange the UUID and put those parts in the front we can allow for better indexing. And if we strip out the dashes and encode it to binary, we end up with 16bytes instead of 146.
I thought it would be awesome if Laravel / Eloquent offered this functionality out of the box. So I started hacking on it and realized that it's quite a heavy change. I want to query the model with the original UUID, but I want it to to be optimized and binary-encoded automatically before going into the DB and then decoded and rearranged back to the original UUID when coming back from the DB as a model. So far, this works great for
$model->find()
but it gets trickier when it comes to relationships. But I already got it working forbelongsTo()
and eager loads.So, my question especially towards @taylorotwell is: Is this something you would consider for Laravel and should I keep working on it in a way that makes a good PR or is there maybe a reason why it's not supported in Laravel?
The text was updated successfully, but these errors were encountered: