Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Please support DataTypes.UUID.BINARY #9271

Open
omidkrad opened this issue Apr 5, 2018 · 14 comments

Comments

6 participants
@omidkrad
Copy link

commented Apr 5, 2018

From the documentation

The CHAR and STRING types expose the BINARY property

I think this is also needed for UUID data type so we can store it as BINARy(16) is MySQL.

for example:

sequelize.define('model',` {
  uuid: {
    type: DataTypes.UUID.BINARY,
    defaultValue: DataTypes.UUIDV4,
    primaryKey: true
  }
})

I could not find any way to map a UUID field to BINARY(16) on MySQL. DataTypes.UUID and DataType.STRING.BINARY give ER_DATA_TOO_LONG error code.

@bradennapier

This comment has been minimized.

Copy link

commented Dec 11, 2018

@see https://mysqlserverteam.com/mysql-8-0-uuid-support/

So after tinkering all day on this I got it working well but there are some tradeoffs. I personally didn't want to implement the UUID stuff myself which is the better choice in the long-run but some of this would still be needed anyway. (ended up doing this, see next post - im a sucker for doing something the second I say I don't want to :-))

I may be missing some important pieces still but its working with my db well thus far.

db.sql.define(
    MODEL_NAME,
    {
      id: {
        type: 'binary(16)',
        allowNull: false,
        primaryKey: true,
      },
      uuid: {
        type: new Sequelize.VIRTUAL(Sequelize.STRING, ['id']),
      },
...

then each model using it needs to have some hooks defined:

export async function afterCreate(instance) {
  if (!instance.get) return;
  const originalID = instance.get('id');
  const [value] = await this.sequelize.query('SELECT BIN_TO_UUID(:binary, true) as uuidString', {
    type: this.sequelize.QueryTypes.SELECT,
    replacements: { binary: originalID },
  });
  instance.uuid = value.uuidString;
}

export async function beforeValidate(instance) {
  const [value] = await this.sequelize.query('SELECT UUID_TO_BIN(UUID(), true) as uuidBinary', {
    type: this.sequelize.QueryTypes.SELECT,
  });
  instance.id = value.uuidBinary;
}

export function afterFind(instances) {
  if (Array.isArray(instances)) {
    return Promise.all(instances.map(instance => afterCreate.call(this, instance)));
  }
  return afterCreate.call(this, instances);
}

then instance.get({ plain: true }) yields -->

{ id: <Buffer 11 e8 fd 11 73 0b cf 3b a1 ce 02 42 ac 11 00 02>,
  uuid: '730bcf3b-fd11-11e8-a1ce-0242ac110002' }

Clearly implementing the mutation of the uuid value ourselves would be key for an internal implementation but it'd be a really good idea as long as that were done.

  1. IMPORTANT - UUID must use UUIDv1 NOT v4 as it will break perf big time
  2. Shuffling of the bytes so that the fastest changing bytes do not occur first is critical

@sushantdhiman can you see a clear path to implementing something along these lines official? The space saving on the database is hugely significantly in many cases - especially when you start using associations! Not to mention unique values across all database id's is nice to have for many reasons.

image


I did write this but not sure if its the proper way to do it on the binary front. Obviously its not doing anything at all really but whatever... im tired now :( heh

class BINARYC extends Sequelize.DataTypes.ABSTRACT {
  constructor(length = 16) {
    super();
    this.key = `binary(${length})`;
  }
}

function BINARY(length) {
  return new BINARYC(length);
}
@bradennapier

This comment has been minimized.

Copy link

commented Dec 11, 2018

Here is a version to convert the uuidv1 into a binary representation binary(16) which is shuffled so it doesn't destroy indexing performance. This should allow changing the above to no longer needing to rely on calling the sql functions.

It also provides a VIRTUAL key to capture the string value of the uuid easily.

Best part is that now it wont require v8.0.0 or a bunch of db calls to operate!

// db-uuid.js
import uuid from 'uuid/v1';

export function getStringUUID(buf) {
  return [
    buf.toString('hex', 4, 8),
    buf.toString('hex', 2, 4),
    buf.toString('hex', 0, 2),
    buf.toString('hex', 8, 10),
    buf.toString('hex', 10, 16),
  ].join('-');
}

export function getBinaryUUID(id) {
  const buf = Buffer.from(id.replace(/-/g, ''), 'hex');
  return Buffer.concat([buf.slice(6, 8), buf.slice(4, 6), buf.slice(0, 4), buf.slice(8, 16)]);
}

export function createUUID(_id) {
  const id = _id || uuid();
  const buf = getBinaryUUID(id);
  return {
    uuid: id,
    buffer: buf,
  };
}
// lets use same uuid from example above to confirm byte-for-byte matching
// with the official mysql:
/*
{ id: <Buffer 11 e8 fd 11 73 0b cf 3b a1 ce 02 42 ac 11 00 02>,
  uuid: '730bcf3b-fd11-11e8-a1ce-0242ac110002' }
*/
const id = '730bcf3b-fd11-11e8-a1ce-0242ac110002';
console.log(id);
const { buffer } = createUUID(id);
console.log(buffer);
console.log(getStringUUID(buffer));

/*
730bcf3b-fd11-11e8-a1ce-0242ac110002
<Buffer 11 e8 fd 11 73 0b cf 3b a1 ce 02 42 ac 11 00 02>
730bcf3b-fd11-11e8-a1ce-0242ac110002
Hooray!
*/

Now we can simplify our implementation a bit and now require the hooks to be so involved! We do still need to add the one beforeValidate so that we can create a uuid when we create a new instance.

import { createUUID } from './db-uuid';

export function beforeValidate(instance) {
  const uuid = createUUID();
  instance.setDataValue('id', uuid.buffer);
  instance.setDataValue('uuid', uuid.uuid);
}

However, now a standard getter can be added as a type to our model directly!

// db-uuid-type.js
import { getStringUUID } from './db-uuid';

function attachStringUUID(instance, binaryID, stringID) {
  const id = instance.getDataValue(binaryID);
  if (!id) return null;
  const uuid = getStringUUID(id);
  instance.setDataValue(stringID, uuid);
  return uuid;
}

export default (Sequelize, binaryID = 'id', stringID = 'uuid') => ({
  type: new Sequelize.DataTypes.VIRTUAL(Sequelize.DataTypes.STRING, [binaryID]),
  get() {
    return this.getDataValue(stringID) || attachStringUUID(this, binaryID, stringID);
  },
});

Finally we just import the db-uuid-type.js and add to any models:

     id: {
        type: 'binary(16)',
        allowNull: false,
        primaryKey: true,
      },
      uuid: createUUIDType(Sequelize, 'id', 'uuid'),

Of note is that many libraries are using v4 for their uuids and that is not ideal for primary keys https://github.com/srcagency/mongo-uuid

@bradennapier bradennapier referenced this issue Dec 21, 2018

Closed

Feature/binary UUID #10290

4 of 5 tasks complete
@JSteunou

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2018

For those struggling with uuid as binary I do this

import {isString, isBuffer} from 'lodash'
import {uuidV4Bytes, uuidToBytes, bytesToUuid} from '../utils/uuid'

// special MySQL
export function getUuidField(name) {
    return {
        type: 'BINARY(16)',
        set(value) {
            return this.setDataValue(name, isString(value) ? uuidToBytes(value) : value)
        },
        get() {
            const uuid = this.getDataValue(name)
            return isBuffer(uuid) ? bytesToUuid(uuid) : uuid
        },
    }
}
export function getIdField(name = 'id') {
    return {
        ...getUuidField(name),
        primaryKey: true,
        allowNull: false,
        defaultValue: uuidV4Bytes,
    }
}

Then using it like this

// inside model definition
    static fields() {
        return {
            id: getIdField(),
            projectId: {
                ...getUuidField('projectId'),
                allowNull: false,
            },
        }
    }

This is easy and works fine

@bradennapier

This comment has been minimized.

Copy link

commented Dec 27, 2018

Except it’s breaking your database badly and should NEVER be done by anyone which reads this.

Your DB will basically not work at any scale and it’s going to be NOTICEABLY slower. Page swap city.

DONT USE V4 UUID

If you want easy you can use the package I wrote ahead of the PR I wrote which implements it automatically for you.

https://github.com/odo-network/sequelize-binary-uuid

@justinkalland

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2018

V4 has a place and use, telling everyone to never use it is misleading. It will not break your database, and has legitimate use cases. Both v1 and v4 have their places. Both have pros and cons.

@bradennapier

This comment has been minimized.

Copy link

commented Dec 27, 2018

It does not have its place as a primary key. It’s randomness will absolutely cause problems with indexing.

It also makes no difference to shuffle the v4 uuid as it does absolutely nothing for the database what so ever. The whole point is to make it so that the fastest changing pieces of the v1 do not happen first

Let me be clear

100% of the time using a v4 uuid that is indexed - esp as private key is a mistake. 100% not 99.9%

Your performance will degrade massively over the methods defined. I’ve linked examples showing this. Please do yourself a favor and research.

@JSteunou

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2018

Did you wrote some benchmark by yourself to validate your say?

Plus I am not saying to everyone to use my method over yours, dont start to panic, I am just sharing mine to help and inspire other people. Not to promote uuidv4 but getters and setters over virtual field which I feel like overenginering for this case.

@bradennapier

This comment has been minimized.

Copy link

commented Dec 27, 2018

Dude... I’m not panicking it’s fact not opinion. Using v4 breaks your database when it scales and yes using getters and setters without virtual works and it’s what is done in my PR... so I don’t have any issue with that but using v4 uuid as a primary key is a fatal mistake.

It’s not MY research it’s the fact of the situation.

Uuidv4 is inherently random so swapping it around is literally pointless and makes zero sense.

The reason to swap the uuid around is because of how V1 uuid works in the first place... it’s going to change in its first characters faster than in its latter. However this causes a problem with indexing the uuid because they use the first characters to create their hash tables for lookups.

Therefore you run into a MASSIVE problem with random values... this is why primary keys by default generally are numbers which increment by one...

Read:

https://mysqlserverteam.com/mysql-8-0-uuid-support/ As originally linked or as posted as a reference for uuid v1 in my PR I’ve mentioned many times now, the article that MYSQL based their system on: https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/

“Ordering” a uuid v4 is pointless since it’s random. So your v4 implementation would actually produce an exponentially worse performance than the unordered uuid in this article. This is because uuidv1 are still “somewhat” ordered, just not as well as when they are shuffled a bit, which allows us to get to same or better performance as bigint

@JSteunou

This comment has been minimized.

Copy link
Contributor

commented Dec 27, 2018

Why do you talk about swapping my uuidv4 ? I am not swapping it I am well aware that they are not like v1 on which we could use this trick.

Also like @justinkalland said above stop saying this will break db and fear fear fear. Both uuid have pros and cons and usage. V4 will not break db and v4 bring better universal unicity when using several machines and dbs over the world.

Everyone have different needs that are not yours. Peace out.

@bradennapier

This comment has been minimized.

Copy link

commented Dec 27, 2018

Ok well any engineer reading this, please read the the research linked so you don’t mistakingly make a massive mistake like this guy has and can’t admit to. I’m simply trying to make it clear how big of a deal this is. I do this for a living on massive databases that handle billions of inserts a day.

It’s a mistake and it’s been specifically mentioned as a mistake BY THE TEAMS THAT BUILD THE DATABASE ITSELF.

If I wasn’t mobile I’d link 10+ articles of people specifically saying what i am saying in stack overflow, mysql official blogs, and other sources but they also go over the performance concern here a bit as well:

https://mysqlserverteam.com/storing-uuid-values-in-mysql-tables/

The answer here goes over the technicals a bit better

https://dba.stackexchange.com/questions/100468/mysql-innodb-clustering-index-with-uuid

@bradennapier

This comment has been minimized.

Copy link

commented Dec 28, 2018

Although, funny enough - trending on GitHub right now is possibly a better answer: https://github.com/ulid/spec

I am still reviewing but from what I can see it has randomness while simultaneously being Lexicographically sortable which is why we shuffle the uuidv1

@rmullenix

This comment has been minimized.

Copy link

commented Dec 28, 2018

There is an argument for using UUIDv4 for clustered DBs, such as Cassandra, where non-collision is more important than indexibility. Who's to say that the UUID is being used for anything other than a unique PK?

If you primarily want to look things up using a UUID and you also care about time, stock UUIDv1 isn't much better, since by spec it mixes the temporal bits all around. See the following:

https://www.percona.com/blog/2014/12/19/store-uuid-optimized-way/

Also if you only have a few machines communicating with the DB, you run the risk of collisions due to a small pool of Node-Ids (i.e. based off of the MAC addresses of your web servers).

For my purposes, I use UUIDv1 with the multi-bit set and then a random number each time for my Node ID (consistent RFC 4122, section 4.1.6, paragraph 2 https://tools.ietf.org/html/rfc4122#section-4.1.6). I then rearrange the UUID to front-load the time portions, as mentioned in the Percona blog above, and storing it as a BINARY16.

@rmullenix

This comment has been minimized.

Copy link

commented Dec 28, 2018

@bradennapier Just saw that you referenced the same blog post as me. Sorry for the collision.

Still, the point about Node IDs stands and not everybody gets the chance to move to MySQL 8 (or use MySQL at all).

@bradennapier

This comment has been minimized.

Copy link

commented Jan 15, 2019

@rmullenix if you view the PR i had submitted and the various libraries you wills ee they resolve the issue already.

#10290
https://github.com/odo-network/sequelize-binary-uuid
https://github.com/odo-network/binary-uuid

This is where the argument came that the v1 was the way to do it as shuffling a v4 is pointless. .

In addition, this will work regardless of if you are on whatever version of SQL as it does not depend on the native functions but provides an identical effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.