JS-Friendly Snowflake Alternative #42

inxilpro · 2023-12-13T03:02:50Z

inxilpro
Dec 13, 2023
Maintainer

I posted this in the Verbs discord, but I'm going to re-post here for more visibility…

An issue with using Snowflakes everywhere is that javascript doesn't handle 64-bit integers very well (only the first 53 bits are accurately stored). That means that if you try to use a snowflake as an integer in JS, things may break. The solution is to treat snowflakes as strings, which is how the default jsonSerialize behavior of glhd/bits works, but that doesn't help people who are echoing the IDs directly to their page…

I have a potential solution… but it's pretty technical.

Snowflakes are made up of 6 parts:

1 unused bit
41 bits of millisecond-accurate timestamp
5 bits for the datacenter ID
5 bits for the worker ID
12 bits for the internal sequence

The 12 bits of sequence lets you generate 4096 snowflakes per millisecond, per worker, per datacenter. That's a lot of parallel capacity. Arguably more than anyone other Twitter (the creator of the snowflake format) would need…

We could, theoretically, get a 53-bit version with some trade-offs. Traditional 32-bit signed unix timestamps have about +68 years of capacity (that's why we're hitting the year 2038 problem — 1970 + 68). But we don't need a sign bit, since snowflake IDs only move forward.

That means that if we accept second-accuracy, rather than millisecond-accuracy, we can fit about 136 years into 32 bits. So if we do the following:

32 bits for timestamp
4 bits for datacenter
5 bits for worker
12 bits for sequence

That would let you generate 4096 unique snowflakes per second, per worker, per datacenter, and produce a 53-bit value that could be represented in JS.

We'd need to pick a set of defaults that would cover most decent-scale apps (I'm not sure that 32/4/5/12 is right… but it's a decent example).

Any thoughts?

inxilpro · 2023-12-13T15:03:48Z

inxilpro
Dec 13, 2023
Maintainer Author

So Matt King found a few existing approaches to this same problem:

The fity-3 approach is:

41-bit millisecond timestamp (~70 years)
8-bit worker ID (up to 256 parallel workers)
4-bit sequence (up to 16 ids, per worker, per millisecond)

The Sami Fayoumi approach is similar, trading a larger sequence for fewer workers:

41-bit timestamp
7-bit machine ID (128 workers)
5-bit sequence (32 ids, per worker, per millisecond)

Both approaches seem pretty good, the former allowing for more parallelization (which might be better suited to Laravel Vapor) while the latter allows for more individual throughput (you could generate 32 unique IDs inside a process before having to usleep for a single millisecond).

I definitely think that jumping from milliseconds to seconds is maybe too much, but I wonder if we could trim a few bits from the timestamp if we used 10ms intervals, like sonyflakes… In that case, we could eek out 109 years from 35 bits, and get ourselves a little more room for parallelization and throughput…

0 replies

x7ryan · 2023-12-13T17:50:15Z

x7ryan
Dec 13, 2023

FWIW I've been using this https://github.com/kra8/laravel-snowflake for generating snowflake primary keys in apps. It supports a shorter 53bit snowflake version. I'm not sure what 11 bits it's cutting though, actually I think they are droping the unused but plus the data center and worker bits do maybe that's a bad example because you lose parallelization support.

Personally I think for most uses it makes sense to prioritize a longer timestamp and longer sequence over parallelization. At least for me personally I figure it'll be 2038 long before I need hundreds of servers.

Although I wonder if the answer isn't to just use a ULID instead of a snowflake.

2 replies

inxilpro Dec 13, 2023
Maintainer Author

If you're going to use a ULID, just treat the snowflake as a string (which is probably the right solution in general). You get all the benefits of Snowflakes in your DB and in PHP, and then you just treat them as strings in JS.

I think we may end up refactoring Verbs to use string versions of Snowflakes everywhere to be clear that they should be used as strings. Not sure, though.

x7ryan Dec 14, 2023

Thinking about it some more, I like where you were going with the 35-bit 10ms interval, you could do something like this.

35-bit timestamp (10ms interval, 109 years)
9-bit machine/worker ID (512 workers)
9-bit sequence (512 ids, per worker, per 10ms)

It would take massive scale to need more than 512 workers IMO. I suspect most apps won't scale beyond tens of workers. So you could almost trade off less workers for more sequence if you wanted.

peterfox · 2024-03-09T18:15:27Z

peterfox
Mar 9, 2024

I've had a similar problem before with CockroachDB and Livewire. CockroachDB uses random IDs instead of autoincrementing so they become large numbers that Javascript doesn't like. Surely the best thing is to make the IDs cast to a string via the model casts and in the event you need to use them for some kind of functionality you just grab the ID and convert it to an int?

0 replies

madebycaliper · 2024-10-30T18:12:36Z

madebycaliper
Oct 30, 2024

Not sure if this is relevant but it looks like Laravel just pushed support for unique string primary keys. PR here:

laravel/framework#53280

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JS-Friendly Snowflake Alternative #42

{{title}}

Replies: 4 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

JS-Friendly Snowflake Alternative #42

inxilpro Dec 13, 2023 Maintainer

Replies: 4 comments · 2 replies

inxilpro Dec 13, 2023 Maintainer Author

x7ryan Dec 13, 2023

inxilpro Dec 13, 2023 Maintainer Author

x7ryan Dec 14, 2023

peterfox Mar 9, 2024

madebycaliper Oct 30, 2024

inxilpro
Dec 13, 2023
Maintainer

Replies: 4 comments 2 replies

inxilpro
Dec 13, 2023
Maintainer Author

x7ryan
Dec 13, 2023

inxilpro Dec 13, 2023
Maintainer Author

peterfox
Mar 9, 2024

madebycaliper
Oct 30, 2024