Skip to content

GenId offers few algorithms to generate non-conflicting IDs (mostly for databases/workflows) without a central coordinator.

License

Notifications You must be signed in to change notification settings

stamov/GenId.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenId

GenId offers few algorithms to generate mostly non-conflicting and time-ordered IDs (mostly for databases/workflows) without a central coordinator.

Project Status: Active Stable Dev Build Status Coverage Aqua Code Style: Blue ColPrac: Contributor Guide on Collaborative Practices for Community Packages current status

About

A tiny library making it easy to generate most of the UUID flavors zoo.

At the lower level it provides a facility to easy combine user defined bit-fields with different semantics (e.g. random numbers, machine id, timestamp etc.) inside Integers. Combining them allows to construct specific UUID generators/parsers in just few lines of code. It also implements widely used (de-)serialization schemes.

Finally it offers example implementations for the following specific UUID schemes used in the industry:

Background

In distributed and/or IOT systems, the latency for acquiring unique IDs (e.g. for primary/technical keys, sequences in databases, queue middleware etc.) between different nodes/threads and a single coordinator (database/service etc.) is sometimes higher than desirable. In such contexts Universally Unique IDentifiers (UUIDs) can be used as they offer some uniqueness guarantees across number of machines/threads without round-trips to a central authority.

Different flavors of UUIDs have different trade-offs around performance, security, number of bytes used, uniqueness guarantees, (de-)serialization choices etc.

Julia currently offers implementations of UUID v1, v4 and v5 (see UUIDs in the Standard Library). While these provide industry standard algorithms and representations of the IDs (see RFC 4122), they are not always ideal for usage in databases as they can introduce unwanted side effects like index fragmentation/write amplification or require some configuration of the clients generating them in advance.

There are number of new UUID proposals (see New UUID Formats) which try to address under different trade-offs some of these shortcomings. Below are few examples:

As well about some security constraints/implications:

Features of the library

  • Support for 64- and 128-bit UUIDs;
  • Small DSL for defining the structure & semantic of a UUID (no syntax/but data oriented through data definitions)
  • Support for fields representing widely used UUID components like machine id, random number, timestamp etc.;
  • Ability to declaratively combine them in a single Integer with custom offsets and bit lengths;
  • Custom implementations of Base 16 (Hex), Base 32, Crockford Base 32 and Base 64 text encoding schemes to allow for phonetic sorting for UUIDs having a timestamp component (e.g. to use the IDs as keys in a database);
  • The text encodings are produced/parsed back in big endian byte order;
  • Allows to get back field values from UUIDs;
  • Ability to use own text encoding dictionaries with declarative case sensitivity for the simple ones;
  • Currently all UUID examples mask the highest bit, to allow transparent storage in databases which don't have large unsigned integer types.

Usage

Add the package to your project

julia> Pkg.add("GenId")

and import it.

using GenId, Dates

Specific UUIDs implemented

Snowflake ID

See https://en.wikipedia.org/wiki/Snowflake_ID

# SnowflakeIdDefinition(epoch_start_dt::DateTime, machine_id::Int64)
# 41 bits timestamp (ms), 10 bits machine id, 12 bits sequence numbers per machine
julia> iddef = SnowflakeIdDefinition(DateTime(2020, 1, 1, 0, 0, 0, 0), 1)
...

julia> iddef.name == :SnowflakeIdDefinition
true

julia> tsid_generate(iddef)
489485826766409729

julia> tsid_generate_string(iddef)
"DJR0RGDG0401"

# en-/decoded using Crockford Base 32
julia> tsid_to_string(iddef, 489485826766409729)
"DJR0RGDG0401"

julia> tsid_int_from_string(iddef, "DJR0RGDG0401")
489485826766409729

Firebase PushID

See https://github.com/arturictus/firebase_pushid

Uses modified Base 64 text encoding.

# FirebasePushIdDefinition()
# 48 bits timestamp (ms), 72 randomness
julia> iddef = FirebasePushIdDefinition()
...

julia> iddef.name == :FirebasePushIdDefinition
true

julia> tsid_generate(iddef)
301430602692632926610578560781911544

julia> tsid_generate_string(iddef)
"EWsj5l65EXH2G1Qfc0Nu"

julia> tsid_to_string(iddef, 301430602692632926610578560781911544)
"EWsj5l65EXH2G1Qfc0Nu"

julia> tsid_int_from_string(iddef, "EWsj5l65EXH2G1Qfc0Nu")
301430602692632926610578560781911544

Generic 64-bit UUID

Define an UUID structure

# The machine id for the current process unique in the infrastructure for the specific application (e.g. machine id in a cluster or VPC etc.).
# Can come from a configuration file, environment variable, last digits of an IP address etc.
machine_id = 1 

# the actual UUID definition
iddef = TsIdDefinition(
    # Data type used for storage of the ID
    Int64; 
    # Number of bits used for the timestamp section.
    bits_time=41, 
    # Number of bits used for the machine section.
    bits_group_1=10, 
    # Number of bits for the tail section.
    # Can be a random number or a local machine/thread specific sequence.
    bits_tail=12, 
    # Increment tail bits globally (independent of thread ids) for the node (machine/server)
    tail_algorithm=:machine_increment,
    # Use group_1 as the machine_id
    group_1=machine_id, 
    # Start of the epoch for this UUID scheme.
    # Time before that can't be represented.
    epoch_start_dt=DateTime(2020, 1, 1, 0, 0, 0, 0), 
    # End of the epoch for this UUID scheme.
    # Time after that can't be represented.
    epoch_end_dt=DateTime(2070, 12, 31, 23, 59, 59, 999)
)
# The sum of the desired bits must match the word size of the specified data type (e.g. 41+10+12=63, which (currently) are the number of bits used in a Int64 type).
# Start and end of the epoch of the ID must fit in the desired number of bits for time.

# or

iddef = SnowflakeIdDefinition(
  # Start of the epoch for this UUID scheme.
  SOME_EPOCH_START_2020, 
  # the machine_id
  1
)

Generate an ID using the ID definition:

julia> tsid_generate(iddef)
489485826766409729

# the ID is produced in the integer type of desired size
julia> typeof(489485826766409729)
Int64

And convert it if necessary to a used friendly text:

julia> tsid_to_string(489485826766409729)
"DJR0RGDG0401"

Or use a lower level function to customize the output:

julia> crockford32_encode_int64(489485826766409729, with_checksum=true)
"DJR0RGDG04014"

Once you have an ID, you can extract back its components:

julia> tsid_timestamp(iddef, 489485826766409729)
2023-09-12T17:21:55.308

julia> typeof(tsid_timestamp(iddef, 489485826766409729))
DateTime

julia> tsid_machine_id(iddef, 489485826766409729)
1

julia> tsid_machine_tail(iddef, 489485826766409729)
1

One can also decode it from a text representation:

julia> tsid_from_string(iddef, "DJR0RGDG0401")
489485826766409729

julia> crockford32_decode_int64("DJR0RGDG0401")
489485826766409729

julia> crockford32_decode_int64("DJR0-RGDG-0401")
489485826766409729

julia> crockford32_decode_int64("DJR0-RGDG-0401-4", with_checksum=true)
489485826766409729

FAQ

What is the status of the package?

Used in production.

Few unpolished nuances around (un-)signed integers and (un-)signed fields at first position.

Why variations as Ints instead of using wrapper types?

A design choice and not a necessity, between trade-offs at this moment.

A wrapper type would help distinguish between UUIDs and other integer in an application which is useful. Lack of a wrapper allows for transparent passing around UUIDs between an application and databases/drivers without explicit (de-)serialization, while errors around UUIDs used as keys are enought profound for a system, to discover them rather early then late. But, a wrapper type is planned, not just for higher type safety, but also for easier support of larger than UInt128 UUIDs (e.g. KSUID)

Why modified Base 32/64 encoding?

Stock Base 32/64 are not correctly sortable under standard ASCII or UTF variants (even under big endian schemes). The library uses encodings where at first come ASCII numbers, then capital letters, then small letters and finally punctuation characters, which allows for lexicographic sorting of encoded strings. E.g. in a standard (as per RFC 4648 and earlier RFC 3548), characters in the encoding table of the Base 64 encoding are ordered like "ABCDE....abcde...01223...+/", while we use "0123...ABCD...abcd...+-", which is in line with integer codes in ASCII/UTF variants.

Why Crockford Base 32 encoding?

  • More human readable and less error prone to dictation than some others (e.g. Base32, Base64, Base58 etc.), while still compressing a bit over Hex encoding for example (each character in Crockford Base 32 corresponds to 5 bits of input);
  • Simple, efficient;
  • Support in other languages (see Crockford 32 on Github).

Future plans

  • Add a wrapper type, which will allow for:
    • Typed UUIDs instead of flavors of Ints only;
    • Compile UUID definitions to minimal set of bit-shifts for calculations and (de-)encoding

License

This library is Open Source software released under the LGPL 3.0+ license.


Enjoy! 😃

About

GenId offers few algorithms to generate non-conflicting IDs (mostly for databases/workflows) without a central coordinator.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages