Skip to content

Commit 34ed1a2

Browse files
committed
Add support for UUID version 7
Although the specification for UUIDv7 is still in draft, the UUIDv7 algorithm has been relatively stable as it progresses to completion. Version 7 UUIDs can be very useful, because they are lexographically sortable, which can improve e.g: database index locality. See section 6.10 of the draft specification for further explanation: https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/ The specification allows up to 12 bits of extra timestamp precision, to make UUID generation closer to monotonically increasing. This provides between 1ms and ~240ns of timestamp precision. At the cost of some code complexity and a small performance penalty, a kwarg may specify any arbitrary precision between 0 and 12 extra bits. Any stronger guarantees of monotonicity have considerably larger tradeoffs, so nothing more is implemented. This limitation is documented. Ruby issue: https://bugs.ruby-lang.org/issues/19735
1 parent 71d71db commit 34ed1a2

File tree

2 files changed

+167
-0
lines changed

2 files changed

+167
-0
lines changed

lib/random/formatter.rb

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,125 @@ def uuid
174174
"%08x-%04x-%04x-%04x-%04x%08x" % ary
175175
end
176176

177+
alias uuid_v4 uuid
178+
179+
# Generate a random v7 UUID (Universally Unique IDentifier).
180+
#
181+
# require 'random/formatter'
182+
#
183+
# Random.uuid_v7 # => "0188d4c3-1311-7f96-85c7-242a7aa58f1e"
184+
# Random.uuid_v7 # => "0188d4c3-16fe-744f-86af-38fa04c62bb5"
185+
# Random.uuid_v7 # => "0188d4c3-1af8-764f-b049-c204ce0afa23"
186+
# Random.uuid_v7 # => "0188d4c3-1e74-7085-b14f-ef6415dc6f31"
187+
# # |<--sorted-->| |<----- random ---->|
188+
#
189+
# # or
190+
# prng = Random.new
191+
# prng.uuid_v7 # => "0188ca51-5e72-7950-a11d-def7ff977c98"
192+
#
193+
# The version 7 UUID starts with the least significant 48 bits of a 64 bit
194+
# Unix timestamp (milliseconds since the epoch) and fills the remaining bits
195+
# with random data, excluding the version and variant bits.
196+
#
197+
# This allows version 7 UUIDs to be sorted by creation time. Time ordered
198+
# UUIDs can be used for better database index locality of newly inserted
199+
# records, which may have a significant performance benefit compared to random
200+
# data inserts.
201+
#
202+
# The result contains 74 random bits (9.25 random bytes).
203+
#
204+
# Note that this method cannot be made reproducable with Kernel#srand, which
205+
# can only affect the random bits. The sorted bits will still be based on
206+
# Process.clock_gettime.
207+
#
208+
# See draft-ietf-uuidrev-rfc4122bis[https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/]
209+
# for details of UUIDv7.
210+
#
211+
# ==== Monotonicity
212+
#
213+
# UUIDv7 has millisecond precision by default, so multiple UUIDs created
214+
# within the same millisecond are not issued in monotonically increasing
215+
# order. To create UUIDs that are time-ordered with sub-millisecond
216+
# precision, up to 12 bits of additional timestamp may added with
217+
# +extra_timestamp_bits+. The extra timestamp precision comes at the expense
218+
# of random bits. Setting <tt>extra_timestamp_bits: 12</tt> provides ~244ns
219+
# of precision, but only 62 random bits (7.75 random bytes).
220+
#
221+
# prng = Random.new
222+
# Array.new(4) { prng.uuid_v7(extra_timestamp_bits: 12) }
223+
# # =>
224+
# ["0188d4c7-13da-74f9-8b53-22a786ffdd5a",
225+
# "0188d4c7-13da-753b-83a5-7fb9b2afaeea",
226+
# "0188d4c7-13da-754a-88ea-ac0baeedd8db",
227+
# "0188d4c7-13da-7557-83e1-7cad9cda0d8d"]
228+
# # |<--- sorted --->| |<-- random --->|
229+
#
230+
# Array.new(4) { prng.uuid_v7(extra_timestamp_bits: 8) }
231+
# # =>
232+
# ["0188d4c7-3333-7a95-850a-de6edb858f7e",
233+
# "0188d4c7-3333-7ae8-842e-bc3a8b7d0cf9", # <- out of order
234+
# "0188d4c7-3333-7ae2-995a-9f135dc44ead", # <- out of order
235+
# "0188d4c7-3333-7af9-87c3-8f612edac82e"]
236+
# # |<--- sorted -->||<---- random --->|
237+
#
238+
# Any rollbacks of the system clock will break monotonicity. UUIDv7 is based
239+
# on UTC, which excludes leap seconds and can rollback the clock. To avoid
240+
# this, the system clock can synchronize with an NTP server configured to use
241+
# a "leap smear" approach. NTP or PTP will also be needed to synchronize
242+
# across distributed nodes.
243+
#
244+
# Counters and other mechanisms for stronger guarantees of monotonicity are
245+
# not implemented. Applications with stricter requirements should follow
246+
# {Section 6.2}[https://www.ietf.org/archive/id/draft-ietf-uuidrev-rfc4122bis-07.html#monotonicity_counters]
247+
# of the specification.
248+
#
249+
def uuid_v7(extra_timestamp_bits: 0)
250+
case (extra_timestamp_bits = Integer(extra_timestamp_bits))
251+
when 0 # min timestamp precision
252+
ms = Process.clock_gettime(Process::CLOCK_REALTIME, :millisecond)
253+
rand = random_bytes(10)
254+
rand.setbyte(0, rand.getbyte(0) & 0x0f | 0x70) # version
255+
rand.setbyte(2, rand.getbyte(2) & 0x3f | 0x80) # variant
256+
"%08x-%04x-%s" % [
257+
(ms & 0x0000_ffff_ffff_0000) >> 16,
258+
(ms & 0x0000_0000_0000_ffff),
259+
rand.unpack("H4H4H12").join("-")
260+
]
261+
262+
when 12 # max timestamp precision
263+
ms, ns = Process.clock_gettime(Process::CLOCK_REALTIME, :nanosecond)
264+
.divmod(1_000_000)
265+
extra_bits = ns * 4096 / 1_000_000
266+
rand = random_bytes(8)
267+
rand.setbyte(0, rand.getbyte(0) & 0x3f | 0x80) # variant
268+
"%08x-%04x-7%03x-%s" % [
269+
(ms & 0x0000_ffff_ffff_0000) >> 16,
270+
(ms & 0x0000_0000_0000_ffff),
271+
extra_bits,
272+
rand.unpack("H4H12").join("-")
273+
]
274+
275+
when (0..12) # the generic version is slower than the special cases above
276+
rand_a, rand_b1, rand_b2, rand_b3 = random_bytes(10).unpack("nnnN")
277+
rand_mask_bits = 12 - extra_timestamp_bits
278+
ms, ns = Process.clock_gettime(Process::CLOCK_REALTIME, :nanosecond)
279+
.divmod(1_000_000)
280+
"%08x-%04x-%04x-%04x-%04x%08x" % [
281+
(ms & 0x0000_ffff_ffff_0000) >> 16,
282+
(ms & 0x0000_0000_0000_ffff),
283+
0x7000 |
284+
((ns * (1 << extra_timestamp_bits) / 1_000_000) << rand_mask_bits) |
285+
rand_a & ((1 << rand_mask_bits) - 1),
286+
0x8000 | (rand_b1 & 0x3fff),
287+
rand_b2,
288+
rand_b3
289+
]
290+
291+
else
292+
raise ArgumentError, "extra_timestamp_bits must be in 0..12"
293+
end
294+
end
295+
177296
private def gen_random(n)
178297
self.bytes(n)
179298
end

test/ruby/test_random_formatter.rb

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,54 @@ def test_uuid
7575
assert_match(/\A\h{8}-\h{4}-\h{4}-\h{4}-\h{12}\z/, uuid)
7676
end
7777

78+
def test_uuid_v7(extra_timestamp_bits)
79+
t1 = current_uuid7_time
80+
uuid = @it.uuid_v7
81+
t3 = current_uuid7_time
82+
83+
assert_match(/\A\h{8}-\h{4}-7\h{3}-[89ab]\h{3}-\h{12}\z/, uuid)
84+
85+
t2 = get_uuid7_time(uuid)
86+
assert_operator(t1, :<=, t2)
87+
assert_operator(t2, :<=, t3)
88+
end
89+
90+
def test_uuid_v7_extra_timestamp_bits
91+
0.upto(12) do |extra_timestamp_bits|
92+
t1 = current_uuid7_time extra_timestamp_bits: extra_timestamp_bits
93+
uuid = @it.uuid_v7 extra_timestamp_bits: extra_timestamp_bits
94+
t3 = current_uuid7_time extra_timestamp_bits: extra_timestamp_bits
95+
96+
assert_match(/\A\h{8}-\h{4}-7\h{3}-[89ab]\h{3}-\h{12}\z/, uuid)
97+
98+
t2 = get_uuid7_time uuid, extra_timestamp_bits: extra_timestamp_bits
99+
assert_operator(t1, :<=, t2)
100+
assert_operator(t2, :<=, t3)
101+
end
102+
end
103+
104+
# It would be nice to simply use Time#floor here. But that is problematic
105+
# due to the difference between decimal vs binary fractions.
106+
def current_uuid7_time(extra_timestamp_bits: 0)
107+
denominator = (1 << extra_timestamp_bits).to_r
108+
Process.clock_gettime(Process::CLOCK_REALTIME, :nanosecond)
109+
.then {|ns| ((ns / 1_000_000r) * denominator).floor / denominator }
110+
.then {|ms| Time.at(ms / 1000r, in: "+00:00") }
111+
end
112+
113+
def get_uuid7_time(uuid, extra_timestamp_bits: 0)
114+
denominator = (1 << extra_timestamp_bits) * 1000r
115+
extra_chars = extra_timestamp_bits / 4
116+
last_char_bits = extra_timestamp_bits % 4
117+
extra_chars += 1 if last_char_bits != 0
118+
timestamp_re = /\A(\h{8})-(\h{4})-7(\h{#{extra_chars}})/
119+
timestamp_chars = uuid.match(timestamp_re).captures.join
120+
timestamp = timestamp_chars.to_i(16)
121+
timestamp >>= 4 - last_char_bits unless last_char_bits == 0
122+
timestamp /= denominator
123+
Time.at timestamp, in: "+00:00"
124+
end
125+
78126
def test_alphanumeric
79127
65.times do |n|
80128
an = @it.alphanumeric(n)

0 commit comments

Comments
 (0)