Text ULID parsing squashes high bits #9

akx · 2017-05-24T10:46:38Z

This issue was discovered by @larsyencken in valohai/ulid2#4; ulid2 uses the same decoding code as this library. Quoting that issue:

We discovered this problem by accident, when we realised that some (very far future ULIDs) the ULIDs appear to be not time-ordering.
For example, ULIDs that start with 0, 8, G or R are mapped to the same place.
It looks like the parsing of the first character has a cycle in it, instead of generating a new sequence of binary ULIDs.

I'm wondering whether this is a bug or a limitation of the encoding.

The same issue is reproducible using this library, too:

package main

import "fmt"
import "github.com/oklog/ulid"

func main() {
	fmt.Println(ulid.MustParse("00000000000000000000000000"))
	fmt.Println(ulid.MustParse("80000000000000000000000000"))
	fmt.Println(ulid.MustParse("G0000000000000000000000000"))
	fmt.Println(ulid.MustParse("R0000000000000000000000000"))
}

outputs

00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000

The text was updated successfully, but these errors were encountered:

ghost · 2017-07-20T11:07:29Z

As per specification:

Won't run out of space till the year 10895 AD

What's the time stamp you're encoding?

larsyencken · 2017-07-20T19:48:17Z

After some discussion in my team, we realised that the string representation contains 130 bits (5 bits/character * 26 characters), whereas the binary form is 128 bit. This means there is two bits of redundancy in the first character.

We encountered this by using a library that randomly generated string ULIDs without regard for the current timestamp.

In my view, a ULID with a first character that is not between 0 and 7 should be invalid, since it encodes bits that will ignored in the transition to binary. I'll file an issue with the spec and ask for it to be mentioned explicitly.

peterbourgon · 2017-11-28T05:52:05Z

@alizain Do you have opinions on this issue?

Edit: Oh, I see you wrote in the README

Technically, a 26-character Base32 encoded string can contain 130 bits of information, whereas a ULID must only contain 128 bits. Therefore, the largest valid ULID encoded in Base32 is 7ZZZZZZZZZZZZZZZZZZZZZZZZZ, which corresponds to an epoch time of 281474976710655 or 2 ^ 48 - 1.

Any attempt to decode or encode a ULID larger than this should be rejected by all implementations, to prevent overflow bugs.

Which I guess means we need to make a fix.

alizain · 2017-11-28T11:37:11Z

Which I guess means we need to make a fix.

Yup ☺️

From what I understood in my tests and in oklog#9, the max possible base32 timestamp is `7ZZZZZZZZZ`. Consequently, the current max year in the README is wrong.

From what I understood in my tests and in #9, the max possible base32 timestamp is `7ZZZZZZZZZ`. Consequently, the current max year in the README is wrong.

larsyencken mentioned this issue Jul 20, 2017

String form has 2-bits of redundancy ulid/javascript#43

Closed

peterbourgon mentioned this issue Dec 14, 2017

Overflow checking #20

Merged

mattmoyer mentioned this issue Jan 30, 2018

Parsing ignores invalid characters #21

Closed

peterbourgon closed this as completed in #20 Mar 14, 2018

fancyweb added a commit to fancyweb/ulid that referenced this issue Jan 27, 2021

Fix max theorical date in the README

cfd6fae

From what I understood in my tests and in oklog#9, the max possible base32 timestamp is `7ZZZZZZZZZ`. Consequently, the current max year in the README is wrong.

fancyweb mentioned this issue Jan 27, 2021

Fix max theorical date in the README #65

Merged

peterbourgon pushed a commit that referenced this issue Jan 27, 2021

Fix max theorical date in the README (#65)

e7ac4de

From what I understood in my tests and in #9, the max possible base32 timestamp is `7ZZZZZZZZZ`. Consequently, the current max year in the README is wrong.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text ULID parsing squashes high bits #9

Text ULID parsing squashes high bits #9

akx commented May 24, 2017

ghost commented Jul 20, 2017

larsyencken commented Jul 20, 2017

peterbourgon commented Nov 28, 2017 •

edited

Loading

alizain commented Nov 28, 2017

Text ULID parsing squashes high bits #9

Text ULID parsing squashes high bits #9

Comments

akx commented May 24, 2017

ghost commented Jul 20, 2017

larsyencken commented Jul 20, 2017

peterbourgon commented Nov 28, 2017 • edited Loading

alizain commented Nov 28, 2017

peterbourgon commented Nov 28, 2017 •

edited

Loading