Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text ULID parsing squashes high bits #9

Closed
akx opened this issue May 24, 2017 · 4 comments
Closed

Text ULID parsing squashes high bits #9

akx opened this issue May 24, 2017 · 4 comments

Comments

@akx
Copy link

akx commented May 24, 2017

This issue was discovered by @larsyencken in valohai/ulid2#4; ulid2 uses the same decoding code as this library. Quoting that issue:

We discovered this problem by accident, when we realised that some (very far future ULIDs) the ULIDs appear to be not time-ordering.
For example, ULIDs that start with 0, 8, G or R are mapped to the same place.
It looks like the parsing of the first character has a cycle in it, instead of generating a new sequence of binary ULIDs.

I'm wondering whether this is a bug or a limitation of the encoding.

The same issue is reproducible using this library, too:

package main

import "fmt"
import "github.com/oklog/ulid"

func main() {
	fmt.Println(ulid.MustParse("00000000000000000000000000"))
	fmt.Println(ulid.MustParse("80000000000000000000000000"))
	fmt.Println(ulid.MustParse("G0000000000000000000000000"))
	fmt.Println(ulid.MustParse("R0000000000000000000000000"))
}

outputs

00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
00000000000000000000000000
@ghost
Copy link

ghost commented Jul 20, 2017

As per specification:

Won't run out of space till the year 10895 AD

What's the time stamp you're encoding?

@larsyencken
Copy link

After some discussion in my team, we realised that the string representation contains 130 bits (5 bits/character * 26 characters), whereas the binary form is 128 bit. This means there is two bits of redundancy in the first character.

We encountered this by using a library that randomly generated string ULIDs without regard for the current timestamp.

In my view, a ULID with a first character that is not between 0 and 7 should be invalid, since it encodes bits that will ignored in the transition to binary. I'll file an issue with the spec and ask for it to be mentioned explicitly.

@peterbourgon
Copy link
Member

peterbourgon commented Nov 28, 2017

@alizain Do you have opinions on this issue?

Edit: Oh, I see you wrote in the README

Technically, a 26-character Base32 encoded string can contain 130 bits of information, whereas a ULID must only contain 128 bits. Therefore, the largest valid ULID encoded in Base32 is 7ZZZZZZZZZZZZZZZZZZZZZZZZZ, which corresponds to an epoch time of 281474976710655 or 2 ^ 48 - 1.

Any attempt to decode or encode a ULID larger than this should be rejected by all implementations, to prevent overflow bugs.

Which I guess means we need to make a fix.

@alizain
Copy link

alizain commented Nov 28, 2017

Which I guess means we need to make a fix.

Yup ☺️

fancyweb added a commit to fancyweb/ulid that referenced this issue Jan 27, 2021
From what I understood in my tests and in oklog#9, the max possible base32 timestamp is `7ZZZZZZZZZ`. Consequently, the current max year in the README is wrong.
peterbourgon pushed a commit that referenced this issue Jan 27, 2021
From what I understood in my tests and in #9, the max possible base32 timestamp is `7ZZZZZZZZZ`. Consequently, the current max year in the README is wrong.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants