Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider getting rid of base58 #599

Closed
MonsieurNicolas opened this issue Jul 9, 2015 · 3 comments · Fixed by #619
Closed

consider getting rid of base58 #599

MonsieurNicolas opened this issue Jul 9, 2015 · 3 comments · Fixed by #619

Comments

@MonsieurNicolas
Copy link
Contributor

This is our last chance to do this or we'll regret it forever!

We should use a format like base32-zooko that has nice properties like:

  • human friendly (case insensitive, avoids similar characters)
  • uses bit wise logic and lookups (simple and fast)

http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt

This would also resolve #273 (where there is no real solution other than use a different encoding in the database).

Here are the differences in size between encodings:
hex => 64 characters
base32 => 51 characters
base64 => 42 characters
base58 => 46 or 47 characters

@MonsieurNicolas MonsieurNicolas added this to the Production milestone Jul 9, 2015
@jedmccaleb
Copy link
Contributor

rough consensus that we should use the RFC base32 and change the checksum to a normal CRC

@graydon
Copy link
Contributor

graydon commented Jul 10, 2015

I feel like I've been providing all the ammunition on this one, despite not really caring.

Our tendency so far has been to follow RFCs when they exist, so in this case https://tools.ietf.org/html/rfc4648 is the canonical encoding.

If we're going to do this -- especially if speed is any concern -- then we should also get rid of the double-sha256-truncated-to-32-bits "checksum" appended to the identifier. This is a pointlessly overpowered "check" against typos on the part of the user; a simple/cheap CRC32 (ISO 3309 / Gzip algorithm -- tons of libraries do this) or CRC16-CCITT or a check-digit scheme designed to catch real transposition/replacement errors (a la https://en.wikipedia.org/wiki/Check_digit ) is sufficient.

Since we probably want to avoid padding characters (ew =) we will want to land on an encoding-group multiple. Encoding-groups are 40 input bits => 8 output chars we'd probably be fine with 7 groups = 56 output chars = 280 input bits = 8 bits typecode/flags/whatever + 256 bits key + 16 bits CRC?

@MonsieurNicolas
Copy link
Contributor Author

that sounds good and yes, I came up with the same conclusion on the encoding. with the RFC one it's very easy for somebody to understand which digits are allowed vs not and we avoid pushing logic to clients that would have to deal with denormalized forms.
As for the checksum: CRC-16 seems like a good candidate as, I think, it would basically detect any error of up to 16 bits (which includes any mutation of 3 consecutive digits) and other errors with some good probability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants