Switch to base36 by default in all text output (overriding ipfs/go-ipfs/issues/4143 ) #7357

ribasushi · 2020-05-25T13:08:07Z

Now that base36 support is landing into go-ipfs for unrelated reasons I would like to resurrect my pitch from 3 years ago to default to base36 rendering as opposed to base32. The internals of go-ipfs operate over binary CIDs so the proposed change is strictly concerned with "UX".

Let me know your thoughts!

✅ Pros

Consistency across the board: libp2p-key and unixfs-entry pointers all look like what the wider public knows as "a CID"
Brings "semi-avalanche effect" to identity CIDs. E.g.


Data:	`sufficiently long payload`	`sufficient1y long payload`
B32Cid:	`bafkqagttovtgm2ldnfsw45dmpeqgy33om4qhaylznrxwczak`	`bafkqagttovtgm2ldnfsw45brpeqgy33om4qhaylznrxwczak`
B36Cid:	`kumq0v1xsl33xy9ospzfmf6uuibi0jxtdsrmhovipx6o0a`	`kumq0v1xsl33xy9ospzfmbp3zrpszmf3tbmvd50ou83pje`

Increases the usable size of an identity-cid from 34 to 36 bytes of "blocksize" while still usable in subdomains (max 63 characters): 1 + ceil( (4+blocksize) * log(256) / log(base32-or-36) )

❌ Cons

Early adopters are already familiar with bafy..., another switch would create further confusion
Somewhat slower ( on my macbook I can encode and decode ( 1 * 1000 * 1000 * 1000 ) / 2672 = 374251 b36 CIDs in 1 second ):

go-multibase$ go test -count=1 ./... -cpu=1 -bench . -benchmem
goos: darwin
goarch: amd64
pkg: github.com/multiformats/go-multibase
BenchmarkRoundTrip/base32         	 2190152	       541 ns/op	     256 B/op	       4 allocs/op
BenchmarkRoundTrip/base36         	  464097	      2672 ns/op	     336 B/op	       5 allocs/op
BenchmarkRoundTrip/base58btc      	  564238	      2220 ns/op	     336 B/op	       5 allocs/op
BenchmarkEncode/base32            	 5402474	       234 ns/op	     192 B/op	       3 allocs/op
BenchmarkEncode/base36            	  608466	      1966 ns/op	     192 B/op	       3 allocs/op
BenchmarkEncode/base58btc         	  739843	      1585 ns/op	     192 B/op	       3 allocs/op
BenchmarkDecode/base32            	 4150816	       278 ns/op	      64 B/op	       1 allocs/op
BenchmarkDecode/base36            	 1604271	       742 ns/op	     144 B/op	       2 allocs/op
BenchmarkDecode/base58btc         	 1804045	       643 ns/op	     144 B/op	       2 allocs/op

The text was updated successfully, but these errors were encountered:

lidel · 2020-05-25T17:14:00Z

There is a performance penalty, but this is mostly about UX.
Having that in mind, I'd like to highlight what @RubenKelevra said here:

If we use the same text to binary transformation it would be nice if the type of data we encode would be visible to a human

I wonder, perhaps there is UX next-positive in having different defaults (and canonical representation in subdomains) for mutable and immutable identifiers:

keep base32 for /ipfs/baf...
use base36 as the default for PeerIDs/libp2p-keys (/ipns/k2...)
- we will use it for longer keys such as ed25519 anyway, so we are half-way there

This would make it easy to eyeball identifiers (baf.. vs k2..), potentially reducing confusion between "CIDs" and "PeerIDs".

Any downsides?

I don't think we encounter a system which can't distinguish between lower and upper case

I wish that was the case :-)

Unfortunately the default text representation of CID needs to be case-insensitive for pragmatic reasons, some of them are:

DNS is one thing,
but actual code in browser vendors (and probably a bunch of libraries/SDKs used in the wild) is forcing all characters in the "authority" component of the URI to be lowercase, even if URI does not start with http* (!)
Windows filesystems are case-insensitive (only recently Windows 10 build 17107 introduced experimental opt-in per directory to make them case-sensitive)
probably much more

lidel · 2020-05-25T22:25:33Z

@RubenKelevra the limitation here is that IPFS uses CIDv0/CIDv1 spec, and adding checksum would mean creating CIDv2. I think its worth discussing, but as you noticed its out of scope here, so please fill an issue in https://github.com/multiformats/cid 🙏

Stebalien · 2020-05-26T00:37:29Z

This would make it easy to eyeball identifiers (baf.. vs k2..), potentially reducing confusion between "CIDs" and "PeerIDs".

Any downsides?

Well, people will start assuming that baf means CID and k2 means peer ID. Worse, people will start writing tools to match these prefixes. On the other hand, that's likely going to happen anyways.

bmwiedemann · 2020-05-27T07:32:04Z

Apart from performance and en/decoder-complexity,
another downside of base36 is decreased legibility, because it adds 0, 1 and 8 to the character set that can be mistaken for I, l or O
probably not worth it for the small increase in data density.

Stebalien · 2020-05-28T00:25:23Z

Well, people will start assuming that baf means CID and k2 means peer ID. Worse, people will start writing tools to match these prefixes. On the other hand, that's likely going to happen anyways.

To clarify, I wouldn't consider the ability to distinguish between CIDs and peer IDs based on encoding a feature.

I'm slightly leaning towards base36 everywhere:

To be consistent with CIDs.
To stop users from trying to distinguish between "is a cid" and "is not a cid" by looking at the base encoding. They're all CIDs.

@bmwiedemann

Apart from performance and en/decoder-complexity,
another downside of base36 is decreased legibility, because it adds 0, 1 and 8 to the character set that can be mistaken for I, l or O
probably not worth it for the small increase in data density.

The goal is:

To shave off two bytes from specific IPNS keys so they fit in subdomains.
Be consistent.

bmwiedemann · 2020-05-28T02:56:28Z

To shave off two bytes from specific IPNS keys so they fit in subdomains.

Maybe soon there will be other keys that need another 2 bytes?

Others have avoided the 63-char limit by allowing a dot:
https://tools.ietf.org/html/rfc6376#section-3.6.2.1

and code complexity is still a concern to me, because it makes alternative implementations harder and thereby reduces ipfs adoption:
https://github.com/bmwiedemann/perl-ipfs/blob/master/multibase.pm#L16
has just ~20 lines to implement encoding+decoding for base32+64+16+2
and no base58 or 36 support in there. Luckily both filestore and CIDv1 default to base32.

aschmahmann · 2020-05-28T03:28:08Z

TLDR: @Stebalien has accurately stated the goal, @bmwiedemann's solution is explored in #7318 and has serious problems for important use cases.

Maybe soon there will be other keys that need another 2 bytes?

Yes, and we'll likely have to deal with that at some point, however the "just use a dot" suggestion is not as simple as you are advertising. Take a look at #7318 for more context and feel free to add any suggestions there if you have some ideas.

Hopefully we can work to ensure that we don't have to deal with this problem "soon" and that by then some of the issues in #7318 will become manageable. Potentially we could even sidestep DNS name restrictions entirely by relying more heavily on an IPFS protocol handler instead just using a DNS name.

In the meanwhile we'd like to handle #6916 while still supporting resolution in subdomains. While we could do that by using base 36 in some places (e.g. subdomains) and base 32 in other places, this proposal is advocating for consistently outputting base 36.

ribasushi · 2020-05-28T08:10:20Z

@bmwiedemann I contributed a scaffold b36/b58 encoder here as a "oneliner" : bmwiedemann/perl-ipfs@83cb86c#r39496064

Maybe soon there will be other keys that need another 2 bytes?

We are now painfully aware about the hard limit for user-visible things being 40 bytes, and are going to aggressively design to not go over this in the future (case in point: the current situation was the result of an oversight).

ribasushi · 2020-07-10T13:13:37Z

Dropping a reference for when there is a decision on this: one of the main spots to fixup would be https://github.com/ipfs/go-ipfs/blob/79d571f272d640f0d4164f6f9d94c5fb44983822/core/corehttp/hostname.go#L381-L396

ribasushi · 2021-06-21T11:49:36Z

@aschmahmann @lidel seems to me we will never do this in go-ipfs. I will close this later today unless you think there's more to talk about.

lidel · 2021-06-21T13:23:49Z

@ribasushi closing this sgtm
(we default CIDv1 under /ipfs to base32 and since #7441 we use base36 for libp2p-keys under /ipns what would not fit in DNS label otherwise)

ribasushi added the kind/enhancement A net-new feature or improvement to an existing feature label May 25, 2020

This comment has been minimized.

Sign in to view

ribasushi mentioned this issue May 25, 2020

Create 2020-05-21--design-discussion-subdomains-dns-label-limit.md ipfs/team-mgmt#1159

Merged

RubenKelevra mentioned this issue May 25, 2020

Hardened CIDs/Peer-IDs with Argon2 parity section #7361

Open

ribasushi mentioned this issue Jun 29, 2020

feat: support ED25519 at subdomain gw with TLS #7441

Merged

ribasushi closed this as completed Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to base36 by default in all text output (overriding ipfs/go-ipfs/issues/4143 ) #7357

Switch to base36 by default in all text output (overriding ipfs/go-ipfs/issues/4143 ) #7357

ribasushi commented May 25, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

lidel commented May 25, 2020 •

edited

Loading

This comment has been minimized.

lidel commented May 25, 2020

Stebalien commented May 26, 2020

bmwiedemann commented May 27, 2020

Stebalien commented May 28, 2020

bmwiedemann commented May 28, 2020

aschmahmann commented May 28, 2020

ribasushi commented May 28, 2020

ribasushi commented Jul 10, 2020

ribasushi commented Jun 21, 2021

lidel commented Jun 21, 2021

Switch to base36 by default in all text output (overriding ipfs/go-ipfs/issues/4143 ) #7357

Switch to base36 by default in all text output (overriding ipfs/go-ipfs/issues/4143 ) #7357

Comments

ribasushi commented May 25, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

lidel commented May 25, 2020 • edited Loading

This comment has been minimized.

lidel commented May 25, 2020

Stebalien commented May 26, 2020

bmwiedemann commented May 27, 2020

Stebalien commented May 28, 2020

bmwiedemann commented May 28, 2020

aschmahmann commented May 28, 2020

ribasushi commented May 28, 2020

ribasushi commented Jul 10, 2020

ribasushi commented Jun 21, 2021

lidel commented Jun 21, 2021

lidel commented May 25, 2020 •

edited

Loading