Provide Way of Representing CidV0 objects in an alternative base #34

kevina · 2017-08-29T07:47:35Z

It may be useful to represent a CidV0 object in an alternative base. For example if we need to include it in as the domain part of a URL.

I propose we provide allow a non non-standard encoding of:

<multibase prefix><version><multihash>

where version is always 0.

Related to ipfs/kubo#4143.

Note: I made this more complicated than it needs be, see this comment: #34 (comment)

The text was updated successfully, but these errors were encountered:

ghost · 2017-08-29T13:20:52Z

I've been thinking about this too, for the case where we want to represent a CIDv0 in base32 in a URL. The less obvious thing here is that we probably need to be able to recognize such CIDs and convert them back to CIDv0, so that DHT and Bitswap return the expected results.

Do you agree this is a concern? @whyrusleeping was skeptical when I first brought it up a while ago.

It could be done with a multicodec header that says "what's in here is CIDv0".

daviddias · 2017-08-29T14:52:31Z

for the case where we want to represent a CIDv0 in base32 in a URL.

For those cases, we upgrade the CID to CIDv1 and base encoded it with base32.

kevina · 2017-08-29T16:36:16Z

@lgierth

It could be done with a multicodec header that says "what's in here is CIDv0

Yes, and I think that should just be the byte '0x00' which is the variable-length encoding of the number 0. Hence the equivalent of setting the version number to 0.

@diasdavid

For those cases, we upgrade the CID to CIDv1 and base encoded it with base32.

The ideas is we want to be able to retrieve a Cidv0 object using base32. Upgrading the CID will not work in this case.

kevina · 2017-08-29T16:59:51Z

The rules for displaying CidV0 in an alternative base can be as follows:

If the base is in 'base58btc' than just encode the multihash without a base prefix, otherwise prepend a byte of value '0x00' (the version 0 string) to the binary representation and encode that with a base prefix.

This should be fairly easy to implement, I can create a p.r. later today if it will make my idea clearer.

ghost · 2017-08-29T17:12:16Z

The ideas is we want to be able to retrieve a Cidv0 object using base32. Upgrading the CID will not work in this case.

@diasdavid what Kevin and I are thinking about is the next step after receiving a gateway request with a CIDv0-in-CIDv1: content routing and bitswap, where we need the original CIDv0 because that's what provider records are published for, and what's used as a key in the blockstore of the provider nodes. It's kind of a matter of backward compatibility: there will always be nodes who continue to create CIDv0 stuff.

Stebalien · 2017-08-29T21:45:48Z

@kevina I'd go a step further and start using the CIDv1 formatted CIDv0s by default. I believe that will actually be less work and, given that we're planning on switching to base32 by default anyways, shouldn't cause any additional breakage. Does that make sense or am I missing something?

kevina · 2017-08-29T22:09:26Z

@Stebalien What exactly will be a CIDv1 formatted CIDv0?

The base used to encode a CID is for the most part an interface issue. The base is is not stored in the key of objects stored as part of a link or in the blockstore. The Cid version and hash used is, on the other hand, part of the key.

You can't retrieve an Cidv0 object if it is reformatted to be Cidv1 and sometimes you may still want to store a Cidv0 key in another object.

Stebalien · 2017-08-29T22:18:33Z

What exactly will be a CIDv1 formatted CIDv0?

The multibase-cidv0-multihash sequence. That is, if we switch to base32 as the default, we'll now be returning base32-cidv0-multihash from interfaces so we might as well start returning base58btc-cidv0-multihash when the user requests base58btc.

kevina · 2017-08-30T00:56:41Z

@Stebalien sorry I misinterpreted your reply (as I am prone to do :( ).

The idea behind not returning base58btc-cidv0-multihash was (1) to provide only one way to represent a Cid in a given base and (2) to simplify the U.I. If we do switch to base58btc-cidv0-multihash then we probably need to provide a way to get to the old style hash (for compatibility, if no other reason) and will need to provide an extra option for that. If we have the rule that base58btc implies the old style hash string (no multibase-version) prefix we avoid this problem.

Stebalien · 2017-08-30T02:06:35Z

I kind of agree... I'd just like to phase out non-multibase strings as much as possible as soon as possible. But, I guess, if we move to base32 by default we'll be doing that anyways.

kevina · 2017-08-30T03:19:42Z

It turns out I made this more complicated then necessary. The CidV0 bit is NOT needed. Just prefixing a CidV0 string with a multibase prefix will already work as expected. For example
QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco zQmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco BCIQIZOL2ACTMSCQU5TYYFWBWGWB5IAXWTEM6HDYETDW5TWJKLQBKPNA will all work already.

The only problem is this little bit of code (which gave me the impression that a CidV0 are incompatible with multibase)

func (c *Cid) StringOfBase(base mbase.Encoding) (string, error) {
	switch c.version {
	case 0:
		if base != mbase.Base58BTC {
			return "", ErrInvalidEncoding
		}
		return c.hash.B58String(), nil
	case 1:
		return mbase.Encode(base, c.bytesV1())
	default:
		panic("not possible to reach this point")
	}
}

which should likely work on CidV0 string and return a string with the multibase prefix if the code is not 'z'.

Thus I propose we fix this function and call it a day. 😄

Stebalien · 2017-08-30T20:44:22Z

Here's an update to the spec to take this into account: multiformats/cid#14

daviddias · 2017-08-31T09:36:08Z

@diasdavid what Kevin and I are thinking about is the next step after receiving a gateway request with a CIDv0-in-CIDv1: content routing and bitswap, where we need the original CIDv0 because that's what provider records are published for, and what's used as a key in the blockstore of the provider nodes. It's kind of a matter of backward compatibility: there will always be nodes who continue to create CIDv0 stuff.

There are two issues here. One is:

a) where we need the original CIDv0 because that's what provider records.
b) what's used as a key in the blockstore of the provider nodes.

a)

This was accepted as kind of fine since that is also a valid concern for when someone picks different hashes or bases for their content. Essentially, the content should be identified by it's multihash and not by a key that includes type and multibase. If the hash matches, it is the same content. If an app wants to interpret content "asdldasjdkla" as multicodec A and another app as multicodec B, that's fine. This way we ensure deduplication

b)

This is what I think will be the be crux.

From memory, the Lisbon Treaty (the 2~3 hack nights that me, @jbenet and @whyrusleeping spent figuring all of the details out), the actual solution for this problem was not to use CIDs as Keys in the Repo, instead, keep using the multihashes as it was before. This would solve this issue from the bottom because independently of reading CIDv0 or CIDv1, the repo would always pick up the block from the multihash and that would be the same for both.

Right now, go-ipfs uses base32.encode(rawCID) to get the keys for storing blocks in the repo. Eventually, js-ipfs followed so that both repos are compatible. However, I can't recall if there was a valid reason for this change. Please help me refresh my and everyone's memory if you know where that was discussed and approved.

Using base32.encode(rawCID) also means that if I have a block that for an IPLD graph, one of the node is considered _Raw) (because, for that graph, it really doesn't care for those bytes nor it knows what it is) and for another graph, it consuming the same content but now it is a git Object, now we effectively have to store the block 2x, one with CID having the multicodec for Raw and another with multicodec for CID.

PS: This brings a old subject where we had similar issues for having updates on how the Repo works (exactly the same module) without having those updates first proposed to the Repo Spec on https://github.com/ipfs/specs/tree/master/repo

kevina · 2017-08-31T10:17:11Z

@diasdavid Thanks for this. Was this tidbit of information documented somewhere?

Based on what you are saying all network requests for blocks should be made using the multihash only and not the full CID.

I agree that we should be storing blocks using just the multihash. The change to base32 was made before the CID proposal to fix ipfs/kubo#2601, some additional discussion was done here ipfs/go-datastore#39, and the change was made in ipfs/kubo#2903.

It should be rather simple to store just the multihash component in the datastore. The biggest pain will be the repo migration.

daviddias · 2017-08-31T10:19:55Z

@diasdavid Thanks for this. Was this tidbit of information should of been documented somewhere?

Agreed and I believe it was, at least 2 times. Post Lisbon Treaty and somewhere in a go-ipfs PR. It should have been, however, done through the IPFS Specs Repo.

Based on what you are saying all network requests for blocks should be made using the multihash only and not the full CID.

The CID can go on the wire (because of in the future, with graphswap that will be useful), however, the assertions should be done in the hash level. If the Repo was checking by Hash, that will be granted by default.

kevina · 2017-08-31T10:23:50Z

@diasdavid you replied before I edited my sentence 😄

Kubuxu · 2017-08-31T10:52:15Z

Right now, go-ipfs uses base32.encode(rawCID) to get the keys for storing blocks in the repo.

This was also cleared up but after Lisbon (I think this point might not have been considered , without this it would be impossible to enumerate blocks given only repo for example as in case of pinset loss, it has saved us once already). This also means that the data isn't self descriptive in storage in blockstore.

It might be possible to store it aside (in other part of the datastore) but it would require us to reevaluate the leveldb (it might just not scale to number of entries we keep).

EDIT: I had this comment sitting in comment windows for ~1h

kevina · 2017-08-31T11:12:25Z

@diasdavid I fully agree with all your points. Nevertheless, I still think it will be useful to be able to represent CidV0 in an alternative base. For example if we switch to Base32 users might still have a need to store a CidV0 string as part of a link in a dag. Also converting a CidV0 link in a dag to a Cidv1 for display in same cases can cause problems. If we don't allow multibase prefixed for CidV0 then it will be base32 for everything but the case where CidV0 is still required. That seams odd to me.

kevina · 2017-08-31T11:15:03Z

Could I have a link to the "Post Lisbon Treaty"?

ghost · 2017-09-12T16:40:20Z

Could I have a link to the "Post Lisbon Treaty"?

I want too

Kubuxu · 2018-03-01T16:40:30Z

I would like to renew this discussion. I don't think that supporting CIDv0 in multibase is wise, it will linger forever everywhere.

We should flesh out clear upgrade path that will lead to us not using CIDv0 internally anymore.

Quick example of possible upgrade path:

Finally upgrade to outputting CIDv1
Create repo migration of v0 -> v1
If we ever decode CIDv0, upgrade it to v1
Push out both CIDv0 and CIDv1 where compatible to the network.
In places where legacy CIDv0 is need (network, DHT, so on) make dual requests and track metrics on how many times CIDv0 exclusively allowed us to access content (so making requests to CIDv0 saved us)
When this number falls low enough (enough nodes upgraded), stop providing compatibility layer of publishing CIDv0.

This way we can get CIDv0 mostly out of our codebases (apart from UI layer) and upgrade to new protocol. If we don't do this soon enough we will be stuck with it forever and it decreases performance, increases complexity and reduces our velocity (complexity is bane of moving protocols forward).

We need a clear way to move forward with CIDv1 and how finally embrace it fully.

kevina · 2018-03-01T17:05:57Z

I don't think that supporting CIDv0 in multibase is wise

I do not agree with this and don't see the argument as "it will linger forever then everywhere" very convincing.

whyrusleeping · 2018-03-01T19:41:16Z

We do want to move away from supporting cidv0. Adding new features to it seems unwise if thats really our goal.

I'm not really sure what a valid usecase for this is, as @diasdavid said earlier, we should just upgrade them to cidv1 and provide an internal conversion layer in cases where it might be needed.

Stebalien · 2018-03-01T23:36:54Z

@Kubuxu ideally, the datastore wouldn't care about the CID, just the multihash. ~~However, to keep GC working correctly, we'd need to introduce a separate "linkstore" for keeping track of the local DAG. @lgierth indicated he was interested in working on this.~~ (#34 (comment))

kevina · 2018-06-08T08:50:50Z

I'm not really sure what a valid usecase for this is, as @diasdavid said earlier, we should just upgrade them to cidv1 and provide an internal conversion layer in cases where it might be needed.

I will repeat by earlier comment #34 (comment) so its not lost in the noise:

Nevertheless, I still think it will be useful to be able to represent CidV0 in an alternative base. For example if we switch to Base32 users might still have a need to store a CidV0 string as part of a link in a dag. Also converting a CidV0 link in a dag to a Cidv1 for display in same cases can cause problems. If we don't allow multibase prefixed for CidV0 then it will be base32 for everything but the case where CidV0 is still required. That seams odd to me.

kevina · 2018-06-08T08:59:48Z

Also, as a side note, we can not fully convert CidV0 Dag objects to CidV1 without changing the hash of the object as the links inside will also need to be converted to CidV1. If we do this it will no longer be possible to look up old content. If we don't do this we will need to support CidV0 forever, at least to some capacity.

kevina · 2018-06-09T22:20:28Z

However, to keep GC working correctly, we'd need to introduce a separate "linkstore" for keeping track of the local DAG.

I do not see why this will be necessary. Pinned hashes will still be stored by the full CID. The GC does not care about the structure of anything not pinned.

Stebalien · 2018-06-11T19:58:12Z

@kevina you're right.

kevina · 2018-06-29T14:48:56Z

In addition to all the points I said above I would also like to add that not allowing CidV0 to be represented in Base32 breaks abstraction and creates this weird special case. As I see it multibase is something that is independent of CID. That is it can be used to encode any sort of (short) binary string. If we first convert to binary and then decode the CID then any sort of base will be permissible (as the multibase code has no idea it is decoding a CID, and the CID decoder just sees the binary string), however if we do it all at once then the CID decoder will reject it.

I fully agree with should work to migrate away from CIDv0, however for this reason and reasons in previous comments I think not allowing CIDv0 to be encoded using multibase creates unnecessary complications.

In all fairness I did think of one valid use case for not allowing multibase prefixed for CIDv0 that is to distinguish raw multihashes from multihashes that are also CIDv0. That is we switch to storing raw multihashes in the datastore we will need to list them to the user. We could in this case list them with a multibase encoding and define any multihash with a multibase prefix as a raw multihash and not a CIDv0. However, as I see it this is a fragile distinction and it still doesn't solve the problem on how to handle working with existing CIDv0 stored as link in objects.

kevina · 2018-07-02T19:41:53Z

I'm not sure, but maybe this is something that we can work through in the upcoming developers meeting. See ipfs-inactive/developer-meetings#16.

lidel · 2020-02-06T15:22:07Z

Closing:

Current consensus is that when alternative base needs to be used,
one needs to convert it to CIDv1
Libp2p is adopting support for CIDv1 as text representation too.
See https://github.com/libp2p/specs/blob/master/RFC/0001-text-peerid-cid.md and Tracking issue for peer IDs as CIDs libp2p/specs#216.

To follow migration to CIDv1 (default base32) see: https://github.com/ipfs/ipfs/issues/337

kevina mentioned this issue Aug 29, 2017

Make base32 CIDv1 the default for go-ipfs ipfs/kubo#4143

Open

kevina mentioned this issue Aug 30, 2017

Fix StringOfBase to allow encoding CidV0 in alternative bases. #35

Closed

kevina mentioned this issue Aug 31, 2017

Update the spec from the implementation. multiformats/cid#14

Merged

Stebalien mentioned this issue Aug 31, 2017

Blocks should be addressed by Multihash, not CID ipfs/notes#259

Closed

This was referenced Oct 6, 2017

Discussion Point: Using Multihash not CID in blockstore and bitswap ipfs/kubo#4284

Closed

Monday Calls: Oct 09-15 ipfs/team-mgmt#505

Closed

momack2 added this to Inbox in ipfs/go-ipfs May 9, 2019

lidel closed this as completed Feb 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide Way of Representing CidV0 objects in an alternative base #34

Provide Way of Representing CidV0 objects in an alternative base #34

kevina commented Aug 29, 2017 •

edited

Loading

ghost commented Aug 29, 2017

daviddias commented Aug 29, 2017

kevina commented Aug 29, 2017 •

edited

Loading

kevina commented Aug 29, 2017 •

edited

Loading

ghost commented Aug 29, 2017

Stebalien commented Aug 29, 2017

kevina commented Aug 29, 2017 •

edited

Loading

Stebalien commented Aug 29, 2017

kevina commented Aug 30, 2017 •

edited

Loading

Stebalien commented Aug 30, 2017

kevina commented Aug 30, 2017 •

edited

Loading

Stebalien commented Aug 30, 2017

daviddias commented Aug 31, 2017 •

edited

Loading

kevina commented Aug 31, 2017 •

edited

Loading

daviddias commented Aug 31, 2017 •

edited

Loading

kevina commented Aug 31, 2017

Kubuxu commented Aug 31, 2017 •

edited

Loading

kevina commented Aug 31, 2017

kevina commented Aug 31, 2017

ghost commented Sep 12, 2017

Kubuxu commented Mar 1, 2018 •

edited

Loading

kevina commented Mar 1, 2018 •

edited

Loading

whyrusleeping commented Mar 1, 2018

Stebalien commented Mar 1, 2018 •

edited

Loading

kevina commented Jun 8, 2018

kevina commented Jun 8, 2018

kevina commented Jun 9, 2018

Stebalien commented Jun 11, 2018

kevina commented Jun 29, 2018 •

edited

Loading

kevina commented Jul 2, 2018

lidel commented Feb 6, 2020 •

edited

Loading

Provide Way of Representing CidV0 objects in an alternative base #34

Provide Way of Representing CidV0 objects in an alternative base #34

Comments

kevina commented Aug 29, 2017 • edited Loading

ghost commented Aug 29, 2017

daviddias commented Aug 29, 2017

kevina commented Aug 29, 2017 • edited Loading

kevina commented Aug 29, 2017 • edited Loading

ghost commented Aug 29, 2017

Stebalien commented Aug 29, 2017

kevina commented Aug 29, 2017 • edited Loading

Stebalien commented Aug 29, 2017

kevina commented Aug 30, 2017 • edited Loading

Stebalien commented Aug 30, 2017

kevina commented Aug 30, 2017 • edited Loading

Stebalien commented Aug 30, 2017

daviddias commented Aug 31, 2017 • edited Loading

kevina commented Aug 31, 2017 • edited Loading

daviddias commented Aug 31, 2017 • edited Loading

kevina commented Aug 31, 2017

Kubuxu commented Aug 31, 2017 • edited Loading

kevina commented Aug 31, 2017

kevina commented Aug 31, 2017

ghost commented Sep 12, 2017

Kubuxu commented Mar 1, 2018 • edited Loading

kevina commented Mar 1, 2018 • edited Loading

whyrusleeping commented Mar 1, 2018

Stebalien commented Mar 1, 2018 • edited Loading

kevina commented Jun 8, 2018

kevina commented Jun 8, 2018

kevina commented Jun 9, 2018

Stebalien commented Jun 11, 2018

kevina commented Jun 29, 2018 • edited Loading

kevina commented Jul 2, 2018

lidel commented Feb 6, 2020 • edited Loading

kevina commented Aug 29, 2017 •

edited

Loading

kevina commented Aug 29, 2017 •

edited

Loading

kevina commented Aug 29, 2017 •

edited

Loading

kevina commented Aug 29, 2017 •

edited

Loading

kevina commented Aug 30, 2017 •

edited

Loading

kevina commented Aug 30, 2017 •

edited

Loading

daviddias commented Aug 31, 2017 •

edited

Loading

kevina commented Aug 31, 2017 •

edited

Loading

daviddias commented Aug 31, 2017 •

edited

Loading

Kubuxu commented Aug 31, 2017 •

edited

Loading

Kubuxu commented Mar 1, 2018 •

edited

Loading

kevina commented Mar 1, 2018 •

edited

Loading

Stebalien commented Mar 1, 2018 •

edited

Loading

kevina commented Jun 29, 2018 •

edited

Loading

lidel commented Feb 6, 2020 •

edited

Loading