Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PartialEq, Hash, PartialOrd across Cid versions #66

Closed
koivunej opened this issue Aug 30, 2020 · 5 comments
Closed

PartialEq, Hash, PartialOrd across Cid versions #66

koivunej opened this issue Aug 30, 2020 · 5 comments

Comments

@koivunej
Copy link

Similar to #59 at the moment (0.5.1) the Cid is not equal across cid versions. I did not find anything from the specs or original posts on Cidv1 extension about this, but my intuition says cid::Version::V0 is merely a formatting option for std::fmt::Display.

What are the situations where V0 cid should not be equal to the same multihash and cid::Codec::DagProtobuf as V1? I guess the only valid use case is to separate on the Cid::version(&self) but for all content identification or addressing purposes these should be equal even if they have different string representations?

@vmx
Copy link
Member

vmx commented Aug 31, 2020

Thanks for bringing it up. This is something that should work the same across implementations, hence I've opened multiformats/cid#41 for general discussions about it.

@vmx
Copy link
Member

vmx commented Sep 9, 2020

What are the situations where V0 cid should not be equal to the same multihash and cid::Codec::DagProtobuf as V1?

They will have the same Multihash. You can always convert a CIDv0 to a CIDv1 (in case you don't want to be bothered about it, you can do that whenever you encoder a CIDv0), or you can compare the hash instead, e.g. cid0.hash() == cid1.hash().

As I understand the Hash trait, it's really about hashing the object itself, this would also include the version. Only taking the Multihash into account sounds wrong to me.

Other implementations (Go, JS) take the version into account for equality, so the idea is that "equal" means byte-identical (which CIDv0 and CIDv1 are not).

@koivunej
Copy link
Author

What are the situations where V0 cid should not be equal to the same multihash and cid::Codec::DagProtobuf as V1?

They will have the same Multihash. You can always convert a CIDv0 to a CIDv1 (in case you don't want to be bothered about it, you can do that whenever you encoder a CIDv0), or you can compare the hash instead, e.g. cid0.hash() == cid1.hash().

Thanks, we have been doing both of these in rust-ipfs. The reason why I was asking this was to find out the use case the equality is based on.

As I understand the Hash trait, it's really about hashing the object itself, this would also include the version.

I read this sentence as you find the documentation for Hash implying its implementations MUST account for all fields, but I don't think that is the case, and I would be surprised to learn it was. It is however what the default derive does and in most cases it is the correct thing to do. What is important regarding Hash trait is the k1 == k2 -> hash(k1) == hash(k2). The proposed "leaving the cid::Version out" would not have any issues with this.

Only taking the Multihash into account sounds wrong to me.

Apologies, I was not being clear enough but I was thinking the Cid should behave like (cid::Codec, multihash) tuple would, as in, ignore the version for expect the Display and to_bytes() purposes.

Other implementations (Go, JS) take the version into account for equality, so the idea is that "equal" means byte-identical (which CIDv0 and CIDv1 are not).

There must then be a need to retain the original formatting information. Perhaps it has just not occured to me yet :)

@koivunej
Copy link
Author

I replied the above before checking the linked multiformats/cid issue. I guess the ship has sailed already then, closing this as well. I'd still be interested learning about the use case for the byte-equality equality if you know of one.

@vmx
Copy link
Member

vmx commented Sep 21, 2020

implying its implementations MUST account for all fields,

No, I don't think it MUST. Just as you said, it's the common thing to do to include all fields.

I'd still be interested learning about the use case for the byte-equality equality if you know of one.

Good point. I'll watch out for one and let you know if I find one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants