Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture payload size in the multihash #163

Open
Gozala opened this issue Apr 5, 2024 · 4 comments
Open

Capture payload size in the multihash #163

Gozala opened this issue Apr 5, 2024 · 4 comments

Comments

@Gozala
Copy link

Gozala commented Apr 5, 2024

Almost in all instance where we use raw multihash we find ourselves capturing payload size an the side. It is also probably worth calling the fact that somewhat recently fr32-sha2-256-trunc254-padded-binary-tree multihash was defined to capture payload size to address potential vulnerabilities.

Given how common it is to want to capture payload size I would like to propose "multihash multihash" format that is multihash variant that uses multihash code 0x31 and encodes information about payload size and digest. Here is the exact format I'd like to propose

Format

<0x31><varint payload size in bytes><varint hash function code><varint digest size in bytes><hash function output>

FAQ

  • Should CIDs adopt this multihash format instead of what they use now ?

    I don't have use case for that unless anyone already has one I'd say lets not until we do have one. Also adopting it in CID would make their size arbitrary which can introduce various problems

  • Should it be possible for CIDs to use this multihash format ?

    I don't see why not. They could use whatever hashing algorithm they want so it make sense to do the same here

  • Should blockstore keys use this format or should they be unwrapping and using inner multihash ?

    I think block stores do not need to capture size in the key, which probably means they should not use this format to avoid duplication ?

@Gozala
Copy link
Author

Gozala commented Apr 5, 2024

I should note that it was suggested to me to create a PR for this repo and perhaps call this multihash v2, however as per FAQ I don't feel like using it everywhere we use multihash is better not to mention pain of upgrade it would introduce. That said I think it is good idea to have a format for a fairly common (at least in my experience) use case that can be recommended in place of sidecar size field.

If there is both support and desire to make this into a real think I can take write something more formal, but even then could use some feedback in regards where description of this document should live and what format should it have.

@BHare1985
Copy link

Can't the digest size be deduced? This would cleanup all the space that the multi-output hashes are taking up like blake2s and skein

@Stebalien
Copy link
Member

The digest size currently specifies truncation. For some hash functions (e.g., blake3), smaller digests are prefixes of larger digests so we only need one code. However, for hash functions like blake2b, different sizes produce entirely different digests.

@BHare1985
Copy link

BHare1985 commented Aug 27, 2024

The digest size currently specifies truncation. For some hash functions (e.g., blake3), smaller digests are prefixes of larger digests so we only need one code. However, for hash functions like blake2b, different sizes produce entirely different digests.

I understand, and that information is redundant if there is a payload size because you can deduce the hash size from the payload size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants