Skip to content
This repository has been archived by the owner on Feb 6, 2024. It is now read-only.

Unsized Cids #5

Closed
Stebalien opened this issue Oct 14, 2021 · 9 comments
Closed

Unsized Cids #5

Stebalien opened this issue Oct 14, 2021 · 9 comments

Comments

@Stebalien
Copy link

I'm currently trying to define an "object safe" blockstore but I'd also like the blockstore to be generic over CID sizes. Unfortunately, I can't have generic functions in an object safe type.

However:

  1. I'm taking CIDs by reference (don't need to keep copies).
  2. Don't really care about the size.

So I'd like to erase it. Luckily. This seems to be entirely possible in safe and stable rust using unsized types:

#[derive(Debug)]
#[doc(hidden)]
pub struct MultihashBase<T: ?Sized> {
    code: u64,
    size: u8,
    digest: T,
}

#[derive(Debug)]
pub enum Version {
    V0,
    V1,
}

pub type Codec = u64;

#[derive(Debug)]
#[doc(hidden)]
pub struct CidBase<T: ?Sized> {
    version: Version,
    codec: Codec,
    hash: MultihashBase<T>,
}

pub type Cid<const S: usize> = CidBase<[u8; S]>;
pub type Multihash<const S: usize> = MultihashBase<[u8; S]>;

pub type CidSlice = CidBase<[u8]>; // Not sure how to name this...
pub type MultihashSlice = MultihashBase<[u8]>;

fn main() {
    let c: Cid<32> = Cid {
        version: Version::V1,
        codec: 0x55,
        hash: Multihash {
            code: 0,
            size: 32,
            digest: [0; 32],
        },
    };
    let cs: &CidSlice = &c; // this seems to "just work" on the latest stable compiler, no unsafe needed.
    println!("{:?}", cs);
}

It may be possible to do this without redefining these types, but it likely requires liberal use of transmute and a custom deref function. The above implementation is probably faster and safer (and more likely to play well with the borrow checker, especially NLL.

Would you be open to a set of PRs to implement this?

@johnchandlerburnham
Copy link
Member

This looks like a an elegant way to have compile-time sizes and to erase that info when we don't want it.

It looks like we can even do:

#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[doc(hidden)]
pub struct MultihashBase<T: ?Sized> {
  code: u64,
  size: u8,
  digest: T,
}

#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
#[doc(hidden)]
pub struct CidBase<T: ?Sized> {
  version: Version,
  codec: Codec,
  hash: MultihashBase<T>,
}

impl<const S: usize> Copy for MultihashBase<[u8; S]> {}
impl<const S: usize> Copy for CidBase<[u8; S]> {}

And get Copy when we have known static sizes, which is pretty cool.

I wonder if as part of this we can also get rid of the size: u8 field on MultihashBase, since digest.len() should return the number of bytes in the digest.

So yes, definitely open to PRs on this!

@Stebalien
Copy link
Author

I wonder if as part of this we can also get rid of the size: u8 field on MultihashBase, since digest.len() should return the number of bytes in the digest.

It's redundant when unsized, but I think we still need it when sized. Otherwise, it's difficult to construct variable-sized multihashes/cids.

And get Copy when we have known static sizes, which is pretty cool.

#[derive(Copy)] seems to work. Am I missing something?

[ names ]

MultihashSlice doesn't really make sense because we don't actually support slicing, just unsizing. I believe the traditional approach would be:

  • MultihashArrayBuf<S> fixed-sized.
  • MultihashBuf sized but allocated (like PathBuf).
  • Multihash unsized (like Path).

But that would be a massively breaking change.

Thoughts?

  • DynCid
  • AnyCid
  • CidSlice
  • DstCid

I'm not a fan of any of the options, but I'm leaning towards DynCid.


OT:

  1. The tests don't build (multihash).
  2. Is there a rustfmt config lying around?

Test output:

error[E0432]: unresolved imports `sp_multihash::Sha1`, `sp_multihash::Sha1Digest`, `sp_multihash::Strobe256`, `sp_multihash::Strobe512`, `sp_multihash::StrobeDigest`
  --> tests/lib.rs:26:3
   |
26 |   Sha1,
   |   ^^^^ no `Sha1` in the root
27 |   Sha1Digest,
   |   ^^^^^^^^^^
   |   |
   |   no `Sha1Digest` in the root
   |   help: a similar name exists in the module: `Sha2Digest`
...
37 |   Strobe256,
   |   ^^^^^^^^^ no `Strobe256` in the root
38 |   Strobe512,
   |   ^^^^^^^^^ no `Strobe512` in the root
39 |   StrobeDigest,
   |   ^^^^^^^^^^^^ no `StrobeDigest` in the root

error[E0433]: failed to resolve: use of undeclared crate or module `rand`
 --> src/arb.rs:5:5
  |
5 | use rand::{
  |     ^^^^ use of undeclared crate or module `rand`

error[E0432]: unresolved import `quickcheck`
 --> src/arb.rs:1:5
  |
1 | use quickcheck::{
  |     ^^^^^^^^^^ use of undeclared crate or module `quickcheck`

error[E0432]: unresolved import `rand`
 --> src/arb.rs:5:5
  |
5 | use rand::{
  |     ^^^^ use of undeclared crate or module `rand`

error[E0433]: failed to resolve: use of undeclared type `WeightedIndex`
  --> src/arb.rs:21:16
   |
21 |     let dist = WeightedIndex::new(weights.iter()).unwrap();
   |                ^^^^^^^^^^^^^ use of undeclared type `WeightedIndex`

Some errors have detailed explanations: E0432, E0433.
For more information about an error, try `rustc --explain E0432`.
error: could not compile `sp-multihash` due to 4 previous errors
warning: build failed, waiting for other jobs to finish...
error[E0599]: no method named `write_all` found for struct `IdentityHasher` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `IdentityHasher<32_usize>`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `sp_multihash::Sha2_256` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `sp_multihash::Sha2_256`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `sp_multihash::Sha2_512` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `sp_multihash::Sha2_512`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `sp_multihash::Sha3_224` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `sp_multihash::Sha3_224`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `sp_multihash::Sha3_256` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `sp_multihash::Sha3_256`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `sp_multihash::Sha3_384` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `sp_multihash::Sha3_384`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `sp_multihash::Sha3_512` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `sp_multihash::Sha3_512`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `sp_multihash::Keccak224` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `sp_multihash::Keccak224`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `sp_multihash::Keccak256` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `sp_multihash::Keccak256`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `sp_multihash::Keccak384` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `sp_multihash::Keccak384`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `sp_multihash::Keccak512` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `sp_multihash::Keccak512`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `Blake2bHasher` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `Blake2bHasher<64_usize>`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `Blake2sHasher` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `Blake2sHasher<32_usize>`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

error[E0599]: no method named `write_all` found for struct `Blake3Hasher` in the current scope
   --> tests/lib.rs:203:24
    |
203 |                   hasher.write_all(b"helloworld").unwrap();
    |                          ^^^^^^^^^ method not found in `Blake3Hasher<32_usize>`
...
217 | /   assert_roundtrip!(
218 | |       Code::Identity, Identity256;
219 | |       Code::Sha1, Sha1;
220 | |       Code::Sha2_256, Sha2_256;
...   |
232 | |       Code::Blake3_256, Blake3_256;
233 | |   );
    | |____- in this macro invocation
    |
    = note: this error originates in the macro `assert_roundtrip` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: unused import: `std::io::Write`
 --> tests/lib.rs:2:5
  |
2 | use std::io::Write;
  |     ^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

Some errors have detailed explanations: E0432, E0599.
warning: `sp-multihash` (test "lib") generated 1 warning
error: build failed

@johnchandlerburnham
Copy link
Member

It's redundant when unsized, but I think we still need it when sized. Otherwise, it's difficult to construct variable-sized multihashes/cids.

Could you give me an example of a variable-sized multihash/cid where the size is actually used? Trying to see in what edge-cases where the digest length and the size would be different.

test issues

You need to do cargo test --all-features to turn on std in test. I'll see if I can fix this, and will add it to the README otherwise

#[derive(Copy)] seems to work. Am I missing something?

No, I just assumed the derive wouldn't work. Tried it now and it does though, which is nice.

names

I like DynCyn/DynMultihash, but I think

MultihashArrayBuf<S> fixed-sized.
MultihashBuf sized but allocated (like PathBuf).
Multihash unsized (like Path).T: ?Sized>

is worth a second look, since

 pub struct MultihashBuf {
    code: u64,
    size: u8,
    digest: Vec<u8>,
}

Seems like it might have some uses in some contexts.

rustfmt

This one?

@johnchandlerburnham
Copy link
Member

Oh, just realized what you meant by "variable-sized multihash", you mean like a variable-digest hash where the digest size isn't divisible by 8, right? E.g. a 100 bit Blake3 won't be distinguishable from a 104 bit Blake3 without the size field

@Stebalien
Copy link
Author

Could you give me an example of a variable-sized multihash/cid where the size is actually used? Trying to see in what edge-cases where the digest length and the size would be different.

I'm specifically thinking of cases where one might want to support different hash functions with different digests. At the moment, one would likely pick a reasonable maximum (e.g., what the standard Cid type does). This overestimation means the actual digest size needs to be stored.

Oh, just realized what you meant by "variable-sized multihash", you mean like a variable-digest hash where the digest size isn't divisible by 8, right? E.g. a 100 bit Blake3 won't be distinguishable from a 104 bit Blake3 without the size field

Well... the way multihash works, any hash digest can be truncated.

You need to do cargo test --all-features to turn on std in test. I'll see if I can fix this, and will add it to the README otherwise

Ah, got it. It should be fixable by adding a bunch of cfgs to the relevant tests.

rustfmt
This one?

Ah, I was in the multiformats repo. I'll copy it there.

@Stebalien
Copy link
Author

MultihashBuf
Seems like it might have some uses in some contexts.

It completes the picture, but given that multihashes generally aren't built piecewise, I can't think of one except maybe as some kind of reusable cid scratch pad? I think Box<DynMultihash> is probably sufficient in most cases.

@Stebalien
Copy link
Author

Ah, got it. It should be fixable by adding a bunch of cfgs to the relevant tests.

Oh, you want to test all features. Yeah.. that makes sense and I have no idea how to fix that.

@Stebalien
Copy link
Author

So, I looked into MultihashBuf anyways and it works, but it won't be castable to a DynMultihash which would make DynMultihash significantly less useful. To actually make that work, MultihashBuf would need to be implemented as Box<DynMultihash> under the hood, with some very fancy resizing.

@Stebalien
Copy link
Author

Multihash implementation in https://github.com/Stebalien/sp-multihash/tree/feat/unsized. But we should resolve lurk-lab/sp-multihash#7 first.

@samuelburnham samuelburnham closed this as not planned Won't fix, can't repro, duplicate, stale Feb 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants