Use inline storage for small hashes #47

rklaehn · 2020-02-15T16:43:12Z

This also gets rid of the Bytes dependency and replaces it with an Arc<[u8]>.

Implementation for #46

Note that the size of this new multihash is 40, where as the size of the old Bytes based multihash is 32, plus the stuff on the heap. I think this is a pretty good tradeoff!

src/storage.rs

dignifiedquire

I like this approach, though I still wonder if just using smallvec and letting the user wrap in an arc if needed wouldn’t make for a nicer separation of concerns,

dignifiedquire · 2020-02-15T17:44:02Z

src/storage.rs

@@ -0,0 +1,43 @@
+use std::sync::Arc;
+
+const MAX_INLINE: usize = 39;


So the total size is 40, a multiple of 8. 1 byte is needed for the enum discriminator.

39 fits all 256 bit hashes with some room to spare. 512 byte hashes won't work though, and I think once you go to such large hashes, you are better off with an Arc<[u8]>.

But the size of the inline buffer can of course be adjusted when even larger hashes become common. The whole mechanism is completely opaque.

rklaehn · 2020-02-15T18:09:26Z

@dignifiedquire smallvec has an overhead of one usize for storing the size https://github.com/servo/rust-smallvec/blob/master/lib.rs#L374 , which is not necessary because the hash already knows how big it is. It also has the overhead of the discriminator, unless you use the "union" feature. https://github.com/servo/rust-smallvec/blob/master/lib.rs#L292, which is disabled by default: https://docs.rs/smallvec/1.2.0/smallvec/#union-feature . Due to the alignment rules, this discriminator costs you 8 bytes on 64 bit arch. So in total an overhead of 16 bytes.

But most of all, it is a dependency containing unsafe code that can cause problems for others and that is not needed for a few LOC.

burdges · 2020-02-16T00:56:53Z

There is an old smallbox crate but comes with a silly inefficiency that's worse than smallvec here.

I'd think fix SmallBox<T, Space> with an inner union that discriminates based on size_of::<T>() < size_of::<Space>(), except..

I presume smallbox really exists to make SmallBox<dyn Trait, Space> work. Ideally, you'd make two vtables for dyn Trait that distinguished between smaller and larger than Space, but this would require deep rustc integration.

It appears possible to do this manually for individual traits, like we really could make a no_std no_alloc compatible SmallError that only takes 16 bytes and lets std::io traits work without std or alloc, so long as error types stay smaller than usize: rust-lang/rfcs#2820 (comment)

It's possible some #[dyn_smallbox] procmaco could abstract this over traits with the current rustc, but nobody did one.

Apologies for the derail, but I'm agreeing with you that smallbox optimizations become messy and custom, especially whenever you want dyn Trait to work.

I'm also however pointing out that alloc being optional comes with some benefits, although perhaps not for multihash.

rklaehn · 2020-02-16T19:48:02Z

@burdges interesting, but I think in this case just doing the branching manually is best. It is quite simple code.

rklaehn · 2020-02-17T08:29:30Z

Thought about this some more.

Given that the average 256 bit hash is 32 bytes, 34 bytes with metadata, there is enough room to store the length separately in 40 bytes to avoid this awkward recomputation of the length from the content. Will update the PR. Then the storage is just an efficient encoding of an arbitrary Vec, giving better separation of concerns.
The way Bytes was used here, there is really no benefit in Bytes over Arc<[u8]>.

We can afford it since the average hash is 34 bytes, and we want this thing to be a multiple of 8 large.

vmx

Nice!

I've run this code against Vec and Bytes based on the use case the libp2p has: https://gist.github.com/vmx/0dc0d981b522e2834084559faa0dcb56

On my machine this code code is always faster or similar (never slower).

src/storage.rs

The next useful limit would be 62, which would make the whole thing 64 bytes large and fit any 384 bit hash, but I don't think those are very common yet.

rklaehn · 2020-02-17T19:26:07Z

@vmx thanks for the benchmark. Criterion is pretty nice.

For small hashes that fit, the improvement is quite good. I think we should give multiaddr the same treatment once this is merged...

vmx · 2020-02-18T13:31:29Z

@twittner Could you please have a look at this PR? You were in favour for using Bytes. Would this change for for your use case in libp2p?

twittner

Looks like a nice improvement to me.

src/lib.rs

src/storage.rs

- use pub(crate) for Storage - use Storage::from_slices to prevent allocations for small identity multihashes

...and also a property based test for the normal roundtrip, now that we have the dependency anyway.

…all vecs

src/lib.rs

src/storage.rs

no more extern crate

vmx

@twittner Did all your concerns get addressed?

rklaehn added 2 commits February 15, 2020 17:39

Use inline storage for small hashes

20c5c89

Clippy

4e0632b

burdges reviewed Feb 15, 2020

View reviewed changes

src/storage.rs Outdated Show resolved Hide resolved

dignifiedquire reviewed Feb 15, 2020

View reviewed changes

rklaehn force-pushed the inline-hash-storage branch from 70b19af to eecc3fa Compare February 15, 2020 17:56

Rename copy_from_slice to just from_slice

0559bed

rklaehn force-pushed the inline-hash-storage branch from eecc3fa to 0559bed Compare February 15, 2020 18:11

Explicity store the bytes size

02cf114

We can afford it since the average hash is 34 bytes, and we want this thing to be a multiple of 8 large.

rklaehn force-pushed the inline-hash-storage branch from c8cd5c0 to 02cf114 Compare February 17, 2020 14:20

rklaehn marked this pull request as ready for review February 17, 2020 14:22

vmx mentioned this pull request Feb 17, 2020

feat: Massive refactor with a new API #45

Merged

vmx reviewed Feb 17, 2020

View reviewed changes

src/storage.rs Show resolved Hide resolved

Add comment about the rationale for the 38 byte limit.

9ab33d9

The next useful limit would be 62, which would make the whole thing 64 bytes large and fit any 384 bit hash, but I don't think those are very common yet.

rklaehn mentioned this pull request Feb 17, 2020

Consider an inline representation for small multiaddrs multiformats/rust-multiaddr#36

Open

twittner reviewed Feb 18, 2020

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Show resolved Hide resolved

src/lib.rs Show resolved Hide resolved

src/storage.rs Outdated Show resolved Hide resolved

rklaehn added 3 commits February 18, 2020 18:56

PR feedback

5a91f48

- use pub(crate) for Storage - use Storage::from_slices to prevent allocations for small identity multihashes

Add quickcheck tests for from_slices

ec44135

...and also a property based test for the normal roundtrip, now that we have the dependency anyway.

Add check_invariants to make sure we don't create heap storage for sm…

b729685

…all vecs

rklaehn mentioned this pull request Feb 19, 2020

Change Cid to depend on Multihash for storage multiformats/rust-cid#22

Closed

vmx reviewed Feb 19, 2020

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

src/storage.rs Outdated Show resolved Hide resolved

PR feedback

3456e07

no more extern crate

vmx approved these changes Feb 19, 2020

View reviewed changes

Make debug instance useful

62692e2

rklaehn force-pushed the inline-hash-storage branch from c4a5fa8 to 62692e2 Compare February 20, 2020 08:22

This was referenced Feb 20, 2020

Add ordering for Multihash and MultihashRef #50

Merged

Hash and Multihash arbitrary #51

Merged

dignifiedquire approved these changes Feb 20, 2020

View reviewed changes

vmx merged commit 0ec803a into multiformats:master Feb 21, 2020

rklaehn mentioned this pull request Mar 25, 2020

Multiaddr: Replace Arc<Vec<u8>> with inline storage libp2p/rust-libp2p#1510

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use inline storage for small hashes #47

Use inline storage for small hashes #47

rklaehn commented Feb 15, 2020 •

edited

Loading

dignifiedquire left a comment

dignifiedquire Feb 15, 2020

rklaehn Feb 15, 2020

rklaehn Feb 15, 2020

rklaehn commented Feb 15, 2020

burdges commented Feb 16, 2020 •

edited

Loading

rklaehn commented Feb 16, 2020

rklaehn commented Feb 17, 2020

vmx left a comment

rklaehn commented Feb 17, 2020

vmx commented Feb 18, 2020

twittner left a comment

vmx left a comment

		@@ -0,0 +1,43 @@
		use std::sync::Arc;

		const MAX_INLINE: usize = 39;

Use inline storage for small hashes #47

Use inline storage for small hashes #47

Conversation

rklaehn commented Feb 15, 2020 • edited Loading

dignifiedquire left a comment

Choose a reason for hiding this comment

dignifiedquire Feb 15, 2020

Choose a reason for hiding this comment

rklaehn Feb 15, 2020

Choose a reason for hiding this comment

rklaehn Feb 15, 2020

Choose a reason for hiding this comment

rklaehn commented Feb 15, 2020

burdges commented Feb 16, 2020 • edited Loading

rklaehn commented Feb 16, 2020

rklaehn commented Feb 17, 2020

vmx left a comment

Choose a reason for hiding this comment

rklaehn commented Feb 17, 2020

vmx commented Feb 18, 2020

twittner left a comment

Choose a reason for hiding this comment

vmx left a comment

Choose a reason for hiding this comment

rklaehn commented Feb 15, 2020 •

edited

Loading

burdges commented Feb 16, 2020 •

edited

Loading