Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: specialize Encodable/Decodable for bytes #4145

Merged
merged 1 commit into from
Jan 30, 2024

Conversation

dpc
Copy link
Contributor

@dpc dpc commented Jan 26, 2024

Re #4111

Re rust-bitcoin/rust-bitcoin#2390

Due to Rust limitations, we can't have impls for both Vec<T> and Vec<u8>. Handling byte case in a generic way is potentially very inefficient, and we do use it in some places, thought it's hard to find all of them (and then prevent even more from being introduced).

The most appealing approach seems to be specializing that case using TypeId check and unsafe.

Cons:

  • TypeId requires 'static bound
  • unsafe (but small and easy to reason about)

Pros:

  • zero cost (at runtime)
  • works automatically everywhere
  • no loss in ergonomics

Edit:

In idle just mprocs I can't tell if there's any performance difference.

chunk_size: usize,
}

/// Read `opts.len` bytes from reader, where `opts.len` could potentially be
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (f7463d3) 58.10% compared to head (154b804) 58.11%.

Files Patch % Lines
fedimint-core/src/encoding/mod.rs 95.89% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4145   +/-   ##
=======================================
  Coverage   58.10%   58.11%           
=======================================
  Files         192      192           
  Lines       42981    43054   +73     
=======================================
+ Hits        24976    25020   +44     
- Misses      18005    18034   +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Re fedimint#4111

Re rust-bitcoin/rust-bitcoin#2390

Due to Rust limitations, we can't have impls for both `Vec<T>`
and `Vec<u8>`. Handling byte case in a generic way is potentially
*very* inefficient, and we do use it in some places, thought it's
hard to find all of them (and then prevent even more from being
introduced).

The most appealing approach seems to be specializing that case
using `TypeId` check and `unsafe`.

Cons:

* `TypeId` requires `'static` bound
* unsafe (but small and easy to reason about)

Pros:

* zero cost (at runtime)
* works automatically everywhere
* no loss in ergonomics
Copy link
Contributor

@elsirion elsirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a not very scientific test this made running fedimint-dbtool dump 18% faster.

Comment on lines +337 to +340
if TypeId::of::<T>() == TypeId::of::<u8>() {
// unsafe: we've just checked that T is `u8` so the transmute here is a no-op
return consensus_encode_bytes(unsafe { mem::transmute::<&[T], &[u8]>(self) }, writer);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really cool pattern! 🤯

Comment on lines +307 to +308
let len: usize =
usize::try_from(len).map_err(|_| DecodeError::from_str("size exceeds memory"))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only error on <64bit platforms, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This errors if trying to decode byte array larger than total theoretical virtual memory available for the process (which we wouldn't be able to allocate anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but for all reasonable platforms these days u64==usize

@dpc
Copy link
Contributor Author

dpc commented Jan 27, 2024

In a not very scientific test this made running fedimint-dbtool dump 18% faster.

I'm happy to hear you were able to confirm this improves something. :D

@tcharding
Copy link

TypeId requires 'static bound

That is a pretty big con, right? In a real life program when would one decode static things? (Excuse my ignorance but the only 'static decodable things are static strings in tests, are they not?)

@dpc
Copy link
Contributor Author

dpc commented Jan 28, 2024

That is a pretty big con, right? In a real life program when would one decode static things? (Excuse my ignorance but the only 'static decodable things are static strings in tests, are they not?)

AFAIR T: 'static means "any references held by this type need to be of static lifetime. Doesn't seem like a big con. It would interefere with zero-copy de/serialization, I guess. I mean - con is a con. Fedimint not being a general purpose library, might be OK with it.

@@ -336,14 +431,29 @@ where
}
}

// From <https://github.com/rust-lang/rust/issues/61956>
unsafe fn horribe_array_transmute_workaround<const N: usize, A, B>(mut arr: [A; N]) -> [B; N] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[inline(always)] to make array doesn't have to be copied

{
fn consensus_decode<D: std::io::Read>(
d: &mut D,
modules: &ModuleDecoderRegistry,
) -> Result<Self, DecodeError> {
if TypeId::of::<T>() == TypeId::of::<u8>() {
// unsafe: we've just checked that T is `u8` so the transmute here is a no-op
return Ok(unsafe { mem::transmute::<Vec<u8>, Vec<T>>(consensus_decode_bytes(d)?) });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait you can't transmute Vec you have to go through {to, from}_raw methods

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it is correct, because T is u8

@maan2003
Copy link
Member

I think we can remove unsafe from this, using dyn Any api

@maan2003
Copy link
Member

maan2003 commented Jan 29, 2024

yes, I removed a lot of unsafe in 6ee8e87

but for &[T], rust complains about 'static

.map_err(DecodeError::from_err)?;
Ok(bytes)
}

impl_encode_decode_tuple!(T1, T2);
impl_encode_decode_tuple!(T1, T2, T3);
impl_encode_decode_tuple!(T1, T2, T3, T4);

impl<T> Encodable for &[T]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should impl for [T] instead of &[T]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but using Any still doesn't work
because you can't do .downcast_ref::<[u8]>(), because downcast_ref needs Sized

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could impl for Vec without unsafe, but not for Box<[u8]> or Arc, Rc

@Kixunil
Copy link

Kixunil commented Jan 29, 2024

To shed more light on why 'static is required: Rust doesn't want to allow non-'static casting because it's not clear how type IDs should interact with lifetimes. Also while apparently there could be a safe downcasting function it's not expressible with current syntax (at least I think so):

fn downcast_ref<'a, T, U>(value:: &'a T) -> Option<&'a U> { /* ... */ }

There needs to be a bound requiring that T outlives U (assuming T and U are covariant) to ban this code:

fn do_stuff<T>(value: &T, s: &mut &'static str) {
    *s = *downcast_ref(value).unwrap();
}

But such bound is currently inexpressible.

This PR wold be sound even without the 'static bound because instead of a generic U it uses a compile time known type u8, which never has any references (this is not the same as 'static!) and thus always outlives itself. (So yes, having an imagined U: 'always_static bound above would work too but it'd still be more restrictive than necessary.) However there's no way to check if T is u8 because you can't obtain TypeId for non-'static T.

@dpc
Copy link
Contributor Author

dpc commented Jan 29, 2024

@Kixunil 100% matches my understanding.

BTW. If Rust had something like TypeId::get_static<T> -> Option<TypeID> then one could have a cake and eat it, no T: 'static would be required - it would just return None.

@dpc
Copy link
Contributor Author

dpc commented Jan 29, 2024

@maan2003 I don't see a reason to worry about wrapping this unsafe into anything. These are the only places where it seems worthwhile and it's already small, isolated and easy to check that it's correct.

Copy link
Member

@maan2003 maan2003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@elsirion elsirion added this pull request to the merge queue Jan 29, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2024
@dpc dpc added this pull request to the merge queue Jan 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 30, 2024
@dpc dpc added this pull request to the merge queue Jan 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 30, 2024
@maan2003 maan2003 added this pull request to the merge queue Jan 30, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 30, 2024
@maan2003
Copy link
Member

wow github doesn't like this pr, maybe we broke something

@dpc dpc added this pull request to the merge queue Jan 30, 2024
@dpc
Copy link
Contributor Author

dpc commented Jan 30, 2024

wow github doesn't like this pr, maybe we broke something

It already passed before the CI. It must be some flakiness on master.

Merged via the queue into fedimint:master with commit 8dd5331 Jan 30, 2024
21 of 22 checks passed
@dpc dpc deleted the 24-01-26-fec-u8-perf branch January 30, 2024 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants