Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to bitvec 1.0.0 #246

Closed
wants to merge 1 commit into from
Closed

Conversation

JuanPotato
Copy link

@JuanPotato JuanPotato commented Jan 18, 2022

What changes I made

  1. BitSlice<Msb0, u8> -> BitSlice<u8, Msb0>
  2. get_raw_slice is gone, use domain
  3. offset_from is unsafe and needs to be called on BitPtr and is reversed

Not related to the upgrade but it made my tests pass. DekuData.vis was never read and that emitted a warning. Cargo test didn't like this extra warning so I renamed it to _vis for now.

test tests/test_compile/cases/attribute_token_stream.rs ... mismatch

EXPECTED:
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
error[E0308]: mismatched types
 --> $DIR/attribute_token_stream.rs:5:19
  |
5 |     #[deku(cond = "0 == true")]
  |                   ^^^^^^^^^^^ expected integer, found `bool`

error[E0277]: can't compare `{integer}` with `bool`
 --> $DIR/attribute_token_stream.rs:5:19
  |
5 |     #[deku(cond = "0 == true")]
  |                   ^^^^^^^^^^^ no implementation for `{integer} == bool`
  |
  = help: the trait `PartialEq<bool>` is not implemented for `{integer}`
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈

ACTUAL OUTPUT:
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
warning: field is never read: `vis`
   --> deku-derive/src/lib.rs
    |
    |     vis: syn::Visibility,
    |     ^^^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(dead_code)]` on by default

error[E0308]: mismatched types
 --> tests/test_compile/cases/attribute_token_stream.rs:5:19
  |
5 |     #[deku(cond = "0 == true")]
  |                   ^^^^^^^^^^^ expected integer, found `bool`

error[E0277]: can't compare `{integer}` with `bool`
 --> tests/test_compile/cases/attribute_token_stream.rs:5:19
  |
5 |     #[deku(cond = "0 == true")]
  |                   ^^^^^^^^^^^ no implementation for `{integer} == bool`
  |
  = help: the trait `PartialEq<bool>` is not implemented for `{integer}`
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
note: If the actual output is the correct output you can bless it by rerunning
      your test with the environment variable TRYBUILD=overwrite

closes #243

Copy link
Collaborator

@wcampbell0x2a wcampbell0x2a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I have some comments but it looks good :)

@@ -77,7 +77,7 @@ fn emit_struct(input: &DekuData) -> Result<TokenStream, syn::Error> {
let __deku_pad = 8 * ((__deku_rest.len() + 7) / 8) - __deku_rest.len();
let __deku_read_idx = __deku_input_bits.len() - (__deku_rest.len() + __deku_pad);

Ok(((__deku_input_bits[__deku_read_idx..].as_raw_slice(), __deku_pad), __deku_value))
Ok(((__deku_input_bits[__deku_read_idx..].domain().region().unwrap().1, __deku_pad), __deku_value))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much rather prefer naming the tuple field.

Suggested change
Ok(((__deku_input_bits[__deku_read_idx..].domain().region().unwrap().1, __deku_pad), __deku_value))
let (_, remaining_bytes, _) = __deku_input_bits[__deku_read_idx..].domain().region().unwrap();
Ok(((remaining_bytes, __deku_pad), __deku_value))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could also just make the tuples an actual named struct here.

@@ -308,7 +308,7 @@ fn emit_enum(input: &DekuData) -> Result<TokenStream, syn::Error> {
let __deku_pad = 8 * ((__deku_rest.len() + 7) / 8) - __deku_rest.len();
let __deku_read_idx = __deku_input_bits.len() - (__deku_rest.len() + __deku_pad);

Ok(((__deku_input_bits[__deku_read_idx..].as_raw_slice(), __deku_pad), __deku_value))
Ok(((__deku_input_bits[__deku_read_idx..].domain().region().unwrap().1, __deku_pad), __deku_value))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

@@ -453,7 +453,7 @@ fn emit_bit_byte_offsets(
|| byte_offset.is_some()
{
Some(quote! {
let __deku_bit_offset = usize::try_from(__deku_input_bits.offset_from(__deku_rest))?;
let __deku_bit_offset = usize::try_from(unsafe { __deku_rest.as_bitptr().offset_from(__deku_input_bits.as_bitptr()) } )?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are here so many unsafes

let mut res = HashMap::with_capacity_and_hasher(capacity.unwrap_or(0), S::default());

let mut rest = input;
let mut found_predicate = false;

while !found_predicate {
let (new_rest, kv) = <(K, V)>::read(rest, ctx)?;
found_predicate = predicate(input.offset_from(new_rest) as usize, &kv);
found_predicate = predicate(unsafe { new_rest.as_bitptr().offset_from(input.as_bitptr()) } as usize, &kv);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -34,11 +34,11 @@ macro_rules! ImplDekuTraits {

let value = if pad == 0
&& bit_slice.len() == max_type_bits
&& bit_slice.as_raw_slice().len() * 8 == max_type_bits
&& bit_slice.domain().region().unwrap().1.len() * 8 == max_type_bits
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

{
// if everything is aligned, just read the value

let bytes: &[u8] = bit_slice.as_raw_slice();
let bytes: &[u8] = bit_slice.domain().region().unwrap().1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -81,7 +81,7 @@ macro_rules! ImplDekuTraits {
bits
};

let bytes: &[u8] = bits.as_raw_slice();
let bytes: &[u8] = bits.domain().region().unwrap().1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -25,8 +25,8 @@ where
let (new_rest, val) = u8::read(rest, ctx)?;
rest = new_rest;

let read_idx = input.offset_from(rest) as usize;
value = input[..read_idx].as_raw_slice();
let read_idx = unsafe { rest.as_bitptr().offset_from(input.as_bitptr()) } as usize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

match limit {
// Read a given count of elements
Limit::Count(mut count) => {
// Handle the trivial case of reading an empty slice
if count == 0 {
return Ok((input, &input.as_raw_slice()[..0]));
return Ok((input, &input.domain().region().unwrap().1[..0]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -33,7 +33,7 @@ fn read_vec_with_predicate<

// This unwrap is safe as we are pushing to the vec immediately before it,
// so there will always be a last element
if predicate(input.offset_from(rest) as usize, res.last().unwrap()) {
if predicate(unsafe { rest.as_bitptr().offset_from(input.as_bitptr()) } as usize, res.last().unwrap()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@wcampbell0x2a
Copy link
Collaborator

Running the benchmarks (which aren't a full holistic picture of this library), gives me some worrying numbers. 😬 These are ran on my cheap laptop:

origin/master..JuanPotato/master

deku_read_byte          time:   [35.971 ns 36.308 ns 36.661 ns]
                        change: [+197.36% +202.61% +208.23%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

deku_write_byte         time:   [84.430 ns 85.312 ns 86.242 ns]
                        change: [+23.911% +26.359% +28.970%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

deku_read_enum          time:   [58.737 ns 59.303 ns 59.910 ns]
                        change: [+172.07% +178.52% +184.68%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

deku_write_enum         time:   [143.86 ns 145.50 ns 147.24 ns]
                        change: [+28.792% +31.542% +34.688%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

deku_read_vec           time:   [2.4821 us 2.5039 us 2.5270 us]
                        change: [+130.90% +134.30% +137.80%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

deku_write_vec          time:   [6.2461 us 6.3132 us 6.3845 us]
                        change: [+28.856% +31.085% +33.302%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

origin/master for reference:

deku_read_byte          time:   [11.810 ns 11.942 ns 12.085 ns]
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

deku_write_byte         time:   [67.383 ns 68.076 ns 68.816 ns]
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

deku_read_enum          time:   [20.725 ns 20.979 ns 21.251 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

deku_write_enum         time:   [110.09 ns 111.23 ns 112.46 ns]
Found 13 outliers among 100 measurements (13.00%)
  8 (8.00%) high mild
  5 (5.00%) high severe

deku_read_vec           time:   [1.0543 us 1.0685 us 1.0835 us]
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

deku_write_vec          time:   [4.7405 us 4.7984 us 4.8617 us]
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

@myrrlyn @sharksforarms

@wcampbell0x2a
Copy link
Collaborator

Running benchmarks on my adsb library after switching deku to this branch:

lax_messsages           time:   [1.5287 s 1.5305 s 1.5330 s]
                        change: [+56.845% +57.120% +57.449%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

@JuanPotato
Copy link
Author

Thank you for the feedback! Yeah that's not great performance, and I'm also thinking my code might have some bugs in it in very specific cases, Will add some new tests and see.

Also not sure if bitvec itself regressed or if our usage of it is not ideal.

@JuanPotato JuanPotato marked this pull request as draft January 18, 2022 12:35
@wcampbell0x2a
Copy link
Collaborator

Thank you for the feedback! Yeah that's not great performance, and I'm also thinking my code might have some bugs in it in very specific cases, Will add some new tests and see.

Also not sure if bitvec itself regressed or if our usage of it is not ideal.

There is this issue, it might not be related to our performance regression however. ferrilab/bitvec#158

@sharksforarms
Copy link
Owner

Thanks for this work! Let's see if we can figure out the performance issue before merging.

@DerFetzer
Copy link

I ran the benchmark executables with perf.
bitvec::Domain for sure is part of the problem.

master:
grafik

JuanPotato:master:
grafik

Just by refactoring https://github.com/JuanPotato/deku/blob/09427b8eb5e39402c4dec49070e9ae0a5ecbe20d/src/impls/primitive.rs#L35 with a match and thus reusing bit_slice.domain().region().unwrap().1 the performance improved again up to 50%.
grafik

But even this is of course still much slower than the master branch.

@JuanPotato
Copy link
Author

At some point I began working on a replacement to bitvec that would be much more performant for deku's usage. I got it looking pretty good but it required a lot of refactoring and is definitely a breaking change. Can't remember how the performance compares to master though

@wcampbell0x2a
Copy link
Collaborator

@myrrlyn I see you updated bitvec w.r.t. performance with the ferrilab/bitvec@4de6277 commit.

When updating the Cargo.toml of this update branch

diff --git a/Cargo.toml b/Cargo.toml
index 14d1a0a..ebfb0c0 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -26,7 +26,7 @@ const_generics = []

 [dependencies]
 deku_derive = { version = "^0.12.0", path = "deku-derive", default-features = false}
-bitvec = { version = "1.0.0", default-features = false }
+bitvec = { git = "https://github.com/bitvecto-rs/bitvec", default-features = false }

I get performance boosts vs the v1 version of bitvec:

deku_read_byte          time:   [15.993 ns 16.075 ns 16.167 ns]
                        change: [-21.903% -21.507% -20.952%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

deku_write_byte         time:   [53.931 ns 54.056 ns 54.216 ns]
                        change: [-2.3317% -1.1832% +0.5395%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe

deku_read_enum          time:   [27.867 ns 28.048 ns 28.236 ns]
                        change: [-22.115% -21.262% -20.134%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 26 outliers among 100 measurements (26.00%)
  15 (15.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe

deku_write_enum         time:   [91.762 ns 91.857 ns 91.960 ns]
                        change: [-2.7815% -0.2585% +3.3560%] (p = 0.88 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe

deku_read_vec           time:   [1.0901 us 1.0914 us 1.0927 us]
                        change: [-36.372% -35.693% -35.277%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

deku_write_vec          time:   [4.4868 us 4.4926 us 4.4985 us]
                        change: [+4.6790% +5.9749% +7.6922%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  1 (1.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

However, still not the kind of performance that current deku provides:

deku_read_byte          time:   [6.4593 ns 6.4657 ns 6.4736 ns]
                        change: [-60.561% -60.293% -60.044%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high severe

deku_write_byte         time:   [40.263 ns 40.318 ns 40.370 ns]
                        change: [-27.729% -26.536% -25.703%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

deku_read_enum          time:   [11.972 ns 11.984 ns 11.996 ns]
                        change: [-58.424% -57.816% -57.379%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

deku_write_enum         time:   [66.074 ns 66.155 ns 66.233 ns]
                        change: [-31.005% -27.837% -24.929%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

deku_read_vec           time:   [638.18 ns 638.77 ns 639.49 ns]
                        change: [-41.653% -41.444% -41.180%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild
  4 (4.00%) high severe

deku_write_vec          time:   [2.8597 us 2.8650 us 2.8723 us]
                        change: [-37.915% -36.792% -35.948%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

@myrrlyn
Copy link
Contributor

myrrlyn commented Jul 2, 2022

By any chance, does the faster old version have #[inline] on non-public functions

@wcampbell0x2a
Copy link
Collaborator

By any chance, does the faster old version have #[inline] on non-public functions

I'm not at my computer, but I think this branch is the code that's used for our code. Looks like everything possible is inlined?

https://github.com/bitvecto-rs/bitvec/tree/v0/22/3/src

@wcampbell0x2a
Copy link
Collaborator

as of bitvec v1.0.1, compared to old bitvec:

deku_read_byte          time:   [15.651 ns 15.732 ns 15.806 ns]
                        change: [+93.359% +94.661% +95.905%] (p = 0.00 < 0.05)
                        Performance has regressed.

deku_write_byte         time:   [54.336 ns 54.673 ns 55.005 ns]
                        change: [+24.738% +25.686% +26.695%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

deku_read_enum          time:   [27.266 ns 27.461 ns 27.667 ns]
                        change: [+80.433% +83.093% +85.122%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

deku_write_enum         time:   [89.895 ns 90.558 ns 91.282 ns]
                        change: [+26.135% +27.569% +28.872%] (p = 0.00 < 0.05)
                        Performance has regressed.

deku_read_vec           time:   [1.3311 us 1.3361 us 1.3412 us]
                        change: [+95.736% +100.29% +105.39%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

deku_write_vec          time:   [3.9468 us 3.9727 us 4.0026 us]
                        change: [+29.954% +31.057% +32.167%] (p = 0.00 < 0.05)
                        Performance has regressed.

@wcampbell0x2a
Copy link
Collaborator

Better https://github.com/wcampbell0x2a/deku/tree/update-bitvec-1.x.x:

deku_read_byte          time:   [11.777 ns 11.799 ns 11.822 ns]
deku_write_byte         time:   [50.278 ns 50.330 ns 50.383 ns]
deku_read_enum          time:   [20.374 ns 20.398 ns 20.422 ns]
deku_write_enum         time:   [84.699 ns 84.795 ns 84.896 ns]
deku_read_vec           time:   [991.05 ns 992.97 ns 995.64 ns]
deku_write_vec          time:   [3.7768 µs 3.7814 µs 3.7862 µs]

@sharksforarms
Copy link
Owner

Superseded by #281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bitvec 1.0.0 support
6 participants