Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take Writer/Reader by &mut in consensus en/decoding #1035

Merged
merged 3 commits into from Jun 29, 2022
Merged

Take Writer/Reader by &mut in consensus en/decoding #1035

merged 3 commits into from Jun 29, 2022

Conversation

dpc
Copy link
Contributor

@dpc dpc commented Jun 3, 2022

Fix #1020 (see more relevant discussion there)

This definitely makes the amount of generics compiler
has to generate by avoding generating the same functions
for R, &mut R, &mut &mut R and so on.

old:

> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 9947832 Jun  2 22:42 target/release/deps/bitcoin-07a9dabf1f3e0266
> strip target/release/deps/bitcoin-07a9dabf1f3e0266
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 4463024 Jun  2 22:46 target/release/deps/bitcoin-07a9dabf1f3e0266

new:


> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 9866800 Jun  2 22:44 target/release/deps/bitcoin-07a9dabf1f3e0266
> strip target/release/deps/bitcoin-07a9dabf1f3e0266
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 4393392 Jun  2 22:45 target/release/deps/bitcoin-07a9dabf1f3e0266

In the unit-test binary itself, it saves ~100KB of data.

I did not expect much performance gains, but turn out I was wrong(*):

old:

test blockdata::block::benches::bench_block_deserialize                 ... bench:   1,072,710 ns/iter (+/- 21,871)
test blockdata::block::benches::bench_block_serialize                   ... bench:     191,223 ns/iter (+/- 5,833)
test blockdata::block::benches::bench_block_serialize_logic             ... bench:      37,543 ns/iter (+/- 732)
test blockdata::block::benches::bench_stream_reader                     ... bench:   1,872,455 ns/iter (+/- 149,519)
test blockdata::transaction::benches::bench_transaction_deserialize     ... bench:         136 ns/iter (+/- 3)
test blockdata::transaction::benches::bench_transaction_serialize       ... bench:          51 ns/iter (+/- 8)
test blockdata::transaction::benches::bench_transaction_serialize_logic ... bench:           5 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_size            ... bench:           3 ns/iter (+/- 0)

new:

test blockdata::block::benches::bench_block_deserialize                 ... bench:   1,028,574 ns/iter (+/- 10,910)
test blockdata::block::benches::bench_block_serialize                   ... bench:     162,143 ns/iter (+/- 3,363)
test blockdata::block::benches::bench_block_serialize_logic             ... bench:      30,725 ns/iter (+/- 695)
test blockdata::block::benches::bench_stream_reader                     ... bench:   1,437,071 ns/iter (+/- 53,694)
test blockdata::transaction::benches::bench_transaction_deserialize     ... bench:          92 ns/iter (+/- 2)
test blockdata::transaction::benches::bench_transaction_serialize       ... bench:          17 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_serialize_logic ... bench:           5 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_size            ... bench:           4 ns/iter (+/- 0)

(*) - I'm benchmarking on a noisy laptop. Take this with a grain of salt. But I think
at least it doesn't make anything slower.

While doing all this manual labor that will probably generate conflicts,
I took a liberty of changing generic type names and variable names to
r and R (reader) and w and W for writer.

@dpc
Copy link
Contributor Author

dpc commented Jun 3, 2022

In debug build (cargo test) the differene is around 900KB:

old:

> ls -al target/debug/deps/bitcoin-32a2c32b15dd4dc4
-rwxrwxr-x 1 dpc dpc 35997144 Jun  2 23:46 target/debug/deps/bitcoin-32a2c32b15dd4dc4

new:

> ls -al target/debug/deps/bitcoin-32a2c32b15dd4dc4
-rwxrwxr-x 1 dpc dpc 35097536 Jun  2 23:45 target/debug/deps/bitcoin-32a2c32b15dd4dc4

apoelstra
apoelstra previously approved these changes Jun 3, 2022
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 0833cc0

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be a bore but can you fix patch 3 please? The consensus_decode_global changes should be in patch 1, right? And the other change:
a) Could be improved (removing the multiple muts)
b) Could be in a patch of its own with a commit log

Thanks

Ok($thing {
$($field: $crate::consensus::Decodable::consensus_decode(&mut d)?),+
$($field: $crate::consensus::Decodable::consensus_decode(r.by_ref())?),+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of interest, is there any technical reason for using r.by_ref() vs &mut r? What made you choose to do so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of doubt there is, though I could imagine some readers theoretically implementing by_ref() somehow in a "smarter" way. Maybe. Possibly. I kind of don't know what I'm talking about maybe way.


// consensus_encode always encodes in LE, and we want to encode in BE.
//TODO `len += io::Write::write(&mut e, &self.port.to_be_bytes())?;` when MSRV >= 1.32
len += self.port.swap_bytes().consensus_encode(e)?;
//TODO `len += io::Write::write(w, &self.port.to_be_bytes())?;` when MSRV >= 1.32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn! I thought I got rid of this :( Added to my todo list.

src/network/address.rs Outdated Show resolved Hide resolved
@apoelstra
Copy link
Member

The first commit now fails to comipile with

error[E0046]: not all trait items implemented, missing: `consensus_decode`
   --> /tmp/tmpzshlvtp4/src/util/psbt/mod.rs:300:1
    |
300 | impl Decodable for PartiallySignedTransaction {
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ missing `consensus_decode` in implementation
    |
   ::: /tmp/tmpzshlvtp4/src/consensus/encode.rs:365:5
    |
365 |     fn consensus_decode<R: io::Read>(reader: &mut R) -> Result<Self, Error>;
    |     ------------------------------------------------------------------------ `consensus_decode` from trait

For more information about this error, try `rustc --explain E0046`.
error: could not compile `bitcoin` due to previous error

@dpc
Copy link
Contributor Author

dpc commented Jun 6, 2022

Done.

tcharding
tcharding previously approved these changes Jun 6, 2022
Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 60a6c83

apoelstra
apoelstra previously approved these changes Jun 7, 2022
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 60a6c83

@apoelstra
Copy link
Member

@TheBlueMatt can you take a quick look at this and concept ACK it? It's a fairly agressive API change (though not to APIs we expect users to be touching..) but it's got solid benchmarks behind it.

@dpc
Copy link
Contributor Author

dpc commented Jun 10, 2022

Doesn't hurt to get a second opinion: https://users.rust-lang.org/t/taking-r-mut-r-vs-mut-r-r-where-r-io-read/76780

@dpc
Copy link
Contributor Author

dpc commented Jun 23, 2022

Doesn't hurt to get a second opinion: https://users.rust-lang.org/t/taking-r-mut-r-vs-mut-r-r-where-r-io-read/76780

Only one response, but positive.

dpc added 2 commits June 23, 2022 15:55
Fix #1020 (see more relevant discussion there)

This definitely makes the amount of generics compiler
has to generate by avoding generating the same functions
for `R`, &mut R`, `&mut &mut R` and so on.

old:

```
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 9947832 Jun  2 22:42 target/release/deps/bitcoin-07a9dabf1f3e0266
> strip target/release/deps/bitcoin-07a9dabf1f3e0266
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 4463024 Jun  2 22:46 target/release/deps/bitcoin-07a9dabf1f3e0266
```

new:

```

> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 9866800 Jun  2 22:44 target/release/deps/bitcoin-07a9dabf1f3e0266
> strip target/release/deps/bitcoin-07a9dabf1f3e0266
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 4393392 Jun  2 22:45 target/release/deps/bitcoin-07a9dabf1f3e0266
```

In the unit-test binary itself, it saves ~100KB of data.

I did not expect much performance gains, but turn out I was wrong(*):

old:

```
test blockdata::block::benches::bench_block_deserialize                 ... bench:   1,072,710 ns/iter (+/- 21,871)
test blockdata::block::benches::bench_block_serialize                   ... bench:     191,223 ns/iter (+/- 5,833)
test blockdata::block::benches::bench_block_serialize_logic             ... bench:      37,543 ns/iter (+/- 732)
test blockdata::block::benches::bench_stream_reader                     ... bench:   1,872,455 ns/iter (+/- 149,519)
test blockdata::transaction::benches::bench_transaction_deserialize     ... bench:         136 ns/iter (+/- 3)
test blockdata::transaction::benches::bench_transaction_serialize       ... bench:          51 ns/iter (+/- 8)
test blockdata::transaction::benches::bench_transaction_serialize_logic ... bench:           5 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_size            ... bench:           3 ns/iter (+/- 0)
```

new:

```
test blockdata::block::benches::bench_block_deserialize                 ... bench:   1,028,574 ns/iter (+/- 10,910)
test blockdata::block::benches::bench_block_serialize                   ... bench:     162,143 ns/iter (+/- 3,363)
test blockdata::block::benches::bench_block_serialize_logic             ... bench:      30,725 ns/iter (+/- 695)
test blockdata::block::benches::bench_stream_reader                     ... bench:   1,437,071 ns/iter (+/- 53,694)
test blockdata::transaction::benches::bench_transaction_deserialize     ... bench:          92 ns/iter (+/- 2)
test blockdata::transaction::benches::bench_transaction_serialize       ... bench:          17 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_serialize_logic ... bench:           5 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_size            ... bench:           4 ns/iter (+/- 0)
```

(*) - I'm benchmarking on a noisy laptop. Take this with a grain of salt. But I think
at least it doesn't make anything slower.

While doing all this manual labor that will probably generate conflicts,
I took a liberty of changing generic type names and variable names to
`r` and `R` (reader) and `w` and `W` for writer.
@dpc dpc dismissed stale reviews from apoelstra and tcharding via a24a3b0 June 23, 2022 22:55
@dpc
Copy link
Contributor Author

dpc commented Jun 23, 2022

Rebased, no changes other than a tiny merge conflict.

apoelstra
apoelstra previously approved these changes Jun 23, 2022
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a24a3b0

tcharding
tcharding previously approved these changes Jun 23, 2022
Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK a24a3b0

@apoelstra
Copy link
Member

I'd still like a third ACK here because it's a fairly big change to the encoding API

@dpc
Copy link
Contributor Author

dpc commented Jun 24, 2022

@Kixunil What do you think about it?

@RCasatta
Copy link
Collaborator

I tested downstream RCasatta/blocks_iterator#56

I don't fully grasp the change I had to make to this line:

let len = block_extra.consensus_encode(&mut self.buffer[..]).unwrap();

where buffer is a Vec<u8>.

To solve:

error[E0277]: the size for values of type `[u8]` cannot be known at compilation time
   --> src/pipe.rs:41:52
    |
41  |             let len = block_extra.consensus_encode(&mut self.buffer[..]).unwrap(); // buffer is big enough, we can unwrap
    |                                   ---------------- ^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
    |                                   |
    |                                   required by a bound introduced by this call
    |
    = help: the trait `Sized` is not implemented for `[u8]`
note: required by a bound in `consensus_encode`
   --> /home/casatta/.cargo/git/checkouts/rust-bitcoin-804a7fa56d920f24/a24a3b0/src/consensus/encode.rs:318:25
    |
318 |     fn consensus_encode<W: io::Write>(&self, writer: &mut W) -> Result<usize, io::Error>;
    |                         ^ required by this bound in `consensus_encode`

To fix the error I had to pass the Vec directly instead of a mutable slice

let len = block_extra.consensus_encode(&mut self.buffer).unwrap();

@apoelstra
Copy link
Member

@RCasatta the issue is that consensus_encode is taking a &mut W so if you give it a &mut [u8] it will infer that W is [u8]. Then because we don't have a Sized? bound on W, it complains because it doesn't know the size of [u8].

Having said this, I'm a little surprised that this code ever worked ... does &mut [u8] impl Write? How does it know what place to write new data to?

@dpc
Copy link
Contributor Author

dpc commented Jun 28, 2022

Having said this, I'm a little surprised that this code ever worked ... does &mut [u8] impl Write? How does it know what place to write new data to?

https://doc.rust-lang.org/std/io/trait.Write.html#impl-Write-6

@dpc
Copy link
Contributor Author

dpc commented Jun 28, 2022

On bitcoin_iterator examples I'm seeing 50% of size improvement?! (too good to be true?) Haven't got a chance to run the perf. RCasatta/blocks_iterator#56 (comment)

@RCasatta the issue is that consensus_encode is taking a &mut W so if you give it a &mut [u8] it will infer that W is [u8]. Then because we don't have a Sized? bound on W, it complains because it doesn't know the size of [u8].

Should I change to <W: io::Write + ?Sized> then?

@apoelstra
Copy link
Member

@dpc I'm not sure.... I don't know why you'd ever not want Sized? but on the other hand there must be a reason it's not the default, right?

Anyway it's a yes vote from me.

@dpc
Copy link
Contributor Author

dpc commented Jun 28, 2022

AFAIR, : Sized? makes the T just harder to store - since it doesn't have a statically known size (though it is possible to Box it or something). That's why it's not the default, because 99% of the time people want to store their T somewhere. In this case we only ever use the writer as variable we pass around, plus we take a reference to it anyway, so should be OK.

The only downside I see (other than me having to go and do more work) is the users having to do more work by adding this : Sized?. But I guess we should do it to unlock maximum potential.

@dpc dpc dismissed stale reviews from tcharding and apoelstra via 92a297f June 29, 2022 01:24
@dpc dpc requested review from apoelstra and tcharding June 29, 2022 02:12
Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1fea098

@RCasatta
Copy link
Collaborator

ACK 1fea098 tested in downstream lib, space saving in compiled code confirmed

Thanks a lot for adding ?Sized I think it's very useful not for the example I previously mentioned but when you want to write on mutable slice not from the beginning, see for example https://github.com/RCasatta/blocks_iterator/blob/27d3cd2d75a36167aed3de514136383df73cbf0b/src/utxo/db.rs#L51

The only thing to mention is the slightly surprising syntax to write in a mutable slice, requiring now (IIUC) a double &mut: https://github.com/RCasatta/blocks_iterator/blob/e8ef94e3ef8f3a8723a80aa9ca980a3fee334520/src/pipe.rs#L42

@dpc
Copy link
Contributor Author

dpc commented Jun 29, 2022

AFAIU, ready to merge. @apoelstra

@apoelstra apoelstra merged commit f401cdc into rust-bitcoin:master Jun 29, 2022
ChallengeDev210 pushed a commit to ChallengeDev210/rust-bitcoin that referenced this pull request Aug 1, 2022
… consensus en/decoding

1fea098 Support unsized `R` and `W` in consensus encode/decode (Dawid Ciężarkiewicz)
a24a3b0 Forward `consensus_decode` to `consensus_decode_from_finite_reader` (Dawid Ciężarkiewicz)
9c754ca Take Writer/Reader by `&mut` in consensus en/decoding (Dawid Ciężarkiewicz)

Pull request description:

  Fix #1020 (see more relevant discussion there)

  This definitely makes the amount of generics compiler
  has to generate by avoding generating the same functions
  for `R`, `&mut R`, `&mut &mut R` and so on.

  old:

  ```
  > ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
  -rwxrwxr-x 1 dpc dpc 9947832 Jun  2 22:42 target/release/deps/bitcoin-07a9dabf1f3e0266
  > strip target/release/deps/bitcoin-07a9dabf1f3e0266
  > ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
  -rwxrwxr-x 1 dpc dpc 4463024 Jun  2 22:46 target/release/deps/bitcoin-07a9dabf1f3e0266
  ```

  new:

  ```

  > ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
  -rwxrwxr-x 1 dpc dpc 9866800 Jun  2 22:44 target/release/deps/bitcoin-07a9dabf1f3e0266
  > strip target/release/deps/bitcoin-07a9dabf1f3e0266
  > ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
  -rwxrwxr-x 1 dpc dpc 4393392 Jun  2 22:45 target/release/deps/bitcoin-07a9dabf1f3e0266
  ```

  In the unit-test binary itself, it saves ~100KB of data.

  I did not expect much performance gains, but turn out I was wrong(*):

  old:

  ```
  test blockdata::block::benches::bench_block_deserialize                 ... bench:   1,072,710 ns/iter (+/- 21,871)
  test blockdata::block::benches::bench_block_serialize                   ... bench:     191,223 ns/iter (+/- 5,833)
  test blockdata::block::benches::bench_block_serialize_logic             ... bench:      37,543 ns/iter (+/- 732)
  test blockdata::block::benches::bench_stream_reader                     ... bench:   1,872,455 ns/iter (+/- 149,519)
  test blockdata::transaction::benches::bench_transaction_deserialize     ... bench:         136 ns/iter (+/- 3)
  test blockdata::transaction::benches::bench_transaction_serialize       ... bench:          51 ns/iter (+/- 8)
  test blockdata::transaction::benches::bench_transaction_serialize_logic ... bench:           5 ns/iter (+/- 0)
  test blockdata::transaction::benches::bench_transaction_size            ... bench:           3 ns/iter (+/- 0)
  ```

  new:

  ```
  test blockdata::block::benches::bench_block_deserialize                 ... bench:   1,028,574 ns/iter (+/- 10,910)
  test blockdata::block::benches::bench_block_serialize                   ... bench:     162,143 ns/iter (+/- 3,363)
  test blockdata::block::benches::bench_block_serialize_logic             ... bench:      30,725 ns/iter (+/- 695)
  test blockdata::block::benches::bench_stream_reader                     ... bench:   1,437,071 ns/iter (+/- 53,694)
  test blockdata::transaction::benches::bench_transaction_deserialize     ... bench:          92 ns/iter (+/- 2)
  test blockdata::transaction::benches::bench_transaction_serialize       ... bench:          17 ns/iter (+/- 0)
  test blockdata::transaction::benches::bench_transaction_serialize_logic ... bench:           5 ns/iter (+/- 0)
  test blockdata::transaction::benches::bench_transaction_size            ... bench:           4 ns/iter (+/- 0)
  ```

  (*) - I'm benchmarking on a noisy laptop. Take this with a grain of salt. But I think
  at least it doesn't make anything slower.

  While doing all this manual labor that will probably generate conflicts,
  I took a liberty of changing generic type names and variable names to
  `r` and `R` (reader) and `w` and `W` for writer.

ACKs for top commit:
  RCasatta:
    ACK 1fea098 tested in downstream lib, space saving in compiled code confirmed
  apoelstra:
    ACK 1fea098

Tree-SHA512: bc11994791dc97cc468dc9d411b9abf52ad475f23adf5c43d563f323bae0da180c8f57f2f17c1bb7b9bdcf523584b0943763742b81362880206779872ad7489f
@Kixunil Kixunil added API break This PR requires a version bump for the next release perf Performance optimizations/issues labels Aug 1, 2022
@Kixunil Kixunil added this to the 0.29.0 milestone Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API break This PR requires a version bump for the next release perf Performance optimizations/issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

consensus_decode should take &mut D?
5 participants