Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add buffering during encode #138

Merged
merged 3 commits into from
Nov 6, 2023
Merged

Conversation

tcharding
Copy link
Member

@tcharding tcharding commented Oct 16, 2023

Currently we write char at a time to both writers (fmt::Write and std::io::Write). We can improve performance by first collecting the ASCII bytes of the encoded bech32 string into a buffer on the stack then writing the buffer as a single call.

This is purely an optimization.

The second patch is surprising, at least to me, would love to learn what I'm doing wrong. I thought this would be valid

let mut buf = [0_u8; Ck::CODE_LENGTH]

But the compiler does not like the usage of an associated const.

@tcharding
Copy link
Member Author

Fuzzing for the win, uncovered #140

@tcharding
Copy link
Member Author

Needs redoing on top of #142

@tcharding tcharding marked this pull request as draft October 19, 2023 01:29
@tcharding tcharding mentioned this pull request Oct 22, 2023
@tcharding tcharding force-pushed the 10-17-buffered branch 3 times, most recently from da82e4e to a957f7f Compare October 30, 2023 20:11
@tcharding tcharding marked this pull request as ready for review October 30, 2023 20:31
@apoelstra
Copy link
Member

I'm not thrilled with the casts from char to u8. There are a couple approaches we could take:

Alternately I can just ACK this as-is. What do you think?

@apoelstra
Copy link
Member

ACK a957f7f other than that.

@tcharding
Copy link
Member Author

I'm not thrilled with the casts from char to u8.

Why? HRP is checked to be valid ASCII and the fe32s are valid ASCII also so its safe to cast.

@apoelstra
Copy link
Member

@tcharding because every time I read the code I need to mentally re-check those facts. I agree it's safe, but it's a code smell.

@tcharding
Copy link
Member Author

tcharding commented Nov 1, 2023

Ah yes, I guess I did it in some place without code comments. I agree its smelly, I'll see if I can come up with something better.

Add an iterator adapter that is identical to `encode::CharIter` but
yields `u8` byte values instead. This makes encoding cleaner because
users do not need to cast `char`s to `u8`.
Currently we call `fmt::Write::write_char` and `io::Write::write_all`
in a loop while iterating over the encoded characters of a bech32
string. This is inefficient.

Add a buffer and copy ASCII bytes into it while looping the write the
whole buffer out in a single call. The buffer has a size of 90 bytes
because we know the bech32 string is guaranteed to be 90 chars or less.
As we did when writing in the `segwit` module; add a buffer to cache
characters and do writes a buffer at a time.

Although we do know the maximum length of the string we are encoding,
and we enforce this limit using `encoded_length::<Ck>()` we use an
unrelated fixed buffer size and loop over it as many times as needed.
This is because, surprisingly the following is not valid Rust

    `let mut buf = [0_u8, Ck::CODE_LENGTH];`

We use a buffer of length 1024 which is larger than the 1023 code length
of the two `Checksum` implementations we provide with this
crate (`Bech32` and `Bech32m`).
@tcharding
Copy link
Member Author

Added primitives::encode::ByteIter and used that to remove the casts in crate::segwit.

@apoelstra
Copy link
Member

Perfect, thanks! I'm a tiny bit disappointed that ByteIter goes through CharIter rather than directly converting fes to bytes and pulling bytes out of the Hrp, but I see that this would be a ton of complexity, and anyway it's just an optimization that's unlikely to be measurable.

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c215c3d

@apoelstra apoelstra merged commit 2b3f42f into rust-bitcoin:master Nov 6, 2023
12 checks passed
@apoelstra apoelstra deleted the 10-17-buffered branch November 6, 2023 13:38
@apoelstra apoelstra mentioned this pull request Jan 23, 2024
@apoelstra
Copy link
Member

I wonder if we could stick a .buffered(buffer_size) adaptor onto CharIter (or whatever our "final" encoding iterator is), which wouldn't be an iterator anymore but would be fmt::Write and io::Write.

@tcharding
Copy link
Member Author

Flagging this, is the comment above actionable before #168 merges?

@apoelstra
Copy link
Member

@tcharding no. The more I think about this the less clear it is to me what we actually want to do here, buffering-wise. Let's just do 0.10.0 and then I think we'll be "done" our part (except for the error correction stuff which I continue to have on my todo list) and can ask Kix to come take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants