Skip to content

Conversation

@Kixunil
Copy link
Collaborator

@Kixunil Kixunil commented Jul 10, 2024

Most base58 strings in Bitcoin are somewhat short. There was previously an "optimization" putting part of the input on stack which was removed in #2759 because it actually made the code slower. This appears to be mostly because of branches caused by using iter::Chain.

Manually splitting the iterations into two helped bring the performance close to what #2759 achieved but that still wasn't worth it. But given that we know the input length in many cases (it's just a slice) we can determine whether it'll fit a buffer upfront and then just call different functions which don't have the branches in loops. To avoid having two functions this uses generics instead. Further, we increase the buffer length to 128 and use ArrayVec from internals which internally avoids initializing the buffer thanks to MaybeUninit

In total this increases performance by around 4% on my machine.

@Kixunil Kixunil added the perf Performance optimizations/issues label Jul 10, 2024
@Kixunil Kixunil requested a review from RCasatta July 10, 2024 13:22
@github-actions github-actions bot added the C-base58 PRs modifying the base58 crate label Jul 10, 2024
@coveralls
Copy link

coveralls commented Jul 10, 2024

Pull Request Test Coverage Report for Build 9885605832

Details

  • 50 of 51 (98.04%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.03%) to 83.094%

Changes Missing Coverage Covered Lines Changed/Added Lines %
base58/src/lib.rs 50 51 98.04%
Totals Coverage Status
Change from base Build 9884211646: 0.03%
Covered Lines: 19586
Relevant Lines: 23571

💛 - Coveralls

@RCasatta
Copy link
Collaborator

I confirm I see the ~4% improvement (~5% on my machine)

ACK 56759ecc672f25c03f6279b0fb7aae52e50ef8ed

I just have the following doubt, if I understand correctly the ArrayVec is never resizing but we use it because it offers the ergonomics of push and length tracking but we could use a plain array instead? Most probably same performance and would have a more complex format_iter but also advantage like not needing the Buffer trait so worth mentioning I guess

@Kixunil
Copy link
Collaborator Author

Kixunil commented Jul 10, 2024

ArrayVec is never resizing

Correct.

we use it because it offers the ergonomics of push and length tracking

That and also the management of MaybeUninit so that it can be reviewed once to do the correct thing and then we don't need to check every other crate for UB.

we could use a plain array instead

Of course, ArrayVec is just array and length. We'd just have to deal with tracking and calling unsafe { buf.assume_init() } in this crate which we already do in internals so it'd duplicate the same work.

like not needing the Buffer trait

I don't think this is worth pursuing. All solutions I can come up with either have extra assumptions (only known lengths) or require insane complexity beyond what the Buffer trait has. E.g. pretending we have a custom allocator and then make "realloc" change from stack to heap. This also involves writing a bunch of unsafe which I'd prefer to avoid. (But honestly, it's a pretty interesting abstraction that would be generally useful.)

@apoelstra
Copy link
Member

I think the Buffer trait is fine because it's small and private. concept ACK this PR. Will try to review and ACK tomorrow.

Although if you need to change it, there are a couple lines of whitespace added in this diff that you could remove.

Most base58 strings in Bitcoin are somewhat short. There was previously
an "optimization" putting part of the input on stack which was removed
in rust-bitcoin#2759 because it actually made the code slower. This appears to be
mostly because of branches caused by using `iter::Chain`.

Manually splitting the iterations into two helped bring the performance
close to what rust-bitcoin#2759 achieved but that still wasn't worth it. But given
that we know the input length in many cases (it's just a slice) we can
determine whether it'll fit a buffer upfront and then just call
different functions which don't have the branches in loops. To avoid
having two functions this uses generics instead. Further, we increase
the buffer length to 128 and use `ArrayVec` from `internals` which
internally avoids initializing the buffer thanks to `MaybeUninit`

In total this increases performance by around 4% on my machine.
@Kixunil
Copy link
Collaborator Author

Kixunil commented Jul 11, 2024

I've removed the whitespace and fixed a grammar error in the commit message.

@Kixunil
Copy link
Collaborator Author

Kixunil commented Jul 11, 2024

Also I had an idea how to make no-alloc Address feasible: we know the length of largest possible address, so if we just expose a function that can encode into a pre-allocated buffer (or use a const generic) we can get rid of the allocation by using a fixed-size buffer.

One thing that I'm thinking about though is that currently passing a string effectively creates two buffers and copies between them. We could instead try to reuse the buffer and avoid copies. The hard part is figuring out a good API and preferably a way to do it without unsafe.

Copy link
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d05723c

@apoelstra apoelstra mentioned this pull request Jul 11, 2024
@apoelstra
Copy link
Member

Promoted your comment @Kixunil to #3013

Copy link
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK d05723c

@apoelstra apoelstra merged commit fec8ec3 into rust-bitcoin:master Jul 11, 2024
@Kixunil Kixunil deleted the optimize-base58 branch July 11, 2024 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-base58 PRs modifying the base58 crate perf Performance optimizations/issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants