Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing allocations for transfer-heavy protocols by using borrowed data in protobuf files #4781

Open
thomaseizinger opened this issue Nov 2, 2023 · 8 comments

Comments

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Nov 2, 2023

Description

This is inspired by PRs from @joshuef (#4751, #4753, #4754).

The idea is to use Bytes to avoid potentially costly clones of large chunks of memory as we are processing the messages. For gossipsub, a single message might be sent to multiple clients. For kademlia, we might serve the same record multiple times from the record store. In both cases, there shouldn't be a need for any additional allocations.

The main problem though is that the generated protobuf structs use Vec<u8> and thus force an allocation onto us every time we want to send one of these. https://github.com/tafia/quick-protobuf does have an option to generate Cow instead of Vec<u8> and thus allow encoding of borrowed data. In the past, we weren't able to use this because asynchronous-codec couldn't encode borrowed data. But it can now! See mxinden/asynchronous-codec#9.

As a result, I think it is worth experimenting whether or not we can now use the generated protobuf structs.

To actually make use of this, we need to:

  • Re-run pb-rs with the correct options to use Cow instead of Vec:
    pb-rs **/generated/*.proto`
  • Work out whether or not we can adapt quick-protobuf-codec to work with the new protobuf files
  • Update the CI check to not fail on the now differing files
  • Actually make use of this in (all?) protocols to avoid allocations where possible

Motivation

Less memory allocations.

Current Implementation

We allocate each message again just to serialize it to the wire.

Are you planning to do it yourself in a pull request ?

No

@thomaseizinger
Copy link
Contributor Author

Re-run pb-rs with the correct options to use Cow instead of Vec:

I started with this an it yields a lot of compile errors (understandably). Those will all need to be fixed. Perhaps in a first iteration by just using Cow::Owned everywhere as that is essentially what we are currently doing by using Vec<u8>.

The lifetime would have to be 'static in many places.

To actually make use of borrowed data in gossipsub for example, we need to defer encoding to the protobuf format for as long as possible. Essentially, we need to carry some data type that uses Bytes all the way down to the encoding step. Then we can, in a single function, construct e.g. a proto::RPC with a borrow using a local lifetime to the Bytes instance and pass it to the Encoder. For gossipsub, this would be e.g. here:

match Sink::start_send(Pin::new(&mut substream), message) {

@joshuef Are you interested in working on this?

@thomaseizinger
Copy link
Contributor Author

To actually make use of borrowed data in gossipsub for example, we need to defer encoding to the protobuf format for as long as possible.

To achieve this, we'd need to do something similar as @mxinden described here. I.e. we need to extend HandlerIn to send different types into the handler. However, contrary to @mxinden's example, these cannot be instances of proto::RPC as those will have a lifetime.

@joshuef
Copy link
Contributor

joshuef commented Nov 2, 2023

@joshuef Are you interested in working on this?

Let me check out the scope of this and see how deep things go. If I have time to get in amongst it in the next week I'll definitely do that, or may be able to get someone from the team to have a look if no. I'll report back here when it's a bit clearer for me. 👍

@joshuef
Copy link
Contributor

joshuef commented Nov 2, 2023

I did start looking at this, but won't have time to dig in until next week sometime, I think.

@thomaseizinger
Copy link
Contributor Author

No worries, it is an internal change so we can ship it at any point, doesn't need to go into the current release :)

@joshuef
Copy link
Contributor

joshuef commented Nov 13, 2023

So I've been digging about further on this this morning, but I am blocked by the Keypairs essentially. Right now we zeroize the inbound bytes when we form the SKs. But with the Cow<u8> setup we don't necessarily have ownership of this to perform this...

Error[E0596]: cannot borrow data in dereference of `Cow<'_, [u8]>` as mutable --> identity/src/keypair.rs:294:59 | 294 | return rsa::Keypair::try_decode_pkcs1(&mut private_key.Data).map(|sk| { | ^^^^^^^^^^^^^^^^^^^^^ cannot borrow as mutable | = help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Cow<'_, [u8]>

I've been trying out some dereffing, borrows, passing vecs, but it all comes down to that same issue that I don't think we can guarantee that we properly `Zeroize` the bytes therin...

I'm not really sure if there's a way around this? Suggestions very welcome

(My WIP branch: https://github.com/joshuef/rust-libp2p/tree/GossipBytesPbCow)

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Nov 14, 2023

If the data is borrowed, we don't need to zeroize it, I'd say? Whoever owns the data should zeroize it when it gets dropped :)

Meaning, we can just match on the Cow and do nothing if it is Cow::Borrowed?

@joshuef
Copy link
Contributor

joshuef commented Nov 14, 2023

Okay, I'll have a look at implementing in that fashion then, thanks @thomaseizinger ! 🙇

mergify bot pushed a commit that referenced this issue Nov 24, 2023
We can reduce the number of allocations during the encoding of messages by not depending on the `Encoder` implementation of `unsigned-varint` but doing the encoding ourselves.

This allows us to directly plug the `BytesMut` into the `Writer` and directly encode into the target buffer. Previously, we were allocating each message as an additional buffer that got immediately thrown-away again.

Related: #4781.

Pull-Request: #4782.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants