Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(relay): reduce allocations during relaying #4453

Merged
merged 4 commits into from
Apr 2, 2024

Conversation

thomaseizinger
Copy link
Member

Previously, we would allocate each message twice:

  1. When receiving the original packet.
  2. When forming the resulting channel-data message.

We can optimise this to only one allocation each by:

  1. Carrying around the original ChannelData message for traffic from clients to peers.
  2. Pre-allocating enough space for the channel-data header for traffic from peers to clients.

Local flamegraphing still shows most of user-space activity as allocations. I did occasionally see a throughput of ~10GBps with these patches. I'd like to still work towards #4095 to ensure we handle anything time-sensitive better.

Copy link

vercel bot commented Apr 2, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
firezone ⬜️ Ignored (Inspect) Visit Preview Apr 2, 2024 1:31am

Copy link

github-actions bot commented Apr 2, 2024

Terraform Cloud Plan Output

Plan: 15 to add, 14 to change, 15 to destroy.

Terraform Cloud Plan

We need to pass a `BufMut` to `encode_to_slice` to retain the semantics of
`BytesMut` of appending to the buffer. If we coerce to a slice, `BufMut`
expects the length to be sufficient to write the data.

This is a bit messy but good enough for now.
Copy link

github-actions bot commented Apr 2, 2024

Performance Test Results

TCP

Test Name Received/s Sent/s Retransmits
direct-tcp-client2server 243.1 MiB (-3%) 245.2 MiB (-3%) 263 (+7%)
direct-tcp-server2client 249.0 MiB (+4%) 250.9 MiB (+4%) 457 (+44%)
relayed-tcp-client2server 175.3 MiB (+2%) 176.2 MiB (+2%) 166 (-20%)
relayed-tcp-server2client 190.1 MiB (+4%) 190.6 MiB (+4%) 288 (+41%)

UDP

Test Name Total/s Jitter Lost
direct-udp-client2server 50.0 MiB (+0%) 0.03ms (-3%) 0.00% (NaN%)
direct-udp-server2client 50.0 MiB (-0%) 0.01ms (-29%) 0.00% (NaN%)
relayed-udp-client2server 50.0 MiB (-0%) 0.09ms (+16%) 0.00% (NaN%)
relayed-udp-server2client 50.0 MiB (-0%) 0.04ms (-7%) 0.00% (NaN%)

Copy link
Collaborator

@ReactorScram ReactorScram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through it and it looks simple, but I don't actually see where the allocation is removed because I am not familiar with the code yet.

Nice to see some <'a>s disappearing.

rust/relay/src/server.rs Show resolved Hide resolved
receiver: channel.peer_address,
});
self.pending_commands
.push_back(Command::ForwardDataClientToPeer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a Deque? What was the reason not to limit the capacity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is emptied on every iteration of Eventloop::poll and only a small number of Commands can build up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How small? 100?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More like 2-3.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we panic if it hits 100?

channel,
msg,
length,
}
}

// Panics if self.data.len() > u16::MAX
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is outdated, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

@@ -11,6 +11,7 @@ use crate::net_ext::IpAddrExt;
use crate::{ClientSocket, IpStack, PeerSocket, TimeEvents};
use anyhow::Result;
use bytecodec::EncodeExt;
use bytes::BytesMut;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the use of bytes that makes it easier to reduce allocations?

@@ -776,7 +788,7 @@ where

self.pending_commands.push_back(Command::ForwardData {
id: channel.allocation,
data: data.to_vec(),
Copy link
Member Author

@thomaseizinger thomaseizinger Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the allocations we are avoiding.


impl PeerToClient {
pub fn new(msg: &[u8]) -> Self {
let mut buf = BytesMut::zeroed(msg.len() + 4);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still allocate here but we leave enough space to put the channel data header in front of it.

@@ -100,7 +100,7 @@ async fn forward_incoming_relay_data(
tokio::select! {
result = socket.recv() => {
let (data, sender) = result?;
relayed_data_sender.send((data.to_vec(), PeerSocket::new(sender), id)).await?;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here used to be the 1. allocation. Note that PeerToClient::new still allocates.

@thomaseizinger
Copy link
Member Author

@ReactorScram I'd recommend patch-by-patch review. I added some comments on how we are avoiding the allocations :)

@thomaseizinger thomaseizinger added this pull request to the merge queue Apr 2, 2024
Merged via the queue into main with commit 5f718ad Apr 2, 2024
151 checks passed
@thomaseizinger thomaseizinger deleted the chore/relay/less-allocations branch April 2, 2024 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants