Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: no more channels for UDP send/recv #1579

Merged
merged 36 commits into from
Oct 19, 2023
Merged

perf: no more channels for UDP send/recv #1579

merged 36 commits into from
Oct 19, 2023

Conversation

Frando
Copy link
Member

@Frando Frando commented Oct 5, 2023

Description

Do not use channels for the hot path of sending and receiving QUIC UDP packets.

  • Put PeerMap into a Mutex so that we can access it for the address remapping directly in the AsyncUdpSocket poll_send/poll_recv methods
  • poll_recv and poll_send the underlying UDP socket directly in MagicSock as AsyncUdpSocket and only forward Derp, Disco and Stun packages over channels

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates if relevant.
  • Tests if relevant.

@Arqu
Copy link
Collaborator

Arqu commented Oct 5, 2023

/netsim

@github-actions
Copy link

github-actions bot commented Oct 5, 2023

net-less-channels.37feb8d177d41507c33b391e80cf9dee1d74aae1
Perf report:

test case throughput_gbps throughput_transfer
iroh_latency_20ms 1_to_1 3.15 3.70
iroh_latency_20ms 1_to_3 6.80 6.86
iroh_latency_20ms 1_to_5 8.88 8.88
iroh_latency_20ms 1_to_10 7.75 7.19
iroh_latency_20ms 2_to_2 4.59 4.72
iroh_latency_20ms 2_to_4 10.60 11.61
iroh_latency_20ms 2_to_6 12.65 13.04
iroh_latency_20ms 2_to_10 15.44 15.39
iroh 1_to_1 2.43 2.20
iroh 1_to_3 6.86 7.15
iroh 1_to_5 9.41 9.70
iroh 1_to_10 7.60 7.09
iroh 2_to_2 6.10 7.09
iroh 2_to_4 10.54 10.82
iroh 2_to_6 13.03 13.91
iroh 2_to_10 16.35 15.70
iroh_latency_200ms 1_to_1 2.71 3.05
iroh_latency_200ms 1_to_3 7.66 8.47
iroh_latency_200ms 1_to_5 8.01 8.15
iroh_latency_200ms 1_to_10 7.31 6.88
iroh_latency_200ms 2_to_2 6.07 7.08
iroh_latency_200ms 2_to_4 10.50 11.24
iroh_latency_200ms 2_to_6 13.39 13.95
iroh_latency_200ms 2_to_10 15.08 14.69

@Arqu
Copy link
Collaborator

Arqu commented Oct 5, 2023

/netsim

@github-actions
Copy link

github-actions bot commented Oct 5, 2023

net-less-channels.37feb8d177d41507c33b391e80cf9dee1d74aae1
Perf report:

test case throughput_gbps throughput_transfer
iroh_latency_20ms 1_to_1 2.27 2.45
iroh_latency_20ms 1_to_3 7.00 6.74
iroh_latency_20ms 1_to_5 8.79 9.07
iroh_latency_20ms 1_to_10 8.18 7.68
iroh_latency_20ms 2_to_2 5.97 6.27
iroh_latency_20ms 2_to_4 10.20 10.32
iroh_latency_20ms 2_to_6 13.02 13.24
iroh_latency_20ms 2_to_10 14.61 14.55
iroh 1_to_1 2.95 3.37
iroh 1_to_3 6.43 6.21
iroh 1_to_5 8.20 8.38
iroh 1_to_10 7.57 7.01
iroh 2_to_2 6.16 7.17
iroh 2_to_4 11.55 13.24
iroh 2_to_6 12.41 12.83
iroh 2_to_10 16.12 16.04
iroh_latency_200ms 1_to_1 3.15 3.69
iroh_latency_200ms 1_to_3 7.19 7.31
iroh_latency_200ms 1_to_5 7.35 6.76
iroh_latency_200ms 1_to_10 7.58 7.05
iroh_latency_200ms 2_to_2 5.89 6.13
iroh_latency_200ms 2_to_4 11.23 12.17
iroh_latency_200ms 2_to_6 12.86 13.49
iroh_latency_200ms 2_to_10 15.27 15.48

@Frando Frando changed the title (wip/ignore) less channels on hot path in iroh-net refactor: no more channels for UDP send/recv Oct 5, 2023
@Frando
Copy link
Member Author

Frando commented Oct 5, 2023

/netsim

@github-actions
Copy link

github-actions bot commented Oct 5, 2023

net-less-channels.08f3eec5ffeb6254f730a08abcec5d3f3010f261
Perf report:

test case throughput_gbps throughput_transfer
iroh_latency_20ms 1_to_1 4.15 3.80
iroh_latency_20ms 1_to_3 8.76 7.97
iroh_latency_20ms 1_to_5 7.35 6.63
iroh_latency_20ms 1_to_10 6.95 6.33
iroh_latency_20ms 2_to_2 8.15 7.42
iroh_latency_20ms 2_to_4 12.39 12.50
iroh_latency_20ms 2_to_6 14.70 14.15
iroh_latency_20ms 2_to_10 13.27 12.31
iroh 1_to_1 3.38 3.07
iroh 1_to_3 7.34 6.69
iroh 1_to_5 7.29 6.57
iroh 1_to_10 6.48 5.86
iroh 2_to_2 7.85 7.20
iroh 2_to_4 13.11 12.60
iroh 2_to_6 13.71 12.47
iroh 2_to_10 13.82 12.74
iroh_latency_200ms 1_to_1 3.27 2.98
iroh_latency_200ms 1_to_3 7.95 8.02
iroh_latency_200ms 1_to_5 7.35 6.64
iroh_latency_200ms 1_to_10 6.97 6.29
iroh_latency_200ms 2_to_2 8.00 7.32
iroh_latency_200ms 2_to_4 12.41 12.52
iroh_latency_200ms 2_to_6 13.46 12.61
iroh_latency_200ms 2_to_10 12.77 11.62

} else {
&self.pconn4
};
let n = ready!(conn.poll_send(&self.udp_state, cx, &transmits))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this ? result in a situation where there's a problem with the udp socket but the derp channel would still work however we never tried because we bailed out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, might be. made it warn log instead in the latest commit and not return.

it is a good question when we should return an error to quinn and when to fail silently..?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we should only error to quinn when all three options fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have the same issue on recv at the moment I believe

@Frando Frando force-pushed the net-less-channels branch 2 times, most recently from b506be1 to d49c8f3 Compare October 6, 2023 17:10
@Frando
Copy link
Member Author

Frando commented Oct 9, 2023

So I looked into how quinn parses packets, and found a reliable way to make it ignore the non-quic packets (stun and disco) that we pass right through to quinn with the approach in this PR. By setting grease_quic_bit to false in the endpoint config, quinn ignores all packets that do not have the quic FIXED_BIT set to 1. So if we set the first byte of non-quic packets to 0, they are ignored directly (and without much perf overhead). I'd guess that's a fine approach to keep then (because it lets us keep the buffers without restructuring them if there's a non-quic packet inside).

We still have failures though, which I'll investigate next.

}
// TODO: This is the remaining alloc on the hot path for send.
// Unfortunately I don't see a way around this because we have do modify the transmits.
let mut transmits = transmits[..n].to_vec();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use a fixed buffer, that might help

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a fixed buffer, so one alloc less. Instead there's a parking_lot::Mutex::lock but it should have no contention ever so fast.

We will still allocate when sending over Derp, because the derp actor expectes a Vec<Bytes>. Not sure if there's a way to reduce allocs here as well, likely not - would defer to followup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is always or frequently a single Bytes we could reach for mallvec for such issues.

// This logs "ping is too new" for each send whenever the endpoint does *not* need
// a ping. Pretty sure this is not a useful log, but maybe there was a reason?
// if !needs_ping {
// debug!("ping is too new: {}ms", elapsed.as_millis());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was useful to understand why pings would not be sent, maybe set it to trace?

@Frando
Copy link
Member Author

Frando commented Oct 19, 2023

/netsim

@github-actions
Copy link

net-less-channels.8d0009b3e2b86235eafcaea555b0186ec3555820
Perf report:

test case throughput_gbps throughput_transfer
iroh_latency_20ms 1_to_1 3.89 3.58
iroh_latency_20ms 1_to_3 7.98 7.26
iroh_latency_20ms 1_to_5 7.20 6.72
iroh_latency_20ms 1_to_10 7.22 6.49
iroh_latency_20ms 2_to_2 5.79 5.27
iroh_latency_20ms 2_to_4 12.63 12.15
iroh_latency_20ms 2_to_6 15.29 13.94
iroh_latency_20ms 2_to_10 14.35 13.12
iroh 1_to_1 2.72 2.48
iroh 1_to_3 8.92 8.15
iroh 1_to_5 8.14 7.36
iroh 1_to_10 7.29 6.58
iroh 2_to_2 4.70 5.10
iroh 2_to_4 13.54 12.39
iroh 2_to_6 16.25 15.61
iroh 2_to_10 14.75 13.35
iroh_latency_200ms 1_to_1 3.05 2.76
iroh_latency_200ms 1_to_3 7.87 7.54
iroh_latency_200ms 1_to_5 8.08 7.48
iroh_latency_200ms 1_to_10 7.01 6.30
iroh_latency_200ms 2_to_2 7.58 6.94
iroh_latency_200ms 2_to_4 13.76 12.59
iroh_latency_200ms 2_to_6 14.64 13.34
iroh_latency_200ms 2_to_10 14.25 12.99

udp_state: quinn_udp::UdpState,

// Send buffer used in `poll_send_udp`
send_buffer: parking_lot::Mutex<Vec<quinn_udp::Transmit>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the length of this doesn't actually change over time, so we could potentially use Box<[Transmit]> instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Frando I filed quinn-rs/quinn#1692 so hopefully we can get rid of this buffer entirely

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Let's defer to that then for now.

@@ -260,12 +259,21 @@ impl Endpoint {
}
}

let addr = pong.from.as_socket_addr();
let trust_best_addr_until = pong.pong_at + Duration::from_secs(60 * 60);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should really change that number (but lets do that in a follow up)

}

pub(super) fn read<T>(&self, f: impl FnOnce(&PeerMapInner) -> T) -> T {
let inner = self.inner.lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be worth it to use a RwLock instead of a Mutex?

Copy link
Member Author

@Frando Frando Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I don't know? Most accesses need mut access (especially all calls from poll_send and poll_recv). So don't think it would matter much.
This benchmark suggests a slight plus for Mutex when most accesses need write locks, but diff is minor: https://gist.github.com/Amanieu/6a4b4151b89b78224992106f9bc4374f

Copy link
Contributor

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small comments left, but other than, looks good

@Frando Frando enabled auto-merge October 19, 2023 13:31
@Frando Frando added this pull request to the merge queue Oct 19, 2023
Merged via the queue into main with commit d6657bd Oct 19, 2023
15 checks passed
@dignifiedquire dignifiedquire deleted the net-less-channels branch November 1, 2023 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants