#729: proto: write outgoing packets to caller-supplied memory #1697

lmpn · 2023-10-25T08:28:56Z

I've used BytesMut for convenience but I can change to &mut [u8].

- buffer as argument - Proto transmit struct has size of the payload instead of buffer

djc

This seems pretty nice! Have you measured the performance impact?

quinn-proto/src/connection/mod.rs

quinn-proto/src/tests/util.rs

quinn/src/endpoint.rs

- use quinn-proto to get buffer size - remove unnecessary includes - revert Transmit varibles order

quinn-proto/src/connection/mod.rs

quinn/src/connection.rs

lmpn · 2023-10-26T19:29:07Z

I've not forgotten about the perf. I will share the comparison later 👍

Ralith · 2023-10-26T21:59:50Z

I'd be surprised if there was a significant performance improvement from this alone, but it brings us much closer to being able to reuse transmit buffers (perhaps by doing UDP transmits directly from connection tasks), which is more likely to have a direct impact.

Still, always good to check for the impact of changes on the I/O path.

lmpn · 2023-10-26T22:25:20Z

I've ran cargo bench for main and this branch. Results below:

poll_transmit_pre_alloc
test large_data_10_streams ... bench: 30,968,912 ns/iter (+/- 11,592,202) = 338 MB/s
test large_data_1_stream ... bench: 26,580,878 ns/iter (+/- 20,358,295) = 39 MB/s
test small_data_100_streams ... bench: 14,363,537 ns/iter (+/- 17,569,828)
test small_data_1_stream ... bench: 19,154,729 ns/iter (+/- 5,427,772)

latest main
test large_data_10_streams ... bench: 30,603,466 ns/iter (+/- 11,396,640) = 342 MB/s
test large_data_1_stream ... bench: 21,525,054 ns/iter (+/- 19,954,110) = 48 MB/s
test small_data_100_streams ... bench: 17,087,725 ns/iter (+/- 22,559,371)
test small_data_1_stream ... bench: 19,227,774 ns/iter (+/- 5,343,771)

If this is not enough please give me some pointers and I will try to do an analysis that fits the project.

lmpn · 2023-10-26T22:37:34Z

@djc I've decided to expose the mtu from the connection.
If there is already an approach to retrieve or calculate such value tell me 🙏

djc · 2023-10-27T09:15:50Z

So this shows as a ~18% regression for small_data_100_streams? That seems bad -- and surprising...

lmpn · 2023-10-27T14:23:19Z

So I've executed cargo bench 10 times for main and this branch and compared the times(image below)

There is indeed a regression (on average) for large_data_1_stream and small_data_100_streams, 3% and 6% respectively.

Ralith

Exposing the MTU makes sense to me; other quinn-proto users will want it for the same reason.

quinn-proto/src/connection/mod.rs

Ralith · 2023-10-27T17:59:31Z

@lmpn I'm having some difficulty reading your chart screenshot. Are the 4 aggregated figures for each separate test, in order? So there's a 20% speedup in small_data_1_stream? That's confusing since the original single run you posted indicated no change there, and a similar speedup in small_data_100_streams. That original run also indicates a very high variance, so maybe this data is too noisy to be meaningful anyway. I'll try to get some of my own numbers to compare.

We should be able to get identical performance to main by having the quinn layer create a fresh BytesMut for each poll_transmit call, right?

lmpn · 2023-10-28T14:09:43Z

Sorry for the bad image. Check the one below and tell me if it is clearer.
The speedups are calculated as (1-main_time/poll_transmit_pre_alloc_time)*100

The single run showed 18% regression for small_data_100_streams. I assumed this was because there is too much noise/bad run and because I'm running in my own computer, thus, I took 10 runs of each branch.

As you can see above I calculated the speedup for the:

average of all times
max of all times
min of all times
top 3 (average of the 3 lowest times)
top 5 (average of the 5 lowest times)

Even with noise this values show:

small_data_1_stream has improved in every scenario
there is a regression for large_data_1_stream / small_data_100_streams
(I think) large_data_10_streams the difference seems negligible.

We should be able to get identical performance to main by having the quinn layer create a fresh BytesMut for each poll_transmit call, right?

Yes I would assume so.

- rename get_current_mtu to current_mtu - add proper docstring to current_mtu

Ralith · 2023-10-28T17:32:33Z

Thanks for the detailed analysis! To summarize: apparent massive speedup for small_data_1_stream, maybe a slight slowdown (or just noise) elsewhere. Perhaps reusing a single BytesMut is letting a single allocation serve multiple small packets.

I think this nets out as a win as written, personally. I've experimented on a couple of my machines (both Windows and Linux) and wasn't able to show a significant difference either way.

quinn-proto/src/connection/mod.rs

quinn-proto/src/lib.rs

quinn/src/connection.rs

djc · 2023-10-31T22:57:33Z

@lmpn was this motivated by Hacktoberfest, and if so, is there a merge deadline to make this count for your stats?

lmpn · 2023-11-01T02:34:45Z

@lmpn was this motivated by Hacktoberfest, and if so, is there a merge deadline to make this count for your stats?

No...
I just have interest in networking and performance programming.

Ralith

Thanks!

quinn/src/endpoint.rs

lmpn · 2023-11-01T12:02:25Z

Above the results of the benchmark with the latest commit on this branch.
If someone can double check on their side it would be great.

Ralith · 2023-11-01T19:29:23Z

My environment still registers too much noise to get meaningful data, but it's exciting that it had such an impact in yours! Maybe the impact is larger on Windows, where GSO isn't doing as much heavy lifting.

Use preallocated buffer in poll_transmit

b7830c5

- buffer as argument - Proto transmit struct has size of the payload instead of buffer

lmpn force-pushed the poll_transmit_pre_alloc branch from 534051f to b7830c5 Compare October 25, 2023 08:31

djc reviewed Oct 25, 2023

View reviewed changes

Luis Neto added 2 commits October 25, 2023 22:59

Address code review

2121ac4

- use quinn-proto to get buffer size - remove unnecessary includes - revert Transmit varibles order

fix clippy errors; remove todo

5ecb025

djc reviewed Oct 26, 2023

View reviewed changes

quinn-proto/src/connection/mod.rs Outdated Show resolved Hide resolved

quinn-proto/src/connection/mod.rs Outdated Show resolved Hide resolved

quinn/src/connection.rs Outdated Show resolved Hide resolved

address code review

e52d51f

expose quinn::Connection's current MTU

7f5f642

Ralith reviewed Oct 27, 2023

View reviewed changes

quinn-proto/src/connection/mod.rs Outdated Show resolved Hide resolved

quinn-proto/src/connection/mod.rs Outdated Show resolved Hide resolved

fix bug in capacity calculation

c92fdcd

- rename get_current_mtu to current_mtu - add proper docstring to current_mtu

djc approved these changes Oct 30, 2023

View reviewed changes

Ralith requested changes Oct 30, 2023

View reviewed changes

quinn-proto/src/connection/mod.rs Outdated Show resolved Hide resolved

quinn-proto/src/lib.rs Outdated Show resolved Hide resolved

quinn/src/connection.rs Outdated Show resolved Hide resolved

Luis Neto added 2 commits October 31, 2023 11:11

address code review

1f09aed

use split_to instead of cloning

9f7f942

Ralith approved these changes Nov 1, 2023

View reviewed changes

quinn/src/endpoint.rs Outdated Show resolved Hide resolved

remove unused BytesMut::clear

b42dfa8

djc enabled auto-merge (squash) November 1, 2023 12:06

djc merged commit 49aa4b6 into quinn-rs:main Nov 1, 2023
7 of 8 checks passed

Ralith mentioned this pull request Dec 24, 2023

proto: write outgoing packets to caller-supplied memory #729

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#729: proto: write outgoing packets to caller-supplied memory #1697

#729: proto: write outgoing packets to caller-supplied memory #1697

lmpn commented Oct 25, 2023

djc left a comment •

edited

lmpn commented Oct 26, 2023

Ralith commented Oct 26, 2023 •

edited

lmpn commented Oct 26, 2023

lmpn commented Oct 26, 2023

djc commented Oct 27, 2023

lmpn commented Oct 27, 2023

Ralith left a comment

Ralith commented Oct 27, 2023

lmpn commented Oct 28, 2023 •

edited

Ralith commented Oct 28, 2023

djc commented Oct 31, 2023

lmpn commented Nov 1, 2023 •

edited

Ralith left a comment

lmpn commented Nov 1, 2023

Ralith commented Nov 1, 2023

#729: proto: write outgoing packets to caller-supplied memory #1697

#729: proto: write outgoing packets to caller-supplied memory #1697

Conversation

lmpn commented Oct 25, 2023

djc left a comment • edited

Choose a reason for hiding this comment

lmpn commented Oct 26, 2023

Ralith commented Oct 26, 2023 • edited

lmpn commented Oct 26, 2023

lmpn commented Oct 26, 2023

djc commented Oct 27, 2023

lmpn commented Oct 27, 2023

Ralith left a comment

Choose a reason for hiding this comment

Ralith commented Oct 27, 2023

lmpn commented Oct 28, 2023 • edited

Ralith commented Oct 28, 2023

djc commented Oct 31, 2023

lmpn commented Nov 1, 2023 • edited

Ralith left a comment

Choose a reason for hiding this comment

lmpn commented Nov 1, 2023

Ralith commented Nov 1, 2023

djc left a comment •

edited

Ralith commented Oct 26, 2023 •

edited

lmpn commented Oct 28, 2023 •

edited

lmpn commented Nov 1, 2023 •

edited