perf: don't allocate in UDP recv path #2076

mxinden · 2024-08-27T14:15:17Z

Previously neqo-udp would have one long-lived receive buffer, but after reading into the buffer from the socket, it would allocate a new Vec for each UDP segment.

This commit does not allocate each UDP segment in a new Vec, but instead passes the single re-used receive buffer to
neqo_transport::Connection::process_input directly.

Part of #1693.

Draft for now. Want to see benchmark results before investing further.

Previously `neqo-udp` would have one long-lived receive buffer, but after reading into the buffer from the socket, it would allocate a new `Vec` for each UDP segment. This commit does not allocate each UDP segment in a new `Vec`, but instead passes the single re-used receive buffer to `neqo_transport::Connection::process_input` directly. Part of mozilla#1693.

github-actions · 2024-08-27T14:34:28Z

Benchmark results

Performance differences relative to 910a7cd.

coalesce_acked_from_zero 1+1 entries: Change within noise threshold.

       time:   [98.743 ns 99.096 ns 99.452 ns]
       change: [-1.4056% -0.8996% -0.3504%] (p = 0.00 < 0.05)
Found 11 outliers among 100 measurements (11.00%)

7 (7.00%) high mild

4 (4.00%) high severe

coalesce_acked_from_zero 3+1 entries: 💚 Performance has improved.

       time:   [116.86 ns 117.16 ns 117.50 ns]
       change: [-2.6326% -1.6901% -1.0195%] (p = 0.00 < 0.05)
Found 19 outliers among 100 measurements (19.00%)

3 (3.00%) low severe

1 (1.00%) low mild

4 (4.00%) high mild

11 (11.00%) high severe

coalesce_acked_from_zero 10+1 entries: Change within noise threshold.

       time:   [116.40 ns 116.80 ns 117.30 ns]
       change: [-2.1473% -1.4459% -0.7655%] (p = 0.00 < 0.05)
Found 20 outliers among 100 measurements (20.00%)

8 (8.00%) low severe

2 (2.00%) low mild

4 (4.00%) high mild

6 (6.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.

       time:   [97.168 ns 101.41 ns 110.93 ns]
       change: [-2.7203% +0.8236% +6.4501%] (p = 0.82 > 0.05)
Found 10 outliers among 100 measurements (10.00%)

2 (2.00%) high mild

8 (8.00%) high severe

RxStreamOrderer::inbound_frame(): No change in performance detected.

       time:   [111.52 ms 111.66 ms 111.88 ms]
       change: [-0.2530% -0.1178% +0.0977%] (p = 0.20 > 0.05)
Found 2 outliers among 100 measurements (2.00%)

1 (1.00%) low mild

1 (1.00%) high severe

transfer/pacing-false/varying-seeds: No change in performance detected.

       time:   [25.985 ms 26.869 ms 27.751 ms]
       change: [-7.4673% -2.9356% +2.1097%] (p = 0.24 > 0.05)

transfer/pacing-true/varying-seeds: No change in performance detected.

       time:   [34.891 ms 36.585 ms 38.286 ms]
       change: [-4.9325% +1.5534% +8.5890%] (p = 0.64 > 0.05)

transfer/pacing-false/same-seed: No change in performance detected.

       time:   [31.107 ms 31.822 ms 32.519 ms]
       change: [-4.9157% -1.8751% +1.1009%] (p = 0.24 > 0.05)
Found 1 outliers among 100 measurements (1.00%)

1 (1.00%) low mild

transfer/pacing-true/same-seed: No change in performance detected.

       time:   [39.949 ms 42.962 ms 45.965 ms]
       change: [-12.157% -4.0154% +5.2027%] (p = 0.39 > 0.05)

1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.

       time:   [111.05 ms 111.36 ms 111.65 ms]
       thrpt:  [895.64 MiB/s 898.02 MiB/s 900.48 MiB/s]
change:
       time:   [-2.9276% -2.4889% -2.0409%] (p = 0.00 < 0.05)
       thrpt:  [+2.0835% +2.5525% +3.0159%]
Found 2 outliers among 100 measurements (2.00%)

1 (1.00%) low mild

1 (1.00%) high mild

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.

       time:   [310.54 ms 314.53 ms 318.58 ms]
       thrpt:  [31.390 Kelem/s 31.793 Kelem/s 32.202 Kelem/s]
change:
       time:   [-1.5604% +0.2356% +1.9649%] (p = 0.79 > 0.05)
       thrpt:  [-1.9270% -0.2350% +1.5852%]

1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.

       time:   [40.418 ms 41.136 ms 41.853 ms]
       thrpt:  [23.893  elem/s 24.310  elem/s 24.741  elem/s]
change:
       time:   [-2.2850% +0.0149% +2.5328%] (p = 1.00 > 0.05)
       thrpt:  [-2.4703% -0.0149% +2.3385%]

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client	Server	CC	Pacing	Mean [ms]	Min [ms]	Max [ms]	Relative
msquic	msquic			157.0 ± 109.5	81.6	484.1	1.00
neqo	msquic	reno	on	207.1 ± 11.3	194.7	234.7	1.00
neqo	msquic	reno		204.4 ± 11.2	193.3	223.2	1.00
neqo	msquic	cubic	on	209.7 ± 11.4	199.0	230.8	1.00
neqo	msquic	cubic		222.4 ± 14.2	201.2	245.9	1.00
msquic	neqo	reno	on	87.7 ± 21.0	74.0	174.7	1.00
msquic	neqo	reno		85.8 ± 21.8	73.7	166.7	1.00
msquic	neqo	cubic	on	83.8 ± 13.7	73.6	127.1	1.00
msquic	neqo	cubic		84.9 ± 22.0	74.3	182.1	1.00
neqo	neqo	reno	on	169.0 ± 83.0	122.3	403.1	1.00
neqo	neqo	reno		147.4 ± 59.5	118.6	387.2	1.00
neqo	neqo	cubic	on	148.6 ± 25.0	120.9	220.7	1.00
neqo	neqo	cubic		176.2 ± 75.2	116.5	394.2	1.00

⬇️ Download logs

github-actions · 2024-08-27T14:40:11Z

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

lsquic vs. neqo-latest: run cancelled after 20 min
msquic vs. neqo-latest: U
mvfst vs. neqo-latest: Z A L1 C1
ngtcp2 vs. neqo-latest: run cancelled after 20 min
quinn vs. neqo-latest: V2
xquic vs. neqo-latest: M

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. go-x-net: H DC LR M B U A L2 C2 6
neqo-latest vs. haproxy: LR C20 M S R Z 3 B U A L2 C1 C2 6 V2
neqo-latest vs. kwik: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
neqo-latest vs. lsquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. msquic: H DC LR C20 M S R B U L1 L2 C2 6 V2
neqo-latest vs. mvfst: H DC LR M R Z 3 B U L2 C2 6
neqo-latest vs. neqo: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. nginx: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
neqo-latest vs. ngtcp2: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
neqo-latest vs. picoquic: H DC LR C20 M S R Z 3 B U E A L1 L2 C2 6 V2
neqo-latest vs. quic-go: H DC LR C20 M S R Z 3 B U A L2 C1 C2 6
neqo-latest vs. quiche: DC LR C20 M S R Z 3 B U A L2 C2 6
neqo-latest vs. quinn: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. s2n-quic: H DC LR C20 M S R 3 B U E A L1 L2 C1 C2 6
neqo-latest vs. xquic: H DC LR C20 M R Z 3 B U L2 C2 6

neqo-latest as server

aioquic vs. neqo-latest: H DC LR C20 M S R Z 3 B A L1 L2 C1 C2 6 V2
chrome vs. neqo-latest: 3
go-x-net vs. neqo-latest: H DC LR M B U A L2 C2 6
kwik vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6 V2
msquic vs. neqo-latest: H DC LR C20 M S R Z B A L1 L2 C1 C2 6 V2
mvfst vs. neqo-latest: H DC LR M 3 B L2 C2 6
neqo vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
picoquic vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6 V2
quic-go vs. neqo-latest: H DC LR C20 M S R Z 3 B U A L1 L2 C1 C2 6
quiche vs. neqo-latest: H DC LR M S R Z 3 B A L1 L2 C1 C2 6
quinn vs. neqo-latest: H DC LR C20 M S R Z 3 B U E A L1 L2 C1 C2 6
s2n-quic vs. neqo-latest: H DC LR M S R 3 B E A L1 L2 C1 C2 6
xquic vs. neqo-latest: H DC LR C20 S R Z 3 B U A L1 L2 C1 C2 6

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest vs. aioquic: E
neqo-latest vs. go-x-net: C20 S R Z 3 E L1 C1 V2
neqo-latest vs. haproxy: E
neqo-latest vs. kwik: E
neqo-latest vs. msquic: 3 E
neqo-latest vs. mvfst: C20 S E V2
neqo-latest vs. nginx: E V2
neqo-latest vs. quic-go: E V2
neqo-latest vs. quiche: E V2
neqo-latest vs. quinn: V2
neqo-latest vs. s2n-quic: Z V2
neqo-latest vs. xquic: S E V2

neqo-latest as server

aioquic vs. neqo-latest: U E
chrome vs. neqo-latest: H DC LR C20 M S R Z B U E A L1 L2 C1 C2 6 V2
go-x-net vs. neqo-latest: C20 S R Z 3 E L1 C1 V2
kwik vs. neqo-latest: E
msquic vs. neqo-latest: 3 E
mvfst vs. neqo-latest: C20 S R U E V2
quic-go vs. neqo-latest: E V2
quiche vs. neqo-latest: C20 U E V2
s2n-quic vs. neqo-latest: C20 Z U V2
xquic vs. neqo-latest: E V2

github-actions · 2024-08-27T15:19:59Z

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

Linux: Debug Release
macOS: Debug Release
Windows: Debug Release

martinthomson

Is it possible to change the code so that input datagrams are taken with a reference to the underlying buffer instead?

You might need to split Datagram in two parts to do that, one with an owned buffer and one with a borrowed one (maybe; for output). Ideally though we never have to own the buffer.

This change seems like it would be harder to undo that doing this right from the outset.

mxinden · 2024-09-08T09:53:25Z

Thank you for taking a look @martinthomson.

Is it possible to change the code so that input datagrams are taken with a reference to the underlying buffer instead?

Yes. I created #2093 implementing your suggestion above. It is a Draft for now, but I would argue goes beyond a proof-of-concept.

Closing here in favor of #2093.

martinthomson reviewed Aug 28, 2024

View reviewed changes

mxinden mentioned this pull request Sep 8, 2024

perf: don't allocate in UDP send & recv path #2093

Draft

mxinden closed this Sep 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: don't allocate in UDP recv path #2076

perf: don't allocate in UDP recv path #2076

mxinden commented Aug 27, 2024

github-actions bot commented Aug 27, 2024

github-actions bot commented Aug 27, 2024

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

github-actions bot commented Aug 27, 2024

martinthomson left a comment

mxinden commented Sep 8, 2024

perf: don't allocate in UDP recv path #2076

perf: don't allocate in UDP recv path #2076

Conversation

mxinden commented Aug 27, 2024

github-actions bot commented Aug 27, 2024

Benchmark results

Client/server transfer results

github-actions bot commented Aug 27, 2024

Failed Interop Tests

neqo-latest as client

neqo-latest as server

Succeeded Interop Tests

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

neqo-latest as client

neqo-latest as server

github-actions bot commented Aug 27, 2024

Firefox builds for this PR

martinthomson left a comment

Choose a reason for hiding this comment

mxinden commented Sep 8, 2024