-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: don't allocate in UDP recv path #2076
Conversation
Previously `neqo-udp` would have one long-lived receive buffer, but after reading into the buffer from the socket, it would allocate a new `Vec` for each UDP segment. This commit does not allocate each UDP segment in a new `Vec`, but instead passes the single re-used receive buffer to `neqo_transport::Connection::process_input` directly. Part of mozilla#1693.
Benchmark resultsPerformance differences relative to 910a7cd. coalesce_acked_from_zero 1+1 entries: Change within noise threshold.time: [98.743 ns 99.096 ns 99.452 ns] change: [-1.4056% -0.8996% -0.3504%] (p = 0.00 < 0.05) coalesce_acked_from_zero 3+1 entries: 💚 Performance has improved.time: [116.86 ns 117.16 ns 117.50 ns] change: [-2.6326% -1.6901% -1.0195%] (p = 0.00 < 0.05) coalesce_acked_from_zero 10+1 entries: Change within noise threshold.time: [116.40 ns 116.80 ns 117.30 ns] change: [-2.1473% -1.4459% -0.7655%] (p = 0.00 < 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [97.168 ns 101.41 ns 110.93 ns] change: [-2.7203% +0.8236% +6.4501%] (p = 0.82 > 0.05) RxStreamOrderer::inbound_frame(): No change in performance detected.time: [111.52 ms 111.66 ms 111.88 ms] change: [-0.2530% -0.1178% +0.0977%] (p = 0.20 > 0.05) transfer/pacing-false/varying-seeds: No change in performance detected.time: [25.985 ms 26.869 ms 27.751 ms] change: [-7.4673% -2.9356% +2.1097%] (p = 0.24 > 0.05) transfer/pacing-true/varying-seeds: No change in performance detected.time: [34.891 ms 36.585 ms 38.286 ms] change: [-4.9325% +1.5534% +8.5890%] (p = 0.64 > 0.05) transfer/pacing-false/same-seed: No change in performance detected.time: [31.107 ms 31.822 ms 32.519 ms] change: [-4.9157% -1.8751% +1.1009%] (p = 0.24 > 0.05) transfer/pacing-true/same-seed: No change in performance detected.time: [39.949 ms 42.962 ms 45.965 ms] change: [-12.157% -4.0154% +5.2027%] (p = 0.39 > 0.05) 1-conn/1-100mb-resp (aka. Download)/client: 💚 Performance has improved.time: [111.05 ms 111.36 ms 111.65 ms] thrpt: [895.64 MiB/s 898.02 MiB/s 900.48 MiB/s] change: time: [-2.9276% -2.4889% -2.0409%] (p = 0.00 < 0.05) thrpt: [+2.0835% +2.5525% +3.0159%] 1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.time: [310.54 ms 314.53 ms 318.58 ms] thrpt: [31.390 Kelem/s 31.793 Kelem/s 32.202 Kelem/s] change: time: [-1.5604% +0.2356% +1.9649%] (p = 0.79 > 0.05) thrpt: [-1.9270% -0.2350% +1.5852%] 1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.time: [40.418 ms 41.136 ms 41.853 ms] thrpt: [23.893 elem/s 24.310 elem/s 24.741 elem/s] change: time: [-2.2850% +0.0149% +2.5328%] (p = 1.00 > 0.05) thrpt: [-2.4703% -0.0149% +2.3385%] Client/server transfer resultsTransfer of 33554432 bytes over loopback.
|
Failed Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to change the code so that input datagrams are taken with a reference to the underlying buffer instead?
You might need to split Datagram in two parts to do that, one with an owned buffer and one with a borrowed one (maybe; for output). Ideally though we never have to own the buffer.
This change seems like it would be harder to undo that doing this right from the outset.
Thank you for taking a look @martinthomson.
Yes. I created #2093 implementing your suggestion above. It is a Draft for now, but I would argue goes beyond a proof-of-concept. Closing here in favor of #2093. |
Previously
neqo-udp
would have one long-lived receive buffer, but after reading into the buffer from the socket, it would allocate a newVec
for each UDP segment.This commit does not allocate each UDP segment in a new
Vec
, but instead passes the single re-used receive buffer toneqo_transport::Connection::process_input
directly.Part of #1693.
Draft for now. Want to see benchmark results before investing further.