-
-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for PLPMTUD #1510
Conversation
e740135
to
e4fa2ed
Compare
Per RFC 9000 § Retransmission of Information, QUIC retransmission operates at frame granularity, not packet granularity. In particular note:
So, there should be no need to bypass the normal loss detection machinery. However, note that there is a subtle interaction with congestion control:
|
@Ralith thanks for the feedback! I think I finally figured out how to use Next week I plan to hunt down the test failure (unfortunately, the test runner doesn't give info about which test actually failed, other than |
Hmm, we should probably set RUST_BACKTRACE=1 in the test environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick incomplete pass, will spend some more time soon.
quinn-proto/src/connection/mod.rs
Outdated
if buf.capacity() < probe_size as usize { | ||
buf.reserve(probe_size as usize - buf.capacity()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reserve
is relative to the current len
, not the capacity; further, we already know len
is zero, and reserve
already noops if redundant, so we might as well call it unconditionally.
if buf.capacity() < probe_size as usize { | |
buf.reserve(probe_size as usize - buf.capacity()); | |
} | |
buf.reserve(probe_size as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was a bit confused by the idea of tracking capacity separately from the buffer itself. I might explore extracting that logic into its own struct in a future PR, to remove that bit of complexity
Thanks for the suggestions! Next week I'll have time to go through them ;) By the way, we'll probably want to tweak a few things (please let me know if you think differently):
Also interesting: how do you think this change should affect versioning? Once this is merged, endpoints will be doing a few extra requests than usual, which might be unexpected for the crate's users. Related to this, do you think it is worthwhile to provide a way to disable PLPMTUD? |
I don't think this is reason to do a semver-incompatible version bump, if that's what you are referring to, as long as we don't remove any API that people might currently be referring to. That said, I think we'll want to release 0.10 in the not-too-distant future (see #1515), so it might make sense to release this at the same time. |
I think it's fine to silently enable it by default, but I can imagine use cases where it might be unnecessary (known networks) or harmful (endpoints which want to sleep for very long periods). A good approach might be to make the interval configurable through an |
Today I addressed @Ralith's feedback, found the test that was crashing Here are some interesting bits and a few questions:
|
Why once rather than never? |
No strong preference here, I just thought you meant once. I'll go with never when the interval is |
6fe50b5
to
fae41a8
Compare
Just pushed an updated (and rebased) version of the PR, with updated docs, passing tests and all the features we discussed above. There are still a few things I want to look into (tomorrow), before marking this PR as "ready for review" (instead of it being a draft):
Also, there is this comment in the old PR, mentioning that we should only enable PLPMTUD on platforms where we are setting the DF IP bit on. Is that still relevant @Ralith? Finally, I added a commit to make |
Do we strictly need an upper bound? We could do open-ended "binary search" by growing the probe size exponentially until it starts failing. That'd be cool for Just Working on localhost, LANs with arbitrarily large jumbo frames, and so forth. Though, we probably should take care not to overly sacrifice rate of convergence in a typical environment for these niche cases. I'm not strongly opposed to a dedicated knob, but it's nice to have universal logic when possible. We might want to tune it carefully to test typical internet limits first regardless.
Capturing our discussion in chat: I'm happy to turn this into an initial guess rather than a hard floor.
Yes; I believe we currently only set that bit on Linux and Windows. |
The upper bound is not strictly needed, but since the peer and the local endpoint have a configured
Note that each failed attempt at a particular MTU results in 3 probes being sent. After the third lost probe, we conclude that the probed MTU is not supported by the network path. I took the time to experiment using exponential search to find the upper bound before doing binary search. Here are the results:
This leads me to think we should keep the current approach, and require users to configure |
Nice analysis! I concur that prioritizing convergence for realistic cases should take precedence.
Hmm, the RFC says:
and specifies the default value as 65527, so this may not be a good hint in practice, and our current default of 1480 might be inappropriate. For the sake of fast convergence I'd be happy to have a dedicated transport config field for MTUD upper bound, and default it to the appropriate value for a typical internet path. Future work could perhaps look at occasional exponential probes above this if the upper bound is reached. |
70ac892
to
c48e2dc
Compare
Thanks for the feedback. I ended up introducing a Here's another update and some additional questions:
|
When did you request an invite?
Looks like it's
If you think it's reasonably close, mark it as ready and I'll do a full review pass. |
Yesterday, 10:00 CET (so quite recently). I am not sure my email went through, though, as I didn't receive a confirmation or anything of the sort (I wrote to quic-chairs@ietf.org, as recommended on https://quicwg.org)
Thanks! I'll probably do that next week (I want to look into the DF bit stuff first, and maybe refactor the |
So today I've been looking into updating the congestion controller's I also looked into enabling PLPMTUD only on platforms that set the IP DF bit. The technical side is straightforward, but the API design side less so. Assuming we want to keep
All in all I think option 2 is the most viable, so the current implementation disables PLPMTUD by default. With all this said, @djc I think the PR is ready for a thorough review, so I have marked it as ready. |
That sounds correct to me. This breakage is fine; we already have breakage pending/planned, and most users probably aren't using this interface. That said, happy to leave this for future work.
Since |
Just rebased on top of latest main and pushed a commit modifying the benchmarks, perf, and examples. I think there isn't anything left to be done on my side. @djc could you have a look when you have the time? The next things I plan to pick up in follow-up PRs:
|
quinn-proto/src/config.rs
Outdated
pub fn initial_max_udp_payload_size(&mut self, value: u16) -> &mut Self { | ||
self.initial_max_udp_payload_size = value.max(INITIAL_MAX_UDP_PAYLOAD_SIZE); | ||
self | ||
} | ||
|
||
/// Specifies the PLPMTUD config (see [`PlpmtudConfig`] for details). Defaults to `None`, which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: for rustdoc a single-line/single-sentence description as the first line of a docstring usually works best. I think defaults value are often mentioned at the end? (Especially in this case, since it's an obvious default.)
Also I think using the PLPMTUD acronym without expanding it is not great here? We should definitely use that acronym somewhere here (and fully expand it), but I think I would call the option mtu_discovery_config
(and the type accordingly).
quinn-proto/src/config.rs
Outdated
/// If you are using the `quinn` crate, you can safely enable PLPMTUD on Windows and Linux. If | ||
/// you are using `quinn-proto`, you will need to ensure your IO layer properly sets the IP DF | ||
/// bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits: the limitations here are kind of abstract and vague. What should I expect to happen if I configure this on macOS or iOS? Even quinn can be used with custom implementations of AsyncUdpSocket
, so I think it would be better to be more precise about where/how support for this stuff is enabled.
(Maybe something like: "Support for MTU discovery depends on platform support for disabling UDP packet fragmentation. This is handled by the quinn-udp implementation used in Quinn by default, but quinn-proto users and custom AsyncUdpSocket
implementers should ensure .. . If they don't, ..".)
quinn-proto/src/config.rs
Outdated
@@ -316,6 +338,67 @@ impl fmt::Debug for TransportConfig { | |||
} | |||
} | |||
|
|||
/// Parameters governing PLPMTUD. | |||
/// | |||
/// PLPMTUD performs a binary search, trying to find the highest packet size that is still |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a bit of context on what MTU is and the goals of discovery?
quinn-proto/src/config.rs
Outdated
/// Specifies the time to wait after completing PLPMTUD before starting a new MTU discovery | ||
/// round. Defaults to 600 seconds, as recommended by | ||
/// [RFC 8899](https://www.rfc-editor.org/rfc/rfc8899). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: let's add an empty line between these two sentences to help rustdoc.
quinn-proto/src/config.rs
Outdated
#[derive(Clone, Debug)] | ||
pub struct PlpmtudConfig { | ||
pub(crate) interval: Duration, | ||
pub(crate) max_udp_payload_size_upper_bound: u16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we could do away with the max_udp_payload_size_
prefix on this one, or use a shorter one? Seems pretty obvious from the context.
quinn-proto/src/connection/paths.rs
Outdated
} | ||
|
||
/// Notifies the [`MtuDiscovery`] that the in-flight MTU probe was lost | ||
pub fn on_probe_packet_lost(&mut self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming: maybe lose the packet
word in both on_probe_packet_lost()
and on_non_probe_packet_lost()
?
quinn-proto/src/connection/paths.rs
Outdated
if packet_size > BASE_UDP_PAYLOAD_SIZE | ||
&& self | ||
.largest_acked_mtu_sized_packet | ||
.map_or(true, |pn| packet_number > pn) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: invert this condition so we can return early?
quinn-proto/src/connection/paths.rs
Outdated
#[derive(Debug, Clone)] | ||
struct SearchState { | ||
/// The lower bound for the current binary search | ||
search_lower_bound: u16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe drop the search_
prefixes here?
quinn-proto/src/connection/paths.rs
Outdated
/// Creates a new search state, with the specified lower bound (the upper bound is derived from | ||
/// the config and the peer's `max_udp_payload_size` transport parameter) | ||
fn new(lower_bound: u16, peer_max_udp_payload_size: u16, config: &PlpmtudConfig) -> Self { | ||
let search_upper_bound = config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using clamp()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, clamp
will panic if min > max
, which is something we should handle robustly here (i.e. we want config.upper_bound.clamp(lower_bound, peer_max_udp_payload_size)
, but it is possible that lower_bound
is higher than peer_max_udp_payload_size
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, maybe add a comment about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While adding the comment I figured out a way to use clamp after all, just pushed it ;)
@@ -142,6 +142,8 @@ pub fn transport_config(opt: &Opt) -> quinn::TransportConfig { | |||
// High stream windows are chosen because the amount of concurrent streams | |||
// is configurable as a parameter. | |||
let mut config = quinn::TransportConfig::default(); | |||
#[cfg(any(windows, os = "linux"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having this all over the place, could we centralize this somehow? A single const
governing MTU_DISCOVERY_SUPPORTED
? Maybe there should also be a method on quinn_udp::UdpState
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A method seems like overkill since the value won't ever change for a given build. Applications not using quinn-udp at all logically shouldn't care about quinn-udp's constants.
Sponsored by Stormshield
f9837ec
to
ecd4126
Compare
@djc I just pushed an amended (and rebased) version of the commits that address all your suggestions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! A few minor things found in the second round.
quinn-proto/src/connection/mtud.rs
Outdated
/// | ||
/// See [`MtuDiscoveryConfig`] for details | ||
#[derive(Clone)] | ||
pub struct MtuDiscovery { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this (and other stuff in this module) should be pub(crate)
, not pub
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I assume using pub(crate)
on the structs suffices, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1526 about the standard I'd be going for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I just pushed a change to use pub(crate)
instead of pub
in mtud.mod
@djc just pushed changes addressing your comments |
Sponsored by Stormshield
Sponsored by Stormshield
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff, thank you! This feature's been long-awaited.
This PR resurrects @djc's old #804. As of yet, it is just a draft. IMO it makes sense to look at the commits separately (we might want to squash them in the end, though).
Below are some bits of feedback from the original PR that I'm planning to include before considering this PR as ready for review (feel free to suggest more):
MtuDiscovery::max_udp_payload_size
, to better convey that onlyMtuDiscovery
itself is meant to modify itLooking at the original PR, there is one question that comes up: I am tracking ACKs for MTU probes manually (e.g. without usingI messed up and finally figured out how to get things working :)PackageBuilder::finalize_and_track
), because otherwise things like automatic retransmission kick in. That seems different from what @djc did in the original PR (he seems to get ACK and packet loss detection automatically, without additional code). Am I working in the right direction, or did I miss a function I could use to get ACKs + lost packets for free like in the original PR?Closes #69