Path MTU Discovery #64

martinthomson · 2016-12-01T23:52:46Z

Quoting Martin Duke:

RFC-Compliant ICMP PMTU messages include the
oversize packet's IP header and the first 8 bytes of the transport. This
allows TCP to demultiplex the ICMP to the relevant connection/path, and
check that the echoed sequence number corresponds to an in-flight segment,
and that the segment that would have violated the advertised MTU. As TCP
windows increase, it's true that this may be less effective.

For QUIC, these messages will contain only the UDP header. I believe this is
sufficient to demux to the right connection (QUIC allows the same 4-tuple
for multiple connection IDs, but it would be fine to apply it to all such
connections), but I see no clear mechanism to prevent a blind attacker that
guesses the 4-tuple from forcing a QUIC connection to revert to the min path
MTU. Some possibilities:
(a) Require routers to send more bytes. This is probably not deployable,
and in any case because packet numbers always start at zero this doesn't add
much security.
(b) Store UDP or IP data. We could require QUIC to store the UDP
checksum or IP ID in its send queue, thus allowing it to associate the
returned headers with a packet in its queue. This is ugly from a layering
perspective and may not be plausible with hardware offload of checksums,
various middleboxes, etc.
(c) Isolated DF packets. QUIC could pick a long time interval (10
seconds?) in which is sets DF on a single MTU-sized packet (which may have
to be padded). QUIC would only accept ICMP messages within an RTT or so of
having sent one of these MTU probe packets, or until it's acked. Thus the
probabability an off-path attacker could time it correctly is (RTT / 10
sec). If there's an extended burst of ICMP to overcome this, that's probably
an attack indication to handle either in IP, the firewall, or at worst in
QUIC. I have some language to codify this, and if we agree this is the
right approach I can share it.

ianswett · 2016-12-03T21:37:12Z

Agreed that (a) seems largely impractical. (b) is semi-practical, but the checksum is still computable by a passive observer.

Something along the lines of (c) is what I would recommend. It's somewhat similar to what our current experiment does today, though we ignore ICMP entirely and rely on loss as a signal. The negative of course is that

I don't think we want to support small MTUs for QUIC, so we should probably pick some minimum size that we expect all forthcoming CHLOs to fit into. Multipacket CHLOs make stateless rejects impractical, and networks that don't support packets at least ~1200 bytes in size are exceedingly rare.

alagoutte · 2016-12-05T08:02:56Z

@ianswett if i remember CHLO is always padded to maximum size (1370 bytes for IPv4 and 1350 bytes for IPv6) ? For avoid amplification ?

ianswett · 2016-12-05T14:47:34Z

All handshake packets are always padded to the maximum packet size to ensure the path supports the chosen MTU. So if the path doens't, the handshake fails. This has proven to be a very practical approach, only removing <1% of users who don't support 1350/1370 MTU sizes.

Conveniently, the CHLO needs to be padded for anti-amplifications reasons as well.

martinduke · 2016-12-05T23:14:23Z

"(b) is semi-practical, but the checksum is still computable by a passive observer."

Indeed, all ICMP is vulnerable to a passive observer, but if the header echo isn't there it's vulnerable to off-path attacks that guess the 4-tuple.

It seems odd to spend an enormous amount of energy to save a handful of bytes in headers and frames, and then turn around and use a conservatively low MTU, when a very common IPv4 UDP MSS is 1472 Bytes.

Regarding option (c), it makes a lot of sense for CHLO to be a max-size packet for these purposes, but it's not sufficient. Furthermore, using loss as the signal instead of ICMP seems like an inefficient solution.

ianswett · 2016-12-05T23:53:25Z

In practice today, we're losing less than 10% of the potential packet size, but I agree it isn't ideal.

Can you clarify why padding the CHLO is not sufficient?

martinduke · 2016-12-06T00:28:01Z

The server must probe the server->client path.
If the path changes (particularly with multipath), there must be some means of probing the MTU or you will have lots of packet fragmentation.

vasilvv · 2016-12-12T22:41:49Z

I am not sure QUIC can in general rely on the ICMP messages for path MTU discovery, since the ICMP messages are normally consumed by the kernel, which may not expose them to individual clients.

martinduke · 2016-12-13T01:00:33Z

I agree that user-space implementations will be reliant on OSes that do whatever they want with UDP ICMP messages. Perhaps the Path MTU discovery section needs lots of SHOULDs instead of MUSTs, because as QUIC migrates into the kernel (?) it can fix these problems. But at the very least, we can set the sockopts to not use the DF bit most of the time.

I should really put something together in a detailed proposal. But the outline should be something like this.

Conditions in which QUIC MUST or MAY send a packet at full-size (using PAD frames as necessary) with the DF bit set. -- this would certainly include the first CHLO and, probably, the second server-generated packet, in addition to packets involving new 4-tuples.
Reactions to loss of those packets
Reactions to ICMP packet too big messages -- would include SHOULDs involving storage of IP header fields, which would be most secure. I think some would say that QUIC SHOULD ignore these packets entirely, but I would disagree.

Relatedly, we could set a relatively high minimum MTU for QUIC connections (~1000 Bytes?). For legitimately low-MTU links, this would cause lots of fragmentation, but it would substantially mitigate off-path attackers.

What do all of you think?

martinduke · 2016-12-13T01:05:17Z

I should also add that the very first packet is a poor one to use to wait for loss, as the RTT is entirely unknown and must use a conservative RTO value. It might be better to use a relatively conservative value, as the draft does, at first, and probe upwards to see if there's more capacity to unlock.

rjshade · 2016-12-13T01:09:39Z

What's the benefit of relying on ICMP responses vs. doing MTU discovery at the QUIC layer ("packetization layer PMTUD" RFC 4821)?

martinduke · 2016-12-13T01:22:27Z

What's the benefit of relying on ICMP responses vs. doing MTU discovery at the QUIC layer ("packetization layer PMTUD" RFC 4821)?

Thanks for the reference; I had not seen that RFC.

Though ICMP messages have their disadvantages, there are four advantages over a loss-based scheme.

Immediate reporting of the actual PMTU, rather than a search process that is likely to undershoot the actual PMTU.
Precision only comes with many probes -> many losses that must be retransmitted.
Loss is an overloaded signal, meaning congestion or RF problems in different contexts. Therefore there is the possibility of error in interpreting a loss.
In many cases, detecting a loss will take much longer than an ICMP message sent from an in-path router.

Meanwhile, a MTU underestimate has not only packet overhead considerations, but also directly impacts the gross throughput possible via congestion control (which operates in multiples of the MSS).

ianswett · 2016-12-13T15:33:38Z

I think I'd like a 'trust but verify" approach to path MTU. In an ideal case, QUIC would get the ICMP message and verify that it really could get a non-fragmented packet through with that size. As long as the size was larger than the chosen handshake size, it would try it.

Some of what we've discussed(ie: padding the CHLO and SHLO and setting the DF bit) is what the implementation does today, and not including it in the draft was an oversight we really need to fix.

Actually, QUIC's congestion control(and I believe FreeBSD's) operates in bytes, not MSS. But I agree it's likely the network and host are more efficient with larger packets.

But please do a pull request with what you describe above, because I think you're going in a good direction, and it's just a matter of working out some details, which is easy to do in the comments of a PR.

martinduke · 2016-12-13T16:25:37Z

I think I'd like a 'trust but verify" approach to path MTU. In an ideal case, QUIC would get the ICMP message and verify that it really could get a non-fragmented packet through with that size. As long as the size was larger than the chosen handshake size, it would try it.

Stacks should ignore ICMP messages that increase the PMTU. The "only" issues with ICMP are non-conforming routers, and attackers (especially "off-path" attackers) that drive the MTU down to the minimum value.

Some of what we've discussed(ie: padding the CHLO and SHLO and setting the DF bit) is what the implementation does today, and not including it in the draft was an oversight we really need to fix.

That's great, but again, CHLO and SHLO will often have long RTOs, so loss-based MTU discovery is uniquely ill-suited to these packets.

Actually, QUIC's congestion control(and I believe FreeBSD's) operates in bytes, not MSS. But I agree it's likely the network and host are more efficient with larger packets.

I believe there's already a comment that QUIC congestion control is poorly spelled out in the draft. But the draft says it uses TCP congestion controls, which define their initial cwnd in multiples of MSS. In the absence of ABC, which is not listed in the draft, then acknowledgments increment cwnd in multiples of MSS as well.

But please do a pull request with what you describe above, because I think you're going in a good direction, and it's just a matter of working out some details, which is easy to do in the comments of a PR.

It might take me a week or two to get to it, but I will do so. Thanks for the encouragement!

ianswett · 2016-12-13T17:04:53Z

Stacks should ignore ICMP messages that *increase* the PMTU. The "only" issues with ICMP are non-conforming routers, and attackers (especially "off-path" attackers) that drive the MTU down to the minimum value.

Right, what I had in mind was completing the handshake, then sending out a RFC 4821 style PMTUD packet and if an ICMP message comes back, try that size one more time to see if it gets through. If the original probe got through, even though an ICMP message was received, then QUIC should ignore the ICMP message and stick with the probed size.

That's great, but again, CHLO and SHLO will often have long RTOs, so loss-based MTU discovery is uniquely ill-suited to these packets.

Today, most paths either block all UDP or support largish(ie: >1400 byte) MTUs, so there needs to be good handling for timeouts. Use of ICMP messages in the handshake could be useful when available, but it's an optimization to allow a few extra people to speak QUIC, at least on today's public internet, where it's necessary to have a TCP fallback. But we should specify what should happen in a fallback free world where ICMP is available to clients and servers.

I believe there's already a comment that QUIC congestion control is poorly spelled out in the draft. But the draft says it uses TCP congestion controls, which define their initial cwnd in multiples of MSS. In the absence of ABC, which is not listed in the draft, then acknowledgments increment cwnd in multiples of MSS as well.

Good point, I'll make sure ABC gets added along with a more fleshed out congestion control section.

It might take me a week or two to get to it, but I will do so. Thanks for the encouragement!

Looking forward to it.

martinduke · 2016-12-28T20:00:50Z

I'm not sure if the PR pings people who are tracking this issue, but I submitted two pull requests:
#105
#106

The first is my preferred version, which strongly recommends ICMP-based PMTU discovery. It makes somewhat bold assumptions about the real world:

ICMP black holes are rare enough to be handled by a MAY if people want to use PLPMTUD in addition to ICMP.
I personally tend to work in kernel space, but my quick glance at UDP socket APIs suggest that it's not a very hard problem to modify normal DF settings/ICMP handling in a user space implementation.
It strongly discourages a fixed, conservative PMTU. IMO it seems perverse to leave ~100 bytes on the table given all the complexity we're introduced to save a handful of bytes in packet and frame headers.

The second PR trashes all those assumptions, and is a very permissive (and wordy) spec that basically allows anything. It still adds a bunch of SHOULDs that make ICMP-based discovery work better in a QUIC context. This is the one piece where I feel strongly that QUIC's packetization section should not just reference a bunch of MTU RFCs.

I am interested in feedback on one or both, in particular which PR is a better basis for further editing.

MikeBishop · 2016-12-29T01:05:39Z

I think the spirit of QUIC so far has been "Be as efficient as possible; if we break something, there's always TCP." In that vein, the first seems more in keeping. On a purely technical basis, I don't have enough context to opine.

martinduke · 2016-12-31T05:45:04Z

On Tue, Dec 13, 2016 at 9:04 AM, ianswett ***@***.***> wrote: Right, what I had in mind was completing the handshake, then sending out a RFC 4821 style PMTUD packet and if an ICMP message comes back, try that size one more time to see if it gets through. If the original probe got through, even though an ICMP message was received, then QUIC should ignore the ICMP message and stick with the probed size.

Because there is no retransmission ambiguity, there is no need to try the larger size again. If the ack comes back for the original packet, than the ICMP message is spurious. I probably should have put this consideration in the pull request.

mcmanus · 2017-01-03T17:59:03Z

both 105 and 106 shift us from MAY use some kind of pmtud onto SHOULD use some kind of pmtud (the details of which vary). Functionally, that's creating a requirement of implementations that I don't think is justified as necessary by the experience so far and the complexity laid out in the PR.

given ian's experience of 90% effectiveness in comment in #64 (comment) I would be wary of introducing ICMP into this at all.

martinduke · 2017-01-03T18:11:25Z

given ian's experience of 90% effectiveness in comment in #64 (comment) I would be wary of introducing ICMP into this at all.

'90% effectiveness' means we're leaving about 150 bytes per datagram on the table. It would be fine to have a protocol that didn't packetize data all that efficiently in the name of simplicity, but if that is the goal then we should absolutely get rid of the many variable-length header fields, which introduce a ton of complexity for less than 100 bytes of savings in most cases.

mcmanus · 2017-01-03T19:17:10Z

I should have been more clear that I was making 2 different (but related) comments 1] PMTUD overall ought to remain a MAY 2] in describing PMTUD we could choose to detail a loss based in band approach or an ICMP approach (or a hybrid etc..). I meant to advocate for the in band approach because of concerns over the complexity of ICMP given its rather small impact here. * part of the complexity is simply ICMP is a whole different protocol stack - often with different same host consumers than the QUIC stack (as has been mentioned). * a bigger part of the complexity imo is that ICMP introduces unauthenticated and unencrypted inputs into the system. So you have to at least add the complexity of verifying them independently which undermines a lot of their original advantages over a loss based approach anyhow (e.g partially the argument about search space, the argument about faster) and who knows if this also enables meaningful traffic analysis such as identifying reliable vs non reliable streams, etc.. Much better in my opinion to keep all quic inputs authenticated as much as possible - and this is a place where it seems possible. I'm not actually a big fan of the variable-length header fields, but your argument isn't really apples to apples. Variable-Length-Encoded bytes are truly saved bandwidth, while the 150 byte MTU shortcoming relates to packet overhead ratios.. adding 150 bytes of data to each packet has about the same bandwidth impact as saving 6 or 7 actual bytes if my arithmetic worked out.

…

On Tue, Jan 3, 2017 at 1:11 PM, martinduke ***@***.***> wrote: given ian's experience of 90% effectiveness in comment in #64 <#64> (comment) I would be wary of introducing ICMP into this at all. '90% effectiveness' means we're leaving about 150 bytes per datagram on the table. It would be fine to have a protocol that didn't packetize data all that efficiently in the name of simplicity, but if that is the goal then we should absolutely get rid of the many variable-length header fields, which introduce a ton of complexity for less than 100 bytes of savings in most cases. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#64 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAP5sxEbWTvUmKtgbh7rwymjzSWrReJbks5rOo9NgaJpZM4LCCRR> .

martinthomson · 2017-01-20T07:34:26Z

I think that #106 is closer now, we should discuss at the interim.

mnot · 2017-01-26T02:13:47Z

As discussed in Tokyo, @martinduke to propose text for PR #106 to reduce the default packet size to the IPv6 default and recommend PLPMTUD with optional usage of ICMP information

martinthomson · 2017-02-09T02:57:06Z

#106 was merged, so this is now done.

martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Dec 1, 2016

ianswett mentioned this issue Dec 3, 2016

Minimum packet size #69

Closed

This was referenced Dec 28, 2016

PMTUD (ICMP variant) #105

Closed

PMTUD #106

Merged

mnot changed the title ~~PMTUD~~ Path MTU Discovery Jan 20, 2017

martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Jan 20, 2017

mnot added the editor-ready label Jan 26, 2017

mnot removed the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Jan 26, 2017

mnot assigned mnot and martinduke and unassigned mnot Jan 26, 2017

mnot mentioned this issue Jan 26, 2017

Minimum MTU #139

Closed

martinthomson closed this as completed Feb 9, 2017

mnot removed the editor-ready label Mar 7, 2017

mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Apr 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Path MTU Discovery #64

Path MTU Discovery #64

martinthomson commented Dec 1, 2016

ianswett commented Dec 3, 2016

alagoutte commented Dec 5, 2016

ianswett commented Dec 5, 2016

martinduke commented Dec 5, 2016

ianswett commented Dec 5, 2016

martinduke commented Dec 6, 2016

vasilvv commented Dec 12, 2016

martinduke commented Dec 13, 2016

martinduke commented Dec 13, 2016

rjshade commented Dec 13, 2016

martinduke commented Dec 13, 2016

ianswett commented Dec 13, 2016

martinduke commented Dec 13, 2016

ianswett commented Dec 13, 2016 via email •

edited

Loading

martinduke commented Dec 28, 2016 •

edited

Loading

MikeBishop commented Dec 29, 2016

martinduke commented Dec 31, 2016 via email

mcmanus commented Jan 3, 2017

martinduke commented Jan 3, 2017

mcmanus commented Jan 3, 2017 via email

martinthomson commented Jan 20, 2017

mnot commented Jan 26, 2017

martinthomson commented Feb 9, 2017

Path MTU Discovery #64

Path MTU Discovery #64

Comments

martinthomson commented Dec 1, 2016

ianswett commented Dec 3, 2016

alagoutte commented Dec 5, 2016

ianswett commented Dec 5, 2016

martinduke commented Dec 5, 2016

ianswett commented Dec 5, 2016

martinduke commented Dec 6, 2016

vasilvv commented Dec 12, 2016

martinduke commented Dec 13, 2016

martinduke commented Dec 13, 2016

rjshade commented Dec 13, 2016

martinduke commented Dec 13, 2016

ianswett commented Dec 13, 2016

martinduke commented Dec 13, 2016

ianswett commented Dec 13, 2016 via email • edited Loading

martinduke commented Dec 28, 2016 • edited Loading

MikeBishop commented Dec 29, 2016

martinduke commented Dec 31, 2016 via email

mcmanus commented Jan 3, 2017

martinduke commented Jan 3, 2017

mcmanus commented Jan 3, 2017 via email

martinthomson commented Jan 20, 2017

mnot commented Jan 26, 2017

martinthomson commented Feb 9, 2017

ianswett commented Dec 13, 2016 via email •

edited

Loading

martinduke commented Dec 28, 2016 •

edited

Loading