Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path MTU Discovery #64

Closed
martinthomson opened this issue Dec 1, 2016 · 23 comments
Closed

Path MTU Discovery #64

martinthomson opened this issue Dec 1, 2016 · 23 comments
Assignees
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@martinthomson
Copy link
Member

Quoting Martin Duke:

RFC-Compliant ICMP PMTU messages include the
oversize packet's IP header and the first 8 bytes of the transport. This
allows TCP to demultiplex the ICMP to the relevant connection/path, and
check that the echoed sequence number corresponds to an in-flight segment,
and that the segment that would have violated the advertised MTU. As TCP
windows increase, it's true that this may be less effective.

For QUIC, these messages will contain only the UDP header. I believe this is
sufficient to demux to the right connection (QUIC allows the same 4-tuple
for multiple connection IDs, but it would be fine to apply it to all such
connections), but I see no clear mechanism to prevent a blind attacker that
guesses the 4-tuple from forcing a QUIC connection to revert to the min path
MTU. Some possibilities:
(a) Require routers to send more bytes. This is probably not deployable,
and in any case because packet numbers always start at zero this doesn't add
much security.
(b) Store UDP or IP data. We could require QUIC to store the UDP
checksum or IP ID in its send queue, thus allowing it to associate the
returned headers with a packet in its queue. This is ugly from a layering
perspective and may not be plausible with hardware offload of checksums,
various middleboxes, etc.
(c) Isolated DF packets. QUIC could pick a long time interval (10
seconds?) in which is sets DF on a single MTU-sized packet (which may have
to be padded). QUIC would only accept ICMP messages within an RTT or so of
having sent one of these MTU probe packets, or until it's acked. Thus the
probabability an off-path attacker could time it correctly is (RTT / 10
sec). If there's an extended burst of ICMP to overcome this, that's probably
an attack indication to handle either in IP, the firewall, or at worst in
QUIC. I have some language to codify this, and if we agree this is the
right approach I can share it.

@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Dec 1, 2016
@ianswett
Copy link
Contributor

ianswett commented Dec 3, 2016

Agreed that (a) seems largely impractical. (b) is semi-practical, but the checksum is still computable by a passive observer.

Something along the lines of (c) is what I would recommend. It's somewhat similar to what our current experiment does today, though we ignore ICMP entirely and rely on loss as a signal. The negative of course is that

I don't think we want to support small MTUs for QUIC, so we should probably pick some minimum size that we expect all forthcoming CHLOs to fit into. Multipacket CHLOs make stateless rejects impractical, and networks that don't support packets at least ~1200 bytes in size are exceedingly rare.

@alagoutte
Copy link
Contributor

@ianswett if i remember CHLO is always padded to maximum size (1370 bytes for IPv4 and 1350 bytes for IPv6) ? For avoid amplification ?

@ianswett
Copy link
Contributor

ianswett commented Dec 5, 2016

All handshake packets are always padded to the maximum packet size to ensure the path supports the chosen MTU. So if the path doens't, the handshake fails. This has proven to be a very practical approach, only removing <1% of users who don't support 1350/1370 MTU sizes.

Conveniently, the CHLO needs to be padded for anti-amplifications reasons as well.

@martinduke
Copy link
Contributor

"(b) is semi-practical, but the checksum is still computable by a passive observer."

Indeed, all ICMP is vulnerable to a passive observer, but if the header echo isn't there it's vulnerable to off-path attacks that guess the 4-tuple.

It seems odd to spend an enormous amount of energy to save a handful of bytes in headers and frames, and then turn around and use a conservatively low MTU, when a very common IPv4 UDP MSS is 1472 Bytes.

Regarding option (c), it makes a lot of sense for CHLO to be a max-size packet for these purposes, but it's not sufficient. Furthermore, using loss as the signal instead of ICMP seems like an inefficient solution.

@ianswett
Copy link
Contributor

ianswett commented Dec 5, 2016

In practice today, we're losing less than 10% of the potential packet size, but I agree it isn't ideal.

Can you clarify why padding the CHLO is not sufficient?

@martinduke
Copy link
Contributor

  1. The server must probe the server->client path.

  2. If the path changes (particularly with multipath), there must be some means of probing the MTU or you will have lots of packet fragmentation.

@vasilvv
Copy link
Contributor

vasilvv commented Dec 12, 2016

I am not sure QUIC can in general rely on the ICMP messages for path MTU discovery, since the ICMP messages are normally consumed by the kernel, which may not expose them to individual clients.

@martinduke
Copy link
Contributor

I agree that user-space implementations will be reliant on OSes that do whatever they want with UDP ICMP messages. Perhaps the Path MTU discovery section needs lots of SHOULDs instead of MUSTs, because as QUIC migrates into the kernel (?) it can fix these problems. But at the very least, we can set the sockopts to not use the DF bit most of the time.

I should really put something together in a detailed proposal. But the outline should be something like this.

  1. Conditions in which QUIC MUST or MAY send a packet at full-size (using PAD frames as necessary) with the DF bit set. -- this would certainly include the first CHLO and, probably, the second server-generated packet, in addition to packets involving new 4-tuples.
  2. Reactions to loss of those packets
  3. Reactions to ICMP packet too big messages -- would include SHOULDs involving storage of IP header fields, which would be most secure. I think some would say that QUIC SHOULD ignore these packets entirely, but I would disagree.

Relatedly, we could set a relatively high minimum MTU for QUIC connections (~1000 Bytes?). For legitimately low-MTU links, this would cause lots of fragmentation, but it would substantially mitigate off-path attackers.

What do all of you think?

@martinduke
Copy link
Contributor

I should also add that the very first packet is a poor one to use to wait for loss, as the RTT is entirely unknown and must use a conservative RTO value. It might be better to use a relatively conservative value, as the draft does, at first, and probe upwards to see if there's more capacity to unlock.

@rjshade
Copy link

rjshade commented Dec 13, 2016

What's the benefit of relying on ICMP responses vs. doing MTU discovery at the QUIC layer ("packetization layer PMTUD" RFC 4821)?

@martinduke
Copy link
Contributor

What's the benefit of relying on ICMP responses vs. doing MTU discovery at the QUIC layer ("packetization layer PMTUD" RFC 4821)?

Thanks for the reference; I had not seen that RFC.

Though ICMP messages have their disadvantages, there are four advantages over a loss-based scheme.

  1. Immediate reporting of the actual PMTU, rather than a search process that is likely to undershoot the actual PMTU.
  2. Precision only comes with many probes -> many losses that must be retransmitted.
  3. Loss is an overloaded signal, meaning congestion or RF problems in different contexts. Therefore there is the possibility of error in interpreting a loss.
  4. In many cases, detecting a loss will take much longer than an ICMP message sent from an in-path router.

Meanwhile, a MTU underestimate has not only packet overhead considerations, but also directly impacts the gross throughput possible via congestion control (which operates in multiples of the MSS).

@ianswett
Copy link
Contributor

I think I'd like a 'trust but verify" approach to path MTU. In an ideal case, QUIC would get the ICMP message and verify that it really could get a non-fragmented packet through with that size. As long as the size was larger than the chosen handshake size, it would try it.

Some of what we've discussed(ie: padding the CHLO and SHLO and setting the DF bit) is what the implementation does today, and not including it in the draft was an oversight we really need to fix.

Actually, QUIC's congestion control(and I believe FreeBSD's) operates in bytes, not MSS. But I agree it's likely the network and host are more efficient with larger packets.

But please do a pull request with what you describe above, because I think you're going in a good direction, and it's just a matter of working out some details, which is easy to do in the comments of a PR.

@martinduke
Copy link
Contributor

I think I'd like a 'trust but verify" approach to path MTU. In an ideal case, QUIC would get the ICMP message and verify that it really could get a non-fragmented packet through with that size. As long as the size was larger than the chosen handshake size, it would try it.

Stacks should ignore ICMP messages that increase the PMTU. The "only" issues with ICMP are non-conforming routers, and attackers (especially "off-path" attackers) that drive the MTU down to the minimum value.

Some of what we've discussed(ie: padding the CHLO and SHLO and setting the DF bit) is what the implementation does today, and not including it in the draft was an oversight we really need to fix.

That's great, but again, CHLO and SHLO will often have long RTOs, so loss-based MTU discovery is uniquely ill-suited to these packets.

Actually, QUIC's congestion control(and I believe FreeBSD's) operates in bytes, not MSS. But I agree it's likely the network and host are more efficient with larger packets.

I believe there's already a comment that QUIC congestion control is poorly spelled out in the draft. But the draft says it uses TCP congestion controls, which define their initial cwnd in multiples of MSS. In the absence of ABC, which is not listed in the draft, then acknowledgments increment cwnd in multiples of MSS as well.

But please do a pull request with what you describe above, because I think you're going in a good direction, and it's just a matter of working out some details, which is easy to do in the comments of a PR.

It might take me a week or two to get to it, but I will do so. Thanks for the encouragement!

@ianswett
Copy link
Contributor

ianswett commented Dec 13, 2016 via email

This was referenced Dec 28, 2016
@martinduke
Copy link
Contributor

martinduke commented Dec 28, 2016

I'm not sure if the PR pings people who are tracking this issue, but I submitted two pull requests:
#105
#106

The first is my preferred version, which strongly recommends ICMP-based PMTU discovery. It makes somewhat bold assumptions about the real world:

  • ICMP black holes are rare enough to be handled by a MAY if people want to use PLPMTUD in addition to ICMP.
  • I personally tend to work in kernel space, but my quick glance at UDP socket APIs suggest that it's not a very hard problem to modify normal DF settings/ICMP handling in a user space implementation.
  • It strongly discourages a fixed, conservative PMTU. IMO it seems perverse to leave ~100 bytes on the table given all the complexity we're introduced to save a handful of bytes in packet and frame headers.

The second PR trashes all those assumptions, and is a very permissive (and wordy) spec that basically allows anything. It still adds a bunch of SHOULDs that make ICMP-based discovery work better in a QUIC context. This is the one piece where I feel strongly that QUIC's packetization section should not just reference a bunch of MTU RFCs.

I am interested in feedback on one or both, in particular which PR is a better basis for further editing.

@MikeBishop
Copy link
Contributor

I think the spirit of QUIC so far has been "Be as efficient as possible; if we break something, there's always TCP." In that vein, the first seems more in keeping. On a purely technical basis, I don't have enough context to opine.

@martinduke
Copy link
Contributor

martinduke commented Dec 31, 2016 via email

@mcmanus
Copy link
Contributor

mcmanus commented Jan 3, 2017

both 105 and 106 shift us from MAY use some kind of pmtud onto SHOULD use some kind of pmtud (the details of which vary). Functionally, that's creating a requirement of implementations that I don't think is justified as necessary by the experience so far and the complexity laid out in the PR.

given ian's experience of 90% effectiveness in comment in #64 (comment) I would be wary of introducing ICMP into this at all.

@martinduke
Copy link
Contributor

given ian's experience of 90% effectiveness in comment in #64 (comment) I would be wary of introducing ICMP into this at all.

'90% effectiveness' means we're leaving about 150 bytes per datagram on the table. It would be fine to have a protocol that didn't packetize data all that efficiently in the name of simplicity, but if that is the goal then we should absolutely get rid of the many variable-length header fields, which introduce a ton of complexity for less than 100 bytes of savings in most cases.

@mcmanus
Copy link
Contributor

mcmanus commented Jan 3, 2017 via email

@mnot mnot changed the title PMTUD Path MTU Discovery Jan 20, 2017
@martinthomson martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Jan 20, 2017
@martinthomson
Copy link
Member Author

I think that #106 is closer now, we should discuss at the interim.

@mnot
Copy link
Member

mnot commented Jan 26, 2017

As discussed in Tokyo, @martinduke to propose text for PR #106 to reduce the default packet size to the IPv6 default and recommend PLPMTUD with optional usage of ICMP information

@mnot mnot removed the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Jan 26, 2017
@mnot mnot assigned mnot and martinduke and unassigned mnot Jan 26, 2017
@mnot mnot mentioned this issue Jan 26, 2017
@martinthomson
Copy link
Member Author

#106 was merged, so this is now done.

@mnot mnot removed the editor-ready label Mar 7, 2017
@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Apr 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

No branches or pull requests

9 participants