Migration before handshake completed is very messy #2309

huitema · 2019-01-07T04:54:22Z

In migration interop tests, the Picoquic client was starting to simulate NAT rebinding just after obtaining the 1-RTT read and write key. That turns out to be very messy, because at that point there is no proof that the server has received the client finished message, let alone acknowledged it. As far as tests are concerned, there is an easy fix: only start the migration tests after the client has received from the server an ACK of one of its 1-RTT packet. That's the moment when the client knows for sure that the handshake is successful.

The migration before handshake complete is messy because the client will try to re-transmit handshake messages that have not been acknowledged, and has some kind of dilemma. Should these re-transmissions aims at the original DCID used during the handshake, or at one of the new CID selected during the migration? What if the continuity of the new path has not been yet verified? What if the continuity test was lost?

Do we have an actual need to support migration before the handshake is fully completed? In practice, the handshake packets create a NAT mapping, and it is rather unlikely that this mapping will just disappear before the few RTT required to complete the handshake. In fact, it is possible, maybe likely, that packets received from an alternative address during the handshake comes from attackers, not from the actual client.

I think the implementations would be simpler and more robust if the spec stated that clients should not initiate "voluntary" migration before the handshake is fully complete. This means 1-RTT ACK received, or ACK of last handshake packet from the client, whichever comes first. Similarly, servers should not try to migrate connections before receiving at least one 1-RTT packet from the client through the same addresses and ports as the Initial messages.

marten-seemann · 2019-01-07T05:02:24Z

Since we decided to use timers to drop drop Handshake keys, a peer can never be sure when the other side actually finished the handshake, unless it proactively sends 1-RTT packets. This leads to problems in the case where you just pre-connect a QUIC connection, but neither peer has any data to send.

huitema · 2019-01-07T05:24:46Z

In theory the key event is "server receives the client Finished message", which the server sees directly, and the client notices when the Finished message is acknowledged. So there is always that, even if the connection is idle. Plus, migration supports requires sending NEW CONNECTION ID frames to the peer, which I suppose do count as "proactively sent 1-RTT packets".

erickinnear · 2019-01-07T05:28:35Z

I think stating that endpoints should not initiate migration before the handshake is fully complete would make a lot of sense.

In addition to reducing the potential for added complexity around those interactions and the handshake, it seems like the need for migration at that point in the timeline can mostly be satisfied by a new connection on an alternate path.

If we don't have a lot of activity happening yet on the connection (although interesting thought about the NEW_CONNECTION_ID frames), it seems like you don't really lose a whole lot by opening a new connection instead of migrating a pre-connected connection that hasn't otherwise been used.
That makes me less concerned about when the threshold should be.

marten-seemann · 2019-01-07T05:40:14Z

In theory the key event is "server receives the client Finished message", which the server sees directly, and the client notices when the Finished message is acknowledged.

The ACK for that could be lost. The client will then start the handshake timer and keep retransmitting the Finished, until it receives an ACK (assuming that the server keeps the keys around for long enough, which it currently doesn't, see #2267).

huitema · 2019-01-07T07:09:24Z

I get the issue in #2267. But regardless, there is an issue with engaging in migration when the handshake is not complete...

ianswett · 2019-01-07T14:45:59Z

I thought we decided that we shouldn't support migration before completing the handshake. It's probably not written down, so we should state that explicitly. The value is tiny and the complexity is substantial.

DavidSchinazi · 2019-01-07T19:51:53Z

Isn't this already made explicit in Section 9 Connection Migration?

An endpoint MUST NOT initiate connection migration before the handshake is finished and the endpoint has 1-RTT keys. The design of QUIC relies on endpoints retaining a stable address for the duration of the handshake.

huitema · 2019-01-07T20:05:58Z

@DavidSchinazi I think there are two separate issues:

There is some fuzziness about what "handshake is finished" actually means. The messiness that we observed in the tests was due to repeating the CFIN handshake packet while engaged in migration. Thinks get much better if the client waits until CFIN is acknowledged.
The problem is with server response to NAT Rebinding. Even if the client does not initiate migration, a NAT could decide to change its translation. My take is that both sides should ignore such NAT rebinding before the handshake is finished. That is, have exactly one address pair for the connection and stick to it until handshake done, don't even think of changing midway.

The justification for (2) is that changing addresses mid-handshake in response to NAT rebinding opens an interesting attack surface, with man-on-the-side sending copies of good packets with spoofed IP addresses. Given that actual rebinding just after a mapping is created is pretty rare, chances are that most "rebinding during handshake" will be of the "man on the side" variety. It is more robust to just ignore them.

ianswett · 2019-01-07T20:09:18Z

Thanks @DavidSchinazi for finding the text. Ah, the CFIN, the gift that keeps on giving.

I agree with @huitema on 2. We previously discussed whether changing the 5-tuple during the handshake was allowed, and decided it opened up a huge number of attacks we weren't sure we could fix, so we should require it be fixed until handshake completion.

DavidSchinazi · 2019-01-07T20:30:14Z

I agree with @huitema, we could add text to that section to clarify what handshake finished means and what servers should do if they detect a NAT rebinding during the handshake.

mikkelfj · 2019-01-07T20:50:00Z

There is also the client cert auth post handshake thing (which I don't understand, but I would assume it also depends on a clear definition of when the handshake is complete.

martinthomson · 2019-01-07T23:41:47Z

Based on there being existing text, I think that we'll just use this for clarifying where the boundary is. PRs sought.

huitema · 2019-01-27T03:26:24Z

Started Pr #2370

martinthomson · 2019-01-31T02:25:35Z

Tokyo conclusion: the answer to #2267 will provide a basis for defining the transition point. @huitema has a PR and can update that when we have an answer there.

This frame solves several problems. 1. it allows us to get rid of Initial keys very quickly, without relying on the existing implicit mess. 2. it allows us to get rid of Handshake keys quickly, without timers. 3. it allows us to regulate the rate of key update, without risk of becoming unsynchronized. The big design question here is whether to include an explicit marker for the current key phase, or to pull that in from the packet. I've chosen the latter, but I am willing to be convinced that this is a bad idea. There is a risk that retransmissions of KEYS_READY will cause problems, but that risk is easily addressed by purging unacknowledged KEYS_READY frames any time you initiate a key update. Or, you could not retransmit frames blindly. The alternative means pulling in the notion of an epoch, likely copying from DTLS (0=Initial, 1=0-RTT, 2=Handshake, 3+=1-RTT). Thats complicated and we would still need to work out what to do with the validation code in that case. I've tried to keep the request to minimize the number of active keys, by allowing KEYS_READY to be deferred by up to 3PTO. That seems to work well. Closes #2267, #1951. Prerequisite for fixing #2309.

huitema · 2019-06-05T05:47:58Z

Updated the text to reference the output of PR #2673 as agreed during the London interim.

martinthomson · 2019-06-05T06:02:16Z

@mnot, @larseggert, this one looks good to go.

larseggert · 2019-06-11T08:28:49Z

Quick question: We're forbidding active migration during the handshake here, but what about a passive NAT rebinding that may happen? We obviously can't forbid that, so we must either handle it or have the connection die quickly?

martinthomson · 2019-06-11T09:48:27Z

That's an issue, but a different one.

huitema · 2019-06-11T14:28:09Z

What @martinthomson said. I would personally treat NAT rebinding during handshake as an error, on the assumption that it will be extremely rare and is caused by a broken NAT.

We generally assume that NAT keep a UDP mapping active for at least 30 seconds -- the BEHAVE recommendation was 2 minutes. Having a mapping change in fewer than 2 RTT means the NAT is broken.

ianswett · 2019-06-11T14:29:38Z

Closing the connection introduces yet another way an attacker can interrupt the handshake. It may be the preferable option, but it's worth noting.

huitema · 2019-06-11T16:28:50Z

@ianswett You don't have to close the connection to protect against NAT rebinding. Ignoring packets from the wrong addresses during the handshake is simpler and more robust. The picoquic implementation does something like that: accept the content of the handshake packet (not initial) if it decrypts correctly, ignore any address change until the handshake is confirmed.

ianswett · 2019-06-11T16:42:11Z

Good point, that SGTM.

huitema · 2019-06-23T05:42:55Z

Waiting on PR #2370 to close this issue.

martinthomson added editorial An issue that does not affect the design of the protocol; does not require consensus. -transport labels Jan 7, 2019

huitema mentioned this issue Jan 28, 2019

Stronger migration handshake #2370

Merged

martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. and removed editorial An issue that does not affect the design of the protocol; does not require consensus. labels Jan 29, 2019

martinthomson added this to Handshake in Transport / Recovery / TLS Jan 29, 2019

martinthomson mentioned this issue Jan 31, 2019

loss of only two packets can lead to an unrecoverable situation #2267

Closed

martinthomson assigned huitema Jan 31, 2019

mnot added this to Design Issues in Late Stage Processing Feb 27, 2019

martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Jun 5, 2019

mnot moved this from Design Issues to Consensus Emerging in Late Stage Processing Jun 7, 2019

mnot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Jul 1, 2019

mnot added has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Jul 8, 2019

mnot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Jul 8, 2019

martinthomson closed this as completed in #2370 Jul 8, 2019

Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Jul 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration before handshake completed is very messy #2309

Migration before handshake completed is very messy #2309

huitema commented Jan 7, 2019

marten-seemann commented Jan 7, 2019

huitema commented Jan 7, 2019

erickinnear commented Jan 7, 2019

marten-seemann commented Jan 7, 2019

huitema commented Jan 7, 2019

ianswett commented Jan 7, 2019

DavidSchinazi commented Jan 7, 2019

huitema commented Jan 7, 2019

ianswett commented Jan 7, 2019

DavidSchinazi commented Jan 7, 2019

mikkelfj commented Jan 7, 2019

martinthomson commented Jan 7, 2019

huitema commented Jan 27, 2019

martinthomson commented Jan 31, 2019

huitema commented Jun 5, 2019

martinthomson commented Jun 5, 2019

larseggert commented Jun 11, 2019

martinthomson commented Jun 11, 2019

huitema commented Jun 11, 2019

ianswett commented Jun 11, 2019

huitema commented Jun 11, 2019

ianswett commented Jun 11, 2019

huitema commented Jun 23, 2019

Migration before handshake completed is very messy #2309

Migration before handshake completed is very messy #2309

Comments

huitema commented Jan 7, 2019

marten-seemann commented Jan 7, 2019

huitema commented Jan 7, 2019

erickinnear commented Jan 7, 2019

marten-seemann commented Jan 7, 2019

huitema commented Jan 7, 2019

ianswett commented Jan 7, 2019

DavidSchinazi commented Jan 7, 2019

huitema commented Jan 7, 2019

ianswett commented Jan 7, 2019

DavidSchinazi commented Jan 7, 2019

mikkelfj commented Jan 7, 2019

martinthomson commented Jan 7, 2019

huitema commented Jan 27, 2019

martinthomson commented Jan 31, 2019

huitema commented Jun 5, 2019

martinthomson commented Jun 5, 2019

larseggert commented Jun 11, 2019

martinthomson commented Jun 11, 2019

huitema commented Jun 11, 2019

ianswett commented Jun 11, 2019

huitema commented Jun 11, 2019

ianswett commented Jun 11, 2019

huitema commented Jun 23, 2019