Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration before handshake completed is very messy #2309

Closed
huitema opened this issue Jan 7, 2019 · 23 comments · Fixed by #2370
Closed

Migration before handshake completed is very messy #2309

huitema opened this issue Jan 7, 2019 · 23 comments · Fixed by #2370
Assignees
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@huitema
Copy link
Contributor

huitema commented Jan 7, 2019

In migration interop tests, the Picoquic client was starting to simulate NAT rebinding just after obtaining the 1-RTT read and write key. That turns out to be very messy, because at that point there is no proof that the server has received the client finished message, let alone acknowledged it. As far as tests are concerned, there is an easy fix: only start the migration tests after the client has received from the server an ACK of one of its 1-RTT packet. That's the moment when the client knows for sure that the handshake is successful.

The migration before handshake complete is messy because the client will try to re-transmit handshake messages that have not been acknowledged, and has some kind of dilemma. Should these re-transmissions aims at the original DCID used during the handshake, or at one of the new CID selected during the migration? What if the continuity of the new path has not been yet verified? What if the continuity test was lost?

Do we have an actual need to support migration before the handshake is fully completed? In practice, the handshake packets create a NAT mapping, and it is rather unlikely that this mapping will just disappear before the few RTT required to complete the handshake. In fact, it is possible, maybe likely, that packets received from an alternative address during the handshake comes from attackers, not from the actual client.

I think the implementations would be simpler and more robust if the spec stated that clients should not initiate "voluntary" migration before the handshake is fully complete. This means 1-RTT ACK received, or ACK of last handshake packet from the client, whichever comes first. Similarly, servers should not try to migrate connections before receiving at least one 1-RTT packet from the client through the same addresses and ports as the Initial messages.

@marten-seemann
Copy link
Contributor

Since we decided to use timers to drop drop Handshake keys, a peer can never be sure when the other side actually finished the handshake, unless it proactively sends 1-RTT packets. This leads to problems in the case where you just pre-connect a QUIC connection, but neither peer has any data to send.

@huitema
Copy link
Contributor Author

huitema commented Jan 7, 2019

In theory the key event is "server receives the client Finished message", which the server sees directly, and the client notices when the Finished message is acknowledged. So there is always that, even if the connection is idle. Plus, migration supports requires sending NEW CONNECTION ID frames to the peer, which I suppose do count as "proactively sent 1-RTT packets".

@erickinnear
Copy link
Contributor

I think stating that endpoints should not initiate migration before the handshake is fully complete would make a lot of sense.

In addition to reducing the potential for added complexity around those interactions and the handshake, it seems like the need for migration at that point in the timeline can mostly be satisfied by a new connection on an alternate path.

If we don't have a lot of activity happening yet on the connection (although interesting thought about the NEW_CONNECTION_ID frames), it seems like you don't really lose a whole lot by opening a new connection instead of migrating a pre-connected connection that hasn't otherwise been used.
That makes me less concerned about when the threshold should be.

@marten-seemann
Copy link
Contributor

In theory the key event is "server receives the client Finished message", which the server sees directly, and the client notices when the Finished message is acknowledged.

The ACK for that could be lost. The client will then start the handshake timer and keep retransmitting the Finished, until it receives an ACK (assuming that the server keeps the keys around for long enough, which it currently doesn't, see #2267).

@huitema
Copy link
Contributor Author

huitema commented Jan 7, 2019

I get the issue in #2267. But regardless, there is an issue with engaging in migration when the handshake is not complete...

@ianswett
Copy link
Contributor

ianswett commented Jan 7, 2019

I thought we decided that we shouldn't support migration before completing the handshake. It's probably not written down, so we should state that explicitly. The value is tiny and the complexity is substantial.

@DavidSchinazi
Copy link
Contributor

Isn't this already made explicit in Section 9 Connection Migration?

An endpoint MUST NOT initiate connection migration before the handshake is finished and the endpoint has 1-RTT keys. The design of QUIC relies on endpoints retaining a stable address for the duration of the handshake.

@huitema
Copy link
Contributor Author

huitema commented Jan 7, 2019

@DavidSchinazi I think there are two separate issues:

  1. There is some fuzziness about what "handshake is finished" actually means. The messiness that we observed in the tests was due to repeating the CFIN handshake packet while engaged in migration. Thinks get much better if the client waits until CFIN is acknowledged.

  2. The problem is with server response to NAT Rebinding. Even if the client does not initiate migration, a NAT could decide to change its translation. My take is that both sides should ignore such NAT rebinding before the handshake is finished. That is, have exactly one address pair for the connection and stick to it until handshake done, don't even think of changing midway.

The justification for (2) is that changing addresses mid-handshake in response to NAT rebinding opens an interesting attack surface, with man-on-the-side sending copies of good packets with spoofed IP addresses. Given that actual rebinding just after a mapping is created is pretty rare, chances are that most "rebinding during handshake" will be of the "man on the side" variety. It is more robust to just ignore them.

@ianswett
Copy link
Contributor

ianswett commented Jan 7, 2019

Thanks @DavidSchinazi for finding the text. Ah, the CFIN, the gift that keeps on giving.

I agree with @huitema on 2. We previously discussed whether changing the 5-tuple during the handshake was allowed, and decided it opened up a huge number of attacks we weren't sure we could fix, so we should require it be fixed until handshake completion.

@DavidSchinazi
Copy link
Contributor

I agree with @huitema, we could add text to that section to clarify what handshake finished means and what servers should do if they detect a NAT rebinding during the handshake.

@mikkelfj
Copy link
Contributor

mikkelfj commented Jan 7, 2019

There is also the client cert auth post handshake thing (which I don't understand, but I would assume it also depends on a clear definition of when the handshake is complete.

@martinthomson martinthomson added editorial An issue that does not affect the design of the protocol; does not require consensus. -transport labels Jan 7, 2019
@martinthomson
Copy link
Member

Based on there being existing text, I think that we'll just use this for clarifying where the boundary is. PRs sought.

@huitema
Copy link
Contributor Author

huitema commented Jan 27, 2019

Started Pr #2370

@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. and removed editorial An issue that does not affect the design of the protocol; does not require consensus. labels Jan 29, 2019
@martinthomson
Copy link
Member

Tokyo conclusion: the answer to #2267 will provide a basis for defining the transition point. @huitema has a PR and can update that when we have an answer there.

martinthomson added a commit that referenced this issue Feb 13, 2019
This frame solves several problems.

1. it allows us to get rid of Initial keys very quickly, without relying
on the existing implicit mess.

2. it allows us to get rid of Handshake keys quickly, without timers.

3. it allows us to regulate the rate of key update, without risk of
becoming unsynchronized.

The big design question here is whether to include an explicit marker
for the current key phase, or to pull that in from the packet.  I've
chosen the latter, but I am willing to be convinced that this is a bad
idea.  There is a risk that retransmissions of KEYS_READY will cause
problems, but that risk is easily addressed by purging unacknowledged
KEYS_READY frames any time you initiate a key update.  Or, you could not
retransmit frames blindly.

The alternative means pulling in the notion of an epoch, likely copying
from DTLS (0=Initial, 1=0-RTT, 2=Handshake, 3+=1-RTT).  Thats
complicated and we would still need to work out what to do with the
validation code in that case.

I've tried to keep the request to minimize the number of active keys, by
allowing KEYS_READY to be deferred by up to 3PTO.  That seems to work
well.

Closes #2267, #1951.

Prerequisite for fixing #2309.
@mnot mnot added this to Design Issues in Late Stage Processing Feb 27, 2019
@huitema
Copy link
Contributor Author

huitema commented Jun 5, 2019

Updated the text to reference the output of PR #2673 as agreed during the London interim.

@martinthomson martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Jun 5, 2019
@martinthomson
Copy link
Member

@mnot, @larseggert, this one looks good to go.

@mnot mnot moved this from Design Issues to Consensus Emerging in Late Stage Processing Jun 7, 2019
@larseggert
Copy link
Member

Quick question: We're forbidding active migration during the handshake here, but what about a passive NAT rebinding that may happen? We obviously can't forbid that, so we must either handle it or have the connection die quickly?

@martinthomson
Copy link
Member

That's an issue, but a different one.

@huitema
Copy link
Contributor Author

huitema commented Jun 11, 2019

What @martinthomson said. I would personally treat NAT rebinding during handshake as an error, on the assumption that it will be extremely rare and is caused by a broken NAT.

We generally assume that NAT keep a UDP mapping active for at least 30 seconds -- the BEHAVE recommendation was 2 minutes. Having a mapping change in fewer than 2 RTT means the NAT is broken.

@ianswett
Copy link
Contributor

Closing the connection introduces yet another way an attacker can interrupt the handshake. It may be the preferable option, but it's worth noting.

@huitema
Copy link
Contributor Author

huitema commented Jun 11, 2019

@ianswett You don't have to close the connection to protect against NAT rebinding. Ignoring packets from the wrong addresses during the handshake is simpler and more robust. The picoquic implementation does something like that: accept the content of the handshake packet (not initial) if it decrypts correctly, ignore any address change until the handshake is confirmed.

@ianswett
Copy link
Contributor

Good point, that SGTM.

@huitema
Copy link
Contributor Author

huitema commented Jun 23, 2019

Waiting on PR #2370 to close this issue.

@mnot mnot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Jul 1, 2019
@mnot mnot added has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Jul 8, 2019
@mnot mnot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Jul 8, 2019
Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Jul 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
Late Stage Processing
  
Issue Handled
Development

Successfully merging a pull request may close this issue.

9 participants