-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration before handshake completed is very messy #2309
Comments
Since we decided to use timers to drop drop Handshake keys, a peer can never be sure when the other side actually finished the handshake, unless it proactively sends 1-RTT packets. This leads to problems in the case where you just pre-connect a QUIC connection, but neither peer has any data to send. |
In theory the key event is "server receives the client Finished message", which the server sees directly, and the client notices when the Finished message is acknowledged. So there is always that, even if the connection is idle. Plus, migration supports requires sending NEW CONNECTION ID frames to the peer, which I suppose do count as "proactively sent 1-RTT packets". |
I think stating that endpoints should not initiate migration before the handshake is fully complete would make a lot of sense. In addition to reducing the potential for added complexity around those interactions and the handshake, it seems like the need for migration at that point in the timeline can mostly be satisfied by a new connection on an alternate path. If we don't have a lot of activity happening yet on the connection (although interesting thought about the NEW_CONNECTION_ID frames), it seems like you don't really lose a whole lot by opening a new connection instead of migrating a pre-connected connection that hasn't otherwise been used. |
The ACK for that could be lost. The client will then start the handshake timer and keep retransmitting the Finished, until it receives an ACK (assuming that the server keeps the keys around for long enough, which it currently doesn't, see #2267). |
I get the issue in #2267. But regardless, there is an issue with engaging in migration when the handshake is not complete... |
I thought we decided that we shouldn't support migration before completing the handshake. It's probably not written down, so we should state that explicitly. The value is tiny and the complexity is substantial. |
Isn't this already made explicit in Section 9
|
@DavidSchinazi I think there are two separate issues:
The justification for (2) is that changing addresses mid-handshake in response to NAT rebinding opens an interesting attack surface, with man-on-the-side sending copies of good packets with spoofed IP addresses. Given that actual rebinding just after a mapping is created is pretty rare, chances are that most "rebinding during handshake" will be of the "man on the side" variety. It is more robust to just ignore them. |
Thanks @DavidSchinazi for finding the text. Ah, the CFIN, the gift that keeps on giving. I agree with @huitema on 2. We previously discussed whether changing the 5-tuple during the handshake was allowed, and decided it opened up a huge number of attacks we weren't sure we could fix, so we should require it be fixed until handshake completion. |
I agree with @huitema, we could add text to that section to clarify what handshake finished means and what servers should do if they detect a NAT rebinding during the handshake. |
There is also the client cert auth post handshake thing (which I don't understand, but I would assume it also depends on a clear definition of when the handshake is complete. |
Based on there being existing text, I think that we'll just use this for clarifying where the boundary is. PRs sought. |
Started Pr #2370 |
This frame solves several problems. 1. it allows us to get rid of Initial keys very quickly, without relying on the existing implicit mess. 2. it allows us to get rid of Handshake keys quickly, without timers. 3. it allows us to regulate the rate of key update, without risk of becoming unsynchronized. The big design question here is whether to include an explicit marker for the current key phase, or to pull that in from the packet. I've chosen the latter, but I am willing to be convinced that this is a bad idea. There is a risk that retransmissions of KEYS_READY will cause problems, but that risk is easily addressed by purging unacknowledged KEYS_READY frames any time you initiate a key update. Or, you could not retransmit frames blindly. The alternative means pulling in the notion of an epoch, likely copying from DTLS (0=Initial, 1=0-RTT, 2=Handshake, 3+=1-RTT). Thats complicated and we would still need to work out what to do with the validation code in that case. I've tried to keep the request to minimize the number of active keys, by allowing KEYS_READY to be deferred by up to 3PTO. That seems to work well. Closes #2267, #1951. Prerequisite for fixing #2309.
Updated the text to reference the output of PR #2673 as agreed during the London interim. |
@mnot, @larseggert, this one looks good to go. |
Quick question: We're forbidding active migration during the handshake here, but what about a passive NAT rebinding that may happen? We obviously can't forbid that, so we must either handle it or have the connection die quickly? |
That's an issue, but a different one. |
What @martinthomson said. I would personally treat NAT rebinding during handshake as an error, on the assumption that it will be extremely rare and is caused by a broken NAT. We generally assume that NAT keep a UDP mapping active for at least 30 seconds -- the BEHAVE recommendation was 2 minutes. Having a mapping change in fewer than 2 RTT means the NAT is broken. |
Closing the connection introduces yet another way an attacker can interrupt the handshake. It may be the preferable option, but it's worth noting. |
@ianswett You don't have to close the connection to protect against NAT rebinding. Ignoring packets from the wrong addresses during the handshake is simpler and more robust. The picoquic implementation does something like that: accept the content of the handshake packet (not initial) if it decrypts correctly, ignore any address change until the handshake is confirmed. |
Good point, that SGTM. |
Waiting on PR #2370 to close this issue. |
In migration interop tests, the Picoquic client was starting to simulate NAT rebinding just after obtaining the 1-RTT read and write key. That turns out to be very messy, because at that point there is no proof that the server has received the client finished message, let alone acknowledged it. As far as tests are concerned, there is an easy fix: only start the migration tests after the client has received from the server an ACK of one of its 1-RTT packet. That's the moment when the client knows for sure that the handshake is successful.
The migration before handshake complete is messy because the client will try to re-transmit handshake messages that have not been acknowledged, and has some kind of dilemma. Should these re-transmissions aims at the original DCID used during the handshake, or at one of the new CID selected during the migration? What if the continuity of the new path has not been yet verified? What if the continuity test was lost?
Do we have an actual need to support migration before the handshake is fully completed? In practice, the handshake packets create a NAT mapping, and it is rather unlikely that this mapping will just disappear before the few RTT required to complete the handshake. In fact, it is possible, maybe likely, that packets received from an alternative address during the handshake comes from attackers, not from the actual client.
I think the implementations would be simpler and more robust if the spec stated that clients should not initiate "voluntary" migration before the handshake is fully complete. This means 1-RTT ACK received, or ACK of last handshake packet from the client, whichever comes first. Similarly, servers should not try to migrate connections before receiving at least one 1-RTT packet from the client through the same addresses and ports as the Initial messages.
The text was updated successfully, but these errors were encountered: