-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed handshake key deadlock issue #3093
Conversation
I don't think that this fixes the problem. What we had discussed was requiring that some ack-eliciting 1-RTT data is sent after handshake completion. With that, the existing definition of "confirmed" is adequate. |
Note: at the moment, recovery states "When a PTO timer expires, a sender MUST send at least one ack-eliciting packet as a probe, unless there is no data available to send." We need remove the "unless". |
@martinthomson I believe discussion in Cupertino resolved your concern. I believe @marten-seemann's chart has been addressed by discussion in Cupertino, pending further analysis that shows a different problem. |
Belay that last comment from me. The friendly amendment here that we need is to the text on discarding Handshake keys that says "no sooner than handshake confirmed" rather than requiring Handshake keys to be discarded at that point. |
@marten-seemann has a fairly silly corner case that breaks this:
This is fixable by disallowing ACK+PADDING in response to a non-ack-eliciting packet. At the moment 13.2.1 of quic-transport does not explicitly forbid this. IMO this is an oversight and we should change 13.2.1. After a brief offline chat with some people, I'll include that change to 13.2.1 to the PR if people agree with this conclusion. |
In my opinion, what this corner case shows is that option 4 is pretty brittle. Disallowing ACK +PADDING seems to work to fix this particular corner case, but it doesn't guarantee that there isn't any other corner case that will lead to a deadlock. |
ACK+PADDING packets aren't ack-eliciting, so the server shouldn't discard keys at step 4. |
@ianswett The server already received a PING in 1-RTT packets in step 2. |
@ianswett it has received an ack-eliciting packet in step 2 and a 1-RTT ack in 4. This is also fixable by changing the condition to "ack of an ack-eliciting packet" |
@martinduke The ACK sent in step 4 is an acknowledgement for an ack-eliciting packet. |
@marten-seemann No it isn't, it's an ack of an ack. |
As we concluded at the meeting, we're going to keep Handshake keys indefinitely. FWIW, I don't believe that it is possible to find a confirmation step that doesn't involve both peers actively driving the state machine. Otherwise, peers might not reach the same state within a reasonable time. This change still has problems (more below) and only adds delay to the confirmed state by making the requirements more complex. Here's an example of how this fix breaks. I'm sure that there are more: web sequence diagram code
|
Thanks for the diagram, and I'm happy to stick with the approach of keeping the keys forever. If we change nothing about the existing text, what you describe occurs, but as I suggested elsewhere, if we are going to delay dropping the Handshake keys(somewhat or indefinitely), I think we should update the text of transport or recovery to be clear that you can stop retransmitting any outstanding data in Handshake once you've received a 1-RTT ACK. |
This diagram is incorrect. There is an ack from the client of 0.5rtt data,
but the ack-eliciting packet is not present in the diagram. If it were, the
client would have discarded its handshake keys and there is no deadlock.
…On Mon, Oct 21, 2019, 6:28 PM ianswett ***@***.***> wrote:
Thanks for the diagram, and I'm happy to stick with the approach of
keeping the keys forever.
If we change nothing about the existing text, what you describe occurs,
but as I suggested elsewhere, if we are going to delay dropping the
Handshake keys(somewhat or indefinitely), I think we should update the text
of transport or recovery to be clear that you can stop retransmitting any
outstanding data in Handshake once you've received a 1-RTT ACK.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3093?email_source=notifications&email_token=AF2EYELRDBHBIVQPFLAFXO3QPZJMRA5CNFSM4JAYS242YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB4JGZI#issuecomment-544772965>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF2EYEOCVAYBG6U65TEXL2LQPZJMRANCNFSM4JAYS24Q>
.
|
@martinduke, the diagram is correct. There is no ack-eliciting packet in 0.5-RTT. The ACK of the 0.5-RTT data is bundled with the other ack-eliciting frames in the packet, that's all. |
@martinthomson Ah, I see that now. Yeah, I think we need an explicit signal, which I would prefer over keeping the keys forever. |
I don't see the problem with an explicit signal. The handshake is uniquely complex already and it helps isolate that complexity. You could even say that if you cannot drive a handshake to completion within a timeout period, the connection is broken. An explicit signal helps deal with cases where one endpoint otherwise has nothing to send. Now it has. |
Along with some other PRs, I believe this fixes #2863.
Fundamentally, there are four ways to solve the problem of one endpoint throwing away the handshake keys while the other has no 1-RTT data.
I believe that the discard keys group might have a plurality for #4. The added condition makes sure that the peer's PTO machinery has started and will have little effect on most applications, including HTTP3.
This PR is a concrete statement of the fourth option.