unrecoverable loss pattern leads to deadlock #2863

marten-seemann · 2019-06-30T05:14:15Z

There's a loss pattern that QUIC cannot recover from:
Assume client and server perform a handshake up to the point just after the server received the ClientFinished message. No packet loss happens so far. Both peers have now completed (but not yet confirmed) the handshake.

Right after sending the ClientFinished, the client sends a 1-RTT packet (e.g. containing application data). This packet is lost. The server sends an acknowledgement for the ClientFinished (in a Handshake packet), and this packet is lost as well. Furthermore, the server also sends a 1-RTT packet (e.g. containing a session ticket). This packet is successfully received and acknowledged by the client.

At this point the endpoint are in the following state: Since it received an acknowledgement for a packet sent with 1-RTT keys, the server now confirmed the handshake, and drops the Handshake keys. On the other hand, the client is still waiting for an acknowledgement for the ClientFinished, and retransmits it. It will however never receive an acknowledgement, since the server dropped the keys already.

nibanks · 2019-06-30T14:33:15Z

Sounds like receiving a 1-RTT packet on the client side might need to be another way to confirm the handshake.

martinthomson · 2019-07-01T00:03:30Z

A server can send a 1-RTT packet before even completing the handshake, so a client receiving one doesn't mean anything.

In this scenario, the client will continue retransmitting the last cryptographic handshake flight. That sounds terrible. But if it ever has cause to send 1-RTT data (as it does in Marten's example, which I'm not sure is relevant), then it will retransmit that data on PTO. When that is eventually acknowledged, that will confirm the handshake.

The more troublesome scenario is where the client sends nothing, only acknowledgments. In that case the client will have a complete handshake that cannot be confirmed.

Would it be enough to say that "unless an endpoint sends some ack-eliciting data in 1-RTT packets, the handshake might remain unconfirmed indefinitely" and leave it at that?

kazuho · 2019-07-01T00:17:59Z

@martinthomson

In this scenario, the client will continue retransmitting the last cryptographic handshake flight. That sounds terrible. But if it ever has cause to send 1-RTT data (as it does in Marten's example, which I'm not sure is relevant), then it will retransmit that data on PTO. When that is eventually acknowledged, that will confirm the handshake.

Agreed. IIRC, the assumption in the design team was that the client would have something to send.

Would it be enough to say that "unless an endpoint sends some ack-eliciting data in 1-RTT packets, the handshake might remain unconfirmed indefinitely" and leave it at that?

+1. We may hint that "a client MAY send a PING packet to confirm the handshake."

marten-seemann · 2019-07-01T00:29:23Z

But if it ever has cause to send 1-RTT data (as it does in Marten's example, which I'm not sure is relevant), then it will retransmit that data on PTO.

@martinthomson Unfortunately, that wouldn't work. As long as the client has outstanding crypto data, it will only retransmit the crypto packets, see https://quicwg.org/base-drafts/draft-ietf-quic-recovery.html#rfc.appendix.A.9.

+1. We may hint that "a client MAY send a PING packet to confirm the handshake."

@kazuho Agreed, although I'd prefer a SHOULD here.

kazuho · 2019-07-01T01:19:25Z

FWIW, what I had in mind was something like "once the handshake is complete, endpoints SHOULD send ack-eliciting packets to confirm the handshake. If there is no application data to be sent, such packets can contain PING frames."

mikkelfj · 2019-07-01T04:22:06Z

Would it be enough to say that "unless an endpoint sends some ack-eliciting data in 1-RTT packets, the handshake might remain unconfirmed indefinitely" and leave it at that?

No I think the state needs to progress. Resources might be tied up and timeouts might be in place to weed out bad handshakes before a longer or indefinite idle timeout.

I would say the client MUST send at least one ack-eliciting 1-RTT packet once it is able to do so, and within, say, 1 PTO.

This leads to indefinite Handshake messages from one side, as Handshake packets can never be acknowledged. As discussed, sending anything that elicits an ACK should fix this. But making a connection where you sit entirely idle leads to a weird corner case. Closes #2863.

tatsuhiro-t · 2019-07-14T06:19:11Z

If I understand the issue correctly, the problem is that server drops handshake key too early before client gets ack for its last handshake flight. Server can ack any client Short packet without client last Handshake flight. That means that sending any ack-eliciting 1RTT packet from client does not fix this issue. Coalescing missing Handshake and Short packet would suffer MTU overflow as disscused in PR.
I think server has to wait for client's confirmation of handshake before confirming handshake. It might be good idea to reconsider explicit signal again?

DavidSchinazi · 2019-07-21T15:54:31Z

From the Discarding Handshake Keys Section:

Most application protocols will send data after the handshake, resulting in acknowledgements that allow both endpoints to discard their handshake keys promptly. Endpoints that do not have reason to send immediately after completing the handshake MAY send ack-eliciting frames, such as PING, which will cause the handshake to be confirmed when they are acknowledged.

We could change this MAY to a SHOULD and it'll resolve this issue.

ekr · 2019-07-21T15:56:09Z

I'm sorry, but I'm having trouble following the initial example.

You said that the client sent a 1-RTT packet and it got lost. In this case, it will continue to retransmit it and will eventually get an ACK, at which point the handshake will be confirmed and it will drop the keys and stop retransmitting ClientFinished.

And as David indicates above, if the client is not sending at all, then it can just send ack-eliciting frames.

marten-seemann · 2019-07-21T15:56:39Z

We could change this MAY to a SHOULD and it'll resolve this issue.

Unfortunately, it doesn't. As I described in the initial example, and in #2863 (comment), the client won't retransmit 1-RTT data until the handshake is completed.

ekr · 2019-07-21T17:22:04Z

@marten-seemann well in that case, the source of the problem is the retransmission algorithm, not this. Why does it make sense to not retransmit 1-RTT data when there is an outstanding CFIN?

Just consider the simple case where the client's second flight (CFIN + some data) is lost. When the PTO expires, you should resend all of that.

marten-seemann · 2019-07-21T17:48:07Z

@ekr The client could have sent an arbitrary amount of 1-RTT data, so you wouldn't want to retransmit all of this. As discussed on the PR (#2881), this can have some weird interaction with the congestion controller as well.

ekr · 2019-07-21T18:16:25Z

Sure, you would, however, want to retransmit some of it

ianswett · 2019-07-21T19:01:43Z

Technically, 1RTT packets can be sent on the PTO when there is unacked Handshake data, but you're correct that if there are two MSS or more of Handshake data, the client would repeatedly retransmit that data instead of sending any 1RTT packets as written today.

I think MT's PR is heading in the right direction.

@marten-seemann do you have an alternate suggestion of how to deal with this situation?

larseggert · 2019-10-16T19:10:02Z

Discussed in Cupertino. With the change in #3066 and #3093, we believe this deadlock issue is addressed. (also, @kazuho has a pending edit).

kazuho · 2019-10-16T19:11:32Z

@larseggert My point is #3093 (comment).

After many iterations of solutions to this problem, we have decided to first forbid discard. We would like to have a concrete discard point, but we couldn't agree on one that was provably safe. If we can come up with a mechanism for discarding earlier, or a signal that comes earlier than the end of the connection, we will first carefully look at it, but we might accept further changes. Closes #2863.

kazuho · 2019-10-18T23:37:42Z

@RyanatGoogle

it sounds like you are recommending something like the previous RETIRE_KEYS proposal?

That might be the case (honestly I do not remember the details of the RETIRES_KEYS proposal), though I think I'm inclined to suggesting using a continuous signal (i.e., a Key Update or burning an unused header bit). The reasons being i) using a continuous signal allows us to abandon the Handshake keys earlier ii) we do not need to define a new signal that needs to be retransmitted in the transport draft, or rely on a particular behavior defined in the recovery draft that is not generally considered as necessary for a recovery logic (i.e. PTO "MUST" trigger a PING when there is no data to send).

martinduke · 2019-10-19T00:16:47Z

I remember thinking RETIRE_KEYS was extraneous given all the implicit signals. But it seems that screwy-but-legal behavior seems to make those implicit signals risky. So let's stop being cute and just use the signal.

…

On Fri, Oct 18, 2019, 4:37 PM Kazuho Oku ***@***.***> wrote: @RyanatGoogle <https://github.com/RyanAtGoogle> it sounds like you are recommending something like the previous RETIRE_KEYS proposal? That might be the case (honestly I do not remember the details of the RETIRES_KEYS proposal), though I think I'm inclined to suggesting using a continuous signal (i.e., a Key Update or burning an unused header bit). The reasons being i) using a continuous signal allows us to abandon the Handshake keys earlier ii) we do not need to define a new signal that needs to be retransmitted in the transport draft, or rely on a particular behavior defined in the recovery draft that is not generally considered as necessary for a recovery logic (i.e. PTO "MUST" trigger a PING when there is no data to send). — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#2863?email_source=notifications&email_token=AF2EYEJOXBO3V7VFTTX5MJDQPJCFDA5CNFSM4H4LZFRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBWONZQ#issuecomment-544007910>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF2EYEOL7XJAQAYKIY6MYLLQPJCFDANCNFSM4H4LZFRA> .

ianswett · 2019-10-19T03:08:12Z

In response to @kazuho "when the client intentionally migrates to a new path while still having unacknowledged handshake data, how does it choose the new client CID when sending the Handshake packet?"

I'm arguing the client should never have unacknowledged Handshake data when it migrates, because it waits for handshake confirmation to migrate and at that point, all Handshake data should be discarded.

There is a possibility the server does not believe the handshake is confirmed but the client does. In that case, if the client didn't keep listening on the old path when it migrated AND the server didn't send anything in 1-RTT, there would be a problem. But don't we require both client and server to validate the path in this case, which would mean both would achieve handshake confirmation on the new path, if they had not done so previously?

kazuho · 2019-10-19T12:34:51Z

@ianswett

There is a possibility the server does not believe the handshake is confirmed but the client does. In that case, if the client didn't keep listening on the old path when it migrated AND the server didn't send anything in 1-RTT, there would be a problem.

Yes. This is the case I am pointing out.

But don't we require both client and server to validate the path in this case, which would mean both would achieve handshake confirmation on the new path, if they had not done so previously?

I am not sure if there is a guarantee that handshake confirmation would be obtained before the path is validated, because the path might become validated while all the packets carrying 1-RTT ACKs get dropped. And at that point, the endpoints might cease sending 1-RTT packets to each other.

janaiyengar · 2019-10-22T08:14:25Z

What @kazuho points out is a problem. Keeping HS keys forever prevents the deadlock, but it does so by admitting that HS packets can be sent forever. This means after migration as well. That is something we don't consider in the spec.

@ianswett: I don't think you're right about this:

I'm arguing the client should never have unacknowledged Handshake data when it migrates, because it waits for handshake confirmation to migrate and at that point, all Handshake data should be discarded.

Handshake data is discarded when the handshake keys are dropped, not when confirmation is reached (see section 5.3 in recovery draft. Now with handshake keys forever, the client can keep retransmitting forever. And as @kazuho notes, the server can surely send keep retransmitting as well since if it hasn't reached confirmation yet.

There are a few ways to curb this... but I'm strongly inclined towards an explicit signal from both sides indicating handshake confirmation, after which migration can be initiated.

ad-l · 2019-10-25T13:43:15Z

There is a much larger issue at stake here. The currently published proofs for the TLS 1.3 handshake do not allow the server to process 1-RTT packets until the client finished has been verified. It does not mean there is an attack, because of the defensive design of TLS 1.3 which implicitly authenticates the transcript in the transport secret derivation. However, this is still a weaker argument than checking the finish, particularly with respect to downgrade attacks. If a server supports a transport ciphersuite with broken integrity (e.g. someone discovers a way to forge CCM8 tags), then clients of this server that support strong ciphersuites may still be attacked.

There is also a clear risk with client authentication. If a server is configured to require a client certificate, but QUIC gives to the application 1-RTT packets before the client finished flight has been received, the application may incorrectly believe that this data has passed the client authentication check.

A reliable way to prevent such issues is to mandate that every 1-RTT packet sent by the client before the client finished frame is acknowledged by the server must include in the same datagram a copy of the finished frame (in the form of a separate handshake packet, before the short 1-RTT packet). The overhead is limited because finished messages are short, so even with the overhead of the long packet headers and AEAD tag, less than 100 bytes are used in the datagram (leaving the rest for the 1-RTT packet). If this is done, it is possible to enforce that a server must have checked the client finished before processing 1-RTT packets from the client. Adding some separate signal for handshake confirmation is a terrible idea - in TLS, receiving the peer's finished message IS the the explicit handshake confirmation signal (which is why CCS is removed in 1.3). Restoring a transport-level signal is a clear step backwards and must be avoided.

DavidSchinazi · 2019-10-25T18:35:23Z

Hi @ad-l, since you're describing a "much larger issue" that I agree should be discussed, could you please open a separate issue? This issue is about a specific deadlock that can happen during the handshake, not about the security properties of the handshake.

janaiyengar · 2019-11-06T00:21:26Z

This was closed erroneously... apparently we're not quite fully zen with github's AI yet.

mnot · 2019-11-19T02:46:10Z

Discussed in Singapore; CfC #3145.

martinthomson added the -transport label Jun 30, 2019

mnot added this to Triage in Late Stage Processing Jul 1, 2019

mnot added the design An issue that affects the design of the protocol; resolution requires consensus. label Jul 1, 2019

mnot moved this from Triage to Design Issues in Late Stage Processing Jul 1, 2019

martinthomson mentioned this issue Jul 9, 2019

Explain asymmetric confirmation condition #2881

Closed

ianswett mentioned this issue Sep 24, 2019

Change PTO to be per packet number space #3066

Merged

martinduke mentioned this issue Oct 15, 2019

Fixed handshake key deadlock issue #3093

Closed

larseggert assigned ianswett and martinduke Oct 16, 2019

martinthomson mentioned this issue Oct 17, 2019

Forbid Handshake key discarding #3121

Closed

martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Oct 17, 2019

mnot moved this from Design Issues to Consensus Emerging in Late Stage Processing Oct 21, 2019

mnot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Oct 22, 2019

kazuho mentioned this issue Oct 24, 2019

use key update for confirming the handshake #3141

Closed

marten-seemann mentioned this issue Oct 24, 2019

use a HANDSHAKE_DONE frame to drive the handshake to confirmation #3145

Merged

ad-l mentioned this issue Oct 28, 2019

Server should not accept 1-RTT traffic before handshake completion #3159

Closed

mnot removed the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Oct 31, 2019

mnot moved this from Consensus Call issued to Design Issues in Late Stage Processing Oct 31, 2019

janaiyengar closed this as completed in f7f22f9 Nov 5, 2019

Late Stage Processing automation moved this from Design Issues to Text Incorporated Nov 5, 2019

janaiyengar reopened this Nov 6, 2019

Late Stage Processing automation moved this from Text Incorporated to Triage Nov 6, 2019

mnot moved this from Triage to Design Issues in Late Stage Processing Nov 6, 2019

martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Dec 3, 2019

mnot moved this from Design Issues to Consensus Emerging in Late Stage Processing Dec 9, 2019

tatsuhiro-t mentioned this issue Dec 23, 2019

Detecting when handshake is confirmed ngtcp2/ngtcp2#204

Closed

mnot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Jan 6, 2020

mnot added has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Jan 17, 2020

mnot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Jan 17, 2020

janaiyengar closed this as completed in #3145 Jan 17, 2020

Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Jan 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unrecoverable loss pattern leads to deadlock #2863

unrecoverable loss pattern leads to deadlock #2863

marten-seemann commented Jun 30, 2019

nibanks commented Jun 30, 2019 via email •

edited by martinthomson

Loading

martinthomson commented Jul 1, 2019

kazuho commented Jul 1, 2019

marten-seemann commented Jul 1, 2019

kazuho commented Jul 1, 2019

mikkelfj commented Jul 1, 2019 •

edited

Loading

tatsuhiro-t commented Jul 14, 2019 •

edited

Loading

DavidSchinazi commented Jul 21, 2019

ekr commented Jul 21, 2019

marten-seemann commented Jul 21, 2019 •

edited

Loading

ekr commented Jul 21, 2019

marten-seemann commented Jul 21, 2019

ekr commented Jul 21, 2019

ianswett commented Jul 21, 2019

larseggert commented Oct 16, 2019

kazuho commented Oct 16, 2019

kazuho commented Oct 18, 2019

martinduke commented Oct 19, 2019 via email

ianswett commented Oct 19, 2019

kazuho commented Oct 19, 2019

janaiyengar commented Oct 22, 2019 •

edited

Loading

ad-l commented Oct 25, 2019

DavidSchinazi commented Oct 25, 2019

janaiyengar commented Nov 6, 2019

mnot commented Nov 19, 2019

unrecoverable loss pattern leads to deadlock #2863

unrecoverable loss pattern leads to deadlock #2863

Comments

marten-seemann commented Jun 30, 2019

nibanks commented Jun 30, 2019 via email • edited by martinthomson Loading

martinthomson commented Jul 1, 2019

kazuho commented Jul 1, 2019

marten-seemann commented Jul 1, 2019

kazuho commented Jul 1, 2019

mikkelfj commented Jul 1, 2019 • edited Loading

tatsuhiro-t commented Jul 14, 2019 • edited Loading

DavidSchinazi commented Jul 21, 2019

ekr commented Jul 21, 2019

marten-seemann commented Jul 21, 2019 • edited Loading

ekr commented Jul 21, 2019

marten-seemann commented Jul 21, 2019

ekr commented Jul 21, 2019

ianswett commented Jul 21, 2019

larseggert commented Oct 16, 2019

kazuho commented Oct 16, 2019

kazuho commented Oct 18, 2019

martinduke commented Oct 19, 2019 via email

ianswett commented Oct 19, 2019

kazuho commented Oct 19, 2019

janaiyengar commented Oct 22, 2019 • edited Loading

ad-l commented Oct 25, 2019

DavidSchinazi commented Oct 25, 2019

janaiyengar commented Nov 6, 2019

mnot commented Nov 19, 2019

nibanks commented Jun 30, 2019 via email •

edited by martinthomson

Loading

mikkelfj commented Jul 1, 2019 •

edited

Loading

tatsuhiro-t commented Jul 14, 2019 •

edited

Loading

marten-seemann commented Jul 21, 2019 •

edited

Loading

janaiyengar commented Oct 22, 2019 •

edited

Loading