Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unrecoverable loss pattern leads to deadlock #2863

Closed
marten-seemann opened this issue Jun 30, 2019 · 36 comments · Fixed by #3145
Closed

unrecoverable loss pattern leads to deadlock #2863

marten-seemann opened this issue Jun 30, 2019 · 36 comments · Fixed by #3145
Assignees
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@marten-seemann
Copy link
Contributor

There's a loss pattern that QUIC cannot recover from:
Assume client and server perform a handshake up to the point just after the server received the ClientFinished message. No packet loss happens so far. Both peers have now completed (but not yet confirmed) the handshake.

Right after sending the ClientFinished, the client sends a 1-RTT packet (e.g. containing application data). This packet is lost. The server sends an acknowledgement for the ClientFinished (in a Handshake packet), and this packet is lost as well. Furthermore, the server also sends a 1-RTT packet (e.g. containing a session ticket). This packet is successfully received and acknowledged by the client.

At this point the endpoint are in the following state: Since it received an acknowledgement for a packet sent with 1-RTT keys, the server now confirmed the handshake, and drops the Handshake keys. On the other hand, the client is still waiting for an acknowledgement for the ClientFinished, and retransmits it. It will however never receive an acknowledgement, since the server dropped the keys already.

@nibanks
Copy link
Member

nibanks commented Jun 30, 2019 via email

@martinthomson
Copy link
Member

A server can send a 1-RTT packet before even completing the handshake, so a client receiving one doesn't mean anything.

In this scenario, the client will continue retransmitting the last cryptographic handshake flight. That sounds terrible. But if it ever has cause to send 1-RTT data (as it does in Marten's example, which I'm not sure is relevant), then it will retransmit that data on PTO. When that is eventually acknowledged, that will confirm the handshake.

The more troublesome scenario is where the client sends nothing, only acknowledgments. In that case the client will have a complete handshake that cannot be confirmed.

Would it be enough to say that "unless an endpoint sends some ack-eliciting data in 1-RTT packets, the handshake might remain unconfirmed indefinitely" and leave it at that?

@kazuho
Copy link
Member

kazuho commented Jul 1, 2019

@martinthomson

In this scenario, the client will continue retransmitting the last cryptographic handshake flight. That sounds terrible. But if it ever has cause to send 1-RTT data (as it does in Marten's example, which I'm not sure is relevant), then it will retransmit that data on PTO. When that is eventually acknowledged, that will confirm the handshake.

Agreed. IIRC, the assumption in the design team was that the client would have something to send.

Would it be enough to say that "unless an endpoint sends some ack-eliciting data in 1-RTT packets, the handshake might remain unconfirmed indefinitely" and leave it at that?

+1. We may hint that "a client MAY send a PING packet to confirm the handshake."

@marten-seemann
Copy link
Contributor Author

But if it ever has cause to send 1-RTT data (as it does in Marten's example, which I'm not sure is relevant), then it will retransmit that data on PTO.

@martinthomson Unfortunately, that wouldn't work. As long as the client has outstanding crypto data, it will only retransmit the crypto packets, see https://quicwg.org/base-drafts/draft-ietf-quic-recovery.html#rfc.appendix.A.9.

+1. We may hint that "a client MAY send a PING packet to confirm the handshake."

@kazuho Agreed, although I'd prefer a SHOULD here.

@kazuho
Copy link
Member

kazuho commented Jul 1, 2019

FWIW, what I had in mind was something like "once the handshake is complete, endpoints SHOULD send ack-eliciting packets to confirm the handshake. If there is no application data to be sent, such packets can contain PING frames."

@mnot mnot added this to Triage in Late Stage Processing Jul 1, 2019
@mnot mnot added the design An issue that affects the design of the protocol; resolution requires consensus. label Jul 1, 2019
@mnot mnot moved this from Triage to Design Issues in Late Stage Processing Jul 1, 2019
@mikkelfj
Copy link
Contributor

mikkelfj commented Jul 1, 2019

Would it be enough to say that "unless an endpoint sends some ack-eliciting data in 1-RTT packets, the handshake might remain unconfirmed indefinitely" and leave it at that?

No I think the state needs to progress. Resources might be tied up and timeouts might be in place to weed out bad handshakes before a longer or indefinite idle timeout.

I would say the client MUST send at least one ack-eliciting 1-RTT packet once it is able to do so, and within, say, 1 PTO.

martinthomson added a commit that referenced this issue Jul 9, 2019
This leads to indefinite Handshake messages from one side, as Handshake
packets can never be acknowledged.  As discussed, sending anything that
elicits an ACK should fix this.  But making a connection where you sit
entirely idle leads to a weird corner case.

Closes #2863.
@tatsuhiro-t
Copy link
Contributor

tatsuhiro-t commented Jul 14, 2019

If I understand the issue correctly, the problem is that server drops handshake key too early before client gets ack for its last handshake flight. Server can ack any client Short packet without client last Handshake flight. That means that sending any ack-eliciting 1RTT packet from client does not fix this issue. Coalescing missing Handshake and Short packet would suffer MTU overflow as disscused in PR.
I think server has to wait for client's confirmation of handshake before confirming handshake. It might be good idea to reconsider explicit signal again?

@DavidSchinazi
Copy link
Contributor

From the Discarding Handshake Keys Section:

Most application protocols will send data after the handshake, resulting in acknowledgements that allow both endpoints to discard their handshake keys promptly. Endpoints that do not have reason to send immediately after completing the handshake MAY send ack-eliciting frames, such as PING, which will cause the handshake to be confirmed when they are acknowledged.

We could change this MAY to a SHOULD and it'll resolve this issue.

@ekr
Copy link
Collaborator

ekr commented Jul 21, 2019

I'm sorry, but I'm having trouble following the initial example.

You said that the client sent a 1-RTT packet and it got lost. In this case, it will continue to retransmit it and will eventually get an ACK, at which point the handshake will be confirmed and it will drop the keys and stop retransmitting ClientFinished.

And as David indicates above, if the client is not sending at all, then it can just send ack-eliciting frames.

@marten-seemann
Copy link
Contributor Author

marten-seemann commented Jul 21, 2019

We could change this MAY to a SHOULD and it'll resolve this issue.

Unfortunately, it doesn't. As I described in the initial example, and in #2863 (comment), the client won't retransmit 1-RTT data until the handshake is completed.

@ekr
Copy link
Collaborator

ekr commented Jul 21, 2019

@marten-seemann well in that case, the source of the problem is the retransmission algorithm, not this. Why does it make sense to not retransmit 1-RTT data when there is an outstanding CFIN?

Just consider the simple case where the client's second flight (CFIN + some data) is lost. When the PTO expires, you should resend all of that.

@marten-seemann
Copy link
Contributor Author

@ekr The client could have sent an arbitrary amount of 1-RTT data, so you wouldn't want to retransmit all of this. As discussed on the PR (#2881), this can have some weird interaction with the congestion controller as well.

@ekr
Copy link
Collaborator

ekr commented Jul 21, 2019

Sure, you would, however, want to retransmit some of it

@ianswett
Copy link
Contributor

Technically, 1RTT packets can be sent on the PTO when there is unacked Handshake data, but you're correct that if there are two MSS or more of Handshake data, the client would repeatedly retransmit that data instead of sending any 1RTT packets as written today.

I think MT's PR is heading in the right direction.

@marten-seemann do you have an alternate suggestion of how to deal with this situation?

@larseggert
Copy link
Member

Discussed in Cupertino. With the change in #3066 and #3093, we believe this deadlock issue is addressed. (also, @kazuho has a pending edit).

@kazuho
Copy link
Member

kazuho commented Oct 16, 2019

@larseggert My point is #3093 (comment).

martinthomson added a commit that referenced this issue Oct 17, 2019
After many iterations of solutions to this problem, we have decided to
first forbid discard.  We would like to have a concrete discard point,
but we couldn't agree on one that was provably safe.

If we can come up with a mechanism for discarding earlier, or a signal
that comes earlier than the end of the connection, we will first
carefully look at it, but we might accept further changes.

Closes #2863.
@martinthomson martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Oct 17, 2019
@kazuho
Copy link
Member

kazuho commented Oct 18, 2019

@RyanatGoogle

it sounds like you are recommending something like the previous RETIRE_KEYS proposal?

That might be the case (honestly I do not remember the details of the RETIRES_KEYS proposal), though I think I'm inclined to suggesting using a continuous signal (i.e., a Key Update or burning an unused header bit). The reasons being i) using a continuous signal allows us to abandon the Handshake keys earlier ii) we do not need to define a new signal that needs to be retransmitted in the transport draft, or rely on a particular behavior defined in the recovery draft that is not generally considered as necessary for a recovery logic (i.e. PTO "MUST" trigger a PING when there is no data to send).

@martinduke
Copy link
Contributor

martinduke commented Oct 19, 2019 via email

@ianswett
Copy link
Contributor

In response to @kazuho "when the client intentionally migrates to a new path while still having unacknowledged handshake data, how does it choose the new client CID when sending the Handshake packet?"

I'm arguing the client should never have unacknowledged Handshake data when it migrates, because it waits for handshake confirmation to migrate and at that point, all Handshake data should be discarded.

There is a possibility the server does not believe the handshake is confirmed but the client does. In that case, if the client didn't keep listening on the old path when it migrated AND the server didn't send anything in 1-RTT, there would be a problem. But don't we require both client and server to validate the path in this case, which would mean both would achieve handshake confirmation on the new path, if they had not done so previously?

@kazuho
Copy link
Member

kazuho commented Oct 19, 2019

@ianswett

There is a possibility the server does not believe the handshake is confirmed but the client does. In that case, if the client didn't keep listening on the old path when it migrated AND the server didn't send anything in 1-RTT, there would be a problem.

Yes. This is the case I am pointing out.

But don't we require both client and server to validate the path in this case, which would mean both would achieve handshake confirmation on the new path, if they had not done so previously?

I am not sure if there is a guarantee that handshake confirmation would be obtained before the path is validated, because the path might become validated while all the packets carrying 1-RTT ACKs get dropped. And at that point, the endpoints might cease sending 1-RTT packets to each other.

@mnot mnot moved this from Design Issues to Consensus Emerging in Late Stage Processing Oct 21, 2019
@mnot mnot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Oct 22, 2019
@janaiyengar
Copy link
Contributor

janaiyengar commented Oct 22, 2019

What @kazuho points out is a problem. Keeping HS keys forever prevents the deadlock, but it does so by admitting that HS packets can be sent forever. This means after migration as well. That is something we don't consider in the spec.

@ianswett: I don't think you're right about this:

I'm arguing the client should never have unacknowledged Handshake data when it migrates, because it waits for handshake confirmation to migrate and at that point, all Handshake data should be discarded.

Handshake data is discarded when the handshake keys are dropped, not when confirmation is reached (see section 5.3 in recovery draft. Now with handshake keys forever, the client can keep retransmitting forever. And as @kazuho notes, the server can surely send keep retransmitting as well since if it hasn't reached confirmation yet.

There are a few ways to curb this... but I'm strongly inclined towards an explicit signal from both sides indicating handshake confirmation, after which migration can be initiated.

@ad-l
Copy link
Contributor

ad-l commented Oct 25, 2019

There is a much larger issue at stake here. The currently published proofs for the TLS 1.3 handshake do not allow the server to process 1-RTT packets until the client finished has been verified. It does not mean there is an attack, because of the defensive design of TLS 1.3 which implicitly authenticates the transcript in the transport secret derivation. However, this is still a weaker argument than checking the finish, particularly with respect to downgrade attacks. If a server supports a transport ciphersuite with broken integrity (e.g. someone discovers a way to forge CCM8 tags), then clients of this server that support strong ciphersuites may still be attacked.

There is also a clear risk with client authentication. If a server is configured to require a client certificate, but QUIC gives to the application 1-RTT packets before the client finished flight has been received, the application may incorrectly believe that this data has passed the client authentication check.

A reliable way to prevent such issues is to mandate that every 1-RTT packet sent by the client before the client finished frame is acknowledged by the server must include in the same datagram a copy of the finished frame (in the form of a separate handshake packet, before the short 1-RTT packet). The overhead is limited because finished messages are short, so even with the overhead of the long packet headers and AEAD tag, less than 100 bytes are used in the datagram (leaving the rest for the 1-RTT packet). If this is done, it is possible to enforce that a server must have checked the client finished before processing 1-RTT packets from the client. Adding some separate signal for handshake confirmation is a terrible idea - in TLS, receiving the peer's finished message IS the the explicit handshake confirmation signal (which is why CCS is removed in 1.3). Restoring a transport-level signal is a clear step backwards and must be avoided.

@DavidSchinazi
Copy link
Contributor

Hi @ad-l, since you're describing a "much larger issue" that I agree should be discussed, could you please open a separate issue? This issue is about a specific deadlock that can happen during the handshake, not about the security properties of the handshake.

@mnot mnot removed the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Oct 31, 2019
@mnot mnot moved this from Consensus Call issued to Design Issues in Late Stage Processing Oct 31, 2019
Late Stage Processing automation moved this from Design Issues to Text Incorporated Nov 5, 2019
@janaiyengar
Copy link
Contributor

This was closed erroneously... apparently we're not quite fully zen with github's AI yet.

@janaiyengar janaiyengar reopened this Nov 6, 2019
Late Stage Processing automation moved this from Text Incorporated to Triage Nov 6, 2019
@mnot mnot moved this from Triage to Design Issues in Late Stage Processing Nov 6, 2019
@mnot
Copy link
Member

mnot commented Nov 19, 2019

Discussed in Singapore; CfC #3145.

@martinthomson martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Dec 3, 2019
@mnot mnot moved this from Design Issues to Consensus Emerging in Late Stage Processing Dec 9, 2019
@mnot mnot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Jan 6, 2020
@mnot mnot added has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Jan 17, 2020
@mnot mnot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Jan 17, 2020
Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
Late Stage Processing
  
Issue Handled