Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain asymmetric confirmation condition #2881

Closed
wants to merge 9 commits into from

Conversation

martinthomson
Copy link
Member

This leads to indefinite Handshake messages from one side, as Handshake
packets can never be acknowledged. As discussed, sending anything that
elicits an ACK should fix this. But making a connection where you sit
entirely idle leads to a weird corner case.

Closes #2863.

This leads to indefinite Handshake messages from one side, as Handshake
packets can never be acknowledged.  As discussed, sending anything that
elicits an ACK should fix this.  But making a connection where you sit
entirely idle leads to a weird corner case.

Closes #2863.
@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Jul 9, 2019
@marten-seemann
Copy link
Contributor

I don’t think this fixes the issue I raised. You don’t only have to send, but also to retransmit the 1-RTT packets. We have a bunch of text in the recovery document that tells us to do not do this.

@martinthomson
Copy link
Member Author

I considered that, and you refer to the fact that the cryptographic handshake packets are the ones that are sent when you have a loss detection timeout. And that these are the only packets that are sent. I worded this carefully to say that you send 1-RTT, ack-eliciting packets. Should I say "In addition to retransmitting crypto packets, ..." or something along those lines?

@marten-seemann
Copy link
Contributor

If that’s the way we’re going to fix this (I’m not sure if it’s the best way), we would need to edit the recovery document as well. That would be a pretty ugly special case in the retransmission algorithm we describe.

@martinthomson
Copy link
Member Author

Well, keep in mind that "recovery" for PING, which is the way you would approach this if you never had anything to send, depends on you generating a new PING with every transmission. So it's not that terrible.

For our implementation, we would simply crank the lever and spit out another packet when the PTO fires, which would produce a packet that coalesced the missing Handshake with a 1-RTT packet.

It only gets hairy when the lost Handshake packet overflows the MTU. Which, sadly, is possible. If we lost more than a PMTU, it gets a little hairy, because we will probably won't send any 1-RTT packets ever. I might be willing to let the connection time out in that case though.

@marten-seemann
Copy link
Contributor

Well, keep in mind that "recovery" for PING, which is the way you would approach this if you never had anything to send, depends on you generating a new PING with every transmission.

If I understand your point correctly, you would either need to change RetransmitUnackedCryptoData() into RetransmitUnackedCryptoDataAndAtLeastOne1RTTPacket() or RetransmitUnackedCryptoDataAndSend1RTTPing(). That sounds doable, but it gets complicated when you take the congestion controller into account: It could happen that you have enough cwnd to send the retransmissions for the Handshake packets, but not for the 1-RTT packet. What you really want is to send the retransmission for the 1-RTT packet as a probe packet (i.e. never blocked by congestion control).

I might be willing to let the connection time out in that case though.

I don't think timing out is acceptable in the case where just 2 packets are lost. In my opinion, the only case where a timeout is justifiable is if the connection is (effectively) blackholed.

You can see the root cause of #2863 as not a problem with loss recovery at all, but with our definition of "handshake confirmed": The server is dropping the keys too early. If the server kept the keys for longer, this loss pattern would harm the connection at all.

@martinthomson
Copy link
Member Author

Don't the Handshake packets consume cwnd also?

@marten-seemann
Copy link
Contributor

Don't the Handshake packets consume cwnd also?

@martinthomson That's exactly the problem. They consume cwnd, although they're not even lost (as in "dropped by the network"), just dropped as undecryptable by the peer. In this situation, freeing up this cwnd is only possible after completion of the handshake, when all loss recovery state for non-1-RTT packets is dropped.

@martinthomson
Copy link
Member Author

So whatever we do to get to confirmation helps. But until that point, we're headed to a completely unrecoverable state.

@marten-seemann
Copy link
Contributor

We would have to say that the 1-RTT packet has to be treated like a probe packet (i.e. not blocked by the congestion controller).

@martinthomson
Copy link
Member Author

Why? The retransmitted Handshake packets aren't and in this case these are the ones that will run the CWND down. The 1-RTT PING barely counts.

Can't we just let the CWND burn down? Do you really think that by this point you will want to send more than CWND worth of these? I know that it's small at this point (maybe just one packet more than baseline), but if we use INIT_CWND = not_too_small, then we get quite a few goes before this becomes terminal.

@marten-seemann
Copy link
Contributor

Depending on the application this is actually quite likely, or even the common case. Assume that the client has a bit of data to send in the first flight, then this would fill up the cwnd completely. You don't even need must, just 10 packets, minus the Handshake packets in flight (which might contain a certificate chain), and their retransmissions.

Why? The retransmitted Handshake packets aren't and in this case these are the ones that will run the CWND down. The 1-RTT PING barely counts.

The problem is not the size of the 1-RTT PING, but the fact that we might already be congestion limited once we try to send it.

@martinthomson
Copy link
Member Author

The problem is not the size of the 1-RTT PING, but the fact that we might already be congestion limited once we try to send it.

Right, but this is a condition where 1-RTT packets might not be processed by the server. The client can't know if the server really needs the Handshake packets or the 1-RTT packet. If we want to exempt anything from CWND, wouldn't we need to exempt both?

Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to conflict with #2806 so I'll try to merge that soon.

draft-ietf-quic-tls.md Show resolved Hide resolved

If the handshake is complete, but not confirmed (see Section 4.1.1 and Section
4.1.2 of {{QUIC-TLS}}), in addition to sending unacknowledged crytographic
handshake data, endpoints SHOULD send an ack-eliciting 1-RTT packet. This can
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
handshake data, endpoints SHOULD send an ack-eliciting 1-RTT packet. This can
Handshake data, endpoints SHOULD send an ack-eliciting 1-RTT packet. This can

@marten-seemann
Copy link
Contributor

This is going to conflict with #2806 so I'll try to merge that soon.

@ianswett I don't think this is an editorial change. Furthermore, I don't believe that a change to the loss recovery algorithm is necessarily the best fix for #2863.
Can we not merge this until Montreal?

@ianswett
Copy link
Contributor

This PR and that issue are labeled transport+design, so presumably it'd need a consensus call?

@ianswett
Copy link
Contributor

I merged #2806, so you'll need to resolve conflicts, unless you want me to try to do that?

@@ -629,6 +629,14 @@ MAY use alternate strategies for determining the content of probe packets,
including sending new or retransmitted data based on the application's
priorities.

If the handshake is complete, but not confirmed (see Section 4.1.1 and Section
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make sense to work this into the "Handshakes and New Paths" section above?

The way I'm now thinking of this is: It's the client's job to drive the handshake to confirmation, and here's what is necessary to ensure that occurs... Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, now that I read the existing text again, I'm wondering if the "Sending Probe Packets" section should be before "Handshakes and New Paths"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave that change to you, but I have moved this up to the Handshakes and New Paths section.

Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more suggestion: I think line 519 should now say confirmed, not completed?

@martinthomson
Copy link
Member Author

OBE

@martinthomson martinthomson deleted the unrecoverable-without-1rtt branch December 3, 2019 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unrecoverable loss pattern leads to deadlock
5 participants