Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lost Server Initial Takes Too Long to be Retransmitted #3078

Closed
nibanks opened this issue Oct 4, 2019 · 14 comments · Fixed by #3080
Closed

Lost Server Initial Takes Too Long to be Retransmitted #3078

nibanks opened this issue Oct 4, 2019 · 14 comments · Fixed by #3080
Assignees
Labels
-recovery design has-consensus

Comments

@nibanks
Copy link
Member

@nibanks nibanks commented Oct 4, 2019

Generally, the server's first flight (of a 1-RTT connection attempt) has a single Initial packet and the rest is made up of 1 or more Handshake packets. If the UDP datagram containing the Initial packet is lost, the server only ends up retransmitting it based off the default RTT based PTO timer.

If/when the client receives the Handshake packets that it can't decrypt (because the Initial was lost) then it has a (possible) signal that the server has responded, but a packet was lost. The client could use this info to immediately retransmit its Initial. If/when the server receives this, it could use this info to retransmit its Initial.

@marten-seemann
Copy link
Contributor

@marten-seemann marten-seemann commented Oct 4, 2019

Isn't this a special case of the EMPTY_ACK frame discussed earlier?

@nibanks
Copy link
Member Author

@nibanks nibanks commented Oct 4, 2019

Yes, but the problem is the worst in this case, because the server doesn't have an RTT estimate, so it can't have an accurate PTO for retransmitting. Once it has received a single ACK, the PTO becomes a lot more accurate, and falling back to the PTO is a lot less painful.

@ianswett
Copy link
Contributor

@ianswett ianswett commented Oct 7, 2019

This problem also occurs when the server receives 0-RTT packets it cannot decrypt. In that case, it would be ideal if it could do something to indicate something was received but not processed. I guess it could send a PING in an Initial packet as an alternative to EMPTY_ACK?

I sent out a PR that I believe covers the most important points you mention. PTAL.

@kazuho
Copy link
Member

@kazuho kazuho commented Oct 7, 2019

I think the problems here are:

  • Receiving a retransmit client Initial does not necessarily indicate a loss. It could simply be the case that the actual RTT is greater than the client's initial RTT. Do we want the server to retransmit packets in that case? Note that a client might use a small initial RTT when it is connecting to a server that it has previously connected to.
  • How the server detects a retransmission will become complicated if we adopt #3045.

Considering these aspects, I might be weakly inclined to addressing the issue using an explicit signal, defined either within v1 or an extension.

@mikkelfj
Copy link
Contributor

@mikkelfj mikkelfj commented Oct 7, 2019

This problem also occurs when the server receives 0-RTT packets it cannot decrypt. In that case, it would be ideal if it could do something to indicate something was received but not processed.

I think this falls into the category of black-holing invalid packets as they might easily come from an unwanted source.

@ianswett
Copy link
Contributor

@ianswett ianswett commented Oct 11, 2019

I don't believe #3045 complicates things substantially, because if only one UDP datagram is lost, then you would expect to receive an ACK in a relatively short period(no ack delay) of time and loss recovery could function quite well.

@ianswett ianswett added the design label Oct 14, 2019
@huitema
Copy link
Contributor

@huitema huitema commented Oct 16, 2019

Does this need to be standardized? Or does it fall in the range of heuristics that applications may do, but don't need to be specified.

@huitema
Copy link
Contributor

@huitema huitema commented Oct 16, 2019

Of course the handshake packet that you receive can be a bunch of random bytes sent by an attacker...

@larseggert
Copy link
Member

@larseggert larseggert commented Oct 16, 2019

Discussed in Cupertino. @ianswett to update #3080 based on discussion.

@martinthomson
Copy link
Member

@martinthomson martinthomson commented Oct 16, 2019

The proposed "fix" could lead to a deadlock where the server consumes its entire congestion window without making progress. If the server receives too many Initial packets (bad RTT estimate at the client, say), then it will send lots of data without getting an ACK. As suggested, this is just an optimization, so limiting the number of times this optimization can be used might avoid that problem.

@mnot mnot added this to Design Issues in Late Stage Processing Nov 25, 2019
@larseggert
Copy link
Member

@larseggert larseggert commented Feb 5, 2020

Discussed in ZRH. No change since Cupertino.

@ianswett ianswett added the proposal-ready label Feb 18, 2020
@project-bot project-bot bot moved this from Design Issues to Consensus Emerging in Late Stage Processing Feb 18, 2020
@LPardue LPardue moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Feb 19, 2020
@LPardue LPardue removed the proposal-ready label Feb 19, 2020
@LPardue LPardue added the call-issued label Feb 26, 2020
@LPardue LPardue added has-consensus and removed call-issued labels Mar 4, 2020
@project-bot project-bot bot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Mar 4, 2020
@martinthomson
Copy link
Member

@martinthomson martinthomson commented Mar 13, 2020

Apparently this has consensus. @ianswett, is #3080 ready?

@mikkelfj
Copy link
Contributor

@mikkelfj mikkelfj commented Mar 13, 2020

My only problem is that I wouldn't know how to detect duplicate CRYPTO data from a cursory read. Normally you'd feed the data to a TLS blackbox and it would not necessarily provide the information needed. The alternative seems to be to inspect CRYPTO frames, but who is to say that duplicate data is always framed the same?

Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Mar 13, 2020
@ianswett
Copy link
Contributor

@ianswett ianswett commented Mar 14, 2020

@mikkelfj detecting duplicate CRYPTO data is definitely implementation dependent, so I don't want to describe that in the recovery doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery design has-consensus
Projects
Late Stage Processing
  
Issue Handled
Development

Successfully merging a pull request may close this issue.

9 participants