Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds substantive descriptions of loss detection mechanisms #801

Merged
merged 7 commits into from
Oct 11, 2017

Conversation

janaiyengar
Copy link
Contributor

This isn't perfect, but I'd like to commit soon and iterate.

Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments, but I like the content and structure.

* Handshake packets, which contain STREAM frames for stream 0, are
critical to QUIC transport and crypto negotiation, so a separate alarm
period is used for them.
QUIC uses both ack information and timeouts to detect lost packets, and this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, ack information and timeouts is correct, but not super-descriptive and it might be worth saying once that this is sender side, even though it's obvious to us. Maybe something like: "QUIC senders use both received acknowledgements and timeouts to detect and recover from lost packets. This section provides..."


### Fast Retransmit

An unacknowledged packet is marked as lost when an acknowledgment is received
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given I just had the conversation about "acknowledgement" and what it meant, do we need to clarify it somewhere in this doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a TODO, but not doing this in this PR. I think we're going to need an early terminology section.

packet was received, while kReorderingThreshold provides some tolerance for
reordering of packets in the network.

The RECOMMENDED value for kReorderingThreshold is 3.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RECOMMENDED initial value? Esp given you talk about changing it below.


We derive this default from recommendations for TCP loss recovery {{!RFC5681}}
{{!RFC6675}}. It is possible for networks to exhibit higher degrees of
reordering, causing a sender to spuriously detect losses. Spurious loss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think "detect spurious losses" reads a bit better, but I'm not sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Spurious loss detection is sort of odd. I'd go with "Detecting spurious losses leads..."

Unacknowledged packets close to the tail may have fewer than
kReorderingThreshold number of packets sent after them. Loss of such packets
cannot be detected via Fast Retransmit. To enable ack-based loss detection of
such packets, receipt of an acknowledgment for the last outstanding packet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to clarify that it's the last outstanding ackable packet?

Also, now that I read through this, it seems like it'd be worth doing early retransmit for any packets within kReorderingThreshold from the tail, but that's a sidethought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm -- yeah, I think this is important. I'm adding ackable in a few places, including in TLP.


A Retransmission Timeout (RTO) alarm is the final backstop for loss
detection. The algorithm used in QUIC is based on the RTO algorithm for TCP
{{!RFC5681}} and is additionally resilient to spurious RTOs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you mention spurious RTOs, maybe reference 5682 as well?


* When the final TLP packet is sent, RTO is set to max(SRTT + 4*RTTVAR, minRTO)

* When an RTO alarm fires, RTO is set to twice its current value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; "RTO is doubled." seems a bit less ambiguous.

them.

The initial flight has no prior RTT sample. A client SHOULD remember
the previous RTT it observed when resumption is attempted and use that for an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly a good recommendation, but if the client changes networks, does it still apply?


Endpoints MUST retransmit handshake frames if not acknowledged within a
time limit. This time limit will start as the largest of twice the RTT value
and MinTLPTimeout. Each consecutive handshake retransmission doubles the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MinTLPTimeout is not defined in this text, it's just used above as 10ms when calculating the TLP alarm.

initial RTT value. If no previous RTT is available, the initial RTT defaults
to 100ms.

Endpoints MUST retransmit handshake frames if not acknowledged within a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we retransmit "all unacknowledged handshake frames", so we should probably say that?

in the underlying network RTT.

When an ack is received after one or more consecutive RTO alarm events, any
packets with lower packet numbers than those acked MUST be marked as lost.
Copy link
Contributor

@lucas-clemente lucas-clemente Sep 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this apply even if the ACK doesn't ACK the RTO packets, i.e. it was an earlier delayed ACK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I think it should not. In that case, we should fall out of RTO mode and go back into standard ack based recovery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the text here, PTAL.

@martinthomson
Copy link
Member

I can't work out what circle is complaining about here. Can you rebase this branch and push again? It might be easier to see any problem with the new CI setup.

* Handshake packets, which contain STREAM frames for stream 0, are
critical to QUIC transport and crypto negotiation, so a separate alarm
period is used for them.
QUIC uses both ack information and timeouts to detect lost packets, and this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/acked/acknowledged/g
s/ack information/acknowledgment information/g
s/ack-based/ACK-based/g or s/ack-based/acknowledgment-based/g

Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement. We can tackle the TODOs next.

@ianswett ianswett merged commit 8cf88aa into master Oct 11, 2017
@martinthomson martinthomson deleted the newrecovery branch October 12, 2017 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants