Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning up Loss Detection overview #536

Closed
wants to merge 1 commit into from
Closed

Cleaning up Loss Detection overview #536

wants to merge 1 commit into from

Conversation

janaiyengar
Copy link
Contributor

Addresses comments from Ian and Martin in #397

@janaiyengar janaiyengar requested a review from ianswett May 17, 2017 20:56
@janaiyengar janaiyengar added editorial An issue that does not affect the design of the protocol; does not require consensus. -recovery labels May 17, 2017
Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions.


* When a short period of time has elapsed after the last sent packet is
acknowledged, with unacknowledged packets near the tail (fewer than
kReorderingThreshold packets sent after them). The sender cannot expect to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping we could combine early retransmit and time based here, just like the implementation does.

I realize this is a bit harder to write, but maybe something like:
When a larger packet than the unacknowledged packet has been acked and the reordering threshold in time has elapsed since a larger packet was acknowledged. In order to implement a mechanism equivalent to TCP Early Retransmit, this only applies when the acked packet is the largest sent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading below, i see what you're trying to do by describing standard FACK style loss detection first, and it likely makes sense.

In which case, how about "for unacknowledged packets within kReorderingThreshold from the tail." instead of "with unacknowledged..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document the "simpler" packet based ones for now, and if we can describe the time-based ones separately, then that's useful too. But we should keep them separable so (i) it's easier to reason about, and (ii) it's easier to implement if you only want to implement one.

TCP's Tail Loss Probe.
* When a Retransmission Timeout (RTO) alarm fires. An RTO alarm is set after
Tail Loss Probe attempts at evoking acknowledgements (see below). When this
alarm fires, all unacknowledged packets are marked as lost.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, they're only marked as lost when a larger packet has been acknowledged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was trying to figure out exactly what is reasonable to say here, but didn't spend time figuring it out. Do you want to suggest an edit here? Specifically, this gets into the details of spurious detection, but that may be worth describing here. I'll try to get to this today.

critical to QUIC transport and crypto negotiation, so a separate alarm
period is used for them.
Handshake packets, which contain STREAM frames for stream 0, are critical to
QUIC transport and crypto negotiation, so a separate (more aggressive) alarm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would just say "more aggressive" and drop the separate and ()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say shorter rather than more aggressive. "Aggressiveness" isn't really a property we care about.

General comment: if you intend to talk about specific tuning variables (as you do for the reordering threshold), please be consistent about referencing all of those variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, no change apparent here.

alarm to detect loss. Specifically, when the last sent packet is
acknowledged, the sender waits a short period of time to allow for
reordering and then marks any unacknowledged packets as lost. This mechanism
* When at least one packet that was sent a threshold number of packets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding "a threshold number of packets" really hard to parse. I think that you want to say that a packet is declared lost when a packet with a significantly higher packet number is acknowledged. (This is an area where packet number skipping really hurts us.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's a bit hard to read. Packet number skipping doesn't matter, because the sender knows it skipped packets. It does introduce extra work in that case, but it's easily solvable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still very hard to parse for me. Does this work?

"If a packet is received, packets with significantly lower packet numbers (determined by kReorderingThreshold) that have not been received are marked as lost. This mechanism combines both TCP's ... [citation needed]"

(kReorderingThreshold) after an unacknowledged packet is acknowledged. This
indicates that the unacknowledged packet is either lost or reordered beyond
the specified threshold. This mechanism combines both TCP's FastRetransmit
and FACK mechanisms.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[citation needed]


* When a short period of time has elapsed after the last sent packet is
acknowledged, with unacknowledged packets near the tail (fewer than
kReorderingThreshold packets sent after them). The sender cannot expect to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that I'm tired, but I simply cannot parse this sentence at all. I can't offer advice on how to make this comprehensible, but I do think that you need to explain that this isn't loss according to the definitions of the reordering threshold and RTO. This is a way of using the dead air after transmission ceases to more proactively repair gaps near the end of the transmission.

Structurally, I think that you want to come at this differently. There are two elementary parameters: the reordering threshold and the retransmission timeout. Explain those first. Then explain the limitations of those mechanisms: when the sender has nothing more to send, reordering no longer works, which slows repair toward the end of a transmission. Repairing gaps near the end on a shorter timer ensures that you don't fall back on the relatively slow RTO. Preemptively retransmitting data in an additional packet triggers that shorter timer and can be used to provide redundancy for trailing data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If a short period of time has elapsed since the last packet was sent and there are fewer than the reordering threshold (kReorderingThreshold) packets that have been sent since the highest acknowledged packet. A sender will not be able to detect loss based on the reordering threshold in this case, so these packets are considered lost based on a timer. This design is based on the Linux implementation... [citation needed]"

lost when a packet larger than it is acknowledged and a threshold amount of
time has passed since the packet was sent.
The above mechanisms use a packet threshold to tolerate
reordering. Alternatively, a QUIC sender may use a time threshold. This allows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need to be clearer about how the time threshold plays into this. I would prefer to see the reordering and time thresholds described together. If it makes sense in some cases to disable the time threshold, then describe those conditions. If it makes sense to disable the time threshold except for certain scenarios, you can effectively disable the time threshold by increasing its value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any change in response to this comment.

critical to QUIC transport and crypto negotiation, so a separate alarm
period is used for them.
Handshake packets, which contain STREAM frames for stream 0, are critical to
QUIC transport and crypto negotiation, so a separate (more aggressive) alarm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say shorter rather than more aggressive. "Aggressiveness" isn't really a property we care about.

General comment: if you intend to talk about specific tuning variables (as you do for the reordering threshold), please be consistent about referencing all of those variables.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I didn't see a notification when the extra commits were pushed. Sorry about taking so long to get to this.

I think that some of this depends on @ianswett, so this should be a reminder to look at this again. This is generally an improvement, so I'd like to see it merged, but it's a little hard to follow as it is.

alarm to detect loss. Specifically, when the last sent packet is
acknowledged, the sender waits a short period of time to allow for
reordering and then marks any unacknowledged packets as lost. This mechanism
* When at least one packet that was sent a threshold number of packets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still very hard to parse for me. Does this work?

"If a packet is received, packets with significantly lower packet numbers (determined by kReorderingThreshold) that have not been received are marked as lost. This mechanism combines both TCP's ... [citation needed]"


* When a short period of time has elapsed after the last sent packet is
acknowledged, with unacknowledged packets near the tail (fewer than
kReorderingThreshold packets sent after them). The sender cannot expect to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If a short period of time has elapsed since the last packet was sent and there are fewer than the reordering threshold (kReorderingThreshold) packets that have been sent since the highest acknowledged packet. A sender will not be able to detect loss based on the reordering threshold in this case, so these packets are considered lost based on a timer. This design is based on the Linux implementation... [citation needed]"

lost when a packet larger than it is acknowledged and a threshold amount of
time has passed since the packet was sent.
The above mechanisms use a packet threshold to tolerate
reordering. Alternatively, a QUIC sender may use a time threshold. This allows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any change in response to this comment.

critical to QUIC transport and crypto negotiation, so a separate alarm
period is used for them.
Handshake packets, which contain STREAM frames for stream 0, are critical to
QUIC transport and crypto negotiation, so a separate (more aggressive) alarm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, no change apparent here.

@janaiyengar
Copy link
Contributor Author

Superseded by #801

@martinthomson martinthomson deleted the recovery branch October 12, 2017 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery editorial An issue that does not affect the design of the protocol; does not require consensus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants