Including ACK delay in packet loss detection time threshold #3951

djc · 2020-07-23T12:32:29Z

In quinn-rs/quinn#815, I've been going through the handling of PTO and other timer-related code in the Quinn implementation to make sure that it is consistent and compliant with the spec. One issue I bumped on is whether/when the ACK delay is relevant to a particular timer. This seemed somewhat inconsistent to me in the code, so I dug through the recovery spec and found the following:

Time Treshold for detecting packet loss says the time threshold is max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity), so it does not include the ACK delay. This seems surprising to me, as the ACK delay seems highly relevant in this case...
Idle Timeout says "To avoid excessively small idle timeout periods, endpoints MUST increase the idle timeout period to be at least three times the current Probe Timeout (PTO)."; it seems like the ACK delay is relevant here: we wouldn't want to consider a connection idle if we haven't received ACKs because they are delayed.
Failed Path Validation says validation_timeout = max(3*PTO, 6*kInitialRtt). However, path validation should be based on a PATH_RESPONSE frame, not an ACK frame, so including the max_ack_delay here doesn't seem relevant from first principles.
Closing And Draining Connection States says "These states SHOULD persist for at least three times the current Probe Timeout (PTO) interval as defined in [QUIC-RECOVERY].", so I think this should keep the ACK delay as well.

So concrete questions I came up with:

Should the packet loss detection threshold take the max_ack_delay into account somehow?
Should the failed path validation timeout use a different calculation that doesn't include max_ack_delay?
For closing/draining connection timers, does it make sense to include the max_ack_delay?

The text was updated successfully, but these errors were encountered:

martinthomson · 2020-07-23T15:04:36Z

I am comfortable with this discrepancy (or all three).

The loss detection threshold is based on observing a gap, so it only applies when there is an obligation to acknowledge immediately if the ACK is received. Thus the max_ack_delay doesn't apply.

The failed path validation timeout simply uses a value that allows for some loss and a response. Any value suffices here, but this value ensures that there is ample opportunity to get a response back.

The closing and draining states follow similar logic to path validation. Any value would suffice, and in this case the goal is only to avoid silly values being set and connections timing out before you give PTO a chance to repair losses.

(I just successfully simulated a handshake with a 25s RTT, 10% loss, and a 30s idle timeout. This last safeguard is relevant there. As long as you avoid an idle timeout that is less than the RTT you get a "usable" connection.)

Ralith · 2020-07-23T17:25:28Z

While in a typical case the inclusion/omission does seem insignificant, max_ack_delay can take values over 16 seconds, leading to a quite noticable impact in behavior. If that's nonetheless deemed within reasonable bounds, I think it'd be helpful to have an explicit statement of intent in the text stating that the inclusion of max_ack_delay in non-ACK-related timers is for simplicity.

ianswett · 2020-07-24T17:12:15Z

max_ack_delay isn't included in time threshold loss detection because the transport document says you need to acknowledge packets which were previously missing immediately and that out of order packets are acknowledged immediately. So not including it is because it's not necessary to include it, and including it would slow down loss detection, in some cases substantially.

janaiyengar · 2020-07-24T22:37:04Z

Is there anything to be done here?

Ralith · 2020-07-26T17:57:44Z

Some clarifying language expressing that the ack delay is included in path validation/closing/draining timers/etc only for simplicity would be nice. Given the large potential impact, the reader is otherwise left trying to guess why that's appropriate.

ianswett · 2020-07-26T19:46:48Z

Thanks, that sounds editorial.

ianswett · 2020-08-02T21:33:16Z

@Ralith or @djc either of you want to write a PR for this?

martinthomson · 2020-08-05T05:16:50Z

Let's close this one, because this title is really throwing me off. I opened #3987 for the specific suggestion.

djc mentioned this issue Jul 23, 2020

PTO handling quinn-rs/quinn#815

Open

LPardue added this to Triage in Late Stage Processing Jul 23, 2020

LPardue added the post-wglc label Jul 23, 2020

ianswett added the editorial An issue that does not affect the design of the protocol; does not require consensus. label Jul 26, 2020

project-bot bot moved this from Triage to Editorial Issues in Late Stage Processing Jul 26, 2020

martinthomson mentioned this issue Aug 5, 2020

Explain why we include max_ack_delay in some timers #3987

Closed

martinthomson closed this as completed Aug 5, 2020

Late Stage Processing automation moved this from Editorial Issues to Issue Handled Aug 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Including ACK delay in packet loss detection time threshold #3951

Including ACK delay in packet loss detection time threshold #3951

djc commented Jul 23, 2020

martinthomson commented Jul 23, 2020 •

edited

Loading

Ralith commented Jul 23, 2020

ianswett commented Jul 24, 2020

janaiyengar commented Jul 24, 2020 •

edited

Loading

Ralith commented Jul 26, 2020

ianswett commented Jul 26, 2020

ianswett commented Aug 2, 2020

martinthomson commented Aug 5, 2020

Including ACK delay in packet loss detection time threshold #3951

Including ACK delay in packet loss detection time threshold #3951

Comments

djc commented Jul 23, 2020

martinthomson commented Jul 23, 2020 • edited Loading

Ralith commented Jul 23, 2020

ianswett commented Jul 24, 2020

janaiyengar commented Jul 24, 2020 • edited Loading

Ralith commented Jul 26, 2020

ianswett commented Jul 26, 2020

ianswett commented Aug 2, 2020

martinthomson commented Aug 5, 2020

martinthomson commented Jul 23, 2020 •

edited

Loading

janaiyengar commented Jul 24, 2020 •

edited

Loading