Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including ACK delay in packet loss detection time threshold #3951

Closed
djc opened this issue Jul 23, 2020 · 8 comments
Closed

Including ACK delay in packet loss detection time threshold #3951

djc opened this issue Jul 23, 2020 · 8 comments
Labels
editorial An issue that does not affect the design of the protocol; does not require consensus.

Comments

@djc
Copy link

djc commented Jul 23, 2020

In quinn-rs/quinn#815, I've been going through the handling of PTO and other timer-related code in the Quinn implementation to make sure that it is consistent and compliant with the spec. One issue I bumped on is whether/when the ACK delay is relevant to a particular timer. This seemed somewhat inconsistent to me in the code, so I dug through the recovery spec and found the following:

  • Time Treshold for detecting packet loss says the time threshold is max(kTimeThreshold * max(smoothed_rtt, latest_rtt), kGranularity), so it does not include the ACK delay. This seems surprising to me, as the ACK delay seems highly relevant in this case...

  • Idle Timeout says "To avoid excessively small idle timeout periods, endpoints MUST increase the idle timeout period to be at least three times the current Probe Timeout (PTO)."; it seems like the ACK delay is relevant here: we wouldn't want to consider a connection idle if we haven't received ACKs because they are delayed.

  • Failed Path Validation says validation_timeout = max(3*PTO, 6*kInitialRtt). However, path validation should be based on a PATH_RESPONSE frame, not an ACK frame, so including the max_ack_delay here doesn't seem relevant from first principles.

  • Closing And Draining Connection States says "These states SHOULD persist for at least three times the current Probe Timeout (PTO) interval as defined in [QUIC-RECOVERY].", so I think this should keep the ACK delay as well.

So concrete questions I came up with:

  • Should the packet loss detection threshold take the max_ack_delay into account somehow?
  • Should the failed path validation timeout use a different calculation that doesn't include max_ack_delay?
  • For closing/draining connection timers, does it make sense to include the max_ack_delay?
@martinthomson
Copy link
Member

martinthomson commented Jul 23, 2020

I am comfortable with this discrepancy (or all three).

The loss detection threshold is based on observing a gap, so it only applies when there is an obligation to acknowledge immediately if the ACK is received. Thus the max_ack_delay doesn't apply.

The failed path validation timeout simply uses a value that allows for some loss and a response. Any value suffices here, but this value ensures that there is ample opportunity to get a response back.

The closing and draining states follow similar logic to path validation. Any value would suffice, and in this case the goal is only to avoid silly values being set and connections timing out before you give PTO a chance to repair losses.

(I just successfully simulated a handshake with a 25s RTT, 10% loss, and a 30s idle timeout. This last safeguard is relevant there. As long as you avoid an idle timeout that is less than the RTT you get a "usable" connection.)

@Ralith
Copy link
Contributor

Ralith commented Jul 23, 2020

While in a typical case the inclusion/omission does seem insignificant, max_ack_delay can take values over 16 seconds, leading to a quite noticable impact in behavior. If that's nonetheless deemed within reasonable bounds, I think it'd be helpful to have an explicit statement of intent in the text stating that the inclusion of max_ack_delay in non-ACK-related timers is for simplicity.

@ianswett
Copy link
Contributor

max_ack_delay isn't included in time threshold loss detection because the transport document says you need to acknowledge packets which were previously missing immediately and that out of order packets are acknowledged immediately. So not including it is because it's not necessary to include it, and including it would slow down loss detection, in some cases substantially.

@janaiyengar
Copy link
Contributor

janaiyengar commented Jul 24, 2020

Is there anything to be done here?

@Ralith
Copy link
Contributor

Ralith commented Jul 26, 2020

Some clarifying language expressing that the ack delay is included in path validation/closing/draining timers/etc only for simplicity would be nice. Given the large potential impact, the reader is otherwise left trying to guess why that's appropriate.

@ianswett ianswett added the editorial An issue that does not affect the design of the protocol; does not require consensus. label Jul 26, 2020
@project-bot project-bot bot moved this from Triage to Editorial Issues in Late Stage Processing Jul 26, 2020
@ianswett
Copy link
Contributor

Thanks, that sounds editorial.

@ianswett
Copy link
Contributor

ianswett commented Aug 2, 2020

@Ralith or @djc either of you want to write a PR for this?

@martinthomson
Copy link
Member

Let's close this one, because this title is really throwing me off. I opened #3987 for the specific suggestion.

Late Stage Processing automation moved this from Editorial Issues to Issue Handled Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial An issue that does not affect the design of the protocol; does not require consensus.
Projects
Late Stage Processing
  
Issue Handled
Development

No branches or pull requests

6 participants