Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring ACK delay can result in wrong RTT calculations #3350

Closed
kixelated opened this issue Jan 15, 2020 · 7 comments
Closed

Ignoring ACK delay can result in wrong RTT calculations #3350

kixelated opened this issue Jan 15, 2020 · 7 comments
Labels
-recovery design has-consensus

Comments

@kixelated
Copy link
Contributor

@kixelated kixelated commented Jan 15, 2020

I'm implementing the RTT estimation code in quic-recovery at the moment and I've run into a potential edge case with ACK delays. This is entirely hypothetical for the moment.

For example, suppose min_rtt = 100 and our actual round trip time is 99, 100, or 101with equal frequency. This minimum RTT was calculated earlier (ex. handshake) but now the peer is delaying ACKs.

If we receive an ACK with latest_rtt = 126 and ack_delay = 25, then adjusted_rtt = 101.
If we receive an ACK with latest_rtt = 125 and ack_delay = 25, then adjusted_rtt = 125.
If we receive an ACK with latest_rtt = 124 and ack_delay = 25, then adjusted_rtt = 124.

The computed result is smoothed_rtt=117 and rttvar=11.
The "actual" result should be closer to smoothed_rtt=100 and rttvar=2/3.

Any delayed ACKs sent faster than min_rtt will skew the results based on the ack_delay. This includes ACKs sent over a faster path.

These RTT estimates can be brought closer to reality with a more accurate min_rtt. This can only be accomplished if the peer transmits an ACK with ack_delay=0 and gets lucky.

However, a peer will only send an immediate ACK if it receives two ack-eliciting packets within max_ack_delay of each other (and the peer follows the RFC recommendation). Ironically, this is the situation where estimating the RTT is the most important, as the packet rate is too low for fast recovery.

@ianswett
Copy link
Contributor

@ianswett ianswett commented Jan 15, 2020

I'll note that persistent delayed acks are bad for performance for a variety of reasons, but let's put that aside for now.

I believe the problematic text is:

MUST NOT apply the adjustment if the resulting RTT sample is smaller than the min_rtt. This limits the underestimation that a misreporting peer can cause to the smoothed_rtt.

The alternative would be to allow the RTT sample to be as small as min_rtt, but no smaller, rather than ignoring the ack_delay entirely when it created a sample less than min_rtt. The negative of this is it allows a buggy/malicious peer to always specify an overly large ack_delay, which drives smoothed to min_rtt and results in an RTTVar of 0. This would make PTO quite aggressive, though given we're also adding max_ack_delay onto the PTO, possibly it's acceptable.

In real-world scenarios, I suspect it doesn't matter that much, which is why we went with the more conservative option.

@kixelated
Copy link
Contributor Author

@kixelated kixelated commented Jan 15, 2020

Using the above example, where we expect smoothed_rtt=100 and rttvar=0.6

Here is the problematic code. It produces smoothed_rtt=116.6 and rttvar=10.4

    adjusted_rtt = latest_rtt
     if (latest_rtt > min_rtt + ack_delay):
       adjusted_rtt = latest_rtt - ack_delay

Like @ianswett said, an improvement would be to not let adjusted_rtt go below min_rtt:

adjusted_rtt = max(latest_rtt - ack_delay, min_rtt)

This produces smoothed_rtt=100.3 and rttvar=0.4. However, this will over-estimate the smooth average and under-estimate the variance.

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Jan 15, 2020

Our principle was that an endpoint should not use an RTT sample that is smaller than any RTT it has directly measured.

What is in the draft, as Ian pointed out, is the conservative approach that protects the endpoint from poor reporting of ack delay. A poor min_rtt will lead you astray but that is because the endpoint does not have direct observation of a smaller RTT. Until it does, it is better to be conservative.

@ianswett
Copy link
Contributor

@ianswett ianswett commented Jan 15, 2020

I'm not suggesting we should change the existing text, but I believe the change suggested above does fit with the principle you're stating @janaiyengar because you would still never allow the adjusted_rtt to go below min_rtt.

I will note that I hadn't anticipated the RTTVar inflation caused by sometimes ignoring ack_delay completely.

@mirjak
Copy link
Contributor

@mirjak mirjak commented Jan 23, 2020

The other option would be to not update smoothed_rtt at all... but the we need a way to ensure that we update still frequently enough.

Also the code should probably say at least (adding =):

if (latest_rtt => min_rtt + ack_delay):

@martinthomson
Copy link
Member

@martinthomson martinthomson commented Jan 23, 2020

Assuming that you mean latest_rtt >= min_rtt + ack_delay, that seems reasonable.

@larseggert larseggert added the design label Feb 4, 2020
@project-bot project-bot bot moved this from Triage to Design Issues in Late Stage Processing Feb 4, 2020
@larseggert
Copy link
Member

@larseggert larseggert commented Feb 5, 2020

Discussed in ZRH. Proposed resolution is to close with no action.

@larseggert larseggert added the proposal-ready label Feb 5, 2020
@project-bot project-bot bot moved this from Design Issues to Consensus Emerging in Late Stage Processing Feb 5, 2020
@LPardue LPardue moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Feb 19, 2020
@LPardue LPardue removed the proposal-ready label Feb 19, 2020
@LPardue LPardue added the call-issued label Feb 26, 2020
@LPardue LPardue added has-consensus and removed call-issued labels Mar 4, 2020
@project-bot project-bot bot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Mar 4, 2020
Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery design has-consensus
Projects
Late Stage Processing
  
Issue Handled
Development

No branches or pull requests

7 participants