compensation of ack_delay is fragile against errors #2060

kazuho · 2018-11-28T04:36:19Z

The logic in UpdateRtt function that compensates for ack_delay is as follows.

    // Adjust for ack delay if it's plausible.
    if (latest_rtt - min_rtt > ack_delay):
      latest_rtt -= ack_delay

The issue here is the delay is completely ignored if ack_delay is a bit greater than the true value due to measurement errors, etc. If this happens on a congested network where ACKs get lost, latest_rtt ends up in an incredibly high value.

IMO this should be latest_rtt = max(latest_rtt - ack_delay, min_rtt).

The text was updated successfully, but these errors were encountered:

ianswett · 2018-11-28T11:11:02Z

Thanks, I think that's a good suggestion. Writing a PR now.

Fixes #2060

janaiyengar · 2018-11-29T02:43:11Z

You still want to have the if check, to avoid a large value due to overflow. Which unfortunately leaves us at the same place. I think what you want is:

    if (latest_rtt - min_rtt > ack_delay)
        latest_rtt -= ack_delay
    else
        latest_rtt = min_rtt

That said, I'm not sure that this is a good change. Basically if you have a bad clock at the receiver, this means that you constantly underestimate the latest RTT sample. That is bad for two reasons: (i) you miss any queues that might be building up because you underestimate latest_rtt, and (ii) the smoothed_rtt value that might be used for pacing uses this underestimated value.

OTOH, in the current world, if the receiver is consistently bad in reporting ack_delay, the sender becomes less aggressive than it should be, since the latest_rtt is larger than it should be. Pacing rate is lower, and any sender queueing estimate that uses latest_rtt hits detects a queue when there isn't one. This isn't unsafe, this is just poor performance, which should be expected for a receiver with a poor clock.

martinthomson · 2018-11-29T02:56:38Z

That code produces the same outcome as Ian's patch, though it's 4 lines instead of 1. I tend to think that this sort of defensive guard is useful though. If a peer under- or over-reports the ACK delay, that's going to produce bad results, but someone with a clock that much jank is going to have a bad time no matter what.

kazuho · 2018-11-29T02:56:46Z

@janaiyengar

I think what you want is:

   if (latest_rtt - min_rtt > ack_delay)
       latest_rtt -= ack_delay
   else
       latest_rtt = min_rtt

Isn't that mathematically equivalent to latest_rtt = max(latest_rtt - ack_delay, min_rtt)?

Basically if you have a bad clock at the receiver, this means that you constantly underestimate the latest RTT sample.

What type of behavior are you assuming for a "bad clock"?

The issue with the current approach is that it considers the true ack_delay to be zero when the value reported by the peer is huge.

Assuming that we cannot tell how good the clock of the peer (or the endpoint) is, we should round ack_delay to the nearest reasonable value. That is, if ack_delay is too large so that latest_rtt becomes below min_rtt, adjust ack_delay to latest_rtt - min_rtt. That is what the issue proposes.

janaiyengar · 2018-11-29T03:39:32Z

@kazuho @martinthomson : Mathematically, yes, but I wanted to protect against overflow.

@kazuho: I'm concerned that if the ack_delay being reported is too large, I don't know how much of that is in error and how much of that is accurate. The patch assumes that exactly latest_rtt - min_rtt is the correct ack_delay, but there's no evidence of it. The problem with the proposed patch is that a client that reports an ack_delay that is too large will automatically cause the sender to always believe that the RTT is not growing at all. (Imagine the case where ack_delay is always reported as MAX_INT. This means that the sender will always end up with latest_rtt = min_rtt.)

I'm suggesting that if there's an indication that the peer is feeding in values that are too large for ack_delay (which should not happen consistently or all the time), then we should ignore ack_delay.

martinthomson · 2018-11-29T03:44:00Z

I thought that we had accepted the costs of integrating ack_delay in calculations (it might be wrong) against the benefits (we get better accounting in the majority of the cases where it is set accurately).

janaiyengar · 2018-11-29T03:47:38Z

In general, yes, but in this case, I don't see a reason to use any part of ack_delay when we know it's incorrect.

…

On Wed, Nov 28, 2018, 7:44 PM Martin Thomson ***@***.*** wrote: I thought that we had accepted the costs of integrating ack_delay in calculations (it might be wrong) against the benefits (we get better accounting in the majority of the cases where it is set accurately). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2060 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKjg1Oht0XwTGilWqn_4uqKhZpD4eg1Lks5uz1gCgaJpZM4Y22J5> .

martinthomson · 2018-11-29T03:49:28Z

Would it be better to disregard ack_delay for all future packets on the connection then?

janaiyengar · 2018-11-29T03:50:52Z

If folks feel strongly about this, I'm fine with it. I don't think it makes sense, but it does seem like I'm in the minority, and if you all think it makes sense, that's what we'll do.

…

On Wed, Nov 28, 2018, 7:47 PM Jana Iyengar ***@***.*** wrote: In general, yes, but in this case, I don't see a reason to use any part of ack_delay when we know it's incorrect. On Wed, Nov 28, 2018, 7:44 PM Martin Thomson ***@***.*** wrote: > I thought that we had accepted the costs of integrating ack_delay in > calculations (it might be wrong) against the benefits (we get better > accounting in the majority of the cases where it is set accurately). > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#2060 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AKjg1Oht0XwTGilWqn_4uqKhZpD4eg1Lks5uz1gCgaJpZM4Y22J5> > . >

martinthomson · 2018-11-29T03:52:49Z

I don't feel strongly. I'm just trying to understand. It's a little odd that we would take ack_delay into account conditionally on a per-packet basis.

janaiyengar · 2018-11-29T04:11:33Z

In general, I think having an inflated RTT is better than a lower than actual RTT, since inflated RTT leads to a less aggressive sender and smaller than actual RTT leads to a more aggressive sender.

Admittedly, it's possible for there to be error when latest_rtt - (inflated) ack_delay > min_rtt. That's the error that I had considered an accepted cost of using ack_delay.

It's fair to ask what should we do about a receiver that we consider might have a bad clock... I wouldn't be opposed to ignoring ack_delay entirely for the rest of the connection... I wonder if there are counter cases that I'm missing though where this ends up being too aggressive. In general, the receiver should be rounding down when reporting delay, not up.

kazuho · 2018-11-29T05:03:53Z

@janaiyengar

I'm concerned that if the ack_delay being reported is too large, I don't know how much of that is in error and how much of that is accurate. The patch assumes that exactly latest_rtt - min_rtt is the correct ack_delay, but there's no evidence of it.

I do not think that that is an issue, because we always use max(srtt, latest_rtt) for loss recovery. Things would work fine as long as SRTT does not become too small.

If it is the case that the clock has a jitter, the observed SRTT using the proposed formula will end up in a value that is either equal to or slightly higher than the true SRTT.

If it is the case that the clock occasionally jumps (to adjust time), the observed SRTT using the proposed formula would also be either equal to or slightly higher than the true SRTT.

So I would assume that the proposed formula has no issues in the cases above.

The problem with the proposed patch is that a client that reports an ack_delay that is too large will automatically cause the sender to always believe that the RTT is not growing at all. (Imagine the case where ack_delay is always reported as MAX_INT. This means that the sender will always end up with latest_rtt = min_rtt.)

I can understand your concern about broken implementations, but I am not sure if the current defense is good enough if it is the case that we need to make care of them. Consider the case of an endpoint that always set ack_delay to 25ms. Assuming that the true RTT is greater than that value, the observed RTT will be 25ms less than the true value.

To put it another way, it seems to me that there is no reliable way of detecting peers sending broken ack_delay values. IMO we should accept that QUIC would not work fine with broken stacks.

Or alternatively, we might want to state that QUIC stacks running with an unreliable clock SHOULD set ack_delay to zero.

ianswett · 2018-11-29T19:39:01Z

FYI, my experience with clocks is they're either noisy, and if so over very short timescales(<1ms) or they jump suddenly. In the case of a sudden jump, any protection from largely ignoring the jump should be fine.

I like Kazuho's suggestion because it changes a case that's currently discontinuous(1us extra max ack delay and the RTT sample can jump tens of ms) into one that's continuous. Discontinuities in congestion control and loss recovery tend to lead to weird corner cases. The current pseudocode is what gQUIC currently does, which is why it's there, but especially now that max ack delay is explicitly communicated, I think Kazuho's suggestion is a good one.

There is the question of what to do if the specified ack_delay is larger than the value specified in transport params. Possibly we should only adjust by min(ack_delay, max_ack_delay), and then if the alarm is always waking up late, SRTT will include that even though max_ack_delay is not?

kazuho · 2018-11-30T02:03:12Z

Yeah. Discontinuity sounds the best word to explain the issue in the current approach.

There is the question of what to do if the specified ack_delay is larger than the value specified in transport params. Possibly we should only adjust by min(ack_delay, max_ack_delay), and then if the alarm is always waking up late, SRTT will include that even though max_ack_delay is not?

I am not sure if min(ack_delay, max_ack_delay) desirable, because ack_delay would typically be greater than max_ack_delay for retransmitted ACKs.

ianswett · 2018-11-30T02:18:35Z

In regards to min(ack_delay, max_ack_delay), it's critical to consider what srtt is used for, which is loss detection and timers(TLP and RTO).

If there is a high rate of ACK loss or if the peer's alarms are persistently waking up late, then the SRTT and/or RTTVar will increase, thereby increasing these timers. If this happens very infrequently, then the EWMA will forget them relatively quickly. I think that seems appropriate?

To think of the pathological case, what if a peer communicates a max_ack_delay of 0? I think it's important the rest of the recovery machinery still functions correctly in that case.

kazuho · 2018-11-30T02:48:59Z

If there is a high rate of ACK loss or if the peer's alarms are persistently waking up late, then the SRTT and/or RTTVar will increase, thereby increasing these timers. If this happens very infrequently, then the EWMA will forget them relatively quickly. I think that seems appropriate?

I had not thought about that, and admittedly I am unfamiliar with loss recovery, however I wonder if that's appropriate.

IIUC, the reason we use X * SRTT where X is slightly greater than one is because we expect an ACK to arrive in SRTT with some jitter.

However, in case of ACK loss, the SRTT using the computation would become too large for the case where there is no loss, or too small for the case where there is a loss. For eaxmaple, in the case of having ACK loss at 33%, SRTT becomes 33% greater than the RTT. I am not sure if that is a desirable behavior.

If we can assume that ACK loss happens infrequently, IMO the correct thing to do would be to ignore the effect of ACK retransmissions.

ianswett · 2018-11-30T02:59:21Z

It's important to consider the fact that typically, even if there is some ACK loss, so many immediate ACKs are being sent(particularly with acking every other packet), the skew is minimal. The interesting case is at the tail where the final ACK is lost and the next ACK arrives a long time afterwards.

kazuho · 2018-11-30T03:02:00Z

I think I might have a simpler case that illustrates the problem of capping the compensation to min(ack_delay, max_ack_delay).

What happens if an endpoint receives an ACK-only packet, sleeps for 10 seconds, then sends some data? The sent packet will contain data and a ACK frame with an ack-delay field of 10 seconds.

Don't we need to subtract 10 seconds in this case?

janaiyengar · 2018-11-30T03:09:13Z

Responding to the most recent couple of comments first: SRTT does not include ack_delay, which is why we add max_ack_delay back into the RTO computation. So, even in the case of ack loss, the SRTT should be the same... the reported ack_delay might keep increasing.

There is the question of what to do if ack_delay gets larger than max_ack_delay, which is why I had argued that should prefer observed ack_delays over explicitly stated ones (in an earlier thread.) I think we had discussed monitoring ack_delays and using that value over time, but left it for later...
ideally, we'd use an EWMA or some such to track actual ack_delays to see if they're increasing, and add that into the RTO computation.

ianswett · 2018-11-30T03:21:08Z

I think there should be a goal of making both the ack_delay always 0 case and the max_ack_delay of 0 case converge to the same behavior?

I think the old approach was problematic because it used a max filter for something that potentially had some loss and tail latency that was not typical, but could permanently drive up tail latency.

As such, I think limiting by no more than max_ack_delay is a good realworld compromise.

janaiyengar · 2018-11-30T03:23:03Z

I understand the argument about sudden increases when the ack_delay is increased... but that is exactly my point as well. It's easy to see a sudden decrease in latest_rtt with the patch when ack_delay is increased.

Consider a clock adjustment at the client where the clock "caught up" with real time (I've seen this happen), and the reported ack_delay is quite large. If the actual RTT had departed a fair bit from min_rtt, then this single misreported ack_delay will cause the sender's estimate of latest_rtt to drop to min_rtt.

As @ianswett notes, what matters is where this erroneous value -- too small or too large -- gets used. It's used in RTO, TLP, and perhaps in pacing. My argument is that having too large a value -- as would happen with the spec -- is better than having too small a value, because that's the conservative thing to do.

For context, we should be mindful that any sudden and occasional increase/decrease in ack_delay is easily absorbed by the RTT filters; that's in fact exactly why the EWMA filter is weighted as it is. Any single sample does not change the SRTT or RTTVAR by much. Individual samples can be noisy. Ultimately, as long as it's an RTT spike, it actually doesn't matter much. It only matters if it's persistent.

janaiyengar · 2018-11-30T03:25:36Z

@ianswett : I agree that we compromised. I was just noting that the ideal solution might be to use an EWMA perhaps, but we're settling for simplicity right now. That has a cost... and these corner cases are that cost I think.

I'm tempted to suggest that we simply drop the RTT samples where the ack_delay does not make sense. This should not be the common case anyways.

kazuho · 2018-11-30T03:40:18Z

Thanks to @ianswett and @janaiyengar for all the conversation.

For connections that are advancing, I now think that what @ianswett says makes sense; capping the compensation to max(ack_delay, max_ack_delay) would not be an issue because ACK loss would not be happening too often in that case.

However, in case of a heavily congested network (or after disruption), capping the number means that we could have an unreasonably high SRTT. Seeing that is why I raised the issue.

Considering that, I think I might prefer @janaiyengar's suggestion to skip updating RTT information based on ACK frames with ack_delays that do not make sense.

janaiyengar · 2018-11-30T03:59:45Z

@kazuho, I see two issues here that we are discussing, and I'll state them explicitly, to ensure that I'm not confused :-)

The issue you opened was about having an ack_delay reported that doesn't make sense, since it leads to a latest_rtt that is too small. This is what I was suggesting that we solve by ignoring the ~~ack_delay that is too large~~ RTT samples when the ack_delay is too large.
The issue both of you have raised about ack_delay being larger than max_ack_delay is an important but separate one. The easiest way out, as @ianswett says, is to let SRTT and RTTVAR go up, by capping the actual ack_delay to min(ack_delay, max_ack_delay). After all, we include only max_ack_delay in the RTO computation... any additional actual ack delays should be accounted for somewhere and SRTT seems like a fine place. Alternatively, we could do an EWMA to manage ack_delay, but that is more mechanism.

Does this seem right?

ianswett · 2018-11-30T14:06:42Z

Yes, those are the two issues, and they're somewhat separate.

For 1, I'm concerned we could ignore samples for a long time under some circumstances. If the RTT is 25ms and a connection is always sending 1 packet and hitting the delayed ack timer, then the min_rtt will be 50ms, not 25ms. And I believe ignoring SRTT samples that are less than min_rtt would result in ignoring every SRTT sample until an immediate ack was sent. Admittedly, we say to ACK crypto packets immediately, but if one used a 1ms timer as 'immediate', I think the same problem results?

For 2, I think putting it into SRTT is the natural thing. Adding another EWMA doesn't seem worthwhile, and in most cases it is very similar to incorporating it into SRTT.

ianswett · 2018-11-30T14:42:55Z

@kazuho up above you were concerned about a 10 second RTT sample due to bundling an ACK of an ACK. That should not be an issue because only when the largest acked is a retransmittable packet should an RTT sample taken.

But now that I look at the text and pseudocode, the retransmittable qualification is missing, so I'll writeup a PR to fix that.

dtikhonov · 2018-11-30T15:08:41Z

@janaiyengar writes:

Consider a clock adjustment at the client where the clock "caught up" with real time (I've seen this happen), and the reported ack_delay is quite large.

I hope you reported this as a bug to the client's implementers: Most operating systems provide an monotonic clock.

kazuho · 2018-11-30T22:38:09Z

@ianswett

@kazuho up above you were concerned about a 10 second RTT sample due to bundling an ACK of an ACK. That should not be an issue because only when the largest acked is a retransmittable packet should an RTT sample taken.

But now that I look at the text and pseudocode, the retransmittable qualification is missing, so I'll writeup a PR to fix that.

Thank you for the clarification and the PR. They make me less worried about the issue :-)

Fixes part of #2060

janaiyengar · 2018-12-05T03:06:58Z

@dtikhonov : The clock jumped up. It was monotonic.

janaiyengar · 2018-12-05T03:43:37Z

@ianswett: Responding to your comment (here) and PR #2062. For (1), yes, that is true, but we have already put a stake in the ground: min_rtt is only driven by what a sender measures and not by reported ack_delay. Under that framework, no SRTT that is smaller than measured min_rtt will be valid. That is a consequence we know and hopefully we should be measuring true min_rtt often enough that this isn't a problem.

Basically, my core argument is this: SRTT is used for loss detection and potentially congestion control and pacing.

A too-small SRTT is more aggressive: it causes too-early loss detection and might cause a congestion controller that uses RTT samples to get a one-off min_rtt sample. It might also cause a pacer to go a bit too fast.
A too-large SRTT is more conservative.

Given that this is a mis-reporting receiver, a natural consequence of mis-reporting should be poor performance, not more aggressive sending.

I still think the sender should ignore a computed latest_rtt sample which seems to be smaller than min_rtt.

ianswett · 2018-12-05T15:47:51Z

There are certainly cases where this does not require a mis-reporting receiver(ie: what I described above), but I think your argument is that you don't care about them? If so, this does make it somewhat critical to send a reasonable number of non-delayed ACKs to ensure min_rtt is discovered relatively quickly.

I completely agree that I don't want to change the definition of min_rtt, FWIW.

janaiyengar · 2018-12-07T01:03:53Z

@ianswett : Sorry, I was referring to a mis-reporting receiver in the case that the min_rtt is known.

In the case you mention, yes, that is a potential condition where a sender might have poor performance too, but I imagine this is not something we would expect to happen for very long, especially given all the handshake packets should be acked immediately. Using a 1ms timer means for immediate means that's just a part of the min_rtt... nothing a sender can do about that.

Basically, I don't think optimistically compensating at the sender for receiver delays is the way to go. If the RTT sample seems broken, I would argue it should be ignored.

ianswett · 2018-12-07T03:02:45Z

As commented on the PR, I'm increasingly thinking we should limit by max_ack_delay and then if it still seems invalid, ignore ack_delay, as in the current text, but not ignore the entire sample, since even without the ack_delay adjustment, we expect the RTT sample to be somewhat useful.

janaiyengar · 2018-12-07T03:23:47Z

Limiting to max_ack_delay sounds fine to me as I've said earlier.

If the RTT sample is used without ack_delay (current spec text) instead of ignoring, this causes the SRTT to increase which is safe. I'm fine with it.

@kazuho?

kazuho · 2018-12-07T03:39:14Z

Limiting to max_ack_delay sounds good to me, especially considering the fact that now we only sample ACK-eliciting packets.

ianswett · 2018-12-07T14:30:10Z

SGTM, closing the PR and this issue then. We can always revisit it if we have some new data that indicates the current text is a problem.

martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -recovery labels Nov 28, 2018

ianswett added a commit that referenced this issue Nov 28, 2018

Adjusting latest RTT by ack delay

4139fee

Fixes #2060

ianswett mentioned this issue Nov 28, 2018

Adjusting latest RTT by ack delay #2062

Closed

ianswett mentioned this issue Nov 30, 2018

No RTT for non-retransmittable packets #2078

Merged

ianswett added a commit that referenced this issue Dec 5, 2018

Update draft-ietf-quic-recovery.md

5307d46

Fixes part of #2060

ianswett mentioned this issue Dec 5, 2018

Limit ack_delay by max_ack_delay #2099

Merged

ianswett closed this as completed Dec 7, 2018

janaiyengar mentioned this issue Dec 13, 2018

ACK Delay interactions with RTT calculation #2137

Closed

mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019

compensation of ack_delay is fragile against errors #2060

compensation of ack_delay is fragile against errors #2060

Comments

kazuho commented Nov 28, 2018 • edited Loading

ianswett commented Nov 28, 2018

janaiyengar commented Nov 29, 2018

martinthomson commented Nov 29, 2018

kazuho commented Nov 29, 2018

janaiyengar commented Nov 29, 2018

martinthomson commented Nov 29, 2018

janaiyengar commented Nov 29, 2018 via email

martinthomson commented Nov 29, 2018

janaiyengar commented Nov 29, 2018 via email

martinthomson commented Nov 29, 2018

janaiyengar commented Nov 29, 2018

kazuho commented Nov 29, 2018

ianswett commented Nov 29, 2018

kazuho commented Nov 30, 2018

ianswett commented Nov 30, 2018

kazuho commented Nov 30, 2018

ianswett commented Nov 30, 2018

kazuho commented Nov 30, 2018

janaiyengar commented Nov 30, 2018

ianswett commented Nov 30, 2018

janaiyengar commented Nov 30, 2018

janaiyengar commented Nov 30, 2018 • edited Loading

kazuho commented Nov 30, 2018

janaiyengar commented Nov 30, 2018 • edited Loading

ianswett commented Nov 30, 2018

ianswett commented Nov 30, 2018

dtikhonov commented Nov 30, 2018

kazuho commented Nov 30, 2018

janaiyengar commented Dec 5, 2018

janaiyengar commented Dec 5, 2018

ianswett commented Dec 5, 2018

janaiyengar commented Dec 7, 2018

ianswett commented Dec 7, 2018

janaiyengar commented Dec 7, 2018

kazuho commented Dec 7, 2018

ianswett commented Dec 7, 2018

kazuho commented Nov 28, 2018 •

edited

Loading

janaiyengar commented Nov 30, 2018 •

edited

Loading

janaiyengar commented Nov 30, 2018 •

edited

Loading