Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compensation of ack_delay is fragile against errors #2060

Closed
kazuho opened this issue Nov 28, 2018 · 36 comments
Closed

compensation of ack_delay is fragile against errors #2060

kazuho opened this issue Nov 28, 2018 · 36 comments
Labels
-recovery design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@kazuho
Copy link
Member

kazuho commented Nov 28, 2018

The logic in UpdateRtt function that compensates for ack_delay is as follows.

    // Adjust for ack delay if it's plausible.
    if (latest_rtt - min_rtt > ack_delay):
      latest_rtt -= ack_delay

The issue here is the delay is completely ignored if ack_delay is a bit greater than the true value due to measurement errors, etc. If this happens on a congested network where ACKs get lost, latest_rtt ends up in an incredibly high value.

IMO this should be latest_rtt = max(latest_rtt - ack_delay, min_rtt).

@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -recovery labels Nov 28, 2018
@ianswett
Copy link
Contributor

Thanks, I think that's a good suggestion. Writing a PR now.

@janaiyengar
Copy link
Contributor

You still want to have the if check, to avoid a large value due to overflow. Which unfortunately leaves us at the same place. I think what you want is:

    if (latest_rtt - min_rtt > ack_delay)
        latest_rtt -= ack_delay
    else
        latest_rtt = min_rtt

That said, I'm not sure that this is a good change. Basically if you have a bad clock at the receiver, this means that you constantly underestimate the latest RTT sample. That is bad for two reasons: (i) you miss any queues that might be building up because you underestimate latest_rtt, and (ii) the smoothed_rtt value that might be used for pacing uses this underestimated value.

OTOH, in the current world, if the receiver is consistently bad in reporting ack_delay, the sender becomes less aggressive than it should be, since the latest_rtt is larger than it should be. Pacing rate is lower, and any sender queueing estimate that uses latest_rtt hits detects a queue when there isn't one. This isn't unsafe, this is just poor performance, which should be expected for a receiver with a poor clock.

@martinthomson
Copy link
Member

That code produces the same outcome as Ian's patch, though it's 4 lines instead of 1. I tend to think that this sort of defensive guard is useful though. If a peer under- or over-reports the ACK delay, that's going to produce bad results, but someone with a clock that much jank is going to have a bad time no matter what.

@kazuho
Copy link
Member Author

kazuho commented Nov 29, 2018

@janaiyengar

I think what you want is:

   if (latest_rtt - min_rtt > ack_delay)
       latest_rtt -= ack_delay
   else
       latest_rtt = min_rtt

Isn't that mathematically equivalent to latest_rtt = max(latest_rtt - ack_delay, min_rtt)?

Basically if you have a bad clock at the receiver, this means that you constantly underestimate the latest RTT sample.

What type of behavior are you assuming for a "bad clock"?

The issue with the current approach is that it considers the true ack_delay to be zero when the value reported by the peer is huge.

Assuming that we cannot tell how good the clock of the peer (or the endpoint) is, we should round ack_delay to the nearest reasonable value. That is, if ack_delay is too large so that latest_rtt becomes below min_rtt, adjust ack_delay to latest_rtt - min_rtt. That is what the issue proposes.

@janaiyengar
Copy link
Contributor

@kazuho @martinthomson : Mathematically, yes, but I wanted to protect against overflow.

@kazuho: I'm concerned that if the ack_delay being reported is too large, I don't know how much of that is in error and how much of that is accurate. The patch assumes that exactly latest_rtt - min_rtt is the correct ack_delay, but there's no evidence of it. The problem with the proposed patch is that a client that reports an ack_delay that is too large will automatically cause the sender to always believe that the RTT is not growing at all. (Imagine the case where ack_delay is always reported as MAX_INT. This means that the sender will always end up with latest_rtt = min_rtt.)

I'm suggesting that if there's an indication that the peer is feeding in values that are too large for ack_delay (which should not happen consistently or all the time), then we should ignore ack_delay.

@martinthomson
Copy link
Member

I thought that we had accepted the costs of integrating ack_delay in calculations (it might be wrong) against the benefits (we get better accounting in the majority of the cases where it is set accurately).

@janaiyengar
Copy link
Contributor

janaiyengar commented Nov 29, 2018 via email

@martinthomson
Copy link
Member

Would it be better to disregard ack_delay for all future packets on the connection then?

@janaiyengar
Copy link
Contributor

janaiyengar commented Nov 29, 2018 via email

@martinthomson
Copy link
Member

I don't feel strongly. I'm just trying to understand. It's a little odd that we would take ack_delay into account conditionally on a per-packet basis.

@janaiyengar
Copy link
Contributor

In general, I think having an inflated RTT is better than a lower than actual RTT, since inflated RTT leads to a less aggressive sender and smaller than actual RTT leads to a more aggressive sender.

Admittedly, it's possible for there to be error when latest_rtt - (inflated) ack_delay > min_rtt. That's the error that I had considered an accepted cost of using ack_delay.

It's fair to ask what should we do about a receiver that we consider might have a bad clock... I wouldn't be opposed to ignoring ack_delay entirely for the rest of the connection... I wonder if there are counter cases that I'm missing though where this ends up being too aggressive. In general, the receiver should be rounding down when reporting delay, not up.

@kazuho
Copy link
Member Author

kazuho commented Nov 29, 2018

@janaiyengar

I'm concerned that if the ack_delay being reported is too large, I don't know how much of that is in error and how much of that is accurate. The patch assumes that exactly latest_rtt - min_rtt is the correct ack_delay, but there's no evidence of it.

I do not think that that is an issue, because we always use max(srtt, latest_rtt) for loss recovery. Things would work fine as long as SRTT does not become too small.

If it is the case that the clock has a jitter, the observed SRTT using the proposed formula will end up in a value that is either equal to or slightly higher than the true SRTT.

If it is the case that the clock occasionally jumps (to adjust time), the observed SRTT using the proposed formula would also be either equal to or slightly higher than the true SRTT.

So I would assume that the proposed formula has no issues in the cases above.

The problem with the proposed patch is that a client that reports an ack_delay that is too large will automatically cause the sender to always believe that the RTT is not growing at all. (Imagine the case where ack_delay is always reported as MAX_INT. This means that the sender will always end up with latest_rtt = min_rtt.)

I can understand your concern about broken implementations, but I am not sure if the current defense is good enough if it is the case that we need to make care of them. Consider the case of an endpoint that always set ack_delay to 25ms. Assuming that the true RTT is greater than that value, the observed RTT will be 25ms less than the true value.

To put it another way, it seems to me that there is no reliable way of detecting peers sending broken ack_delay values. IMO we should accept that QUIC would not work fine with broken stacks.

Or alternatively, we might want to state that QUIC stacks running with an unreliable clock SHOULD set ack_delay to zero.

@ianswett
Copy link
Contributor

FYI, my experience with clocks is they're either noisy, and if so over very short timescales(<1ms) or they jump suddenly. In the case of a sudden jump, any protection from largely ignoring the jump should be fine.

I like Kazuho's suggestion because it changes a case that's currently discontinuous(1us extra max ack delay and the RTT sample can jump tens of ms) into one that's continuous. Discontinuities in congestion control and loss recovery tend to lead to weird corner cases. The current pseudocode is what gQUIC currently does, which is why it's there, but especially now that max ack delay is explicitly communicated, I think Kazuho's suggestion is a good one.

There is the question of what to do if the specified ack_delay is larger than the value specified in transport params. Possibly we should only adjust by min(ack_delay, max_ack_delay), and then if the alarm is always waking up late, SRTT will include that even though max_ack_delay is not?

@kazuho
Copy link
Member Author

kazuho commented Nov 30, 2018

Yeah. Discontinuity sounds the best word to explain the issue in the current approach.

There is the question of what to do if the specified ack_delay is larger than the value specified in transport params. Possibly we should only adjust by min(ack_delay, max_ack_delay), and then if the alarm is always waking up late, SRTT will include that even though max_ack_delay is not?

I am not sure if min(ack_delay, max_ack_delay) desirable, because ack_delay would typically be greater than max_ack_delay for retransmitted ACKs.

@ianswett
Copy link
Contributor

In regards to min(ack_delay, max_ack_delay), it's critical to consider what srtt is used for, which is loss detection and timers(TLP and RTO).

If there is a high rate of ACK loss or if the peer's alarms are persistently waking up late, then the SRTT and/or RTTVar will increase, thereby increasing these timers. If this happens very infrequently, then the EWMA will forget them relatively quickly. I think that seems appropriate?

To think of the pathological case, what if a peer communicates a max_ack_delay of 0? I think it's important the rest of the recovery machinery still functions correctly in that case.

@kazuho
Copy link
Member Author

kazuho commented Nov 30, 2018

If there is a high rate of ACK loss or if the peer's alarms are persistently waking up late, then the SRTT and/or RTTVar will increase, thereby increasing these timers. If this happens very infrequently, then the EWMA will forget them relatively quickly. I think that seems appropriate?

I had not thought about that, and admittedly I am unfamiliar with loss recovery, however I wonder if that's appropriate.

IIUC, the reason we use X * SRTT where X is slightly greater than one is because we expect an ACK to arrive in SRTT with some jitter.

However, in case of ACK loss, the SRTT using the computation would become too large for the case where there is no loss, or too small for the case where there is a loss. For eaxmaple, in the case of having ACK loss at 33%, SRTT becomes 33% greater than the RTT. I am not sure if that is a desirable behavior.

If we can assume that ACK loss happens infrequently, IMO the correct thing to do would be to ignore the effect of ACK retransmissions.

loss - 1

@ianswett
Copy link
Contributor

It's important to consider the fact that typically, even if there is some ACK loss, so many immediate ACKs are being sent(particularly with acking every other packet), the skew is minimal. The interesting case is at the tail where the final ACK is lost and the next ACK arrives a long time afterwards.

@kazuho
Copy link
Member Author

kazuho commented Nov 30, 2018

I think I might have a simpler case that illustrates the problem of capping the compensation to min(ack_delay, max_ack_delay).

What happens if an endpoint receives an ACK-only packet, sleeps for 10 seconds, then sends some data? The sent packet will contain data and a ACK frame with an ack-delay field of 10 seconds.

Don't we need to subtract 10 seconds in this case?

@janaiyengar
Copy link
Contributor

Responding to the most recent couple of comments first: SRTT does not include ack_delay, which is why we add max_ack_delay back into the RTO computation. So, even in the case of ack loss, the SRTT should be the same... the reported ack_delay might keep increasing.

There is the question of what to do if ack_delay gets larger than max_ack_delay, which is why I had argued that should prefer observed ack_delays over explicitly stated ones (in an earlier thread.) I think we had discussed monitoring ack_delays and using that value over time, but left it for later...
ideally, we'd use an EWMA or some such to track actual ack_delays to see if they're increasing, and add that into the RTO computation.

@ianswett
Copy link
Contributor

I think there should be a goal of making both the ack_delay always 0 case and the max_ack_delay of 0 case converge to the same behavior?

I think the old approach was problematic because it used a max filter for something that potentially had some loss and tail latency that was not typical, but could permanently drive up tail latency.

As such, I think limiting by no more than max_ack_delay is a good realworld compromise.

@janaiyengar
Copy link
Contributor

I understand the argument about sudden increases when the ack_delay is increased... but that is exactly my point as well. It's easy to see a sudden decrease in latest_rtt with the patch when ack_delay is increased.

Consider a clock adjustment at the client where the clock "caught up" with real time (I've seen this happen), and the reported ack_delay is quite large. If the actual RTT had departed a fair bit from min_rtt, then this single misreported ack_delay will cause the sender's estimate of latest_rtt to drop to min_rtt.

As @ianswett notes, what matters is where this erroneous value -- too small or too large -- gets used. It's used in RTO, TLP, and perhaps in pacing. My argument is that having too large a value -- as would happen with the spec -- is better than having too small a value, because that's the conservative thing to do.

For context, we should be mindful that any sudden and occasional increase/decrease in ack_delay is easily absorbed by the RTT filters; that's in fact exactly why the EWMA filter is weighted as it is. Any single sample does not change the SRTT or RTTVAR by much. Individual samples can be noisy. Ultimately, as long as it's an RTT spike, it actually doesn't matter much. It only matters if it's persistent.

@janaiyengar
Copy link
Contributor

janaiyengar commented Nov 30, 2018

@ianswett : I agree that we compromised. I was just noting that the ideal solution might be to use an EWMA perhaps, but we're settling for simplicity right now. That has a cost... and these corner cases are that cost I think.

I'm tempted to suggest that we simply drop the RTT samples where the ack_delay does not make sense. This should not be the common case anyways.

@kazuho
Copy link
Member Author

kazuho commented Nov 30, 2018

Thanks to @ianswett and @janaiyengar for all the conversation.

For connections that are advancing, I now think that what @ianswett says makes sense; capping the compensation to max(ack_delay, max_ack_delay) would not be an issue because ACK loss would not be happening too often in that case.

However, in case of a heavily congested network (or after disruption), capping the number means that we could have an unreasonably high SRTT. Seeing that is why I raised the issue.

Considering that, I think I might prefer @janaiyengar's suggestion to skip updating RTT information based on ACK frames with ack_delays that do not make sense.

@janaiyengar
Copy link
Contributor

janaiyengar commented Nov 30, 2018

@kazuho, I see two issues here that we are discussing, and I'll state them explicitly, to ensure that I'm not confused :-)

  1. The issue you opened was about having an ack_delay reported that doesn't make sense, since it leads to a latest_rtt that is too small. This is what I was suggesting that we solve by ignoring the ack_delay that is too large RTT samples when the ack_delay is too large.

  2. The issue both of you have raised about ack_delay being larger than max_ack_delay is an important but separate one. The easiest way out, as @ianswett says, is to let SRTT and RTTVAR go up, by capping the actual ack_delay to min(ack_delay, max_ack_delay). After all, we include only max_ack_delay in the RTO computation... any additional actual ack delays should be accounted for somewhere and SRTT seems like a fine place. Alternatively, we could do an EWMA to manage ack_delay, but that is more mechanism.

Does this seem right?

@ianswett
Copy link
Contributor

Yes, those are the two issues, and they're somewhat separate.

For 1, I'm concerned we could ignore samples for a long time under some circumstances. If the RTT is 25ms and a connection is always sending 1 packet and hitting the delayed ack timer, then the min_rtt will be 50ms, not 25ms. And I believe ignoring SRTT samples that are less than min_rtt would result in ignoring every SRTT sample until an immediate ack was sent. Admittedly, we say to ACK crypto packets immediately, but if one used a 1ms timer as 'immediate', I think the same problem results?

For 2, I think putting it into SRTT is the natural thing. Adding another EWMA doesn't seem worthwhile, and in most cases it is very similar to incorporating it into SRTT.

@ianswett
Copy link
Contributor

@kazuho up above you were concerned about a 10 second RTT sample due to bundling an ACK of an ACK. That should not be an issue because only when the largest acked is a retransmittable packet should an RTT sample taken.

But now that I look at the text and pseudocode, the retransmittable qualification is missing, so I'll writeup a PR to fix that.

@dtikhonov
Copy link
Member

@janaiyengar writes:

Consider a clock adjustment at the client where the clock "caught up" with real time (I've seen this happen), and the reported ack_delay is quite large.

I hope you reported this as a bug to the client's implementers: Most operating systems provide an monotonic clock.

@kazuho
Copy link
Member Author

kazuho commented Nov 30, 2018

@ianswett

@kazuho up above you were concerned about a 10 second RTT sample due to bundling an ACK of an ACK. That should not be an issue because only when the largest acked is a retransmittable packet should an RTT sample taken.

But now that I look at the text and pseudocode, the retransmittable qualification is missing, so I'll writeup a PR to fix that.

Thank you for the clarification and the PR. They make me less worried about the issue :-)

ianswett added a commit that referenced this issue Dec 5, 2018
@janaiyengar
Copy link
Contributor

@dtikhonov : The clock jumped up. It was monotonic.

@janaiyengar
Copy link
Contributor

@ianswett: Responding to your comment (here) and PR #2062. For (1), yes, that is true, but we have already put a stake in the ground: min_rtt is only driven by what a sender measures and not by reported ack_delay. Under that framework, no SRTT that is smaller than measured min_rtt will be valid. That is a consequence we know and hopefully we should be measuring true min_rtt often enough that this isn't a problem.

Basically, my core argument is this: SRTT is used for loss detection and potentially congestion control and pacing.

  • A too-small SRTT is more aggressive: it causes too-early loss detection and might cause a congestion controller that uses RTT samples to get a one-off min_rtt sample. It might also cause a pacer to go a bit too fast.
  • A too-large SRTT is more conservative.

Given that this is a mis-reporting receiver, a natural consequence of mis-reporting should be poor performance, not more aggressive sending.

I still think the sender should ignore a computed latest_rtt sample which seems to be smaller than min_rtt.

@ianswett
Copy link
Contributor

ianswett commented Dec 5, 2018

There are certainly cases where this does not require a mis-reporting receiver(ie: what I described above), but I think your argument is that you don't care about them? If so, this does make it somewhat critical to send a reasonable number of non-delayed ACKs to ensure min_rtt is discovered relatively quickly.

I completely agree that I don't want to change the definition of min_rtt, FWIW.

@janaiyengar
Copy link
Contributor

@ianswett : Sorry, I was referring to a mis-reporting receiver in the case that the min_rtt is known.

In the case you mention, yes, that is a potential condition where a sender might have poor performance too, but I imagine this is not something we would expect to happen for very long, especially given all the handshake packets should be acked immediately. Using a 1ms timer means for immediate means that's just a part of the min_rtt... nothing a sender can do about that.

Basically, I don't think optimistically compensating at the sender for receiver delays is the way to go. If the RTT sample seems broken, I would argue it should be ignored.

@ianswett
Copy link
Contributor

ianswett commented Dec 7, 2018

As commented on the PR, I'm increasingly thinking we should limit by max_ack_delay and then if it still seems invalid, ignore ack_delay, as in the current text, but not ignore the entire sample, since even without the ack_delay adjustment, we expect the RTT sample to be somewhat useful.

@janaiyengar
Copy link
Contributor

Limiting to max_ack_delay sounds fine to me as I've said earlier.

If the RTT sample is used without ack_delay (current spec text) instead of ignoring, this causes the SRTT to increase which is safe. I'm fine with it.

@kazuho?

@kazuho
Copy link
Member Author

kazuho commented Dec 7, 2018

Limiting to max_ack_delay sounds good to me, especially considering the fact that now we only sample ACK-eliciting packets.

@ianswett
Copy link
Contributor

ianswett commented Dec 7, 2018

SGTM, closing the PR and this issue then. We can always revisit it if we have some new data that indicates the current text is a problem.

@ianswett ianswett closed this as completed Dec 7, 2018
@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

No branches or pull requests

6 participants