Unify TLP and RTO into Probe Timeout #2114

janaiyengar · 2018-12-11T15:57:19Z

This PR builds on #1974 and simplifies QUIC's timeout-based loss detection mechanisms. Unfortunately, this PR cuts through text in many places, so I'm summarizing the key changes here. The compiled version of the draft, here, might be easier to read than the diffs.

The high-order description is that TLP and RTO are combined in this PR into a single "Probe Timeout" (PTO). Consecutive PTOs are exponentially longer, just like RTO used to be. The rationale for calling this a "Probe" Timeout is that loss detection in QUIC only happens when an ack is received, and this timer and timeout only cause probe packets to be sent out to elicit acks. This unification is made possible by the following constituent changes.

(Issue Eliminating the fixed minimum RTO #1017) There was general agreement in Eliminating the fixed minimum RTO #1017 that adding a max_ack_delay should allow us to remove the minRTO. max_ack_delay has since been added to QUIC in a transport parameter, which allows us to include it in the PTO computation and remove minRTO/minTLP.
Timer granularity is added for both crypto and PTO timeouts, ensuring that even when the computed network RTT might be too small and represented as 0, timeouts are set for at least 1 clock tick. This is in keeping with use of granularity in TCP (RFC 6298). A min granularity of 1ms is recommended.
(Issue TLP and RTO arbitrariness #2166) In line with the resolution above for removing min RTO, min TLP and min crypto timeout values are now removed.
(Issue Verification of RTO no longer necessary #2168) PTO validation has been removed. PTOs cause congestion window collapse to min cwnd only after 3 consecutive PTOs to retain parity with current behavior, where an RTO collapses the window after 2 TLPs.

One side-effect to note is that a single timer will start exponentially backing off right from the first PTO, as against TLP and RTO where the TLP timers did not back off at all, and the RTO timer started exponentially backing off after the TLPs. I can restore this by adjusting the backoff expression in SetLossDetectionTimer(), but I'm not convinced this is necessary.

draft-ietf-quic-recovery.md

nibanks

In general, I like the simplification here. It would be interesting to see the results of an A/B test before and after these changes.

draft-ietf-quic-recovery.md

ianswett

I like this change in general, but I think it makes sense to keep existing behavior more similar to the status quo and allow a sender to do a normal loss-style reduction if the first or second PTO are the ones acknowledged.

draft-ietf-quic-recovery.md

martinthomson

Haven't reviewed, but these might help

draft-ietf-quic-recovery.md

martinthomson · 2018-12-11T23:17:55Z

Does this have an issue?

ianswett

Looking much better.

draft-ietf-quic-recovery.md

ianswett

Specific suggestion in pseudocode to prevent a single packet loss from leading to massive CWND reduction.

draft-ietf-quic-recovery.md

janaiyengar · 2018-12-13T22:20:02Z

I've updated the description text and the PR quite a bit based on the comments. Specifically, I've removed the verification of a timeout, since the behavior with and without it was the same. I've also added a clause that allows for the cwnd to be collapsed to min cwnd only after 2 PTOs, to retain parity with TLP/RTO behavior.

I'll open an issue to discuss the issues this PR addresses. We should have discussion there.

pravb

Worthy change. Really nice that we can get rid of minRTO. Adding some comments.

pravb · 2018-12-14T02:04:35Z

draft-ietf-quic-recovery.md

+flight but an acknowledgement is not received within the expected period of
+time.  A PTO enables a connection to recover from loss of tail packets or acks.
+The PTO algorithm used in QUIC implements the reliability functions of Tail Loss
+Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO


TLP is now subsumed by RACK draft so we should cite that instead.

The original implementation and algorithms preceded RACK, so the benefit of referencing TLP is that it's more specific and time-relevant. But if RACK was an RFC, then we would definitely prefer that. Either works for me.

I think the function we're talking about is TLP, though I'm happy to add and cite RACK separately somewhere. I'll do that separately.

TLP is an expired draft in 2013 and RACK is standards track and active. Linux, FreeBSD and Windows are all implementing the RACK draft and TLP as specified in the RACK draft. I would just cite the RACK draft. There is a chance it may go into last call within the next two IETFs.

pravb · 2018-12-14T02:07:54Z

draft-ietf-quic-recovery.md

+Probe packets sent on a PTO MUST be ack-eliciting.  A probe packet SHOULD carry
+new data when possible.  A probe packet MAY carry retransmitted unacknowledged
+data when new data is unavailable, when flow control does not permit new data to
+be sent, or to opportunistically reduce loss recovery delay.  Implementers MAY


"Implementations MAY" is better

draft-ietf-quic-recovery.md

pravb · 2018-12-14T02:22:40Z

draft-ietf-quic-recovery.md

-acknowledges no packets sent prior to the first retransmission timeout.  In this
-case, the congestion window MUST be reduced to the minimum congestion window and
-slow start is re-entered.
+If two consecutive PTOs have occurred (pto_count is at least 3), the network is


I am not sure why pto_count needs to be 3? Shouldn't it be 2 after two PTOs?

+1, I'm confused. Also, this should be more about the number of PTOs that were lost than the number sent. ie: I could send 3 PTOs, then the network returns to normal, no packets were lost, and the connection continues.

This is based on the number of timeouts that happened, not based on packets sent. We want cwnd collapse at the equivalent of the first RTO's occurrence. This is the third timeout (2 TLP timeouts and then 1 RTO), which means pto_count is at least 3. Or am I doing this math wrong?

To me it seems that "pto_count is at least 3" is correct, however "If two consecutive PTOs have occurred" is incorrect and needs to be changed to "If three...".

IIRC Ian was going to get data on whether we need 2 TLPs or just one is sufficient. FWIW we only do one TLP in Windows TCP. Yes at the least please fix as Kazuho says: "if three consecutive PTOs have occurred"

Ah, right. Ok, fixed. I'll leave it at 3 for now, and I've turned it into a constant, kPersistentCongestion. We can discuss the value in a separate issue.

ianswett

Thanks for the update, this looks very close. I'm concerned about the gap between number of PTOs that have occurred vs the number that were lost, but possibly it doesn't matter much in practice?

ianswett · 2018-12-14T02:52:36Z

draft-ietf-quic-recovery.md

+time.  A PTO enables a connection to recover from loss of tail packets or acks.
+The PTO algorithm used in QUIC implements the reliability functions of Tail Loss
+Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO
+algorithms for TCP {{?RFC5682}}, and the computation is based on TCP's


"the computation" is vague. How about "the timeout computation"?

ianswett · 2018-12-14T02:53:23Z

draft-ietf-quic-recovery.md

@@ -473,8 +474,7 @@ consecutive PTO expiration due to a single packet loss.
 Consecutive PTO periods increase exponentially, and as a result, connection
 recovery latency increases exponentially as packets continue to be dropped in
 the network.  Sending two packets on PTO expiration increases resilience to
-packet drop in both directions, thus reducing the probability of consecutive PTO
-events.
+packet drop, thus reducing the probability of consecutive PTO events.


packet drops

ianswett · 2018-12-14T02:58:02Z

draft-ietf-quic-recovery.md

+flight but an acknowledgement is not received within the expected period of
+time.  A PTO enables a connection to recover from loss of tail packets or acks.
+The PTO algorithm used in QUIC implements the reliability functions of Tail Loss
+Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO


The original implementation and algorithms preceded RACK, so the benefit of referencing TLP is that it's more specific and time-relevant. But if RACK was an RFC, then we would definitely prefer that. Either works for me.

draft-ietf-quic-recovery.md

ianswett · 2018-12-14T03:04:07Z

draft-ietf-quic-recovery.md

-acknowledges no packets sent prior to the first retransmission timeout.  In this
-case, the congestion window MUST be reduced to the minimum congestion window and
-slow start is re-entered.
+If two consecutive PTOs have occurred (pto_count is at least 3), the network is


+1, I'm confused. Also, this should be more about the number of PTOs that were lost than the number sent. ie: I could send 3 PTOs, then the network returns to normal, no packets were lost, and the connection continues.

janaiyengar

Thanks for you comments! Responses inline.

janaiyengar · 2018-12-14T03:24:05Z

draft-ietf-quic-recovery.md

+flight but an acknowledgement is not received within the expected period of
+time.  A PTO enables a connection to recover from loss of tail packets or acks.
+The PTO algorithm used in QUIC implements the reliability functions of Tail Loss
+Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO


I think the function we're talking about is TLP, though I'm happy to add and cite RACK separately somewhere. I'll do that separately.

draft-ietf-quic-recovery.md

janaiyengar · 2018-12-14T03:31:31Z

draft-ietf-quic-recovery.md

-acknowledges no packets sent prior to the first retransmission timeout.  In this
-case, the congestion window MUST be reduced to the minimum congestion window and
-slow start is re-entered.
+If two consecutive PTOs have occurred (pto_count is at least 3), the network is


This is based on the number of timeouts that happened, not based on packets sent. We want cwnd collapse at the equivalent of the first RTO's occurrence. This is the third timeout (2 TLP timeouts and then 1 RTO), which means pto_count is at least 3. Or am I doing this math wrong?

ianswett · 2018-12-14T04:06:46Z

draft-ietf-quic-recovery.md

@@ -1157,6 +1095,9 @@ window.
       congestion_window *= kLossReductionFactor
       congestion_window = max(congestion_window, kMinimumWindow)
       ssthresh = congestion_window
+       // Collapse congestion window if persistent congestion
+       if (pto_count > 2):


2 is a magic number. Previously we said you SHOULD have 2 TLPs, but you could have fewer. I think it makes sense to define this as a constant and explain its reason for existing there?

I commented on the thread above this. Yes we should settle on one or two TLPs. I personally prefer one but unfortunately I don't have data to say if two is better since Windows TCP has been doing just one TLP.

Ok, added constant and explained it.

i think this condition will bring back the issue that i raised previously of a longer timeout.

Consider this case:

1 [acked], 2, 3[pto], 4[pto], 5 [pto]

-> ack 5

now we get into a state

1 [acked], 2 [lost], 3[outstading], 4[outstanding], 5 [acked]

since minCwnd = 2

you end up with consuming the cwnd fully still and will have to wait until another pto to get out of this case.

The previous logic intentionally cleared all packets < rto_verified packet to avoid this which I had raised in a previous issue and is intentional

@siyengar : In this case, by the time 5 is acked, 3 and 4 should be marked as lost by the time-threshold loss detection mechanism, since they're going to be > 1.125 x SRTT in the past.

pravb

Added a few more comments.

kazuho

LGTM modulo
#2114 (comment).

pravb · 2018-12-14T07:08:16Z

draft-ietf-quic-recovery.md

-timer be scheduled for a second time. When the TLP timer expires the second
-time, a second TLP packet is sent, and an RTO timer SHOULD be scheduled {{rto}}.
+~~~
+PTO = max(smoothed_rtt + 4*rttvar + max_ack_delay, kGranularity)


Sorry just noticed another difference. If peer advertises max_ack_delay as 0 (which seems to be legal), then first PTO becomes Srtt + 4 * RttVar, whereas in TCP the first PTO is 2* Srtt. If variance is low this may be too aggressive? Should we also do PTO = max (PTO, 2 *Srtt) ?

What is the rationale in TCP for using 2xSRTT when RTTVAR is available? If variance is low, that still seems like the right thing to do. Why inflate the PTO when you don't need to?

I also just realized that the first PTO is akin to the first TLP. The second PTO would be backed off, so you will automatically get 2 x (SRTT + 4xRTTVAR).

Quoting RACK draft:

Aiming for a PTO value of 2*SRTT allows a sender to wait long enough
to know that an ACK is overdue. Under normal circumstances, i.e. no
losses, an ACK typically arrives in one SRTT. But choosing PTO to be
exactly an SRTT is likely to generate spurious probes given that
network delay variance and even end-system timings can easily push an
ACK to be above an SRTT. We chose PTO to be the next integral
multiple of SRTT.

Similarly, network delay variations and end-system processing
latencies and timer granularities can easily delay ACKs beyond
2*SRTT, so senders SHOULD add at least 2ms to a computed PTO value
(and MAY add more if the sending host OS timer granularity is more
coarse than 1ms).

@pravb: Thanks for the context -- I should have just looked at the RACK draft. So, to be clear, the RACK draft does not use this 2ms when WCDelAckT is used, that is, when it accounts for the worst case delayed ack timer. This is what we expect max_ack_delay to do for us, but as you note, a receiver might send 0 here.

I think we should put the onus of this onto the receiver. A receiver should send the worst case ack delay in max_ack_delay. A receiver that doesn't do that shoots itself in the foot by causing spurious retransmissions at the sender. We may want to add some strong wording about how a receiver should choose max ack delay, I've filed #2186 for that.

Sound reasonable?

That's reasonable. So this is how we'd justify the deviation from TCP:

Include RttVar component in the computation for in network variability.

Require the receiver to include the perturbation for system processing delays. So 2186 needs to add a SHOULD for adding a perturbation value on top of the actual max ack delay. I'll comment there.

janaiyengar

I've added a constant (kPersistentCongestion) and described it. PTAL.

janaiyengar · 2018-12-14T07:04:21Z

draft-ietf-quic-recovery.md

-acknowledges no packets sent prior to the first retransmission timeout.  In this
-case, the congestion window MUST be reduced to the minimum congestion window and
-slow start is re-entered.
+If two consecutive PTOs have occurred (pto_count is at least 3), the network is


Ah, right. Ok, fixed. I'll leave it at 3 for now, and I've turned it into a constant, kPersistentCongestion. We can discuss the value in a separate issue.

janaiyengar · 2018-12-14T07:28:29Z

draft-ietf-quic-recovery.md

@@ -1157,6 +1095,9 @@ window.
       congestion_window *= kLossReductionFactor
       congestion_window = max(congestion_window, kMinimumWindow)
       ssthresh = congestion_window
+       // Collapse congestion window if persistent congestion
+       if (pto_count > 2):


Ok, added constant and explained it.

ianswett · 2018-12-14T14:31:38Z

draft-ietf-quic-recovery.md

+  Loss Probe (TLP) in TCP {{TLP}} {{RACK}}.  Once the number of consecutive PTOs
+  reaches this threshold - that is, persistent congestion is established - the
+  sender responds by collapsing its congestion window, similar to a
+  Retransmission Timeout (RTO) in TCP {{RFC5681}}.  The RECOMMENDED value is 3,


The pseudocode below decreases CWND to kMinimumWindow, which is 2 packets. I think sticking with that makes sense, at least for now. It also requires less text to justify.

The recommendation is 3 for kPersistentCongestion, not for the value of cwnd. I'll change the text a bit to clarify.

marten-seemann

I really like this PR, since it's a big simplification.

The one thing I don't understand is how we can get rid of the (R/P)TO verified. I think I understand @janaiyengar's response to @siyengar's comment in #2114 (comment), i.e. that when receiving the ACK for the PTO, packets sent before the PTO are declared lost because the time-threshold based loss detection kicks in. But shouldn't that also have applied to RTO verified in the first place? Why did we need RTO verified in the first place then?

marten-seemann · 2018-12-14T17:43:13Z

I think I found the answer to my question:
Since in #1974 we make time-threshold based loss recovery mandatory (before that you could choose between packet-threshold and time-threshold), we don't need the RTO verified anymore (see #2168) anyway. This is independent of this PR, this PR just happens to fix multiple issues at the same time.

Sorry for the noise.

janaiyengar · 2018-12-14T20:26:47Z

@marten-seemann : Exactly. This PR builds on #1974, which started the movement in this direction. This PR fixes multiple issues at the same time because the design that comes out at the end is clearer than the one you'd get by fixing the issues piecewise.

janaiyengar · 2018-12-14T20:36:41Z

There's exactly one question that's pending to be resolved now in this PR: @pravb's question about using 2SRTT for initial PTO. I'm happy to open a new issue for that specific question if Praveen thinks we should continue that discussion, but I'd like to merge this soon, as (i) I'm hearing pushback on a detail or two but not the overall PR, which we can handle in separate issues, and (ii) this PR is fast approaching github's limit on comments (hence the unicorn tag).

ianswett

Thanks Jana, looks even better.

nibanks

LGTM

siyengar reviewed Dec 11, 2018

View reviewed changes

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved

nibanks reviewed Dec 11, 2018

View reviewed changes

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved

ianswett reviewed Dec 11, 2018

View reviewed changes

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved

ianswett reviewed Dec 11, 2018

View reviewed changes

martinthomson reviewed Dec 11, 2018

View reviewed changes

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved

draft-ietf-quic-recovery.md Show resolved Hide resolved

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved

martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -recovery labels Dec 11, 2018

ianswett reviewed Dec 12, 2018

View reviewed changes

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved

janaiyengar force-pushed the pto branch from 60cb630 to 4b4eb64 Compare December 13, 2018 20:51

pravb reviewed Dec 14, 2018

View reviewed changes

ianswett reviewed Dec 14, 2018

View reviewed changes

janaiyengar commented Dec 14, 2018

View reviewed changes

janaiyengar changed the title ~~Simplify TLP and RTO into Probe Timeout~~ Unify TLP and RTO into Probe Timeout Dec 14, 2018

ianswett approved these changes Dec 14, 2018

View reviewed changes

ianswett reviewed Dec 14, 2018

View reviewed changes

pravb approved these changes Dec 14, 2018

View reviewed changes

kazuho approved these changes Dec 14, 2018

View reviewed changes

pravb reviewed Dec 14, 2018

View reviewed changes

janaiyengar commented Dec 14, 2018

View reviewed changes

ianswett reviewed Dec 14, 2018

View reviewed changes

marten-seemann reviewed Dec 14, 2018

View reviewed changes

janaiyengar added merge-imminent labels Dec 14, 2018

ianswett approved these changes Dec 14, 2018

View reviewed changes

nibanks approved these changes Dec 14, 2018

View reviewed changes

janaiyengar added 15 commits December 14, 2018 13:02

sync

665df18

first cut of mechanism

60d97f3

complete text

63688bc

pseudocode changes

3645ab6

ian/subodh/nick comments

37b100f

comments

e81a94d

Unify post-PTO behavior and allow 2 PTOs before collapsing cwnd

2b438da

collapse cwnd after 3 PTOs not 2

9d6378d

pravb and ian comments

5947f7c

kGranularity not granularity

e1bfab6

add RACK reference to TLP

68d8b4a

add kPersistentCongestion

6cf3d05

rewording

6d22e20

minor change to kPersistentCongestionThreshold

ea1da9f

Add changelog

de326a0

janaiyengar force-pushed the pto branch from f77ffa3 to de326a0 Compare December 14, 2018 21:06

janaiyengar merged commit 7cf0d84 into master Dec 15, 2018

This was referenced Dec 15, 2018

Verification of RTO no longer necessary #2168

Closed

Eliminating the fixed minimum RTO #1017

Closed

marten-seemann mentioned this pull request Dec 15, 2018

unify TLP and RTO quic-go/quic-go#1673

Closed

janaiyengar mentioned this pull request Dec 16, 2018

Discuss considerations for max_ack_delay #2186

Closed

martinthomson removed the merge-imminent label Dec 18, 2018

marten-seemann mentioned this pull request Dec 25, 2018

persistent congestion is more aggressive than RTO verified #2255

Closed

MikeBishop deleted the pto branch February 6, 2019 00:11

Unify TLP and RTO into Probe Timeout #2114

Unify TLP and RTO into Probe Timeout #2114

Conversation

janaiyengar commented Dec 11, 2018 • edited Loading

nibanks left a comment

Choose a reason for hiding this comment

ianswett left a comment

Choose a reason for hiding this comment

martinthomson left a comment

Choose a reason for hiding this comment

martinthomson commented Dec 11, 2018

ianswett left a comment

Choose a reason for hiding this comment

ianswett left a comment

Choose a reason for hiding this comment

janaiyengar commented Dec 13, 2018

pravb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pravb Dec 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ianswett left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janaiyengar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siyengar Dec 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pravb left a comment

Choose a reason for hiding this comment

kazuho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janaiyengar Dec 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janaiyengar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marten-seemann left a comment

Choose a reason for hiding this comment

marten-seemann commented Dec 14, 2018

janaiyengar commented Dec 14, 2018

janaiyengar commented Dec 14, 2018

ianswett left a comment

Choose a reason for hiding this comment

nibanks left a comment

Choose a reason for hiding this comment

janaiyengar commented Dec 11, 2018 •

edited

Loading

pravb Dec 14, 2018 •

edited

Loading

siyengar Dec 14, 2018 •

edited

Loading

janaiyengar Dec 14, 2018 •

edited

Loading