Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify TLP and RTO into Probe Timeout #2114

Merged
merged 15 commits into from
Dec 15, 2018
Merged

Unify TLP and RTO into Probe Timeout #2114

merged 15 commits into from
Dec 15, 2018

Conversation

janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Dec 11, 2018

Closes #2166, #2168, #1017.

This PR builds on #1974 and simplifies QUIC's timeout-based loss detection mechanisms. Unfortunately, this PR cuts through text in many places, so I'm summarizing the key changes here. The compiled version of the draft, here, might be easier to read than the diffs.

The high-order description is that TLP and RTO are combined in this PR into a single "Probe Timeout" (PTO). Consecutive PTOs are exponentially longer, just like RTO used to be. The rationale for calling this a "Probe" Timeout is that loss detection in QUIC only happens when an ack is received, and this timer and timeout only cause probe packets to be sent out to elicit acks. This unification is made possible by the following constituent changes.

  • (Issue Eliminating the fixed minimum RTO #1017) There was general agreement in Eliminating the fixed minimum RTO #1017 that adding a max_ack_delay should allow us to remove the minRTO. max_ack_delay has since been added to QUIC in a transport parameter, which allows us to include it in the PTO computation and remove minRTO/minTLP.

  • Timer granularity is added for both crypto and PTO timeouts, ensuring that even when the computed network RTT might be too small and represented as 0, timeouts are set for at least 1 clock tick. This is in keeping with use of granularity in TCP (RFC 6298). A min granularity of 1ms is recommended.

  • (Issue TLP and RTO arbitrariness #2166) In line with the resolution above for removing min RTO, min TLP and min crypto timeout values are now removed.

  • (Issue Verification of RTO no longer necessary #2168) PTO validation has been removed. PTOs cause congestion window collapse to min cwnd only after 3 consecutive PTOs to retain parity with current behavior, where an RTO collapses the window after 2 TLPs.

One side-effect to note is that a single timer will start exponentially backing off right from the first PTO, as against TLP and RTO where the TLP timers did not back off at all, and the RTO timer started exponentially backing off after the TLPs. I can restore this by adjusting the backoff expression in SetLossDetectionTimer(), but I'm not convinced this is necessary.

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
Copy link
Member

@nibanks nibanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I like the simplification here. It would be interesting to see the results of an A/B test before and after these changes.

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change in general, but I think it makes sense to keep existing behavior more similar to the status quo and allow a sender to do a normal loss-style reduction if the first or second PTO are the ones acknowledged.

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Show resolved Hide resolved
draft-ietf-quic-recovery.md Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't reviewed, but these might help

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -recovery labels Dec 11, 2018
@martinthomson
Copy link
Member

Does this have an issue?

Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better.

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specific suggestion in pseudocode to prevent a single packet loss from leading to massive CWND reduction.

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
@janaiyengar
Copy link
Contributor Author

I've updated the description text and the PR quite a bit based on the comments. Specifically, I've removed the verification of a timeout, since the behavior with and without it was the same. I've also added a clause that allows for the cwnd to be collapsed to min cwnd only after 2 PTOs, to retain parity with TLP/RTO behavior.

I'll open an issue to discuss the issues this PR addresses. We should have discussion there.

Copy link

@pravb pravb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worthy change. Really nice that we can get rid of minRTO. Adding some comments.

flight but an acknowledgement is not received within the expected period of
time. A PTO enables a connection to recover from loss of tail packets or acks.
The PTO algorithm used in QUIC implements the reliability functions of Tail Loss
Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLP is now subsumed by RACK draft so we should cite that instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original implementation and algorithms preceded RACK, so the benefit of referencing TLP is that it's more specific and time-relevant. But if RACK was an RFC, then we would definitely prefer that. Either works for me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function we're talking about is TLP, though I'm happy to add and cite RACK separately somewhere. I'll do that separately.

Copy link

@pravb pravb Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLP is an expired draft in 2013 and RACK is standards track and active. Linux, FreeBSD and Windows are all implementing the RACK draft and TLP as specified in the RACK draft. I would just cite the RACK draft. There is a chance it may go into last call within the next two IETFs.

Probe packets sent on a PTO MUST be ack-eliciting. A probe packet SHOULD carry
new data when possible. A probe packet MAY carry retransmitted unacknowledged
data when new data is unavailable, when flow control does not permit new data to
be sent, or to opportunistically reduce loss recovery delay. Implementers MAY
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Implementations MAY" is better

draft-ietf-quic-recovery.md Show resolved Hide resolved
acknowledges no packets sent prior to the first retransmission timeout. In this
case, the congestion window MUST be reduced to the minimum congestion window and
slow start is re-entered.
If two consecutive PTOs have occurred (pto_count is at least 3), the network is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why pto_count needs to be 3? Shouldn't it be 2 after two PTOs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I'm confused. Also, this should be more about the number of PTOs that were lost than the number sent. ie: I could send 3 PTOs, then the network returns to normal, no packets were lost, and the connection continues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on the number of timeouts that happened, not based on packets sent. We want cwnd collapse at the equivalent of the first RTO's occurrence. This is the third timeout (2 TLP timeouts and then 1 RTO), which means pto_count is at least 3. Or am I doing this math wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it seems that "pto_count is at least 3" is correct, however "If two consecutive PTOs have occurred" is incorrect and needs to be changed to "If three...".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC Ian was going to get data on whether we need 2 TLPs or just one is sufficient. FWIW we only do one TLP in Windows TCP. Yes at the least please fix as Kazuho says: "if three consecutive PTOs have occurred"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Ok, fixed. I'll leave it at 3 for now, and I've turned it into a constant, kPersistentCongestion. We can discuss the value in a separate issue.

Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, this looks very close. I'm concerned about the gap between number of PTOs that have occurred vs the number that were lost, but possibly it doesn't matter much in practice?

time. A PTO enables a connection to recover from loss of tail packets or acks.
The PTO algorithm used in QUIC implements the reliability functions of Tail Loss
Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO
algorithms for TCP {{?RFC5682}}, and the computation is based on TCP's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the computation" is vague. How about "the timeout computation"?

@@ -473,8 +474,7 @@ consecutive PTO expiration due to a single packet loss.
Consecutive PTO periods increase exponentially, and as a result, connection
recovery latency increases exponentially as packets continue to be dropped in
the network. Sending two packets on PTO expiration increases resilience to
packet drop in both directions, thus reducing the probability of consecutive PTO
events.
packet drop, thus reducing the probability of consecutive PTO events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packet drops

flight but an acknowledgement is not received within the expected period of
time. A PTO enables a connection to recover from loss of tail packets or acks.
The PTO algorithm used in QUIC implements the reliability functions of Tail Loss
Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original implementation and algorithms preceded RACK, so the benefit of referencing TLP is that it's more specific and time-relevant. But if RACK was an RFC, then we would definitely prefer that. Either works for me.

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
acknowledges no packets sent prior to the first retransmission timeout. In this
case, the congestion window MUST be reduced to the minimum congestion window and
slow start is re-entered.
If two consecutive PTOs have occurred (pto_count is at least 3), the network is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I'm confused. Also, this should be more about the number of PTOs that were lost than the number sent. ie: I could send 3 PTOs, then the network returns to normal, no packets were lost, and the connection continues.

Copy link
Contributor Author

@janaiyengar janaiyengar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for you comments! Responses inline.

flight but an acknowledgement is not received within the expected period of
time. A PTO enables a connection to recover from loss of tail packets or acks.
The PTO algorithm used in QUIC implements the reliability functions of Tail Loss
Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the function we're talking about is TLP, though I'm happy to add and cite RACK separately somewhere. I'll do that separately.

draft-ietf-quic-recovery.md Show resolved Hide resolved
acknowledges no packets sent prior to the first retransmission timeout. In this
case, the congestion window MUST be reduced to the minimum congestion window and
slow start is re-entered.
If two consecutive PTOs have occurred (pto_count is at least 3), the network is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on the number of timeouts that happened, not based on packets sent. We want cwnd collapse at the equivalent of the first RTO's occurrence. This is the third timeout (2 TLP timeouts and then 1 RTO), which means pto_count is at least 3. Or am I doing this math wrong?

@janaiyengar janaiyengar changed the title Simplify TLP and RTO into Probe Timeout Unify TLP and RTO into Probe Timeout Dec 14, 2018
@@ -1157,6 +1095,9 @@ window.
congestion_window *= kLossReductionFactor
congestion_window = max(congestion_window, kMinimumWindow)
ssthresh = congestion_window
// Collapse congestion window if persistent congestion
if (pto_count > 2):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 is a magic number. Previously we said you SHOULD have 2 TLPs, but you could have fewer. I think it makes sense to define this as a constant and explain its reason for existing there?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on the thread above this. Yes we should settle on one or two TLPs. I personally prefer one but unfortunately I don't have data to say if two is better since Windows TCP has been doing just one TLP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added constant and explained it.

Copy link

@siyengar siyengar Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this condition will bring back the issue that i raised previously of a longer timeout.

Consider this case:

1 [acked], 2, 3[pto], 4[pto], 5 [pto]

-> ack 5

now we get into a state

1 [acked], 2 [lost], 3[outstading], 4[outstanding], 5 [acked]

since minCwnd = 2

you end up with consuming the cwnd fully still and will have to wait until another pto to get out of this case.

The previous logic intentionally cleared all packets < rto_verified packet to avoid this which I had raised in a previous issue and is intentional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siyengar : In this case, by the time 5 is acked, 3 and 4 should be marked as lost by the time-threshold loss detection mechanism, since they're going to be > 1.125 x SRTT in the past.

Copy link

@pravb pravb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few more comments.

Copy link
Member

@kazuho kazuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo
#2114 (comment).

timer be scheduled for a second time. When the TLP timer expires the second
time, a second TLP packet is sent, and an RTO timer SHOULD be scheduled {{rto}}.
~~~
PTO = max(smoothed_rtt + 4*rttvar + max_ack_delay, kGranularity)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry just noticed another difference. If peer advertises max_ack_delay as 0 (which seems to be legal), then first PTO becomes Srtt + 4 * RttVar, whereas in TCP the first PTO is 2* Srtt. If variance is low this may be too aggressive? Should we also do PTO = max (PTO, 2 *Srtt) ?

Copy link
Contributor Author

@janaiyengar janaiyengar Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale in TCP for using 2xSRTT when RTTVAR is available? If variance is low, that still seems like the right thing to do. Why inflate the PTO when you don't need to?

I also just realized that the first PTO is akin to the first TLP. The second PTO would be backed off, so you will automatically get 2 x (SRTT + 4xRTTVAR).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quoting RACK draft:

Aiming for a PTO value of 2*SRTT allows a sender to wait long enough
to know that an ACK is overdue. Under normal circumstances, i.e. no
losses, an ACK typically arrives in one SRTT. But choosing PTO to be
exactly an SRTT is likely to generate spurious probes given that
network delay variance and even end-system timings can easily push an
ACK to be above an SRTT. We chose PTO to be the next integral
multiple of SRTT.

Similarly, network delay variations and end-system processing
latencies and timer granularities can easily delay ACKs beyond
2*SRTT, so senders SHOULD add at least 2ms to a computed PTO value
(and MAY add more if the sending host OS timer granularity is more
coarse than 1ms).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pravb: Thanks for the context -- I should have just looked at the RACK draft. So, to be clear, the RACK draft does not use this 2ms when WCDelAckT is used, that is, when it accounts for the worst case delayed ack timer. This is what we expect max_ack_delay to do for us, but as you note, a receiver might send 0 here.

I think we should put the onus of this onto the receiver. A receiver should send the worst case ack delay in max_ack_delay. A receiver that doesn't do that shoots itself in the foot by causing spurious retransmissions at the sender. We may want to add some strong wording about how a receiver should choose max ack delay, I've filed #2186 for that.

Sound reasonable?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable. So this is how we'd justify the deviation from TCP:

  1. Include RttVar component in the computation for in network variability.
  2. Require the receiver to include the perturbation for system processing delays. So 2186 needs to add a SHOULD for adding a perturbation value on top of the actual max ack delay. I'll comment there.

Copy link
Contributor Author

@janaiyengar janaiyengar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a constant (kPersistentCongestion) and described it. PTAL.

acknowledges no packets sent prior to the first retransmission timeout. In this
case, the congestion window MUST be reduced to the minimum congestion window and
slow start is re-entered.
If two consecutive PTOs have occurred (pto_count is at least 3), the network is
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Ok, fixed. I'll leave it at 3 for now, and I've turned it into a constant, kPersistentCongestion. We can discuss the value in a separate issue.

@@ -1157,6 +1095,9 @@ window.
congestion_window *= kLossReductionFactor
congestion_window = max(congestion_window, kMinimumWindow)
ssthresh = congestion_window
// Collapse congestion window if persistent congestion
if (pto_count > 2):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, added constant and explained it.

Loss Probe (TLP) in TCP {{TLP}} {{RACK}}. Once the number of consecutive PTOs
reaches this threshold - that is, persistent congestion is established - the
sender responds by collapsing its congestion window, similar to a
Retransmission Timeout (RTO) in TCP {{RFC5681}}. The RECOMMENDED value is 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pseudocode below decreases CWND to kMinimumWindow, which is 2 packets. I think sticking with that makes sense, at least for now. It also requires less text to justify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommendation is 3 for kPersistentCongestion, not for the value of cwnd. I'll change the text a bit to clarify.

Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this PR, since it's a big simplification.

The one thing I don't understand is how we can get rid of the (R/P)TO verified. I think I understand @janaiyengar's response to @siyengar's comment in #2114 (comment), i.e. that when receiving the ACK for the PTO, packets sent before the PTO are declared lost because the time-threshold based loss detection kicks in. But shouldn't that also have applied to RTO verified in the first place? Why did we need RTO verified in the first place then?

@marten-seemann
Copy link
Contributor

I think I found the answer to my question:
Since in #1974 we make time-threshold based loss recovery mandatory (before that you could choose between packet-threshold and time-threshold), we don't need the RTO verified anymore (see #2168) anyway. This is independent of this PR, this PR just happens to fix multiple issues at the same time.

Sorry for the noise.

@janaiyengar
Copy link
Contributor Author

@marten-seemann : Exactly. This PR builds on #1974, which started the movement in this direction. This PR fixes multiple issues at the same time because the design that comes out at the end is clearer than the one you'd get by fixing the issues piecewise.

@janaiyengar
Copy link
Contributor Author

There's exactly one question that's pending to be resolved now in this PR: @pravb's question about using 2SRTT for initial PTO. I'm happy to open a new issue for that specific question if Praveen thinks we should continue that discussion, but I'd like to merge this soon, as (i) I'm hearing pushback on a detail or two but not the overall PR, which we can handle in separate issues, and (ii) this PR is fast approaching github's limit on comments (hence the unicorn tag).

Copy link
Contributor

@ianswett ianswett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jana, looks even better.

Copy link
Member

@nibanks nibanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery design An issue that affects the design of the protocol; resolution requires consensus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants