-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify TLP and RTO into Probe Timeout #2114
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I like the simplification here. It would be interesting to see the results of an A/B test before and after these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change in general, but I think it makes sense to keep existing behavior more similar to the status quo and allow a sender to do a normal loss-style reduction if the first or second PTO are the ones acknowledged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't reviewed, but these might help
Does this have an issue? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specific suggestion in pseudocode to prevent a single packet loss from leading to massive CWND reduction.
I've updated the description text and the PR quite a bit based on the comments. Specifically, I've removed the verification of a timeout, since the behavior with and without it was the same. I've also added a clause that allows for the cwnd to be collapsed to min cwnd only after 2 PTOs, to retain parity with TLP/RTO behavior. I'll open an issue to discuss the issues this PR addresses. We should have discussion there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worthy change. Really nice that we can get rid of minRTO. Adding some comments.
draft-ietf-quic-recovery.md
Outdated
flight but an acknowledgement is not received within the expected period of | ||
time. A PTO enables a connection to recover from loss of tail packets or acks. | ||
The PTO algorithm used in QUIC implements the reliability functions of Tail Loss | ||
Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLP is now subsumed by RACK draft so we should cite that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original implementation and algorithms preceded RACK, so the benefit of referencing TLP is that it's more specific and time-relevant. But if RACK was an RFC, then we would definitely prefer that. Either works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function we're talking about is TLP, though I'm happy to add and cite RACK separately somewhere. I'll do that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLP is an expired draft in 2013 and RACK is standards track and active. Linux, FreeBSD and Windows are all implementing the RACK draft and TLP as specified in the RACK draft. I would just cite the RACK draft. There is a chance it may go into last call within the next two IETFs.
draft-ietf-quic-recovery.md
Outdated
Probe packets sent on a PTO MUST be ack-eliciting. A probe packet SHOULD carry | ||
new data when possible. A probe packet MAY carry retransmitted unacknowledged | ||
data when new data is unavailable, when flow control does not permit new data to | ||
be sent, or to opportunistically reduce loss recovery delay. Implementers MAY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Implementations MAY" is better
draft-ietf-quic-recovery.md
Outdated
acknowledges no packets sent prior to the first retransmission timeout. In this | ||
case, the congestion window MUST be reduced to the minimum congestion window and | ||
slow start is re-entered. | ||
If two consecutive PTOs have occurred (pto_count is at least 3), the network is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure why pto_count needs to be 3? Shouldn't it be 2 after two PTOs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I'm confused. Also, this should be more about the number of PTOs that were lost than the number sent. ie: I could send 3 PTOs, then the network returns to normal, no packets were lost, and the connection continues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on the number of timeouts that happened, not based on packets sent. We want cwnd collapse at the equivalent of the first RTO's occurrence. This is the third timeout (2 TLP timeouts and then 1 RTO), which means pto_count is at least 3. Or am I doing this math wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it seems that "pto_count is at least 3" is correct, however "If two consecutive PTOs have occurred" is incorrect and needs to be changed to "If three...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC Ian was going to get data on whether we need 2 TLPs or just one is sufficient. FWIW we only do one TLP in Windows TCP. Yes at the least please fix as Kazuho says: "if three consecutive PTOs have occurred"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Ok, fixed. I'll leave it at 3 for now, and I've turned it into a constant, kPersistentCongestion. We can discuss the value in a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update, this looks very close. I'm concerned about the gap between number of PTOs that have occurred vs the number that were lost, but possibly it doesn't matter much in practice?
draft-ietf-quic-recovery.md
Outdated
time. A PTO enables a connection to recover from loss of tail packets or acks. | ||
The PTO algorithm used in QUIC implements the reliability functions of Tail Loss | ||
Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO | ||
algorithms for TCP {{?RFC5682}}, and the computation is based on TCP's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the computation" is vague. How about "the timeout computation"?
draft-ietf-quic-recovery.md
Outdated
@@ -473,8 +474,7 @@ consecutive PTO expiration due to a single packet loss. | |||
Consecutive PTO periods increase exponentially, and as a result, connection | |||
recovery latency increases exponentially as packets continue to be dropped in | |||
the network. Sending two packets on PTO expiration increases resilience to | |||
packet drop in both directions, thus reducing the probability of consecutive PTO | |||
events. | |||
packet drop, thus reducing the probability of consecutive PTO events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packet drops
draft-ietf-quic-recovery.md
Outdated
flight but an acknowledgement is not received within the expected period of | ||
time. A PTO enables a connection to recover from loss of tail packets or acks. | ||
The PTO algorithm used in QUIC implements the reliability functions of Tail Loss | ||
Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original implementation and algorithms preceded RACK, so the benefit of referencing TLP is that it's more specific and time-relevant. But if RACK was an RFC, then we would definitely prefer that. Either works for me.
draft-ietf-quic-recovery.md
Outdated
acknowledges no packets sent prior to the first retransmission timeout. In this | ||
case, the congestion window MUST be reduced to the minimum congestion window and | ||
slow start is re-entered. | ||
If two consecutive PTOs have occurred (pto_count is at least 3), the network is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I'm confused. Also, this should be more about the number of PTOs that were lost than the number sent. ie: I could send 3 PTOs, then the network returns to normal, no packets were lost, and the connection continues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you comments! Responses inline.
draft-ietf-quic-recovery.md
Outdated
flight but an acknowledgement is not received within the expected period of | ||
time. A PTO enables a connection to recover from loss of tail packets or acks. | ||
The PTO algorithm used in QUIC implements the reliability functions of Tail Loss | ||
Probe {{?TLP=I-D.dukkipati-tcpm-tcp-loss-probe}}, RTO {{?RFC5681}} and F-RTO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the function we're talking about is TLP, though I'm happy to add and cite RACK separately somewhere. I'll do that separately.
draft-ietf-quic-recovery.md
Outdated
acknowledges no packets sent prior to the first retransmission timeout. In this | ||
case, the congestion window MUST be reduced to the minimum congestion window and | ||
slow start is re-entered. | ||
If two consecutive PTOs have occurred (pto_count is at least 3), the network is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on the number of timeouts that happened, not based on packets sent. We want cwnd collapse at the equivalent of the first RTO's occurrence. This is the third timeout (2 TLP timeouts and then 1 RTO), which means pto_count is at least 3. Or am I doing this math wrong?
draft-ietf-quic-recovery.md
Outdated
@@ -1157,6 +1095,9 @@ window. | |||
congestion_window *= kLossReductionFactor | |||
congestion_window = max(congestion_window, kMinimumWindow) | |||
ssthresh = congestion_window | |||
// Collapse congestion window if persistent congestion | |||
if (pto_count > 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 is a magic number. Previously we said you SHOULD have 2 TLPs, but you could have fewer. I think it makes sense to define this as a constant and explain its reason for existing there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented on the thread above this. Yes we should settle on one or two TLPs. I personally prefer one but unfortunately I don't have data to say if two is better since Windows TCP has been doing just one TLP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added constant and explained it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this condition will bring back the issue that i raised previously of a longer timeout.
Consider this case:
1 [acked], 2, 3[pto], 4[pto], 5 [pto]
-> ack 5
now we get into a state
1 [acked], 2 [lost], 3[outstading], 4[outstanding], 5 [acked]
since minCwnd = 2
you end up with consuming the cwnd fully still and will have to wait until another pto to get out of this case.
The previous logic intentionally cleared all packets < rto_verified packet to avoid this which I had raised in a previous issue and is intentional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@siyengar : In this case, by the time 5 is acked, 3 and 4 should be marked as lost by the time-threshold loss detection mechanism, since they're going to be > 1.125 x SRTT in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few more comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo
#2114 (comment).
timer be scheduled for a second time. When the TLP timer expires the second | ||
time, a second TLP packet is sent, and an RTO timer SHOULD be scheduled {{rto}}. | ||
~~~ | ||
PTO = max(smoothed_rtt + 4*rttvar + max_ack_delay, kGranularity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry just noticed another difference. If peer advertises max_ack_delay as 0 (which seems to be legal), then first PTO becomes Srtt + 4 * RttVar, whereas in TCP the first PTO is 2* Srtt. If variance is low this may be too aggressive? Should we also do PTO = max (PTO, 2 *Srtt) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale in TCP for using 2xSRTT when RTTVAR is available? If variance is low, that still seems like the right thing to do. Why inflate the PTO when you don't need to?
I also just realized that the first PTO is akin to the first TLP. The second PTO would be backed off, so you will automatically get 2 x (SRTT + 4xRTTVAR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting RACK draft:
Aiming for a PTO value of 2*SRTT allows a sender to wait long enough
to know that an ACK is overdue. Under normal circumstances, i.e. no
losses, an ACK typically arrives in one SRTT. But choosing PTO to be
exactly an SRTT is likely to generate spurious probes given that
network delay variance and even end-system timings can easily push an
ACK to be above an SRTT. We chose PTO to be the next integral
multiple of SRTT.
Similarly, network delay variations and end-system processing
latencies and timer granularities can easily delay ACKs beyond
2*SRTT, so senders SHOULD add at least 2ms to a computed PTO value
(and MAY add more if the sending host OS timer granularity is more
coarse than 1ms).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pravb: Thanks for the context -- I should have just looked at the RACK draft. So, to be clear, the RACK draft does not use this 2ms when WCDelAckT is used, that is, when it accounts for the worst case delayed ack timer. This is what we expect max_ack_delay to do for us, but as you note, a receiver might send 0 here.
I think we should put the onus of this onto the receiver. A receiver should send the worst case ack delay in max_ack_delay. A receiver that doesn't do that shoots itself in the foot by causing spurious retransmissions at the sender. We may want to add some strong wording about how a receiver should choose max ack delay, I've filed #2186 for that.
Sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's reasonable. So this is how we'd justify the deviation from TCP:
- Include RttVar component in the computation for in network variability.
- Require the receiver to include the perturbation for system processing delays. So 2186 needs to add a SHOULD for adding a perturbation value on top of the actual max ack delay. I'll comment there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a constant (kPersistentCongestion) and described it. PTAL.
draft-ietf-quic-recovery.md
Outdated
acknowledges no packets sent prior to the first retransmission timeout. In this | ||
case, the congestion window MUST be reduced to the minimum congestion window and | ||
slow start is re-entered. | ||
If two consecutive PTOs have occurred (pto_count is at least 3), the network is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Ok, fixed. I'll leave it at 3 for now, and I've turned it into a constant, kPersistentCongestion. We can discuss the value in a separate issue.
draft-ietf-quic-recovery.md
Outdated
@@ -1157,6 +1095,9 @@ window. | |||
congestion_window *= kLossReductionFactor | |||
congestion_window = max(congestion_window, kMinimumWindow) | |||
ssthresh = congestion_window | |||
// Collapse congestion window if persistent congestion | |||
if (pto_count > 2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, added constant and explained it.
draft-ietf-quic-recovery.md
Outdated
Loss Probe (TLP) in TCP {{TLP}} {{RACK}}. Once the number of consecutive PTOs | ||
reaches this threshold - that is, persistent congestion is established - the | ||
sender responds by collapsing its congestion window, similar to a | ||
Retransmission Timeout (RTO) in TCP {{RFC5681}}. The RECOMMENDED value is 3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pseudocode below decreases CWND to kMinimumWindow, which is 2 packets. I think sticking with that makes sense, at least for now. It also requires less text to justify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recommendation is 3 for kPersistentCongestion, not for the value of cwnd. I'll change the text a bit to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this PR, since it's a big simplification.
The one thing I don't understand is how we can get rid of the (R/P)TO verified. I think I understand @janaiyengar's response to @siyengar's comment in #2114 (comment), i.e. that when receiving the ACK for the PTO, packets sent before the PTO are declared lost because the time-threshold based loss detection kicks in. But shouldn't that also have applied to RTO verified in the first place? Why did we need RTO verified in the first place then?
I think I found the answer to my question: Sorry for the noise. |
@marten-seemann : Exactly. This PR builds on #1974, which started the movement in this direction. This PR fixes multiple issues at the same time because the design that comes out at the end is clearer than the one you'd get by fixing the issues piecewise. |
There's exactly one question that's pending to be resolved now in this PR: @pravb's question about using 2SRTT for initial PTO. I'm happy to open a new issue for that specific question if Praveen thinks we should continue that discussion, but I'd like to merge this soon, as (i) I'm hearing pushback on a detail or two but not the overall PR, which we can handle in separate issues, and (ii) this PR is fast approaching github's limit on comments (hence the unicorn tag). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jana, looks even better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Closes #2166, #2168, #1017.
This PR builds on #1974 and simplifies QUIC's timeout-based loss detection mechanisms. Unfortunately, this PR cuts through text in many places, so I'm summarizing the key changes here. The compiled version of the draft, here, might be easier to read than the diffs.
The high-order description is that TLP and RTO are combined in this PR into a single "Probe Timeout" (PTO). Consecutive PTOs are exponentially longer, just like RTO used to be. The rationale for calling this a "Probe" Timeout is that loss detection in QUIC only happens when an ack is received, and this timer and timeout only cause probe packets to be sent out to elicit acks. This unification is made possible by the following constituent changes.
(Issue Eliminating the fixed minimum RTO #1017) There was general agreement in Eliminating the fixed minimum RTO #1017 that adding a max_ack_delay should allow us to remove the minRTO. max_ack_delay has since been added to QUIC in a transport parameter, which allows us to include it in the PTO computation and remove minRTO/minTLP.
Timer granularity is added for both crypto and PTO timeouts, ensuring that even when the computed network RTT might be too small and represented as 0, timeouts are set for at least 1 clock tick. This is in keeping with use of granularity in TCP (RFC 6298). A min granularity of 1ms is recommended.
(Issue TLP and RTO arbitrariness #2166) In line with the resolution above for removing min RTO, min TLP and min crypto timeout values are now removed.
(Issue Verification of RTO no longer necessary #2168) PTO validation has been removed. PTOs cause congestion window collapse to min cwnd only after 3 consecutive PTOs to retain parity with current behavior, where an RTO collapses the window after 2 TLPs.
One side-effect to note is that a single timer will start exponentially backing off right from the first PTO, as against TLP and RTO where the TLP timers did not back off at all, and the RTO timer started exponentially backing off after the TLPs. I can restore this by adjusting the backoff expression in SetLossDetectionTimer(), but I'm not convinced this is necessary.