-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No RTT samples, no persistent congestion #3889
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is correct.
The problem being discussed in #3875 is that PTO could be significantly reduced when obtaining the first RTT sample, and that persistent congestion is declared because that RTT sample would be used.
I think we need to do something along the lies of forbidding declaration of persistent congestion using packets sent before RTT samples were obtained, or until the RTT estimation becomes stable.
So a slight rewording might help, but you would have to disqualify losses associated with packets sent prior to getting an RTT estimate, as noted. But then RTT estimates are poor to start with (the first packet to establish a NAT binding can add noticeable delays, for instance). |
Can we just forbid persistent congestion detection for the Initial and the Handshake PN space? That would be easier to reason about than the RTT estimate you had a while ago, when you sent the first packet. |
@marten-seemann I'm sot sure if that would be sufficient as a fix, because packets belonging to ApplicationData PN space are also sent before an RTT sample is obtained (e.g., 0-RTT packets, 0.5-RTT data). |
Right, this is what #3831 is about. Maybe we should say that only packets sent after handshake completion should be considered for persistent congestion detection? |
@marten-seemann I agree that using handshake completion (or confirmation) would be better than status quo, though it is still questionable if that's good enough. IIUC, there are two issues:
Using handshake completion (or confirmation) as the signal concentrates on fixing the former, but does not address the second concern. If we are to make a design change, I think it would be beneficial to spend some time looking into addressing both of the issues. PS. Note that PTO taking RTTVAR into account does not work as a counterargument here, because we are using RTTVAR in an exceptional way (i.e. to estimate PTO of the past). |
Yeah, this is tricky. Can we move the discussion to the issue please? |
Based on the discussion we've had, don't we need a more comprehensive set of changes that deal with persistent congestion, sending probe packets, and calculating RTT? |
@martinthomson I think that depends on the following two aspects: First, the accuracy we want to have in the definition of persistent congestion (see #3875 (comment)). If we think of persistent congestion as a ballpark figure that resembles a blackhole period of somewhere around 3 PTO, during which an endpoint would have sent at least 2 or 3 packets, then I think the change proposed in this PR is sufficient. Second, if your concern is about endpoints not arming PTO for ApplicationData and that becoming an edge of persistent congestion, then I'd assume that the wort case that we might want to discuss is when all the following conditions are met:
Under such scenario, persistent congestion would be declared. But then, the condition matches the definition of the previous paragraph. Therefore, I think we can call this a non-issue, if we are fine with what's written in the previous paragraph. |
I think we might need to do something to avoid or at least caution against terrible RTT samples, but that's being discussed in #3821 |
This is separable from #3821, and I wouldn't mix the two. I agree with @kazuho that the design issue here is about whether 3 x PTO is about time, about packets, or both. That needs consensus. This PR is good if consensus is on the issue is that PTO is (mostly) about time. We should discuss that on #3875. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add some pseudo code?
@marten-seemann : That's probably a good idea. |
draft-ietf-quic-recovery.md
Outdated
The persistent congestion period SHOULD NOT start until there is at | ||
least one RTT sample, both because the length of the period is unknown and | ||
the PTO may be excessively conservative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a wordier version is clearer:
"Persistent congestion SHOULD NOT consider packets that were sent prior to obtaining an RTT sample. Correctly detecting persistent congestion can depend on packets being sent on PTO expiration. The initial RTT, which is used to set the PTO timer, might be too conservative. If the first RTT sample results in a much smaller RTT, the resulting persistent congestion interval might contain too few packets for their loss to be indicative of congestion."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the first sentence potentially confusing, but I can make this more verbose along the lines of the second two sentences.
A variation on @martinthomson suggestion
Update persistent congestion pseudocode
I added a few lines of pseudocode and a comment. Hopefully that's helpful. |
draft-ietf-quic-recovery.md
Outdated
@@ -1581,6 +1588,10 @@ Invoked when DetectAndRemoveLostPackets deems packets lost. | |||
|
|||
~~~ | |||
InPersistentCongestion(lost_packets): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this function is expected to check for losses across all packet number spaces. Assuming that this understanding is correct, I think we should not pass lost_packets
as an argument (to this function or to the AreAllPacketsLost
that is being called from this function), because lost_packets
is a per-PN value that is obtained by calling DetectAndRemoveLostPackets(pn_space)
.
And I think that this mitigates the concern raised in #3831.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it needs to be across PN spaces. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. While strictly speaking this change is not wrong, it makes things more complex than necessary: An ACK can only ever establish loss in its own packet number space, never across packet number spaces. So it's totally fine to pass lost_packets
to InPersistentCongestion
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that passing loss_packets
in the pseudo-code gives the readers false impression that persistent congestion is something to be observed per-PN-space.
An endpoint can pass arguments whatever necessary (or beneficial), but that does not mean that those arguments should appear in pseudo-code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment now says:
Determine if all packets in the time period before the largest newly lost packet, including the edges and across all packet number spaces, are marked lost.
My point is that this is unnecessarily complicated, as only packets within a single packet number space, namely the packet number space the ACK was received in, can be newly lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marten-seemann It is impossible to rely on largest_acked_packet_send_time
, because an ACK on another PN space might have arrived later than the persistent congestion period to be declared. See the example below.
As stated in #3889 (comment), I do not think it is possible can use loss_packets
, unless you implement some lossy behavior to bound state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kazuho That makes sense to me. The algorithm I envisioned doesn't work. Not sure how one would correctly implement an InPersistentCongestion
that yields correct results across packet number spaces then...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marten-seemann IIUC, one trivial way of implementing InPersistentCongestion
is as follows:
- Retain the entries in sentmap (the list of packets being sent) for at least 3 PTO.
- When receiving an acknowledgement, mark a hole in the sentmap for the packet being acked.
- When a loss is detected in a packet number space, check if that loss spans across more than 3 PTO, by traversing the sentmap. Then, if did, check the sentmap of other packet number spaces (by traversing through those sentmaps) to see if any ACK-eliciting packet sent in that period has been acked.
This approach might sound too trivial, as it is O(N) where N is the number of packets being lost (or in case there are multiple PN spaces in action, the number of inflight packets in other PN spaces). But in practice, I think this approach is sufficient in terms of performance, because N would be like 3 for the current packet number space, and because we would not be sending that many packets on Initial and Handshake packet number spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When receiving an acknowledgement, mark a hole in the sentmap for the packet being acked.
I don't think that's trivial. Creating a "hole" is extra state that has to be cleaned up at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the discussion to #3939, starting from #3939 (comment). @marten-seemann I will respond there.
Co-authored-by: Jana Iyengar <jri.ietf@gmail.com>
Co-authored-by: Jana Iyengar <jri.ietf@gmail.com>
Co-authored-by: Jana Iyengar <jri.ietf@gmail.com>
Don't pass all the lost packets in, since this is across PN spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me quite a while to confirm that "is first rtt sample" is good. It's slightly more expansive than what I had implemented (which records
This has the effect of including packets sent prior to getting that sample. But as this is only packets sent within a single RTT, it's guaranteed to be less than a PTO.
Co-authored-by: Martin Thomson <mt@lowentropy.net>
// Persistent congestion cannot be declared on the | ||
// first RTT sample. | ||
if (is first RTT sample): | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if that's correct. What matters is that the packet you use for the start of the persistent congestion period was sent after you already had an RTT sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @marten-seemann here.
Consider the case where an endpoint receives the first ACK 10 seconds after sending the first packet (with an RTT of 10ms), then receives the next ACK in 20ms (with RTT of 10ms).
When the second ACK is being processed, the execution would pass through this if
, and invoke AreAllPacketsLost
. Because all the packets that were sent during the first few seconds are not acked, persistent congestion would be declared.
I think we can simply remove these lines, and rely on the fact that the comment in the pseudo-code talking about "edges." We can clarify what "edge" means, if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simply remove these lines, and rely on the fact that the comment in the pseudo-code talking about "edges." We can clarify what "edge" means, if necessary.
I'm not sure if that's sufficient. I think I'd prefer to make the pseudo-code more closely resemble what you'd have to implement. Specifically, to implement this, you'll need a new global variable time_of_first_rtt_measurement
, or alternative first_packet_sent_with_measured_rtt
, and then you'll have to implement a comparison with the time of the packets in lost_packets
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, to implement this, you'll need a new global variable
time_of_first_rtt_measurement
, or alternativefirst_packet_sent_with_measured_rtt
, and then you'll have to implement a comparison with the time of the packets inlost_packets
.
Can you implement that way? Persistent congestion is declared across all packet number spaces. That means that you'd have to consult if any acks were received in other packet number spaces during the loss period observed in lost_packets
. But you cannot remember all the moments when acks were received, because the state would be unbounded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kazuho can you explain your reasoning around "edges" being sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mjoras If we define "edge" as a lost packet next to a packet that has been acked, having both edges means that a series of packets being lost are surrounded by acks. As there would have been RTT samples obtained for each of the surrounding acks, there is no need for a separate condition that checks if an RTT sample has been previously obtained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very cute. I would be okay with that too if we cleaned up what "edge" means and also clarify the notion of which contiguous segment should be considered for the persistent period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline with @janaiyengar, it has turned out that the approach of tweaking the definion of edges does not work well, because the edges (from current time) can be any packet, while RTT can only be established with an ack-eliciting packet.
FWIW this is what mvfst (and perhaps others?) has been doing since persistent congestion was implemented, since at the time it seemed silly to calculate the period using the initial RTT. However, at the time we still had the handshake timeout and so persistent congestion did not even apply to initial and handshake packets (see #2649). More recently our deployment has relied on PTOs for the handshake and thus this check, and there are no obvious problems with it on the Internet. I am in favor of this change even if it is perhaps not exhaustive in its coverage of unlikely pathological cases. |
My reaction is the same as @mjoras. I had to think a few times over to make sure that this was adequate, and I believe it is. It covers an egregious case explicitly, which is important. |
I'm merging this so Jana can pull any conflicts into #3961 |
Fixes #3875
Adds a SHOULD NOT and clarifies that persistent congestion is across PN spaces.
I was thinking that this is how it would be implemented, but I suspect this'll end up as a design change.