Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No RTT samples, no persistent congestion #3889

Merged
merged 10 commits into from Jul 29, 2020
Merged

No RTT samples, no persistent congestion #3889

merged 10 commits into from Jul 29, 2020

Conversation

ianswett
Copy link
Contributor

@ianswett ianswett commented Jul 9, 2020

Fixes #3875

Adds a SHOULD NOT and clarifies that persistent congestion is across PN spaces.

I was thinking that this is how it would be implemented, but I suspect this'll end up as a design change.

@ianswett ianswett added editorial An issue that does not affect the design of the protocol; does not require consensus. -recovery labels Jul 9, 2020
Copy link
Member

@kazuho kazuho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is correct.

The problem being discussed in #3875 is that PTO could be significantly reduced when obtaining the first RTT sample, and that persistent congestion is declared because that RTT sample would be used.

I think we need to do something along the lies of forbidding declaration of persistent congestion using packets sent before RTT samples were obtained, or until the RTT estimation becomes stable.

@martinthomson
Copy link
Member

So a slight rewording might help, but you would have to disqualify losses associated with packets sent prior to getting an RTT estimate, as noted. But then RTT estimates are poor to start with (the first packet to establish a NAT binding can add noticeable delays, for instance).

@marten-seemann
Copy link
Contributor

Can we just forbid persistent congestion detection for the Initial and the Handshake PN space? That would be easier to reason about than the RTT estimate you had a while ago, when you sent the first packet.

@kazuho
Copy link
Member

kazuho commented Jul 10, 2020

@marten-seemann I'm sot sure if that would be sufficient as a fix, because packets belonging to ApplicationData PN space are also sent before an RTT sample is obtained (e.g., 0-RTT packets, 0.5-RTT data).

@marten-seemann
Copy link
Contributor

@marten-seemann I'm sot sure if that would be sufficient as a fix, because packets belonging to ApplicationData PN space are also sent before an RTT sample is obtained (e.g., 0-RTT packets, 0.5-RTT data).

Right, this is what #3831 is about. Maybe we should say that only packets sent after handshake completion should be considered for persistent congestion detection?

@kazuho
Copy link
Member

kazuho commented Jul 10, 2020

@marten-seemann I agree that using handshake completion (or confirmation) would be better than status quo, though it is still questionable if that's good enough.

IIUC, there are two issues:

  • there is no continuity between initial RTT and real RTT
  • the quality of RTT estimate is bad during the early stages of a connection

Using handshake completion (or confirmation) as the signal concentrates on fixing the former, but does not address the second concern. If we are to make a design change, I think it would be beneficial to spend some time looking into addressing both of the issues.

PS. Note that PTO taking RTTVAR into account does not work as a counterargument here, because we are using RTTVAR in an exceptional way (i.e. to estimate PTO of the past).

@janaiyengar
Copy link
Contributor

Yeah, this is tricky. Can we move the discussion to the issue please?

@ianswett ianswett removed the editorial An issue that does not affect the design of the protocol; does not require consensus. label Jul 14, 2020
@martinthomson
Copy link
Member

Based on the discussion we've had, don't we need a more comprehensive set of changes that deal with persistent congestion, sending probe packets, and calculating RTT?

@kazuho
Copy link
Member

kazuho commented Jul 16, 2020

@martinthomson I think that depends on the following two aspects:

First, the accuracy we want to have in the definition of persistent congestion (see #3875 (comment)). If we think of persistent congestion as a ballpark figure that resembles a blackhole period of somewhere around 3 PTO, during which an endpoint would have sent at least 2 or 3 packets, then I think the change proposed in this PR is sufficient.

Second, if your concern is about endpoints not arming PTO for ApplicationData and that becoming an edge of persistent congestion, then I'd assume that the wort case that we might want to discuss is when all the following conditions are met:

  • the TLS handshake transcript send by the server does not fit in 3 datagrams (therefore the server would have RTT estimate when sending the first 0.5 RTT data)
  • client spends long time in verifying the certificate chain
  • both the 1-RTT packets that carrried 0.5 RTT data and the first 1-RTT data (i.e. the one containing HANDSHAKE_DONE) gets lost

Under such scenario, persistent congestion would be declared. But then, the condition matches the definition of the previous paragraph. Therefore, I think we can call this a non-issue, if we are fine with what's written in the previous paragraph.

@ianswett ianswett added the design An issue that affects the design of the protocol; resolution requires consensus. label Jul 16, 2020
@ianswett
Copy link
Contributor Author

I think we might need to do something to avoid or at least caution against terrible RTT samples, but that's being discussed in #3821

@janaiyengar
Copy link
Contributor

This is separable from #3821, and I wouldn't mix the two. I agree with @kazuho that the design issue here is about whether 3 x PTO is about time, about packets, or both. That needs consensus. This PR is good if consensus is on the issue is that PTO is (mostly) about time. We should discuss that on #3875.

Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add some pseudo code?

@janaiyengar
Copy link
Contributor

@marten-seemann : That's probably a good idea.

Comment on lines 845 to 847
The persistent congestion period SHOULD NOT start until there is at
least one RTT sample, both because the length of the period is unknown and
the PTO may be excessively conservative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a wordier version is clearer:

"Persistent congestion SHOULD NOT consider packets that were sent prior to obtaining an RTT sample. Correctly detecting persistent congestion can depend on packets being sent on PTO expiration. The initial RTT, which is used to set the PTO timer, might be too conservative. If the first RTT sample results in a much smaller RTT, the resulting persistent congestion interval might contain too few packets for their loss to be indicative of congestion."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the first sentence potentially confusing, but I can make this more verbose along the lines of the second two sentences.

@ianswett
Copy link
Contributor Author

I added a few lines of pseudocode and a comment. Hopefully that's helpful.

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
@@ -1581,6 +1588,10 @@ Invoked when DetectAndRemoveLostPackets deems packets lost.

~~~
InPersistentCongestion(lost_packets):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this function is expected to check for losses across all packet number spaces. Assuming that this understanding is correct, I think we should not pass lost_packets as an argument (to this function or to the AreAllPacketsLost that is being called from this function), because lost_packets is a per-PN value that is obtained by calling DetectAndRemoveLostPackets(pn_space).

And I think that this mitigates the concern raised in #3831.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it needs to be across PN spaces. Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. While strictly speaking this change is not wrong, it makes things more complex than necessary: An ACK can only ever establish loss in its own packet number space, never across packet number spaces. So it's totally fine to pass lost_packets to InPersistentCongestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that passing loss_packets in the pseudo-code gives the readers false impression that persistent congestion is something to be observed per-PN-space.

An endpoint can pass arguments whatever necessary (or beneficial), but that does not mean that those arguments should appear in pseudo-code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment now says:

Determine if all packets in the time period before the largest newly lost packet, including the edges and across all packet number spaces, are marked lost.

My point is that this is unnecessarily complicated, as only packets within a single packet number space, namely the packet number space the ACK was received in, can be newly lost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marten-seemann It is impossible to rely on largest_acked_packet_send_time, because an ACK on another PN space might have arrived later than the persistent congestion period to be declared. See the example below.

IMG_3331

As stated in #3889 (comment), I do not think it is possible can use loss_packets, unless you implement some lossy behavior to bound state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kazuho That makes sense to me. The algorithm I envisioned doesn't work. Not sure how one would correctly implement an InPersistentCongestion that yields correct results across packet number spaces then...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marten-seemann IIUC, one trivial way of implementing InPersistentCongestion is as follows:

  • Retain the entries in sentmap (the list of packets being sent) for at least 3 PTO.
  • When receiving an acknowledgement, mark a hole in the sentmap for the packet being acked.
  • When a loss is detected in a packet number space, check if that loss spans across more than 3 PTO, by traversing the sentmap. Then, if did, check the sentmap of other packet number spaces (by traversing through those sentmaps) to see if any ACK-eliciting packet sent in that period has been acked.

This approach might sound too trivial, as it is O(N) where N is the number of packets being lost (or in case there are multiple PN spaces in action, the number of inflight packets in other PN spaces). But in practice, I think this approach is sufficient in terms of performance, because N would be like 3 for the current packet number space, and because we would not be sending that many packets on Initial and Handshake packet number spaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When receiving an acknowledgement, mark a hole in the sentmap for the packet being acked.

I don't think that's trivial. Creating a "hole" is extra state that has to be cleaned up at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the discussion to #3939, starting from #3939 (comment). @marten-seemann I will respond there.

ianswett and others added 4 commits July 21, 2020 17:23
Co-authored-by: Jana Iyengar <jri.ietf@gmail.com>
Co-authored-by: Jana Iyengar <jri.ietf@gmail.com>
Co-authored-by: Jana Iyengar <jri.ietf@gmail.com>
Don't pass all the lost packets in, since this is across PN spaces.
Copy link
Contributor

@janaiyengar janaiyengar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good.

Copy link
Member

@martinthomson martinthomson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me quite a while to confirm that "is first rtt sample" is good. It's slightly more expansive than what I had implemented (which records

This has the effect of including packets sent prior to getting that sample. But as this is only packets sent within a single RTT, it's guaranteed to be less than a PTO.

draft-ietf-quic-recovery.md Outdated Show resolved Hide resolved
Co-authored-by: Martin Thomson <mt@lowentropy.net>
// Persistent congestion cannot be declared on the
// first RTT sample.
if (is first RTT sample):
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that's correct. What matters is that the packet you use for the start of the persistent congestion period was sent after you already had an RTT sample.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @marten-seemann here.

Consider the case where an endpoint receives the first ACK 10 seconds after sending the first packet (with an RTT of 10ms), then receives the next ACK in 20ms (with RTT of 10ms).

When the second ACK is being processed, the execution would pass through this if, and invoke AreAllPacketsLost. Because all the packets that were sent during the first few seconds are not acked, persistent congestion would be declared.

I think we can simply remove these lines, and rely on the fact that the comment in the pseudo-code talking about "edges." We can clarify what "edge" means, if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simply remove these lines, and rely on the fact that the comment in the pseudo-code talking about "edges." We can clarify what "edge" means, if necessary.

I'm not sure if that's sufficient. I think I'd prefer to make the pseudo-code more closely resemble what you'd have to implement. Specifically, to implement this, you'll need a new global variable time_of_first_rtt_measurement, or alternative first_packet_sent_with_measured_rtt, and then you'll have to implement a comparison with the time of the packets in lost_packets.

Copy link
Member

@kazuho kazuho Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, to implement this, you'll need a new global variable time_of_first_rtt_measurement, or alternative first_packet_sent_with_measured_rtt, and then you'll have to implement a comparison with the time of the packets in lost_packets.

Can you implement that way? Persistent congestion is declared across all packet number spaces. That means that you'd have to consult if any acks were received in other packet number spaces during the loss period observed in lost_packets. But you cannot remember all the moments when acks were received, because the state would be unbounded.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kazuho can you explain your reasoning around "edges" being sufficient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjoras If we define "edge" as a lost packet next to a packet that has been acked, having both edges means that a series of packets being lost are surrounded by acks. As there would have been RTT samples obtained for each of the surrounding acks, there is no need for a separate condition that checks if an RTT sample has been previously obtained.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very cute. I would be okay with that too if we cleaned up what "edge" means and also clarify the notion of which contiguous segment should be considered for the persistent period.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with @janaiyengar, it has turned out that the approach of tweaking the definion of edges does not work well, because the edges (from current time) can be any packet, while RTT can only be established with an ack-eliciting packet.

@mjoras
Copy link

mjoras commented Jul 22, 2020

FWIW this is what mvfst (and perhaps others?) has been doing since persistent congestion was implemented, since at the time it seemed silly to calculate the period using the initial RTT. However, at the time we still had the handshake timeout and so persistent congestion did not even apply to initial and handshake packets (see #2649).

More recently our deployment has relied on PTOs for the handshake and thus this check, and there are no obvious problems with it on the Internet. I am in favor of this change even if it is perhaps not exhaustive in its coverage of unlikely pathological cases.

@janaiyengar
Copy link
Contributor

My reaction is the same as @mjoras. I had to think a few times over to make sure that this was adequate, and I believe it is. It covers an egregious case explicitly, which is important.

@ianswett
Copy link
Contributor Author

I'm merging this so Jana can pull any conflicts into #3961

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery design An issue that affects the design of the protocol; resolution requires consensus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persistent congestion threshold is unreliable during the early stages of a connection
6 participants