Congestion window reduction should be based on size of inflight #3943

janaiyengar · 2020-07-23T03:34:12Z

The recovery draft says that on congestion, NewReno halves the congestion window. The corresponding pseudo-code is as follows:

congestion_window *= kLossReductionFactor

This isn't quite correct however, since when application limited, the amount of data in flight can be lesser than that allowed. The reduction should be on what was actually in flight, rather than what the sender was allowed to send.

For reference, RFC 5681, which defines NewReno for TCP, says this on Page 7 (Eq. 4):

 ssthresh = max (FlightSize / 2, 2*SMSS)

(and cwnd is later set to this value).

The text was updated successfully, but these errors were encountered:

junhochoi · 2020-07-23T17:54:29Z

@janaiyengar I can make a PR if you're ok

ianswett · 2020-07-23T18:32:46Z

I would suggest not making this change. This is a case where I believe implementations typically use CWND and not flightsize, and we should make the same choice.

Feel free to correct me if I'm wrong about what implementations do.

janaiyengar · 2020-07-24T00:56:44Z

@ianswett : you are right. I had thought that this is what Linux did but I was wrong. After chatting with Yuchung a bit, it's become clear to me that this is ok.

So, the reason to consider doing this change is because you might have a situation where the cwnd is 100 packets, but you have sent 10; reducing the cwnd from 100 to 50 is not meaningful, because the actual network load was 10 packets.

On the other hand, if you sent 100 packets, and the loss happened at packet 95, the flightsize now is 4, causing the cwnd to become 2. That does not make sense either.

If there is in fact congestion in the application limited case with a large cwnd (the first case above), there will be losses in subsequent rounds, causing the cwnd to eventually come down within a few RTTs, by halving each RTT. Reducing the cwnd to a very small number on the other hand (the second case above) means that the connection has to potentially go through many RTTs to increase, since the increase is a linear 1 per RTT.

There is a way to do this correctly: to track what was in flight when a given packet was sent, so that on loss, you use the appropriate flightsize. But that is quite a departure from the simple Reno algorithm.

On balance, I now think we should keep what we have. I know this is a departure from RFC 5681, but I think that's ok, and we have a convincing argument as to why it is.

I'll leave this issue open for a bit for others to comment, but I'm happy with closing this issue with no action.

janaiyengar · 2020-07-24T01:04:19Z

I should note that both the Reno and Cubic implementations in Linux use cwnd/2.

kazuho · 2020-07-24T01:05:01Z

+1 to retaining status quo, based on the view that there is a "bug" in what RFC 5681 does (the 100 packet case that @janaiyengar points out), in addition to what linux has been doing.

junhochoi · 2020-07-24T01:15:17Z

seems like freebsd newreno also does cwnd/2.

On the other hand, if you sent 100 packets, and the loss happened at packet 95, the flightsize now is 4, causing the cwnd to become 2. That does not make sense either.

Just thinking how this is possible? assuming cwnd=100, I think knowing packet 95 is lost without receiving ack of 0-95 is not realistic, so flightsize will be much higher than 4.

junhochoi · 2020-07-24T01:22:51Z

Also, RFC5681 3.1 has the following text, indicating we need to consider receiver window too?

Implementation Note: An easy mistake to make is to simply use cwnd,
rather than FlightSize, which in some implementations may
incidentally increase well beyond rwnd.

janaiyengar · 2020-07-24T03:01:53Z

I don't think that is an issue as such. The point of the note was, I suspect, to say that the cwnd might be even larger than the rwnd. That's not a problem for us ... since we don't really have the concept of an rwnd. Ours is a changing credit, and in TCP terms, it reflects a dynamic rwnd value.

janaiyengar · 2020-07-24T03:02:23Z

Ok, I'm closing this issue. Please reopen if anyone thinks this discussion needs to continue.

janaiyengar added the -recovery label Jul 23, 2020

marten-seemann mentioned this issue Jul 23, 2020

It's unclear if persistent congestion is a per-PN-space property #3939

Closed

larseggert added this to Triage in Late Stage Processing via automation Jul 23, 2020

larseggert added the post-wglc label Jul 23, 2020

janaiyengar closed this as completed Jul 24, 2020

Late Stage Processing automation moved this from Triage to Issue Handled Jul 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Congestion window reduction should be based on size of inflight #3943

Congestion window reduction should be based on size of inflight #3943

janaiyengar commented Jul 23, 2020

junhochoi commented Jul 23, 2020

ianswett commented Jul 23, 2020

janaiyengar commented Jul 24, 2020

janaiyengar commented Jul 24, 2020

kazuho commented Jul 24, 2020 •

edited

junhochoi commented Jul 24, 2020 •

edited

junhochoi commented Jul 24, 2020

janaiyengar commented Jul 24, 2020

janaiyengar commented Jul 24, 2020

Congestion window reduction should be based on size of inflight #3943

Congestion window reduction should be based on size of inflight #3943

Comments

janaiyengar commented Jul 23, 2020

junhochoi commented Jul 23, 2020

ianswett commented Jul 23, 2020

janaiyengar commented Jul 24, 2020

janaiyengar commented Jul 24, 2020

kazuho commented Jul 24, 2020 • edited

junhochoi commented Jul 24, 2020 • edited

junhochoi commented Jul 24, 2020

janaiyengar commented Jul 24, 2020

janaiyengar commented Jul 24, 2020

kazuho commented Jul 24, 2020 •

edited

junhochoi commented Jul 24, 2020 •

edited