Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

congestion window increase on every ACKed packet could result in bursty sends #3094

Closed
goelvidhi opened this issue Oct 15, 2019 · 36 comments
Closed
Assignees
Labels
-recovery design has-consensus

Comments

@goelvidhi
Copy link

@goelvidhi goelvidhi commented Oct 15, 2019

When an ACK arrives that acknowledges N packets (where N could be very large), QUIC will increase the congestion window by more than 1MSS per congestion window during Congestion Avoidance causing bursty send. This could also affect the slow start.

if (congestion_window < ssthresh):
// Slow start.
congestion_window += acked_packet.size
else:
// Congestion avoidance.
congestion_window += kMaxDatagramSize * acked_packet.size / congestion_window

One could argue that QUIC ACKs every other packet, but there ACK packet could get lost or ACK for ACK packet could get lost.

The draft is probably assuming packet pacing but not all implementations support that.

I discussed this with Jana and he thinks that we should add something in pseudo code or some specific text in the draft.

@ianswett
Copy link
Contributor

@ianswett ianswett commented Oct 16, 2019

PR #2675 added more specific text on avoiding bursts into the network, FYI.

@goelvidhi
Copy link
Author

@goelvidhi goelvidhi commented Oct 16, 2019

Please note that PR #2675 addresses the issue of app-limited or pacing-limited cases to not increase cwnd when it is under utilized.

My concern is cwnd being inflated due to an ACK frame containing ACKs for large number of packets.

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Oct 16, 2019

@ianswett : The problem here is that we don't talk about burstiness that can occur when pacing is not in place, and a stretch-ack is received. PRR handles this, but we don't do PRR in the draft.

@huitema
Copy link
Contributor

@huitema huitema commented Oct 16, 2019

No we need to spend lots of time on Reno as everybody appears to be doing Cubic or BBR?

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Oct 16, 2019

Cubic has this problem. BBR cannot be assumed.

@huitema
Copy link
Contributor

@huitema huitema commented Oct 16, 2019

Except Cubic opens the window based on time spent, not just ack received

@huitema
Copy link
Contributor

@huitema huitema commented Oct 16, 2019

I don't get the scenario. It seems that @goelvidhi is worried about receiving an ACK for a large number of packets. This is not a congestion window problem, because it happens even if the congestion windows size remains constant. The root cause is that the ACK of N packets now reduces the number of packet in flow by N. So this is strictly about pacing, not about congestion control.

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Oct 16, 2019

@huitema: yes, and it can and still does burst because of stretch acks

@huitema
Copy link
Contributor

@huitema huitema commented Oct 16, 2019

Not just stretch Acks -- deliberate Ack compression by the network happens too. Very few alternative to pacing.

@junhochoi
Copy link
Contributor

@junhochoi junhochoi commented Oct 16, 2019

https://tools.ietf.org/html/rfc3465 describes

The limit, L, chosen for the cwnd increase during slow start, ... This document specifies that TCP implementations MAY use L=2SMSS bytes and MUST NOT use L > 2SMSS bytes. This choice balances between being conservative (L=1SMSS bytes) and being potentially very aggressive. In addition, L=2SMSS bytes exactly balances the negative impact of the delayed ACK algorithm (as discussed in more detail in section 3.2).

@larseggert
Copy link
Member

@larseggert larseggert commented Oct 16, 2019

Discussed in Cupertino. @goelvidhi to prepare a PR that hopefully can incorporate some ideas from RFC3465 to add safety features for non-pacing stacks.

@larseggert larseggert added the design label Oct 16, 2019
@huitema
Copy link
Contributor

@huitema huitema commented Oct 16, 2019

But 3465 does not solve the burst problem. It limits the growth of the window, but does not do pacing of any kind.

@junhochoi
Copy link
Contributor

@junhochoi junhochoi commented Oct 16, 2019

I think we can change to something like this

// Slow start.
congestion_window += min ( 2 * kMaxDatagramSize, acked_packet.size )

ianswett added a commit that referenced this issue Oct 17, 2019
@goelvidhi
Copy link
Author

@goelvidhi goelvidhi commented Oct 17, 2019

@junhochoi, I think we need to do 2MSS clamping per ACK received and not per packet acked.

I am thinking that we can remove OnPacketAckedCC and instead do OnAckReceivedCC. Pseudocode change something like this:

    OnAckReceived(ack, pn_space):

    Set total_bytes_acked to 0
    for acked_packet in newly_acked_packets:
        total_bytes_acked += acked_packet.size
        OnPacketAcked(acked_packet.packet_number, pn_space)

    OnAckReceivedCC(total_bytes_acked)

OnAckReceivedCC(total_bytes_acked):
    bytes_in_flight -= total_bytes_acked
    if (congestion_window < ssthresh):
         // Slow start.
        congestion_window += min(total_bytes_acked, 2* kMaxDatagramSize)
   else
       congestion_window += kMaxDatagramSize * total_bytes_acked
           / congestion_window
    

@junhochoi
Copy link
Contributor

@junhochoi junhochoi commented Oct 17, 2019

I think we can keep OnPacketAckedCC() but just move cwnd update to OnAckReceivedCC() as you propose.

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Oct 17, 2019

Yup, that's an important missing fix, though it needs to be applied only when not pacing. Additionally, #3106 helps with burstiness in general. @goelvidhi : I would make a pacing variable, and apply the min here when not pacing.

@huitema
Copy link
Contributor

@huitema huitema commented Oct 17, 2019

The proposed change has the effect of reducing the effective Window if ACKs are compressed. Say you have a window of 12 packets, acks are delayed, and a single ACK acknowledges 12 packets. Now you have a window of 2 packets, your transmission rate is 6 times slower than before, and it will take you 10 RTT to recover. Is that really what you want?

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Oct 17, 2019

@huitema: This is what ABC (RFC 3465) does for TCP. ACK compression has side-effects, and the way to deal with it is, as you say, to pace packets out. This reduction in performance is only when the endpoint does not pace, and it is the only way to limit bursts in that case.

@huitema
Copy link
Contributor

@huitema huitema commented Oct 17, 2019

Also, ACK compression does not necessarily imply ACK pruning. I have traces in which 1 RTT worth of ACKs is compressed and delivered back to back. In that case, the proposed algorithm will do nothing: you will get 2 MSS added for each ACK, back to back within less than 1 ms.

@junhochoi
Copy link
Contributor

@junhochoi junhochoi commented Oct 17, 2019

FYI. FreeBSD implements rfc3465 (and enabled by default) https://github.com/freebsd/freebsd/blob/master/sys/netinet/cc/cc_newreno.c#L174
Linux removed tcp_abc implementation but saying it does in a different way: https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_cong.c#L385
Considering FreeBSD doesn't have pacing by default, I think current proposed algorithm is not worse (or aggressive) than what tcp implementation does.

ianswett added a commit that referenced this issue Nov 1, 2019
When not pacing packets, limit the CWND increase to 2 packets per ACK frame received and cite RFC3465.

Fixes #3094
@goelvidhi
Copy link
Author

@goelvidhi goelvidhi commented Nov 4, 2019

@ianswett: I am working on the PR for pseudocode changes.

@tfpauly
Copy link

@tfpauly tfpauly commented Jan 15, 2020

Bringing conversation in from the list:

With regards to #3232, one of the things we found in testing was that it could (a) still allow an implementation to be vulnerable to too much bursting (the original problem), but more importantly (b) locked receivers into an ack pattern of acking every other packet. My main concern is that this causes problems if receivers want to experiment with and change acking strategies going forward in way that otherwise would be fine.

One way to look at the problem described by this issue is that the existing text has a minor underspecification, in which it has the notion of a burst limit but says nothing about the frequency of such bursts. Two "bursts" of 10MSS that are sent within a couple nanoseconds are really one burst that's too big. Another way of fixing this is to remove even the specification of the 10MSS limit if we really don't want too much text about the non-pacing cases.

Essentially, what this comes down to is: pace, or even if you don't pace, you need to not send bursts above some frequency; and if you don't set some pacing timers, you'll likely underutilize the link.

@martinthomson
Copy link
Member

@martinthomson martinthomson commented Jan 16, 2020

I'm going to put this out there: the answer is MUST pace.

You might formalize this by saying the number of packets sent in any interval t cannot exceed cwnd * 2 * t / RTT by more than the initial cwnd (i.e., the 10MSS). That allows for bursting, but requires a period of quiet either preceding or following any burst.

@mnot mnot removed the proposal-ready label Jan 17, 2020
@mnot mnot moved this from Consensus Call issued to Design Issues in Late Stage Processing Jan 17, 2020
@tfpauly
Copy link

@tfpauly tfpauly commented Jan 21, 2020

@martinthomson Yes, it does largely reduce to "implementations MUST pace" (even if the pacing algorithm isn't very nuanced, in which case you'll perform poorly).

The formalization of the rate, one option for which was specified in #3351, I think is the heart of what should be done here.

@goelvidhi
Copy link
Author

@goelvidhi goelvidhi commented Feb 5, 2020

There are two routes that we can take:

  1. Specify clearly what 'delay` refers to in the below statement, which is done in #3351
Sending multiple packets into the network without any delay between them
creates a packet burst that might cause short-term congestion and losses. 

OR
2. Change MUST pace to SHOULD and remove the 10 MSS burst size limit and provide general recommendation if an implementation doesn't pace. Something like:

Implementations SHOULD either use pacing or use another mechanism 
(for example, limit congestion window increase as specified by RFC 3465)
to introduce delay between successive transmissions. Implementations
that don't pace are at risk of underutilizing the network.

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Feb 5, 2020

I'm sympathetic to saying MUST pace, and I like the formatlization that @martinthomson suggests. I've heard some pushback on saying MUST pace in the past, but it might be worth giving it a shot again. @erickinnear @tfpauly @goelvidhi : What do you think?

@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Feb 5, 2020

That said... I will note that @martinthomson 's formalization allows for a burst at the end of an RTT. Consider for example when a single ACK is received that acknowledges an entire window. That would be legal by this definition. I think that's ok.

@larseggert
Copy link
Member

@larseggert larseggert commented Feb 5, 2020

Discussed in ZRH. Proposed resolution is to define and REQUIRE pacing.

@larseggert larseggert added the proposal-ready label Feb 5, 2020
@project-bot project-bot bot moved this from Design Issues to Consensus Emerging in Late Stage Processing Feb 5, 2020
@tfpauly
Copy link

@tfpauly tfpauly commented Feb 5, 2020

@janaiyengar I think requiring pacing makes good sense, with some formalization of what that means.

@LPardue
Copy link
Member

@LPardue LPardue commented Apr 4, 2020

To clarify the status here, my understanding is that the proposal is to close this design issue with no action.

The discussions have identified editorial improvements that will be addressed independently.

@LPardue LPardue added call-issued and removed proposal-ready labels Apr 4, 2020
@project-bot project-bot bot moved this from Consensus Emerging to Consensus Call issued in Late Stage Processing Apr 4, 2020
@janaiyengar
Copy link
Contributor

@janaiyengar janaiyengar commented Apr 10, 2020

Let's continue the editorial discussion on #3122.

@goelvidhi
Copy link
Author

@goelvidhi goelvidhi commented Apr 12, 2020

Closing this issue and continuing discussion on #3122

@goelvidhi goelvidhi closed this Apr 12, 2020
Late Stage Processing automation moved this from Consensus Call issued to Text Incorporated Apr 12, 2020
@LPardue
Copy link
Member

@LPardue LPardue commented Apr 12, 2020

This needs to run through the consensus call procedure, so we'll keep it open for another week.

@LPardue LPardue reopened this Apr 12, 2020
Late Stage Processing automation moved this from Text Incorporated to Triage Apr 12, 2020
@martinthomson martinthomson moved this from Triage to Consensus Call issued in Late Stage Processing Apr 14, 2020
@LPardue LPardue added has-consensus and removed call-issued labels Apr 18, 2020
@project-bot project-bot bot moved this from Consensus Call issued to Consensus Declared in Late Stage Processing Apr 18, 2020
@ianswett ianswett closed this Apr 18, 2020
Late Stage Processing automation moved this from Consensus Declared to Text Incorporated Apr 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery design has-consensus
Projects
Late Stage Processing
  
Issue Handled
Development

No branches or pull requests