Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When the sender is using standard Reno congestion control, ack every ~2 packets #1428

Closed
ianswett opened this issue Jun 7, 2018 · 22 comments
Assignees
Labels
-recovery design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@ianswett
Copy link
Contributor

ianswett commented Jun 7, 2018

We should recommend acking every 2 packets or something similar when the sender is using Reno or another window based congestion control like Cubic.

Other congestion controllers like BBR are able to tolerate less frequent acking, but those controllers are not standardized and quite a bit more complex.

This makes me a bit sad, since I like acking less frequently, but the recovery doc should be self-consistent.

@janaiyengar
Copy link
Contributor

The receiver doesn't know the sender's congestion controller, so that recommendation is not going to be useful in practice.

The receiver is usually capable of sending acks as frequently as necessary, in response to packet receive events. The sender knows how sensitive it is to the ack ratio, so it needs to suggest the max ratio. I think we need a transport param for this.

@ianswett
Copy link
Contributor Author

To clarify, my proposal is that we should document an acknowledgement approach we expect to work well with the described congestion controller, Reno. Do you think that would not be useful?

If we want to add a mechanism to indicate to the receiver that acknowledging packets less frequently is ok, then as you said, we need one or more transport parameters.

What transport parameter(s) do you have in mind? The two I can think of are the max number of packets newly acked at once and the interval between acks in fraction of an RTT or absolute time. The current BBR approach is very adaptive, so it doesn't need to know the details of the receiver's behavior. It's not clear to me how other congestion controllers will adapt to this, so it's not clear to me what information they need?

@janaiyengar
Copy link
Contributor

janaiyengar commented Jun 11, 2018

The Reno spec is a simple CC for those senders who need something simple and reasonably effective to implement on the send side. I don't think we can afford to specify a receiver that assumes only Reno senders though. Given that senders (such as Google's servers, for instance) will use non-Reno CCs right off the ground, we would want receivers to do something better than simply ack every two packets right off the ground.

Any sender should be adaptive to this, but it's useful to have a knob by which we can control receiver behavior and reduce the number of ACK-only packets that are being sent on the wire. The point of acking every other packet is basically an attempt to reduce the number of ACK-only packets, and we ought to be able to make it more generalizable.

As I mentioned in my earlier comment, I think the sender can specify something like:
ack_ratio: number of retransmittable packets received after which an ack must be sent. 2 by default.

We can also additionally have an ack_delay param that the receiver tells the sender about, but I think that can be a separate issue (and there's already #981).

What do you think?

@ianswett
Copy link
Contributor Author

We shouldn't assume only Reno, but as EKR pointed out, the text should describe something we expect to perform reasonably well.

Yes, let's keep ack delay is a separate issue. And I completely agree that we don't want to specify only acking every two packets.

If we specify ack ratio, I tend to think it should be in bytes, not packets, but possibly it doesn't really matter? However, it seems like min number of acks per RTT is just as valuable as max number of bytes received before sending an acknowledgement.

At the moment, I'm fairly concerned that we are only aware of one congestion controller that deals well with intentional receiver-side aggregation/decimation, and it is ok with any value of ack_ratio. More important to BBR is getting somewhat regular feedback, ie: at least once per RTT, ideally more often.

@mikkelfj
Copy link
Contributor

I don't know that much about congestion control algorithms, but at some point I made a note of the following:

http://modong.github.io/pcc-page/

@ianswett
Copy link
Contributor Author

Good point, I think PCC(or any other primarily rate based algorithm) would still perform well with a significantly reduced ack rate.

@ianswett ianswett added the design An issue that affects the design of the protocol; resolution requires consensus. label Jul 12, 2018
@ianswett
Copy link
Contributor Author

Jana and I discussed this today at lunch and we think the best approach is for the sender to communicate a value for the max number of retransmittable packets or bytes received before sending an immediate ack.

Number of packets seems simpler, but the resource we're trying not to constrain is CWND, which is in bytes, so that seems more correct. I suspect either will work fine in typical circumstances.

Given we recommend Reno currently, the default value would probably be 2 packets or 2400 bytes.

@mikkelfj
Copy link
Contributor

bytes seem more future proof

@mirjak
Copy link
Contributor

mirjak commented Jul 12, 2018

+1 for communicating the value

@ianswett
Copy link
Contributor Author

Jana and I were discussing this, and I believe we should update the TLP calculation to assume an immediate ack(similar to TCP assuming immediate acks for every two packets) if more than the specified number of bytes are in flight.

@pravb
Copy link

pravb commented Jul 18, 2018

I am not sure I understand the issue here. Even if the sender is using Reno, it should do appropriate byte counting without any limits (since we are requiring pacing). If this is the case why does it matter if there is stretch ACKing? The issue is not the number of ACKs that need to be generated but that ACKs might get delayed. Any ACKing scheme should generate an immediate ACK after processing a batch of packets. We should recommend that the receiver generate an immediate ACK if it processes more than 2 packets in the batch, but that it could go slower if it is willing to live with slower congestion window growth and less RTT / congestion feedback to the sender.

@gloinul
Copy link
Contributor

gloinul commented Jul 18, 2018

So supportive of the general function. However, my question is if it is in bytes, how does that interact with path MTU discovery. If on starts at the default 1280 bytes PMTU and one actually have a 4K path MTU network. Then something that was configured for 10 packets will suddenly end up acking every 3-4 packets.

@nibanks
Copy link
Member

nibanks commented Jul 18, 2018

@gloinul assuming the value can be updated dynamically, and isn't just a transport parameter, couldn't you send the update in your PMTUD probe packet? If the packet makes it and gets acknowledged, then everything is good. If the packet was too big, the packet doesn't make it, doesn't get acknowledged and then the peer doesn't even know you tried to increase it.

@ianswett
Copy link
Contributor Author

For both this and explicit max ack delay, the ability to update would be very useful. I'm not sure if that should be done via an UPDATE_TRANSPORT_PARAMS frame as I proposed for initial stream flow control, or something else.

@gloinul
Copy link
Contributor

gloinul commented Jul 18, 2018

@nibanks I don't think driving it from the probe packet necessary is the right way of updating it. That as result of acknowledging a probe does not necessarily result in the PMTU be updated there and then. I rather see other ways.

@ianswett ianswett self-assigned this Aug 8, 2018
ianswett added a commit that referenced this issue Aug 30, 2018
Possibly it needs to be mentioned in recovery as well?

Fixed #1428
@huitema
Copy link
Contributor

huitema commented Sep 1, 2018

I commented on #1715 already. I don't think this works very well, because whatever byte or packet limit we use depend on the network bandwidth, which is going to vary during the course of the connection. IMHO, that's why specifications like "ACK at least every 1/4th of the min RTT" are better, because they do not depend on the bandwidth.

The reason for frequent ACK in TCP Reno is not congestion control but "ACK clocking". The received ACK is a signal to send new packets. Van Jacobson explains that in his original design: he basically wanted to keep the number of packet in transit constant, and ACK clocking achieves that -- modulo changes in the congestion window of course. If the sender is not using some form of rate control, the infrequent ACK will cause ACK clocking to degenerate. The ACK will open the window for a large number of packets, which will be sent in a single batch. This can cause spikes in network usage, stress buffers, etc. That's the main argument for sending ACK at east every 2 packets.

However, even if receivers send ACK every 2 packets, ACK clocking can still fail. For example, queues on the path from receiver to sender can cause ACK to bunch. This will have a cumulative effect, as bunched ACKs will cause packets to be sent in bunches, causing more bunches of ACKs, etc. So in practice senders who want to achieve high rates of transmission have to implement some form of rate control.

So my preference would be to not specify anything but a simple min RTT based limit on ACK delay.

@mikkelfj
Copy link
Contributor

mikkelfj commented Sep 1, 2018

Thanks for informative background.
If it is only for clocking, then frequent ACK's looks an awful lot like a huge hammer. We already have flow control and we could add rules for rate limitiing that does involve very frequent ACKs that can be expensive, for example IoT power use and also context switching on server dealing with many connections.

@ianswett
Copy link
Contributor Author

ianswett commented Sep 11, 2018

Thanks Christian, specifying the preferred peer max ack delay as a fraction of RTT is what gQUIC does today, and works well for BBR and other rate-based congestion controllers. However, at the moment, gQUIC also sends an ACK every 10 packets. Experiments have been done to raise that to 20, and no degradation was observed, so I don't believe that limit is important.

My concern is that I never got fraction of RTT based acking to work as well as acking every 2 packets for TCP Cubic, but I never tried a value lower than 1/8RTT.

In general, I think this suggestion is good and easy to implement, but I'm a bit concerned about regressions vs the "ACK every 2 packets" approach with Reno or Cubic.

We could have a 'status quo' knob that if the ack delay isn't specified, just ack every two packets with the recommended max ack delay?

@huitema
Copy link
Contributor

huitema commented Sep 12, 2018

Ian, you probably have vastly more data that I do. When we are concerned with Reno and Cubic, is that a theoretical concern based on simulation, or a practical issue visible in your telemetry? I say that because ACK compression and ACK suppression does happen a lot to TCP "in the wild". Cable modems do it, for example. If I was shipping a TCP implementation on a major platform, I would do something about it, so I suppose that most of the big releases have.

In any case, you have to go back to the reason why implementations want fewer ACKs -- first to reduce CPU load caused by packet-sent interrupts, and second to ease potential bottlenecks on the reverse path. So there no point being dogmatic. If ACKs are spaced "enough", be it in packets or in milliseconds, you win already; going further will only improve your CPU cost a little, but it might risk a lot.

I suspect that is how gQUIC came to adopt a 10 packet cap. Or maybe a 10 ms cap as well.

@ianswett
Copy link
Contributor Author

The data was from experiments in the wild looking at YouTube and Search metrics. The results weren't awful, but there were small regressions in some metrics, which appeared to indicate Cubic wasn't achieving as much bandwidth with 1/4 RTT decimation enabled. One possible issue with the experiments is that QUIC's Cubic implementation currently compares bytes_in_flight to CWND to determine if it's app-limited(https://cs.chromium.org/chromium/src/net/third_party/quic/core/congestion_control/tcp_cubic_sender_bytes.cc?q=cubic_sen&sq=package:chromium&g=0&l=369), and if it's app-limited it doesn't increase the CWND. I wouldn't expect less frequent ACKs to cause that to be more likely, but if it did, it could explain the behavior.

In traces, it was clear that this experiment was further increasing gaps between ACKs vs acking every 2 packets, at least for some users. It's very possible this is a Wifi artifact and sending more ACKs caused the Wifi access point to release ACKs a bit more often. The other potential issue is that QUIC paces at 1.25CWND/RTT in congestion avoidance for Cubic, because that's what the Linux kernel does. This pretty much guarantees that a flow will be CWND limited when a new ACK arrives. We discussed lowering this to 1CWND/RTT, but never ran an experiment with that.

That brings up another point. If one doesn't have pacing enabled(which I wouldn't advise, but I know pacing isn't always enabled at this point) then being ACK-clocked should smooth out traffic some.

The 10 packet cap was a random number I came up with, and there was no fundamental reasoning behind it. I have no evidence it's valuable or necessary.

I'd like to be in a situation where multiple implementations have data to share on this, in case my results were a result of an implementation bug, but we're not there yet. So my goal is to provide some knobs to allow experimentation, and we need to decide what those should be.

Related to this is the explicit max ack delay, which feeds into the TLP and RTO timeouts.

@ianswett
Copy link
Contributor Author

The decision in New York is to ACK every two packets, except for the recvmmsg case when multiple may be received on a connection at once.

Another decision is we want to communicate explicit Max Ack Delay and stop trying to use a max filter to track Max Ack Delay for TLP and RTO.

As with TCP, any fraction of RTT approach besides the "no more than an RTT of delay" will be via experimental extension only.

@martinthomson
Copy link
Member

Fixed in #1781.

@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

No branches or pull requests

10 participants