Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-path calculation of loss and congestion #632

Closed
britram opened this issue Jun 14, 2017 · 9 comments
Closed

On-path calculation of loss and congestion #632

britram opened this issue Jun 14, 2017 · 9 comments
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. parked An issue that we can't immediately address; for future discussion.

Comments

@britram
Copy link
Contributor

britram commented Jun 14, 2017

It might be useful for on-path devices to be able to calculate the (1) loss and (2) non-loss indication of congestion (i.e., ECN) experienced by each side of a QUIC flow from a single observation point.

Since #609 essentially marks every congestion window of a QUIC flow with an alternating spin, evaluation of the time-series of congestion window sizes could allow passive analysis of congestion window reductions, from which loss can be inferred if the assumption that the sender is using loss-based congestion control is valid.

A less inference-based solution would be something like the loss bit discussed in #279.

An even more explicit way to do this would be to implement something like ConEx in the QUIC short header.

@janaiyengar
Copy link
Contributor

janaiyengar commented Jun 14, 2017 via email

@britram
Copy link
Contributor Author

britram commented Jun 16, 2017

Yes, this is to cover loss estimation. We didn't discuss this at the interim, but it did come up in my survey of exported, passively measured metrics. This is here for completeness.

It's less important to react to loss on a per-flow basis than it is latency, though. So as long as (1) we can assume UDP and TCP traffic are treated equivalently by the link/queue at which the loss occurs and (2) there is some TCP traffic through that link/queue, inference-based observation of TCP will continue to work. The real question is whether this is important in a QUIC-only world.

@huitema
Copy link
Contributor

huitema commented Jun 19, 2017

Brian, how important really is this loss estimation? I don't think that losses can really be estimated by watching the congestion window. The variations of that window will depend on the specific algorithm implemented by the nodes, and also on the actual load. If this is really important, we may need to dedicate a "loss bit" in addition to the "latency bit".

@britram
Copy link
Contributor Author

britram commented Jun 19, 2017

Loss measurement is currently useful in a couple of network troubleshooting and capacity planning tasks: isolating congested links (and upgrading or shifting load away from them), as well as differentiating actual loss at the link layer from congestion (and repairing lossy links). It is useful to measure these conditions both on one's own network (where you can repair them yourself) as well as isolating them to some upstream or downstream path on other networks (so you can work to shift traffic away from those networks).

Universal deployment of useful AQM + ECN would both reduce the absolute magnitude of the congestion problem as well as make loss and congestion easily distinguishable. I'm optimistic about the time horizon here, but it's still measured on the order of decades.

You're right that the inference-based method, assuming that flight size reduction is loss- or congestion-signal related, is not a very high-fidelity signal, and may be useless depending on how the sender does transmission scheduling. So if we're serious about supporting this, I think we do need to take a good look at the ConEx mechanism and see if there's a low-overhead way to support it in QUIC.

@larseggert
Copy link
Member

I think it's actually a feature that loss information is NOT explicitly visible with QUIC, because packet loss ratio is a pretty useless metric in the first place. (The 5G people still dream about the "lossless" network, no matter how little sense that makes.)

With RTT-scale delay information and if we - say - required QUIC to try and negotiate ECN, those two signals (esp. correlated) are much more useful.

@ianswett
Copy link
Contributor

I believe a single bit for loss would be extremely low fidelity, because losses are commonly correlated, so it may actually be more misinformation than information.

I've seen one case when this would have been nice to have, but I think there are other ways to monitor loss that are more useful(ie: I'm seeing a lot of loss at this peering point, etc).

Brian, I believe you mentioned writing up a proposal for an extended 'debug' header that would not be default enabled, but be available for detailed network debugging. If we were to adopt that, I think loss would be helpful there.

@britram
Copy link
Contributor Author

britram commented Jun 20, 2017

So it turns out I actually agree with Lars here: loss, in and of itself, is a relatively useless metric without other things to correlate with. Requiring ECN negotiation for QUIC would actually do more for the (long-term) measurability of the protocol than mandating loss exposure, in part because future QUICs may have different transmission schedulers which would react differently to loss. The question is how useful CE mark based measurement will be in the intermediate term, as we transition from relatively little marking on path to "some" marking on path.

More useful would be a generalization of the ConEx mechanism, which could certainly go into a debug header, and would allow a single point to get information about CE markings (as well as loss) both on the upstream and the downstream. Would need to look into whether any TCP-centric assumptions were built into that mechanism.

@martinthomson martinthomson added -transport design An issue that affects the design of the protocol; resolution requires consensus. labels Jun 20, 2017
@mnot mnot changed the title Explicit support for on-path calculation of loss and congestion of QUIC flows On-path calculation of loss and congestion Jun 20, 2017
@mnot mnot added the arch label Jun 21, 2017
@britram
Copy link
Contributor Author

britram commented Nov 9, 2017

After discussion in the RTT DT, I agree less with Lars here than I thought I did. For steady-state (large flow, "infinite" application demand) traffic, RTT is sufficient (since RTT gives you bandwidth per RTT gives you effective window size gives you loss/congestion reaction indication). However, for application-limited traffic (effectively anything machine-to-machine, logging and control, ABR media, CBR media, etc.) measuring RTT is difficult, and loss/ECN is a better indicator of application-visible performance problems.

@martinthomson martinthomson added the parked An issue that we can't immediately address; for future discussion. label Jun 4, 2018
@martinthomson
Copy link
Member

Tokyo conclusion: this requires something more concrete and anything about this is not going to happen for QUIC version 1. Closing this.

In the future, if there are more concrete proposals for what can be measured (and ideally how), then we will need a thorough analysis along the lines of that performed for the spin bit.

@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. parked An issue that we can't immediately address; for future discussion.
Projects
None yet
Development

No branches or pull requests

7 participants