On-path calculation of loss and congestion #632

britram · 2017-06-14T14:39:14Z

It might be useful for on-path devices to be able to calculate the (1) loss and (2) non-loss indication of congestion (i.e., ECN) experienced by each side of a QUIC flow from a single observation point.

Since #609 essentially marks every congestion window of a QUIC flow with an alternating spin, evaluation of the time-series of congestion window sizes could allow passive analysis of congestion window reductions, from which loss can be inferred if the assumption that the sender is using loss-based congestion control is valid.

A less inference-based solution would be something like the loss bit discussed in #279.

An even more explicit way to do this would be to implement something like ConEx in the QUIC short header.

janaiyengar · 2017-06-14T18:36:33Z

I'm confused. Is this issue asking for a way for mboxes to do loss estimation? I didn't hear folks really asking for this, so before digging into solutions, I'd like to ask for motivation.

…

On Wed, Jun 14, 2017 at 3:39 PM, Brian Trammell ***@***.***> wrote: It might be useful for on-path devices to be able to calculate the (1) loss and (2) non-loss indication of congestion (i.e., ECN) experienced by each side of a QUIC flow from a single observation point. Since #609 <#609> essentially marks every congestion window of a QUIC flow with an alternating spin, evaluation of the time-series of congestion window sizes could allow passive analysis of congestion window reductions, from which loss can be inferred if the assumption that the sender is using loss-based congestion control is valid. A less inference-based solution would be something like the loss bit discussed in #279 <#279>. An even more explicit way to do this would be to implement something like ConEx <https://tools.ietf.org/html/rfc7713> in the QUIC short header. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#632>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKjg1ETiKkbBsm_M8VfjwiPOBGSWp4APks5sD_CWgaJpZM4N5-ze> .

britram · 2017-06-16T14:41:21Z

Yes, this is to cover loss estimation. We didn't discuss this at the interim, but it did come up in my survey of exported, passively measured metrics. This is here for completeness.

It's less important to react to loss on a per-flow basis than it is latency, though. So as long as (1) we can assume UDP and TCP traffic are treated equivalently by the link/queue at which the loss occurs and (2) there is some TCP traffic through that link/queue, inference-based observation of TCP will continue to work. The real question is whether this is important in a QUIC-only world.

huitema · 2017-06-19T03:54:13Z

Brian, how important really is this loss estimation? I don't think that losses can really be estimated by watching the congestion window. The variations of that window will depend on the specific algorithm implemented by the nodes, and also on the actual load. If this is really important, we may need to dedicate a "loss bit" in addition to the "latency bit".

britram · 2017-06-19T07:26:59Z

Loss measurement is currently useful in a couple of network troubleshooting and capacity planning tasks: isolating congested links (and upgrading or shifting load away from them), as well as differentiating actual loss at the link layer from congestion (and repairing lossy links). It is useful to measure these conditions both on one's own network (where you can repair them yourself) as well as isolating them to some upstream or downstream path on other networks (so you can work to shift traffic away from those networks).

Universal deployment of useful AQM + ECN would both reduce the absolute magnitude of the congestion problem as well as make loss and congestion easily distinguishable. I'm optimistic about the time horizon here, but it's still measured on the order of decades.

You're right that the inference-based method, assuming that flight size reduction is loss- or congestion-signal related, is not a very high-fidelity signal, and may be useless depending on how the sender does transmission scheduling. So if we're serious about supporting this, I think we do need to take a good look at the ConEx mechanism and see if there's a low-overhead way to support it in QUIC.

larseggert · 2017-06-19T07:34:24Z

I think it's actually a feature that loss information is NOT explicitly visible with QUIC, because packet loss ratio is a pretty useless metric in the first place. (The 5G people still dream about the "lossless" network, no matter how little sense that makes.)

With RTT-scale delay information and if we - say - required QUIC to try and negotiate ECN, those two signals (esp. correlated) are much more useful.

ianswett · 2017-06-19T13:56:02Z

I believe a single bit for loss would be extremely low fidelity, because losses are commonly correlated, so it may actually be more misinformation than information.

I've seen one case when this would have been nice to have, but I think there are other ways to monitor loss that are more useful(ie: I'm seeing a lot of loss at this peering point, etc).

Brian, I believe you mentioned writing up a proposal for an extended 'debug' header that would not be default enabled, but be available for detailed network debugging. If we were to adopt that, I think loss would be helpful there.

britram · 2017-06-20T08:22:40Z

So it turns out I actually agree with Lars here: loss, in and of itself, is a relatively useless metric without other things to correlate with. Requiring ECN negotiation for QUIC would actually do more for the (long-term) measurability of the protocol than mandating loss exposure, in part because future QUICs may have different transmission schedulers which would react differently to loss. The question is how useful CE mark based measurement will be in the intermediate term, as we transition from relatively little marking on path to "some" marking on path.

More useful would be a generalization of the ConEx mechanism, which could certainly go into a debug header, and would allow a single point to get information about CE markings (as well as loss) both on the upstream and the downstream. Would need to look into whether any TCP-centric assumptions were built into that mechanism.

britram · 2017-11-09T08:58:02Z

After discussion in the RTT DT, I agree less with Lars here than I thought I did. For steady-state (large flow, "infinite" application demand) traffic, RTT is sufficient (since RTT gives you bandwidth per RTT gives you effective window size gives you loss/congestion reaction indication). However, for application-limited traffic (effectively anything machine-to-machine, logging and control, ABR media, CBR media, etc.) measuring RTT is difficult, and loss/ECN is a better indicator of application-visible performance problems.

martinthomson · 2019-01-30T07:38:53Z

Tokyo conclusion: this requires something more concrete and anything about this is not going to happen for QUIC version 1. Closing this.

In the future, if there are more concrete proposals for what can be measured (and ideally how), then we will need a thorough analysis along the lines of that performed for the spin bit.

britram mentioned this issue Jun 14, 2017

Latency Spin Bit #609

Closed

britram mentioned this issue Jun 20, 2017

Explicit Congestion Notification (ECN) #68

Closed

martinthomson added -transport design An issue that affects the design of the protocol; resolution requires consensus. labels Jun 20, 2017

mnot changed the title ~~Explicit support for on-path calculation of loss and congestion of QUIC flows~~ On-path calculation of loss and congestion Jun 20, 2017

mnot mentioned this issue Jun 21, 2017

Public Packet Number Echo #269

Closed

mnot added the arch label Jun 21, 2017

martinthomson added the parked An issue that we can't immediately address; for future discussion. label Jun 4, 2018

martinthomson removed the arch label Jan 22, 2019

martinthomson closed this as completed Jan 30, 2019

mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Mar 5, 2019

martinthomson mentioned this issue Nov 5, 2019

QUIC lacks on-path exposure of packet loss #3189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On-path calculation of loss and congestion #632

On-path calculation of loss and congestion #632

britram commented Jun 14, 2017

janaiyengar commented Jun 14, 2017 via email

britram commented Jun 16, 2017 •

edited

Loading

huitema commented Jun 19, 2017

britram commented Jun 19, 2017

larseggert commented Jun 19, 2017

ianswett commented Jun 19, 2017

britram commented Jun 20, 2017

britram commented Nov 9, 2017 •

edited

Loading

martinthomson commented Jan 30, 2019

On-path calculation of loss and congestion #632

On-path calculation of loss and congestion #632

Comments

britram commented Jun 14, 2017

janaiyengar commented Jun 14, 2017 via email

britram commented Jun 16, 2017 • edited Loading

huitema commented Jun 19, 2017

britram commented Jun 19, 2017

larseggert commented Jun 19, 2017

ianswett commented Jun 19, 2017

britram commented Jun 20, 2017

britram commented Nov 9, 2017 • edited Loading

martinthomson commented Jan 30, 2019

britram commented Jun 16, 2017 •

edited

Loading

britram commented Nov 9, 2017 •

edited

Loading