On-path calculation of RTT #631

britram · 2017-06-14T14:25:01Z

As discussed at the interim, it might be desirable to allow devices on-path observing two sides of a QUIC connection (or the subset of a multipath QUIC connection) to accurately estimate the RTT of the flow, either for in-band operational purposes (AQM) or general monitoring and measurement purposes (cf. an informal survey of IPFIX information elements suggesting multiple vendors shipping devices that perform this measurement for TCP.)

This facility should require bilateral agreement of both endpoints before an RTT signal is available on path; i.e. it should be optional.

This was a large part of the requirement behind #269, #391, and #393, which were expressed in terms of mechanism instead of requirements. #609 contains a proposal addressing this issue which is separate from packet numbering, and is therefore compatible with proposals to make packet numbering inaccessible to on path devices (see #231)

This issue was filed in part to uplevel the discussion on #609 to requirements.

janaiyengar · 2017-06-14T19:49:11Z

Thanks for filing this issue -- I'd like to move the discussion out of the PR and to this issue.

On your issue description: I'm not sure that we need to have bilateral agreement. If the mechanism is simple, and if we think the information exposed is reasonable, then we could mandate it.

Two issues were raised on PR #609 :

Maybe use a known/fixed packet type instead of spin bit. I went back and forth on this, and I'm leaning towards using a packet type. Irrespective of greasing, we have packet types, such as Client Initial, that middleboxes will want to identify; this type is no different. We need only specify a single short packet type (with a 4-byte packet number).
Specify that this signal is per-path. Endpoints will continue receiving and sending packets as usual, so the signal will continuously be rallied between the endpoints despite any path changes. A middlebox that is on one of the two paths will stop or start seeing all packets, and there's nothing about the marked packet that requires it to be the "first" packet. The only place where it might matter is with multipath, but we're not there yet.

janaiyengar · 2017-06-14T19:55:22Z

There's another use for this bit/packet type -- it gives a data receiver one RTT sample per round trip, without requiring a PING frame.

BTW, calling it Latency Spin Bit doesn't make sense -- there's positive or negative spin, no latency spin (unless you're thinking that the endpoints could fool the network about path latency with this bit :-))

mcmanus · 2017-06-15T13:49:21Z

On Wed, Jun 14, 2017 at 3:49 PM, janaiyengar ***@***.***> wrote: Thanks for filing this issue -- I'd like to move the discussion out of the PR and to this issue. On your issue description: I'm not sure that we need to have bilateral agreement. If the mechanism is simple, *and* if we think the information exposed is reasonable, then we could mandate it.

reasonable is going to be in the eye of the beholder. In this case rather than needing negotiation it might be as simple as saying "if the endpoint does not wish to signal the path then always set the bit to 0". Otherwise you might get less deterministic results :)

…

-P

martinthomson · 2017-06-15T23:33:52Z

@janaiyengar,

Maybe use a known/fixed packet type instead of spin bit. I went back and forth on this, and I'm leaning towards using a packet type. Irrespective of greasing, we have packet types, such as Client Initial, that middleboxes will want to identify; this type is no different. We need only specify a single short packet type (with a 4-byte packet number).

This isn't correct. Well, unless that you decide that sending shorter packet numbers isn't necessary every other round trip. Whatever we do here, we're burning a bit on this.

Ian made another point being that this needs to be very clear about how to flip the bit locally. Packet reordering will cause extra edges if we aren't careful. Edges need to be driven based on receiving a different bit, but only if the packet number is larger than the last observed bit flip.

britram · 2017-06-16T06:23:52Z

In this case rather
than needing negotiation it might be as simple as saying "if the endpoint
does not wish to signal the path then always set the bit to 0".

This works nicely, and doesn't require any coordination to get the property of requiring bilateral cooperation for it to work.

If we decided that we wanted unilateral exposure to work (i.e. that one endpoint could allow an on-path device to calculate its observed RTT and flight size without the cooperation of the other endpoint), then the spin bit could be driven off the transport protocol's control loop, simply flipping the bit once per RTT. This loses the (IMO very nice) property that this proposal has, though, that the latency exposure is completely separate from the transport mechanics.

Edges need to be driven based on receiving a different bit, but only if the packet number is larger than the last observed bit flip.

Yes, although no amount of care will keep signaling from trailing or being otherwise inaccurate on pathologically lossy and reordering prone paths. I need to think about this a bit more, but I think these cases are easy enough to recognize heuristically that we don't need loss and reordering exposure to allow such samples to be detected and used as a "bad path" indication.

ianswett · 2017-06-16T14:24:31Z

As I mentioned in the PR, I think we can deal with reordering by specifying the spin to be sent is based on the spin of the largest received packet(for a given path) instead of the last received packet.

britram · 2017-06-16T14:36:27Z

largest received packet number?

ianswett · 2017-06-16T14:43:38Z

Yes, the "spin of the packet with the largest received packet number for a given path."

janaiyengar · 2017-06-17T18:39:03Z

@martinthomson,

Maybe use a known/fixed packet type instead of spin bit. I went back and forth on this, and I'm leaning towards using a packet type. Irrespective of greasing, we have packet types, such as Client Initial, that middleboxes will want to identify; this type is no different. We need only specify a single short packet type (with a 4-byte packet number). This isn't correct. Well, unless that you decide that sending shorter packet numbers isn't necessary every other round trip. Whatever we do here, we're burning a bit on this.

Yes, that's what I said above -- burn bits on having a 4-byte packet number. This is a 2-byte overhead per RTT, which is fine. We don't have to burn an entire bit. Ian made another point being that this needs to be very clear about how to

flip the bit locally. Packet reordering will cause extra edges if we aren't careful. Edges need to be driven based on receiving a different bit, but only if the packet number is larger than the last observed bit flip.

I don't think this matters (or maybe I'm missing something). I don't think we need to make this more complicated than it is. A middlebox is basically recording the time of this signal in one direction, and measuring RTT when it sees the signal in the opposite direction. I don't see why reordering matters here... the measured time is clearly a network RTT measurement. Perhaps you're assuming that the middlebox needs to see the "largest" RTT, but I think that assumes too much. RTT measurements at endpoints have to be more careful, since RTT is basically tied entirely to retransmissions, but I wouldn't presume that a middlebox wants something so specific.

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#631 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKjg1FcijoUilXcXCcxNzPHnH1h4r8VQks5sEb9jgaJpZM4N59cW> .

janaiyengar · 2017-06-17T18:42:49Z

As I mentioned in the PR, I think we can deal with reordering by specifying the spin to be sent is based on the spin of the largest received packet(for a given path) instead of the last received packet.

Sorry, I missed this. As I said in my earlier response, you're assuming that the middlebox needs to see the "largest" RTT, but I think that assumes too much. RTT measurements at endpoints have to use the largest RTT, since RTT is basically tied entirely to retransmissions. I don't think that's necessarily what a middlebox would want. I would go with the simplest mechanism. If there's substantial reordering, the middlebox should see two distinct RTTs over several rounds. The middlebox can deal with anomalies (by using appropriate filters, min/max/avg/ewma) based on what exactly it wants to do with the RTT measurement.

ianswett · 2017-06-17T20:00:19Z

Since the algorithm from Christian's PR is not described in this issue, I'll describe it as best I can, because I think there are two different algorithms that people may have in their minds, and I'm worried it's causing confusion.

There are two roles, the initiator, who flips the spin value, and the responder, who reflects the spin value.

The initiator sends a value on a flight of packets(ie: 0)
The responder reflects back the 0.
When the initiator receives the first 0, it starts sending out 1's.
The responder reflects back the 1's.
The initiator starts sending 0's once it receives a 1, and goto 2.

This means the spin signal may be present on any packet type, though we could decide to make it present only in 1RTT packets if we wanted to conserve packet types.

The reordering issue I was concerned about is if either side sees 1111010000 instead of 111110000 as it was sent, then I was proposing it should treat the packets as though they'd arrived in order, which make the bits sequential again and the responder would reflect 1111000000. Otherwise any reordering permanently sticks around and at some point, it may look like noise instead of an RTT signal.

The original design I had in mind was that one packet would be marked with the spin bit, initially sent by the connection initiator, and it would ping-pong back and forth. This design is more friendly to designating a single packet type instead of using a dedicated bit, but it takes more work to deal with packet loss, since one side needs to re-start the signal if the packet with the spin bit set is lost. On the other hand, reordering doesn't need anything special.

martinthomson · 2017-06-18T22:54:48Z

@ianswett, it seems like watching for edges is more robust than having a single-bit signal.

@janaiyengar, if there is any value in having a shorter packet number, then you just proposed halving that value. Either that or you created incentive not to implement this scheme. Just spend the bit. The single packet type only works in Ian's proposed alternative design, but that design isn't anywhere near as robust.

huitema · 2017-06-19T03:48:59Z

We could certainly work with packet types instead of dedicated bits. The intermediate nodes would need to observe transitions of the first octet instead of looking at just one bit. But then, we also have a proposal to grease the first octet, for example by doing an XOR with a function of the last byte. That type of greasing would randomize the first octet. So we have a choice: if we want to grease the first octet, we need to dedicate a bit that is exempted from the greasing. If we don't then we can use a packet type.

martinthomson · 2017-06-19T04:06:36Z

Why do you say that the flip bit needs to be exempt from greasing? It's not like the intermediary is unable to reverse the greased transform as proposed. If greasing == encryption then that would be a whole different story of course.

huitema · 2017-06-19T04:21:04Z

OK, yes we can say that the latency bit should be exempt from "encrypted greasing". If greasing is easily decrypted by middle-boxes, then of course there is no issue. But then, I don't believe that a form of greasing that is easily decrypted is particularly useful.

huitema · 2017-06-23T00:50:45Z

By the way, I think I need to change the phrasing in the PR from "latest packet" to "packet with the highest sequence number". Otherwise, a ian pointed out, the mechanism can degenerate. For example, with a Window of 5, the client will send "11111" and the server will echo "11111", after which the client will send "00000", the sender "00000", etc. But suppose that the client sends "11111" and the server receives "111-gap-1". The client sends "00000", but due to the reordering the server receives "101111". Now it is echoing "101111", and the client is sending "010000". And so on. Reordering and gaps create flips in the sequence.

There are natural healing processes such as gaps and ack coalescing, but it is better to avoid the whole problem. If the server only echoes the bit from the highest sequence number received, then the process is much less sensitive to reordering.

huitema · 2017-07-05T00:24:45Z

Any idea how to resolve this issue? The thread is silent now. We have three plausible options:

Do nothing. This means that monitoring RTT will be hard, and will have to rely on heuristics.
What we described in the discussion as "using machine learning to derive information from
traffic patterns." It is probably doable, albeit error prone and somewhat expensive.
Adopt the dedicated bit design.
Change the design to have dedicated packet types instead. Currently we have "1-RTT Protected
(key phase 0)" and "1-RTT Protected (key phase 1)". We could have "1-RTT Protected (key phase 0)
with spin 0" and "1-RTT Protected (key phase 0) with spin 1". For the short packet form, we could
either keep the bit, much like we have the epoch bit, or create two packet types for the spin values,
so we don't burn the bit.

The behavior of option 3 would be about the same as the current PR, but would only enable measurement on 1RTT packets.

Personally, I don't care much. But if we are going to mess with the packet format, it is probably better to decide sooner rather than later.

britram · 2017-07-05T08:21:02Z

Since (1) would be a significant step back from the heuristics currently used for TCP latency, and we have a proof of concept that shows it is avoidable while honoring other goals we have for linkability resistance across five-tuple migrations (see #231, #598), I'm not very enthusiastic about doing nothing.

I think there are a couple of additional points in the design space beyond (2) and (3):

(4) Simple packet number echo as in (#269) would also support RTT measurement via demonstration of receipt, regardless of whether the packet number is encrypted or not.

(5) Simple echo of N (protected) bytes at offset M would also support RTT measurement via demonstration of receipt, without relying on the packet number being decodable in the header.

I'm pretty sure it's okay to restrict measurement to 1RTT packets. Any of these seem reasonable to me. (4) may have more or less utility than (2) and (3) depending on what we decide with respect to the semantics attached to path-visible packet numbers.

ianswett · 2017-07-05T16:19:36Z

I'm happy with either 2 or 3 and prefer them over 1, 4 or 5,

2 or 3 provides middleboxes both end to end RTT and downstream RTT, whereas 4 and 5 only allow measurement of downstream RTT.

ianswett · 2017-07-05T23:27:09Z

I did think of one quirk with 2 or 3 we likely need to add some text for, which is what do about upstream reordering. Even if the hosts are fixing reordering, if reordering is observed upstream, we either need to:

Supply a heuristic on how to filter it out.
Expose packet number so the middlebox can fix reordering.
Assume it doesn't matter in aggregate.

For #1, we could do a two packet edge trigger instead of 1 if packet number doesn't end up being exposed. Based on looking at traces and the metrics I've seen, I think that would fix the vast majority of reordering.

As long as packet number is exposed, I'd suggest we use that to fix the problem.

martinthomson · 2017-07-06T00:18:37Z

Right now, without packet number encryption, those are available. If they go away, then the heuristic approach is fine, and there is no harm in also saying that significant reordering might be impossible to detect, so anyone doing measurements has to accept the possibility that their measurements will have some noise.

@ianswett's suggestion to use multiple packets as a heuristic is fine, but you might only do that if you notice a large change in RTT (a severe drop is likely to appear in the case that you get reordering, for instance).

mirjak · 2017-07-10T12:26:28Z

If we keep exposing the packet number, I’m actually still in favor of having the packet number echo. I don’t think the overhead is an issue because you don’t have to send it on every packet, however, if you send it you can also use it as a confirmation signal where needed without adding extra bits. Also I’m in favor of exposing the packet number because this gives the network a simple estimate of re-ordering and so-far-on-the-path packet loss. While it is good to discuss these issue/properties separately, I don’t think these things are completely independent for the resulting wire format design (at least as long as overhead is a concern).

ihlar · 2017-07-20T12:08:00Z

Magnus Westerlund sent a proposal to the mailing list where the n least significant bits of the packet number are strictly increased by 1. We could let this portion of the packet number remain visible to the path, it would wrap quite frequently (based on the value of n) but the information would be good enough for most cases of reordering that otherwise might mess up the spin bit signal.

Magnus original proposal:
https://mailarchive.ietf.org/arch/msg/quic/tCcEECogBErU1_SWGZ9odtYAGEc

martinthomson · 2018-05-10T05:04:39Z

We've discussed this. We're undertaking an experiment. We'll need to leave this issue open until that experiment resolves one way or other.

britram · 2018-11-07T14:47:52Z

Unparking, per rough consensus to go forward with the spin bit in BKK.

I think the right way forward here is to:

refer to the spin-exp document from the new short header definition, once the first-octet discussion is complete (I checked the notes, but it's not clear that the discussion converged)
work out the details of a spin-bit based approach to signaling RTT in -spin-exp
work out the details of measuring the spin bit in -manageability (note: much of this is already in section 3.6, and needs only minor tweaks)
merge -spin-exp into -transport once we settle on the details.

janaiyengar · 2018-11-19T20:52:32Z

I'm closing this, since the spin bit is now in the draft. If we find that this doesn't yet have consensus, happy to reopen it. We can figure out editorial considerations amongst the editors.

This was referenced Jun 14, 2017

Latency Spin Bit #609

Closed

Packet number echo with fixed-length, 32-bit packet and echo numbers #393

Closed

Packet number echo with variable-length numbering #391

Closed

janaiyengar added -transport arch design An issue that affects the design of the protocol; resolution requires consensus. labels Jun 14, 2017

mnot changed the title ~~Explicit support for on-path calculation of RTT of QUIC flows~~ On-path calculation of RTT Jun 20, 2017

mnot mentioned this issue Jun 21, 2017

Public Packet Number Echo #269

Closed

britram mentioned this issue Jan 9, 2018

Latency Spin Bit, 2018 edition #1046

Closed

MikeBishop mentioned this issue Jan 21, 2018

Public routing info #205

Closed

MikeBishop added the needs-discussion An issue that needs more discussion before we can resolve it. label Mar 14, 2018

martinthomson removed the needs-discussion An issue that needs more discussion before we can resolve it. label May 10, 2018

martinthomson added the parked An issue that we can't immediately address; for future discussion. label Jun 4, 2018

martinthomson mentioned this issue Oct 16, 2018

Spin bit should be applied per each 5-tuple rather than per connection #1828

Closed

britram removed the parked An issue that we can't immediately address; for future discussion. label Nov 7, 2018

janaiyengar added has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. and removed has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. labels Nov 19, 2018

janaiyengar closed this as completed Nov 19, 2018

mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Nov 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On-path calculation of RTT #631

On-path calculation of RTT #631

britram commented Jun 14, 2017

janaiyengar commented Jun 14, 2017

janaiyengar commented Jun 14, 2017

mcmanus commented Jun 15, 2017 via email

martinthomson commented Jun 15, 2017

britram commented Jun 16, 2017 •

edited

Loading

ianswett commented Jun 16, 2017

britram commented Jun 16, 2017

ianswett commented Jun 16, 2017

janaiyengar commented Jun 17, 2017 via email

janaiyengar commented Jun 17, 2017 via email

ianswett commented Jun 17, 2017

martinthomson commented Jun 18, 2017

huitema commented Jun 19, 2017

martinthomson commented Jun 19, 2017

huitema commented Jun 19, 2017

huitema commented Jun 23, 2017

huitema commented Jul 5, 2017

britram commented Jul 5, 2017

ianswett commented Jul 5, 2017

ianswett commented Jul 5, 2017

martinthomson commented Jul 6, 2017

mirjak commented Jul 10, 2017

ihlar commented Jul 20, 2017

martinthomson commented May 10, 2018

britram commented Nov 7, 2018

janaiyengar commented Nov 19, 2018

On-path calculation of RTT #631

On-path calculation of RTT #631

Comments

britram commented Jun 14, 2017

janaiyengar commented Jun 14, 2017

janaiyengar commented Jun 14, 2017

mcmanus commented Jun 15, 2017 via email

martinthomson commented Jun 15, 2017

britram commented Jun 16, 2017 • edited Loading

ianswett commented Jun 16, 2017

britram commented Jun 16, 2017

ianswett commented Jun 16, 2017

janaiyengar commented Jun 17, 2017 via email

janaiyengar commented Jun 17, 2017 via email

ianswett commented Jun 17, 2017

martinthomson commented Jun 18, 2017

huitema commented Jun 19, 2017

martinthomson commented Jun 19, 2017

huitema commented Jun 19, 2017

huitema commented Jun 23, 2017

huitema commented Jul 5, 2017

britram commented Jul 5, 2017

ianswett commented Jul 5, 2017

ianswett commented Jul 5, 2017

martinthomson commented Jul 6, 2017

mirjak commented Jul 10, 2017

ihlar commented Jul 20, 2017

martinthomson commented May 10, 2018

britram commented Nov 7, 2018

janaiyengar commented Nov 19, 2018

britram commented Jun 16, 2017 •

edited

Loading