Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-path calculation of RTT #631

Closed
britram opened this issue Jun 14, 2017 · 26 comments
Closed

On-path calculation of RTT #631

britram opened this issue Jun 14, 2017 · 26 comments
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@britram
Copy link
Contributor

britram commented Jun 14, 2017

As discussed at the interim, it might be desirable to allow devices on-path observing two sides of a QUIC connection (or the subset of a multipath QUIC connection) to accurately estimate the RTT of the flow, either for in-band operational purposes (AQM) or general monitoring and measurement purposes (cf. an informal survey of IPFIX information elements suggesting multiple vendors shipping devices that perform this measurement for TCP.)

This facility should require bilateral agreement of both endpoints before an RTT signal is available on path; i.e. it should be optional.

This was a large part of the requirement behind #269, #391, and #393, which were expressed in terms of mechanism instead of requirements. #609 contains a proposal addressing this issue which is separate from packet numbering, and is therefore compatible with proposals to make packet numbering inaccessible to on path devices (see #231)

This issue was filed in part to uplevel the discussion on #609 to requirements.

@janaiyengar janaiyengar added -transport arch design An issue that affects the design of the protocol; resolution requires consensus. labels Jun 14, 2017
@janaiyengar
Copy link
Contributor

Thanks for filing this issue -- I'd like to move the discussion out of the PR and to this issue.

On your issue description: I'm not sure that we need to have bilateral agreement. If the mechanism is simple, and if we think the information exposed is reasonable, then we could mandate it.

Two issues were raised on PR #609 :

  • Maybe use a known/fixed packet type instead of spin bit. I went back and forth on this, and I'm leaning towards using a packet type. Irrespective of greasing, we have packet types, such as Client Initial, that middleboxes will want to identify; this type is no different. We need only specify a single short packet type (with a 4-byte packet number).

  • Specify that this signal is per-path. Endpoints will continue receiving and sending packets as usual, so the signal will continuously be rallied between the endpoints despite any path changes. A middlebox that is on one of the two paths will stop or start seeing all packets, and there's nothing about the marked packet that requires it to be the "first" packet. The only place where it might matter is with multipath, but we're not there yet.

@janaiyengar
Copy link
Contributor

There's another use for this bit/packet type -- it gives a data receiver one RTT sample per round trip, without requiring a PING frame.

BTW, calling it Latency Spin Bit doesn't make sense -- there's positive or negative spin, no latency spin (unless you're thinking that the endpoints could fool the network about path latency with this bit :-))

@mcmanus
Copy link
Contributor

mcmanus commented Jun 15, 2017 via email

@martinthomson
Copy link
Member

@janaiyengar,

Maybe use a known/fixed packet type instead of spin bit. I went back and forth on this, and I'm leaning towards using a packet type. Irrespective of greasing, we have packet types, such as Client Initial, that middleboxes will want to identify; this type is no different. We need only specify a single short packet type (with a 4-byte packet number).

This isn't correct. Well, unless that you decide that sending shorter packet numbers isn't necessary every other round trip. Whatever we do here, we're burning a bit on this.

Ian made another point being that this needs to be very clear about how to flip the bit locally. Packet reordering will cause extra edges if we aren't careful. Edges need to be driven based on receiving a different bit, but only if the packet number is larger than the last observed bit flip.

@britram
Copy link
Contributor Author

britram commented Jun 16, 2017

In this case rather
than needing negotiation it might be as simple as saying "if the endpoint
does not wish to signal the path then always set the bit to 0".

This works nicely, and doesn't require any coordination to get the property of requiring bilateral cooperation for it to work.

If we decided that we wanted unilateral exposure to work (i.e. that one endpoint could allow an on-path device to calculate its observed RTT and flight size without the cooperation of the other endpoint), then the spin bit could be driven off the transport protocol's control loop, simply flipping the bit once per RTT. This loses the (IMO very nice) property that this proposal has, though, that the latency exposure is completely separate from the transport mechanics.

Edges need to be driven based on receiving a different bit, but only if the packet number is larger than the last observed bit flip.

Yes, although no amount of care will keep signaling from trailing or being otherwise inaccurate on pathologically lossy and reordering prone paths. I need to think about this a bit more, but I think these cases are easy enough to recognize heuristically that we don't need loss and reordering exposure to allow such samples to be detected and used as a "bad path" indication.

@ianswett
Copy link
Contributor

As I mentioned in the PR, I think we can deal with reordering by specifying the spin to be sent is based on the spin of the largest received packet(for a given path) instead of the last received packet.

@britram
Copy link
Contributor Author

britram commented Jun 16, 2017

largest received packet number?

@ianswett
Copy link
Contributor

Yes, the "spin of the packet with the largest received packet number for a given path."

@janaiyengar
Copy link
Contributor

janaiyengar commented Jun 17, 2017 via email

@janaiyengar
Copy link
Contributor

janaiyengar commented Jun 17, 2017 via email

@ianswett
Copy link
Contributor

Since the algorithm from Christian's PR is not described in this issue, I'll describe it as best I can, because I think there are two different algorithms that people may have in their minds, and I'm worried it's causing confusion.

There are two roles, the initiator, who flips the spin value, and the responder, who reflects the spin value.

  1. The initiator sends a value on a flight of packets(ie: 0)
  2. The responder reflects back the 0.
  3. When the initiator receives the first 0, it starts sending out 1's.
  4. The responder reflects back the 1's.
  5. The initiator starts sending 0's once it receives a 1, and goto 2.

This means the spin signal may be present on any packet type, though we could decide to make it present only in 1RTT packets if we wanted to conserve packet types.

The reordering issue I was concerned about is if either side sees 1111010000 instead of 111110000 as it was sent, then I was proposing it should treat the packets as though they'd arrived in order, which make the bits sequential again and the responder would reflect 1111000000. Otherwise any reordering permanently sticks around and at some point, it may look like noise instead of an RTT signal.

The original design I had in mind was that one packet would be marked with the spin bit, initially sent by the connection initiator, and it would ping-pong back and forth. This design is more friendly to designating a single packet type instead of using a dedicated bit, but it takes more work to deal with packet loss, since one side needs to re-start the signal if the packet with the spin bit set is lost. On the other hand, reordering doesn't need anything special.

@martinthomson
Copy link
Member

@ianswett, it seems like watching for edges is more robust than having a single-bit signal.

@janaiyengar, if there is any value in having a shorter packet number, then you just proposed halving that value. Either that or you created incentive not to implement this scheme. Just spend the bit. The single packet type only works in Ian's proposed alternative design, but that design isn't anywhere near as robust.

@huitema
Copy link
Contributor

huitema commented Jun 19, 2017

We could certainly work with packet types instead of dedicated bits. The intermediate nodes would need to observe transitions of the first octet instead of looking at just one bit. But then, we also have a proposal to grease the first octet, for example by doing an XOR with a function of the last byte. That type of greasing would randomize the first octet. So we have a choice: if we want to grease the first octet, we need to dedicate a bit that is exempted from the greasing. If we don't then we can use a packet type.

@martinthomson
Copy link
Member

Why do you say that the flip bit needs to be exempt from greasing? It's not like the intermediary is unable to reverse the greased transform as proposed. If greasing == encryption then that would be a whole different story of course.

@huitema
Copy link
Contributor

huitema commented Jun 19, 2017

OK, yes we can say that the latency bit should be exempt from "encrypted greasing". If greasing is easily decrypted by middle-boxes, then of course there is no issue. But then, I don't believe that a form of greasing that is easily decrypted is particularly useful.

@mnot mnot changed the title Explicit support for on-path calculation of RTT of QUIC flows On-path calculation of RTT Jun 20, 2017
@huitema
Copy link
Contributor

huitema commented Jun 23, 2017

By the way, I think I need to change the phrasing in the PR from "latest packet" to "packet with the highest sequence number". Otherwise, a ian pointed out, the mechanism can degenerate. For example, with a Window of 5, the client will send "11111" and the server will echo "11111", after which the client will send "00000", the sender "00000", etc. But suppose that the client sends "11111" and the server receives "111-gap-1". The client sends "00000", but due to the reordering the server receives "101111". Now it is echoing "101111", and the client is sending "010000". And so on. Reordering and gaps create flips in the sequence.

There are natural healing processes such as gaps and ack coalescing, but it is better to avoid the whole problem. If the server only echoes the bit from the highest sequence number received, then the process is much less sensitive to reordering.

@huitema
Copy link
Contributor

huitema commented Jul 5, 2017

Any idea how to resolve this issue? The thread is silent now. We have three plausible options:

  1. Do nothing. This means that monitoring RTT will be hard, and will have to rely on heuristics.
    What we described in the discussion as "using machine learning to derive information from
    traffic patterns." It is probably doable, albeit error prone and somewhat expensive.

  2. Adopt the dedicated bit design.

  3. Change the design to have dedicated packet types instead. Currently we have "1-RTT Protected
    (key phase 0)" and "1-RTT Protected (key phase 1)". We could have "1-RTT Protected (key phase 0)
    with spin 0" and "1-RTT Protected (key phase 0) with spin 1". For the short packet form, we could
    either keep the bit, much like we have the epoch bit, or create two packet types for the spin values,
    so we don't burn the bit.

The behavior of option 3 would be about the same as the current PR, but would only enable measurement on 1RTT packets.

Personally, I don't care much. But if we are going to mess with the packet format, it is probably better to decide sooner rather than later.

@britram
Copy link
Contributor Author

britram commented Jul 5, 2017

Since (1) would be a significant step back from the heuristics currently used for TCP latency, and we have a proof of concept that shows it is avoidable while honoring other goals we have for linkability resistance across five-tuple migrations (see #231, #598), I'm not very enthusiastic about doing nothing.

I think there are a couple of additional points in the design space beyond (2) and (3):

(4) Simple packet number echo as in (#269) would also support RTT measurement via demonstration of receipt, regardless of whether the packet number is encrypted or not.

(5) Simple echo of N (protected) bytes at offset M would also support RTT measurement via demonstration of receipt, without relying on the packet number being decodable in the header.

I'm pretty sure it's okay to restrict measurement to 1RTT packets. Any of these seem reasonable to me. (4) may have more or less utility than (2) and (3) depending on what we decide with respect to the semantics attached to path-visible packet numbers.

@ianswett
Copy link
Contributor

ianswett commented Jul 5, 2017

I'm happy with either 2 or 3 and prefer them over 1, 4 or 5,

2 or 3 provides middleboxes both end to end RTT and downstream RTT, whereas 4 and 5 only allow measurement of downstream RTT.

@ianswett
Copy link
Contributor

ianswett commented Jul 5, 2017

I did think of one quirk with 2 or 3 we likely need to add some text for, which is what do about upstream reordering. Even if the hosts are fixing reordering, if reordering is observed upstream, we either need to:

  1. Supply a heuristic on how to filter it out.
  2. Expose packet number so the middlebox can fix reordering.
  3. Assume it doesn't matter in aggregate.

For #1, we could do a two packet edge trigger instead of 1 if packet number doesn't end up being exposed. Based on looking at traces and the metrics I've seen, I think that would fix the vast majority of reordering.

As long as packet number is exposed, I'd suggest we use that to fix the problem.

@martinthomson
Copy link
Member

Right now, without packet number encryption, those are available. If they go away, then the heuristic approach is fine, and there is no harm in also saying that significant reordering might be impossible to detect, so anyone doing measurements has to accept the possibility that their measurements will have some noise.

@ianswett's suggestion to use multiple packets as a heuristic is fine, but you might only do that if you notice a large change in RTT (a severe drop is likely to appear in the case that you get reordering, for instance).

@mirjak
Copy link
Contributor

mirjak commented Jul 10, 2017

If we keep exposing the packet number, I’m actually still in favor of having the packet number echo. I don’t think the overhead is an issue because you don’t have to send it on every packet, however, if you send it you can also use it as a confirmation signal where needed without adding extra bits. Also I’m in favor of exposing the packet number because this gives the network a simple estimate of re-ordering and so-far-on-the-path packet loss. While it is good to discuss these issue/properties separately, I don’t think these things are completely independent for the resulting wire format design (at least as long as overhead is a concern).

@ihlar
Copy link
Contributor

ihlar commented Jul 20, 2017

Magnus Westerlund sent a proposal to the mailing list where the n least significant bits of the packet number are strictly increased by 1. We could let this portion of the packet number remain visible to the path, it would wrap quite frequently (based on the value of n) but the information would be good enough for most cases of reordering that otherwise might mess up the spin bit signal.

Magnus original proposal:
https://mailarchive.ietf.org/arch/msg/quic/tCcEECogBErU1_SWGZ9odtYAGEc

@MikeBishop MikeBishop added the needs-discussion An issue that needs more discussion before we can resolve it. label Mar 14, 2018
@martinthomson martinthomson removed the needs-discussion An issue that needs more discussion before we can resolve it. label May 10, 2018
@martinthomson
Copy link
Member

We've discussed this. We're undertaking an experiment. We'll need to leave this issue open until that experiment resolves one way or other.

@martinthomson martinthomson added the parked An issue that we can't immediately address; for future discussion. label Jun 4, 2018
@britram britram removed the parked An issue that we can't immediately address; for future discussion. label Nov 7, 2018
@britram
Copy link
Contributor Author

britram commented Nov 7, 2018

Unparking, per rough consensus to go forward with the spin bit in BKK.

I think the right way forward here is to:

  • refer to the spin-exp document from the new short header definition, once the first-octet discussion is complete (I checked the notes, but it's not clear that the discussion converged)
  • work out the details of a spin-bit based approach to signaling RTT in -spin-exp
  • work out the details of measuring the spin bit in -manageability (note: much of this is already in section 3.6, and needs only minor tweaks)
  • merge -spin-exp into -transport once we settle on the details.

@janaiyengar janaiyengar added has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. and removed has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. labels Nov 19, 2018
@janaiyengar
Copy link
Contributor

I'm closing this, since the spin bit is now in the draft. If we find that this doesn't yet have consensus, happy to reopen it. We can figure out editorial considerations amongst the editors.

@mnot mnot added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

No branches or pull requests

10 participants