-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On-path calculation of RTT #631
Comments
Thanks for filing this issue -- I'd like to move the discussion out of the PR and to this issue. On your issue description: I'm not sure that we need to have bilateral agreement. If the mechanism is simple, and if we think the information exposed is reasonable, then we could mandate it. Two issues were raised on PR #609 :
|
There's another use for this bit/packet type -- it gives a data receiver one RTT sample per round trip, without requiring a PING frame. BTW, calling it Latency Spin Bit doesn't make sense -- there's positive or negative spin, no latency spin (unless you're thinking that the endpoints could fool the network about path latency with this bit :-)) |
On Wed, Jun 14, 2017 at 3:49 PM, janaiyengar ***@***.***> wrote:
Thanks for filing this issue -- I'd like to move the discussion out of the
PR and to this issue.
On your issue description: I'm not sure that we need to have bilateral
agreement. If the mechanism is simple, *and* if we think the information
exposed is reasonable, then we could mandate it.
reasonable is going to be in the eye of the beholder. In this case rather
than needing negotiation it might be as simple as saying "if the endpoint
does not wish to signal the path then always set the bit to 0". Otherwise
you might get less deterministic results :)
…-P
|
This isn't correct. Well, unless that you decide that sending shorter packet numbers isn't necessary every other round trip. Whatever we do here, we're burning a bit on this. Ian made another point being that this needs to be very clear about how to flip the bit locally. Packet reordering will cause extra edges if we aren't careful. Edges need to be driven based on receiving a different bit, but only if the packet number is larger than the last observed bit flip. |
This works nicely, and doesn't require any coordination to get the property of requiring bilateral cooperation for it to work. If we decided that we wanted unilateral exposure to work (i.e. that one endpoint could allow an on-path device to calculate its observed RTT and flight size without the cooperation of the other endpoint), then the spin bit could be driven off the transport protocol's control loop, simply flipping the bit once per RTT. This loses the (IMO very nice) property that this proposal has, though, that the latency exposure is completely separate from the transport mechanics.
Yes, although no amount of care will keep signaling from trailing or being otherwise inaccurate on pathologically lossy and reordering prone paths. I need to think about this a bit more, but I think these cases are easy enough to recognize heuristically that we don't need loss and reordering exposure to allow such samples to be detected and used as a "bad path" indication. |
As I mentioned in the PR, I think we can deal with reordering by specifying the spin to be sent is based on the spin of the largest received packet(for a given path) instead of the last received packet. |
largest received packet number? |
Yes, the "spin of the packet with the largest received packet number for a given path." |
Maybe use a known/fixed packet type instead of spin bit. I went back and
forth on this, and I'm leaning towards using a packet type. Irrespective of
greasing, we have packet types, such as Client Initial, that middleboxes
will want to identify; this type is no different. We need only specify a
single short packet type (with a 4-byte packet number).
This isn't correct. Well, unless that you decide that sending shorter
packet numbers isn't necessary every other round trip. Whatever we do here,
we're burning a bit on this.
Yes, that's what I said above -- burn bits on having a 4-byte packet
number. This is a 2-byte overhead per RTT, which is fine. We don't have to
burn an entire bit.
Ian made another point being that this needs to be very clear about how to
flip the bit locally. Packet reordering will cause extra edges if we aren't
careful. Edges need to be driven based on receiving a different bit, but
only if the packet number is larger than the last observed bit flip.
I don't think this matters (or maybe I'm missing something). I don't think
we need to make this more complicated than it is. A middlebox is basically
recording the time of this signal in one direction, and measuring RTT when
it sees the signal in the opposite direction. I don't see why reordering
matters here... the measured time is clearly a network RTT measurement.
Perhaps you're assuming that the middlebox needs to see the "largest" RTT,
but I think that assumes too much. RTT measurements at endpoints have to be
more careful, since RTT is basically tied entirely to retransmissions, but
I wouldn't presume that a middlebox wants something so specific.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#631 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKjg1FcijoUilXcXCcxNzPHnH1h4r8VQks5sEb9jgaJpZM4N59cW>
.
|
As I mentioned in the PR, I think we can deal with reordering by
specifying the spin to be sent is based on the spin of the largest received
packet(for a given path) instead of the last received packet.
Sorry, I missed this. As I said in my earlier response, you're assuming
that the middlebox needs to see the "largest" RTT, but I think that assumes
too much. RTT measurements at endpoints have to use the largest RTT, since
RTT is basically tied entirely to retransmissions. I don't think that's
necessarily what a middlebox would want. I would go with the simplest
mechanism. If there's substantial reordering, the middlebox should see two
distinct RTTs over several rounds. The middlebox can deal with anomalies
(by using appropriate filters, min/max/avg/ewma) based on what exactly it
wants to do with the RTT measurement.
|
Since the algorithm from Christian's PR is not described in this issue, I'll describe it as best I can, because I think there are two different algorithms that people may have in their minds, and I'm worried it's causing confusion. There are two roles, the initiator, who flips the spin value, and the responder, who reflects the spin value.
This means the spin signal may be present on any packet type, though we could decide to make it present only in 1RTT packets if we wanted to conserve packet types. The reordering issue I was concerned about is if either side sees 1111010000 instead of 111110000 as it was sent, then I was proposing it should treat the packets as though they'd arrived in order, which make the bits sequential again and the responder would reflect 1111000000. Otherwise any reordering permanently sticks around and at some point, it may look like noise instead of an RTT signal. The original design I had in mind was that one packet would be marked with the spin bit, initially sent by the connection initiator, and it would ping-pong back and forth. This design is more friendly to designating a single packet type instead of using a dedicated bit, but it takes more work to deal with packet loss, since one side needs to re-start the signal if the packet with the spin bit set is lost. On the other hand, reordering doesn't need anything special. |
@ianswett, it seems like watching for edges is more robust than having a single-bit signal. @janaiyengar, if there is any value in having a shorter packet number, then you just proposed halving that value. Either that or you created incentive not to implement this scheme. Just spend the bit. The single packet type only works in Ian's proposed alternative design, but that design isn't anywhere near as robust. |
We could certainly work with packet types instead of dedicated bits. The intermediate nodes would need to observe transitions of the first octet instead of looking at just one bit. But then, we also have a proposal to grease the first octet, for example by doing an XOR with a function of the last byte. That type of greasing would randomize the first octet. So we have a choice: if we want to grease the first octet, we need to dedicate a bit that is exempted from the greasing. If we don't then we can use a packet type. |
Why do you say that the flip bit needs to be exempt from greasing? It's not like the intermediary is unable to reverse the greased transform as proposed. If greasing == encryption then that would be a whole different story of course. |
OK, yes we can say that the latency bit should be exempt from "encrypted greasing". If greasing is easily decrypted by middle-boxes, then of course there is no issue. But then, I don't believe that a form of greasing that is easily decrypted is particularly useful. |
By the way, I think I need to change the phrasing in the PR from "latest packet" to "packet with the highest sequence number". Otherwise, a ian pointed out, the mechanism can degenerate. For example, with a Window of 5, the client will send "11111" and the server will echo "11111", after which the client will send "00000", the sender "00000", etc. But suppose that the client sends "11111" and the server receives "111-gap-1". The client sends "00000", but due to the reordering the server receives "101111". Now it is echoing "101111", and the client is sending "010000". And so on. Reordering and gaps create flips in the sequence. There are natural healing processes such as gaps and ack coalescing, but it is better to avoid the whole problem. If the server only echoes the bit from the highest sequence number received, then the process is much less sensitive to reordering. |
Any idea how to resolve this issue? The thread is silent now. We have three plausible options:
The behavior of option 3 would be about the same as the current PR, but would only enable measurement on 1RTT packets. Personally, I don't care much. But if we are going to mess with the packet format, it is probably better to decide sooner rather than later. |
Since (1) would be a significant step back from the heuristics currently used for TCP latency, and we have a proof of concept that shows it is avoidable while honoring other goals we have for linkability resistance across five-tuple migrations (see #231, #598), I'm not very enthusiastic about doing nothing. I think there are a couple of additional points in the design space beyond (2) and (3): (4) Simple packet number echo as in (#269) would also support RTT measurement via demonstration of receipt, regardless of whether the packet number is encrypted or not. (5) Simple echo of N (protected) bytes at offset M would also support RTT measurement via demonstration of receipt, without relying on the packet number being decodable in the header. I'm pretty sure it's okay to restrict measurement to 1RTT packets. Any of these seem reasonable to me. (4) may have more or less utility than (2) and (3) depending on what we decide with respect to the semantics attached to path-visible packet numbers. |
I'm happy with either 2 or 3 and prefer them over 1, 4 or 5, 2 or 3 provides middleboxes both end to end RTT and downstream RTT, whereas 4 and 5 only allow measurement of downstream RTT. |
I did think of one quirk with 2 or 3 we likely need to add some text for, which is what do about upstream reordering. Even if the hosts are fixing reordering, if reordering is observed upstream, we either need to:
For #1, we could do a two packet edge trigger instead of 1 if packet number doesn't end up being exposed. Based on looking at traces and the metrics I've seen, I think that would fix the vast majority of reordering. As long as packet number is exposed, I'd suggest we use that to fix the problem. |
Right now, without packet number encryption, those are available. If they go away, then the heuristic approach is fine, and there is no harm in also saying that significant reordering might be impossible to detect, so anyone doing measurements has to accept the possibility that their measurements will have some noise. @ianswett's suggestion to use multiple packets as a heuristic is fine, but you might only do that if you notice a large change in RTT (a severe drop is likely to appear in the case that you get reordering, for instance). |
If we keep exposing the packet number, I’m actually still in favor of having the packet number echo. I don’t think the overhead is an issue because you don’t have to send it on every packet, however, if you send it you can also use it as a confirmation signal where needed without adding extra bits. Also I’m in favor of exposing the packet number because this gives the network a simple estimate of re-ordering and so-far-on-the-path packet loss. While it is good to discuss these issue/properties separately, I don’t think these things are completely independent for the resulting wire format design (at least as long as overhead is a concern). |
Magnus Westerlund sent a proposal to the mailing list where the n least significant bits of the packet number are strictly increased by 1. We could let this portion of the packet number remain visible to the path, it would wrap quite frequently (based on the value of n) but the information would be good enough for most cases of reordering that otherwise might mess up the spin bit signal. Magnus original proposal: |
We've discussed this. We're undertaking an experiment. We'll need to leave this issue open until that experiment resolves one way or other. |
Unparking, per rough consensus to go forward with the spin bit in BKK. I think the right way forward here is to:
|
I'm closing this, since the spin bit is now in the draft. If we find that this doesn't yet have consensus, happy to reopen it. We can figure out editorial considerations amongst the editors. |
As discussed at the interim, it might be desirable to allow devices on-path observing two sides of a QUIC connection (or the subset of a multipath QUIC connection) to accurately estimate the RTT of the flow, either for in-band operational purposes (AQM) or general monitoring and measurement purposes (cf. an informal survey of IPFIX information elements suggesting multiple vendors shipping devices that perform this measurement for TCP.)
This facility should require bilateral agreement of both endpoints before an RTT signal is available on path; i.e. it should be optional.
This was a large part of the requirement behind #269, #391, and #393, which were expressed in terms of mechanism instead of requirements. #609 contains a proposal addressing this issue which is separate from packet numbering, and is therefore compatible with proposals to make packet numbering inaccessible to on path devices (see #231)
This issue was filed in part to uplevel the discussion on #609 to requirements.
The text was updated successfully, but these errors were encountered: