-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHOULD implement adaptive packet threshold loss detection #3571
Comments
Another normative dependency on an experimental RFC. And one that isn't in the downref registry. Great. This seems fine, but it's a design change. |
I would like to not see a dependency on the RACK draft here. If we want this to become a SHOULD, we really need to spell out what an implementation SHOULD do more precisely and not just point to RACK. That said, I'm not convinced that we should do a SHOULD here. I understand that you see an improvement (that might be worth noting in the spec), which is a perfectly good reason encourage others to implement and experiment, but why is a MAY not adequate, given that we have limited experience? I also am not convinced that this is important enough to warrant a design change at this late stage. Do you think this performance impact is significant enough to argue for this change? |
As Matt Olson pointed out, we already reference RACK informationally. We could remove all the references to RACK, but I think that's a separate question? We can more clearly explain what this means or re-instate the reference to RFC4653. TCP has been doing this for a very long time, so I don't think the fact we have limited experience in QUIC is as important. In terms of 'new information', the performance improvement from adaptive packet threshold was measurable for both application latency and by reducing the rate of spurious retransmits. And I finally read the RACK draft and noticed it was a SHOULD there. The original intent was to change a MAY to a SHOULD, which is a very small design change, but then I realized the existing text wasn't that clear, so I tried to improve it. We have to do another round of consensus calls in the next few weeks, so though this change is very small, it seems like the right thing to do to me. |
I understand that this might have performance benefit that is worth exploring further. I question the need for doing it at this late stage, given that we will find more and more things that help with performance as we gain further experience with the protocol. RACK is currently an informational reference, meaning that it is not required reading for an implementer. This change would make it a normative reference, and we don't have such dependencies on any TCP document at the moment. (I see that F-RTO is listed as normative, but it isn't really, I'll send a PR to fix that.) This isn't a small change because just changing a MAY to a SHOULD isn't enough. We cannot change a "MAY adapt" to a "SHOULD adapt" without saying exactly what "SHOULD adapt" means for an implementer. At a minimum, this requires changes to the text and to the pseudo-code that include a precise description and an algorithm for adapting. (The RACK draft suggests one, but again, it's not clear to me whether that one is what we should be recommending. I want to try it out for sure though.) On TCP's experience with this: While I understand that RACK has been deployed for a few years, I don't believe we know how much improvement this specific part of it brings, especially given the DSACK dependency in TCP (Praveen can perhaps shed some light on DSACK deployment at clients, but ... this isn't my central argument.). If there is public information about this, I would love to look at it to have a better understanding. |
I still intend RACK to be an informational reference(PR uses a ?, not a !), not normative. The intent was to update from an older informational reference(TCP-NCR) to a newer one. This mechanism was deployed in Linux long before RACK. It seems your point is that by changing from MAY to SHOULD, that puts a requirement upon us to fully describe the algorithm in text and pseudocode? I don't see that as a hard requirement, and as you can see in my PR, I didn't intent to do that. |
Does it need to be normative at all? What about simply noting that adaptive strategies such as X, Y, and Z have proven to be useful in TCP, and would apply equally to QUIC? |
@MikeBishop Sure, that could work. |
That works for me too, thanks! |
My thinking is that "adaptive strategies" would not only "apply equally to QUIC" but be "significantly more important for QUIC". Operators are aware of the havoc that excessive reordering in their networks causes to TCP and have deployed middle boxes for restoring TCP ordering in such networks. Such network "optimizations" are not possible with QUIC, so QUIC will see more TCP reordering. We've seen such extreme reordering of QUIC in production. |
@igorlord that's a good point. Do you think it warrants adding a statement like "Re-ordering could be more common with QUIC than TCP, because network elements cannot observe and fix the order of out-of-order packets."? |
@ianswett I like this. |
@igorlord do you have a reference of those data? It'd be good to understand how the reordering pattern looks like in the production network. |
@junhochoi The public data on this can be found in slides presented for IETF 105. Take a look at red "below the line" red dots on the graphs. At the time we thought the red dots represented loss in our observer infrastructure, but additional analysis after the IETF meeting confirmed that the signal was mostly due to very high reordering (more than 31 subsequent packets arrive before an older packet has arrived). |
Some other MAPRG UDP reordering data I'll update the PR to point out that UDP reordering may be more prevalent, but not add any new normative statements. |
Thanks, @ianswett. We were deliberately focusing data collection on both "boring" and "interesting" network locations to get a wide range of scenarios. |
The RACK draft has this as a SHOULD, and this was worthwhile in our deployment experience, so I think we should change our MAY to a SHOULD.
From Section 4 of https://tools.ietf.org/html/draft-ietf-tcpm-rack-08
"3. The RACK reordering window SHOULD leverage Duplicate Selective
Acknowledgement (DSACK) information [RFC3708] to adaptively
estimate the duration of reordering events."
The text was updated successfully, but these errors were encountered: