Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHOULD implement adaptive packet threshold loss detection #3571

Closed
ianswett opened this issue Apr 8, 2020 · 15 comments · Fixed by #3572
Closed

SHOULD implement adaptive packet threshold loss detection #3571

ianswett opened this issue Apr 8, 2020 · 15 comments · Fixed by #3572
Assignees
Labels
-recovery design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.

Comments

@ianswett
Copy link
Contributor

ianswett commented Apr 8, 2020

The RACK draft has this as a SHOULD, and this was worthwhile in our deployment experience, so I think we should change our MAY to a SHOULD.

From Section 4 of https://tools.ietf.org/html/draft-ietf-tcpm-rack-08
"3. The RACK reordering window SHOULD leverage Duplicate Selective
Acknowledgement (DSACK) information [RFC3708] to adaptively
estimate the duration of reordering events."

@ianswett ianswett self-assigned this Apr 8, 2020
ianswett added a commit that referenced this issue Apr 8, 2020
@martinthomson
Copy link
Member

Another normative dependency on an experimental RFC. And one that isn't in the downref registry. Great.

This seems fine, but it's a design change.

@ianswett ianswett added the design An issue that affects the design of the protocol; resolution requires consensus. label Apr 9, 2020
@ianswett ianswett removed the design An issue that affects the design of the protocol; resolution requires consensus. label Apr 9, 2020
@janaiyengar
Copy link
Contributor

I would like to not see a dependency on the RACK draft here. If we want this to become a SHOULD, we really need to spell out what an implementation SHOULD do more precisely and not just point to RACK.

That said, I'm not convinced that we should do a SHOULD here. I understand that you see an improvement (that might be worth noting in the spec), which is a perfectly good reason encourage others to implement and experiment, but why is a MAY not adequate, given that we have limited experience?

I also am not convinced that this is important enough to warrant a design change at this late stage. Do you think this performance impact is significant enough to argue for this change?

@larseggert larseggert added the design An issue that affects the design of the protocol; resolution requires consensus. label Apr 9, 2020
@ianswett
Copy link
Contributor Author

ianswett commented Apr 9, 2020

As Matt Olson pointed out, we already reference RACK informationally. We could remove all the references to RACK, but I think that's a separate question? We can more clearly explain what this means or re-instate the reference to RFC4653.

TCP has been doing this for a very long time, so I don't think the fact we have limited experience in QUIC is as important. In terms of 'new information', the performance improvement from adaptive packet threshold was measurable for both application latency and by reducing the rate of spurious retransmits. And I finally read the RACK draft and noticed it was a SHOULD there.

The original intent was to change a MAY to a SHOULD, which is a very small design change, but then I realized the existing text wasn't that clear, so I tried to improve it. We have to do another round of consensus calls in the next few weeks, so though this change is very small, it seems like the right thing to do to me.

@janaiyengar
Copy link
Contributor

I understand that this might have performance benefit that is worth exploring further. I question the need for doing it at this late stage, given that we will find more and more things that help with performance as we gain further experience with the protocol.

RACK is currently an informational reference, meaning that it is not required reading for an implementer. This change would make it a normative reference, and we don't have such dependencies on any TCP document at the moment. (I see that F-RTO is listed as normative, but it isn't really, I'll send a PR to fix that.)

This isn't a small change because just changing a MAY to a SHOULD isn't enough. We cannot change a "MAY adapt" to a "SHOULD adapt" without saying exactly what "SHOULD adapt" means for an implementer. At a minimum, this requires changes to the text and to the pseudo-code that include a precise description and an algorithm for adapting. (The RACK draft suggests one, but again, it's not clear to me whether that one is what we should be recommending. I want to try it out for sure though.)

On TCP's experience with this: While I understand that RACK has been deployed for a few years, I don't believe we know how much improvement this specific part of it brings, especially given the DSACK dependency in TCP (Praveen can perhaps shed some light on DSACK deployment at clients, but ... this isn't my central argument.). If there is public information about this, I would love to look at it to have a better understanding.

@ianswett
Copy link
Contributor Author

I still intend RACK to be an informational reference(PR uses a ?, not a !), not normative. The intent was to update from an older informational reference(TCP-NCR) to a newer one.

This mechanism was deployed in Linux long before RACK.

It seems your point is that by changing from MAY to SHOULD, that puts a requirement upon us to fully describe the algorithm in text and pseudocode? I don't see that as a hard requirement, and as you can see in my PR, I didn't intent to do that.

@MikeBishop
Copy link
Contributor

Does it need to be normative at all? What about simply noting that adaptive strategies such as X, Y, and Z have proven to be useful in TCP, and would apply equally to QUIC?

@ianswett
Copy link
Contributor Author

@MikeBishop Sure, that could work.

@janaiyengar
Copy link
Contributor

That works for me too, thanks!

@martinthomson martinthomson added the proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. label Apr 20, 2020
@igorlord
Copy link
Contributor

What about simply noting that adaptive strategies such as X, Y, and Z have proven to be useful in TCP, and would apply equally to QUIC?

My thinking is that "adaptive strategies" would not only "apply equally to QUIC" but be "significantly more important for QUIC". Operators are aware of the havoc that excessive reordering in their networks causes to TCP and have deployed middle boxes for restoring TCP ordering in such networks. Such network "optimizations" are not possible with QUIC, so QUIC will see more TCP reordering. We've seen such extreme reordering of QUIC in production.

@ianswett
Copy link
Contributor Author

@igorlord that's a good point. Do you think it warrants adding a statement like "Re-ordering could be more common with QUIC than TCP, because network elements cannot observe and fix the order of out-of-order packets."?

@igorlord
Copy link
Contributor

@ianswett I like this.

@LPardue LPardue added call-issued An issue that the Chairs have issued a Consensus call for. and removed proposal-ready An issue which has a proposal that is believed to be ready for a consensus call. labels Apr 29, 2020
@junhochoi
Copy link
Contributor

We've seen such extreme reordering of QUIC in production

@igorlord do you have a reference of those data? It'd be good to understand how the reordering pattern looks like in the production network.

@igorlord
Copy link
Contributor

@junhochoi The public data on this can be found in slides presented for IETF 105. Take a look at red "below the line" red dots on the graphs. At the time we thought the red dots represented loss in our observer infrastructure, but additional analysis after the IETF meeting confirmed that the signal was mostly due to very high reordering (more than 31 subsequent packets arrive before an older packet has arrived).

@ianswett
Copy link
Contributor Author

30 packet reordering certainly happens, though it is quite rare.

Some other MAPRG UDP reordering data

I'll update the PR to point out that UDP reordering may be more prevalent, but not add any new normative statements.

@igorlord
Copy link
Contributor

Thanks, @ianswett. We were deliberately focusing data collection on both "boring" and "interesting" network locations to get a wide range of scenarios.

@LPardue LPardue removed the call-issued An issue that the Chairs have issued a Consensus call for. label May 13, 2020
@LPardue LPardue added the has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list. label May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-recovery design An issue that affects the design of the protocol; resolution requires consensus. has-consensus An issue that the Chairs have determined has consensus, by canvassing the mailing list.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants