-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert Spurious Congestion Event #4836
Comments
With my chair hat on, this sounds like a substantial change to the recovery draft and we are too far along to consider it in v1. I also think that trying to triage issues for extensions or v2 should not be done on this repo, That said, it seems like something with merit that could easily be done as an extension. There are a suite of other I-Ds that can improve recovery without defining a congestion control algorithm so they could fall into the scope of the future charter, My advice would be to write a proposal in I-D format and solicit it to the mailing list. |
I am not proposing this as a change to v1. I also don't think this qualifies as an extension. There is no negotiation required here. It's purely a sender side, implementation detail (like much of loss recovery / congestion control). I also disagree that it's a substantial change (67 lines of C code to msquic). A whole doc/draft just for this seems like overkill. Waiting for v2 would be my preference (which may be sooner rather than later if we truly want to exercise VN soon). |
with chair hat off
My comment was a substantial change to the document, I don't care about code :)
Thanks for the clarification. I guess your proposal sounds like an extension to the interface, hence my suggestion that it is an extension, even if it is not part of the wire image and is not negotiated. Either way it seems like something that would be in scope for the WG's new charter. I honestly don't know at this time what a v2 would look like. I might suspect that we would rev the transport and recovery docs independently. It sounds useful, why couple yourself to an unknown v2 process timeline? I-Ds are cheap and even a document that has 1 page of prose can give people an anchor for implementation, deployment and discussion. |
Picoquic has implemented this since 2019, and I can confirm that (a) this is useful and (b) this is a unilateral decision by the implementation. I don't think we need to change the recovery draft, because that draft is only meant to specify minimal rules for developers who don't know any better. Those who know better read the litterature, and react accordingly. I documented the interaction between Cubic-based congestion control and spurious transmission in a blog published in November 2019. The handling has since been added to the Cubic "bis" draft, see section 4.9 of the bis draft. So we can say that this is now documented. There is a missing part for Cubic implementation. The Cubic draft does not explain how implementations of QUIC actually detect spurious retransmission. The solution implemented in picoquic is rather simple: if a packet is marked lost, remember information about that packet for a while -- maybe 3*RTT. If an ACK frame acknowledging the packet number is received during that interval, detect the spurious retransmission. It would probably be nice to add something like that to a revision of the recovery specification, but I don't think developers will spend too much time figuring this one out by themselves. So I assume that we can wait for the bis edition of the recovery RFC. By the way, the November 2019 blog mentioned three problems. Two of those are addressed in the revised Cubic draft: specify the units of the constants that Cubic uses, and specify the handling of spurious retransmission. The third problem is not so much with Cubic as with Hystart. Hystart uses RTT measurements to detect queue build-up and exit slow start before starting to lose packets, which is generally great. However, on some links, the RTT measurements are affected by a lot of jitter. This can cause early exit, which is not great. The solution implemented in picoquic is to implement a kind of low pass filter before using RTT measurements in Hystart -- I use the minimum of the last N measurements, with currently N=6. I also use the time stamp extension when available to isolate variations on the data path from those on the high path. I don't know if there is appetite in the WG to further document this problem. |
Thanks for the additional insight Christian. Is the hystart stuff something that could go into the hystart++ draft or is it something specific to QUIC? |
It should go in the Hystart++ draft. Is there one? Updated: I did find the hystart draft |
There was some discussion on NTAP/rfc8312bis#45
(Christian already explained his algorithm in the thread).
I also did some similar experiment a long time ago (
cloudflare/quiche#308) with Reno but the same thing
can be done with Cubic easily.
I believe it's beneficial for the loss/reorder recovered in a short time.
p.s. hystart++ I-D is here:
https://tools.ietf.org/html/draft-balasubramanian-tcpm-hystartplusplus-03
…On Fri, Mar 12, 2021 at 12:39 PM Christian Huitema ***@***.***> wrote:
It should go in the Hystart++ draft. Is there one?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4836 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJMHESHEUWY6KGDYUOFO3DTDJUWPANCNFSM4ZCPHQTQ>
.
--
Junho Choi <junho dot choi at gmail.com> | https://saturnsoft.net
|
I'd be supportive of an update to the recovery doc to incorporate detection of spurious retransmissions and any other changes which make sense(Scope TBD). I agree with @nibanks that this isn't well suited to an ID, given it's unilateral. I was going to put parked on this, indicating it might warrant discussion in v2, but I realize I really want the quicv2 label. |
Marking as |
The RFC shipped, closing this. We'll revist the issue as part of any |
This is not necessarily an "issue" but more of a "feature recommendation" and I'm writing it up after a Gather discussion with Jana. This could possibly be integrated into v2, though I don't know if there is any official process for that yet.
Problem space:
Currently, large enough packet reordering triggers the congestion event logic, possibly resulting in a significant degradation of performance. When the ACK eventually comes in to indicate to loss recovery that the "loss" was spurious, the damage has already been done to congestion control.
Proposed solution:
I propose to add a new interface to CC: "Spurious Congestion Event", which is simply triggered by loss recovery if/when all lost packets since the last congestion event end up being acknowledged. When this happens, CC should be able to "undo" the effect of the previous congestion event.
This should be quite trivial for existing QUIC implementations to implement since the spec already recommends tracking "lost" packets to be able to detect spurious loss. Assuming that logic exists the change to determine when these are found to all be acknowledged is likely fairly trivial. The changes on the CC side generally amount to saving a "backup" of the state whenever there is a congestion event, and then reverting to that state when it's found to be spurious.
We've implemented this in msquic and it effectively eliminates the effect of large reordering on performance (as measured in our simulated WAN perf testing). I believe picoquic also has similar logic and saw similar benefit (fyi @huitema to confirm).
The text was updated successfully, but these errors were encountered: