-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complicated Retransmission Corner Cases #765
Comments
@martinduke, This is where I think that I'd like your opinion on #743. Is this a duplicate of that issue? The model that I prefer is to hold the data in one place, but maintain a separate structure that tracks where various pieces of that data were sent. Having just reviewed code for DTLS that does exactly that, and having then worked on improving the retransmission logic, I can be fairly confident in saying that it's a much better strategy for managing this sort of thing. The idea that you can just copy data from the last/lost packet to the next one is rarely true. Enough so that attempting to do so is wasteful. |
When I was toying with this, I wound up tracking the delivery state of frames, not packets. Each retransmittable frame was pending send, in-flight, or delivered. Each packet that was outstanding was simply a list of frames that it contained, and changing a packet to lost or delivered was really walking its packets and updating their state. Every time I generated a packet, I filled it as full as possible with frames that were pending send, with retransmissions being first. And I agree, it's hairy. You're right that you eventually need to get rid of the lists from old lost packets, but ought to maintain them for a little while in case the packet (or the ACK) is merely delayed. However, there's no great harm in throwing it away too early (as you say, you'll get an ACK for a packet you no longer know anything about, and so accidentally resend a frame) and I'd say that can be left to the implementation. |
@martinthomson I think the issue is related, in that streams pretty much have to remain open if they're holding in flight data. But that's not how I got here. After playing with it both ways, I've settled on this question the same way you have. I think this section could use an implementation note to save people some of the trouble I went through. I'll mull a PR. |
So, GQUIC tracks delivery state of packets, in part because loss recovery and congestion control work at these scales. As I wrote to Martin earlier, GQUIC has a data structure that holds per-packet metadata about what frames were contained in a sent packet, and a deque of these per-packet structs. Once acked you can remove the entry. When marked as lost, we hang on to it with the packet number of the retransmission (which is a new packet in the same data structure), to allow spurious rtx detection, and to prevent future retransmissions. We also remember one past transmission in a chain of retransmitted packets, to avoid having an indefinitely long chain of past transmissions. (You'd want to be careful around removing old state around TLP, just to be safe.) @MikeBishop In addition to reliability, an ack also contributes towards RTT measurement, and a late ack is a very useful signal, which moves the RTT estimate to be higher than it otherwise would have been. I would strongly recommend holding on to packets for a while after it is marked as lost. @martinduke I think you can make the implementation a ton easier if you don't try to re-bundle retransmissions. Meaning, simply send retransmittable frames from a lost packet in a retransmission, without trying to pack it maximally with mixing and matching. Yes, it's a pain if the PTMU reduces mid-connection, but that situation is rare enough (I've never really seen it in practice) that I'm comfortable in punting on that in practice. Does that make sense? I think it's a good idea to keep the data in the streams and have frame-level metadata per-packet that has offsets into the stream about data that has been sent. This allows you to keep one copy of stream data without having to maintain a separate copy for sent data. @martinduke I'd love to look at a PR if you could send one along. |
In line with our established principles for retransmission, I've reworked the description of packetization. The description now concentrates on the information that is being repaired in response to perceived loss. This should help avoid the confusion about retransmission. Two new subsections are added to the packetization section. One covers the processing of packets and includes the existing text on processing requirements before acknowledgment. The other includes the retransmission logic. The retransmission section still mentions frame types, but I've tried to make that secondary to the description of the information that is being repaired. I also removed mention of "Regular QUIC packets", which only occurred in 3 places. Closes #463, #765.
In line with our established principles for retransmission, I've reworked the description of packetization. The description now concentrates on the information that is being repaired in response to perceived loss. This should help avoid the confusion about retransmission. Two new subsections are added to the packetization section. One covers the processing of packets and includes the existing text on processing requirements before acknowledgment. The other includes the retransmission logic. The retransmission section still mentions frame types, but I've tried to make that secondary to the description of the information that is being repaired. I also removed mention of "Regular QUIC packets", which only occurred in 3 places. Closes #463, #765.
In line with our established principles for retransmission, I've reworked the description of packetization. The description now concentrates on the information that is being repaired in response to perceived loss. This should help avoid the confusion about retransmission. Two new subsections are added to the packetization section. One covers the processing of packets and includes the existing text on processing requirements before acknowledgment. The other includes the retransmission logic. The retransmission section still mentions frame types, but I've tried to make that secondary to the description of the information that is being repaired. I also removed mention of "Regular QUIC packets", which only occurred in 3 places. Closes #463, #765.
My proposed text is to build off the new ack of acks ack block text and say that algorithm can also be used to decide when to stop tracking what was in older packets in order to detect/prevent spurious retransmissions. |
* What to track and send First pass at #765 and some of #1724 * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Reference transport * Update draft-ietf-quic-recovery.md Co-Authored-By: ianswett <ianswett@users.noreply.github.com> * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md
* What to track and send First pass at #765 and some of #1724 * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md * Reference transport * Update draft-ietf-quic-recovery.md Co-Authored-By: ianswett <ianswett@users.noreply.github.com> * Update draft-ietf-quic-recovery.md * Update draft-ietf-quic-recovery.md
The basic case for retransmission is pretty straightforward. I detect a packet loss, so I take all the stuff in that packet and send it in a new packet. There has to be some sort of data structure in the PCB to track what's in the packet. I can clear that data structure once the packet is acked.
There are two ambiguities here:
(1) How long do I hold on to the packet data if lost? I can delete it immediately, hold it for n retransmissions, or hold it until all acks and data in the packet are acked.
Deleting it immediately is very simple, but if the ack comes back for that packet later and the first retransmission is lost, I will needlessly resend that data.
Offline, Jana suggested keeping it for 1 retransmission. That is fine with me (minus the second item below) but I think some implementation guidance on when it is acceptable to throw away old packet data would be helpful.
(2) More importantly, fitting the whole retransmission into a single packet does not hold in the general case. There might be an MTU change; the updated ack might be bigger; optimizers like me might want to aggregate multiple small packets into a single retransmission; some ack frames might be supersets of other ack frames, so although there is no direct retransmission relationship acking one of those packets might obviate the ack frame of the other.
What all this implies is a many-to-many mapping where every time data or acks are acked, we have to clear that from all other packets that contain it. I started down this road before seeing that it was way too hairy to continue.
So we're at a point where the optimal wire performance -- never throw it away until all components are acked, allow mix-and-match of retransmissions, don't resend stuff if acked in some other packet -- is close to unimplementable. Some implementation guidance about what it's OK to not worry about, and perhaps some judicious prohibitions on particularly gnarly stuff, might help some.
The text was updated successfully, but these errors were encountered: