-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FORWARD-TSN for unreceived chunk results in ABORT #597
Comments
By looking at the |
Thanks Michael. To help you pinpoint the issue, I have created a minimal reproducing sample, and learned some things while doing it. I'm essentially trying to find out if there is any way I can ensure that the packets I sent out will not provoke this issue. It's not a super-common issue, but if I can avoid it (as upgrading all clients will take some time), I'd prefer to do that.
For this to trigger, the second packet (with a TSN much larger than what was previously seen) has to have at least two DATA chunks in it, with the first having the END bit set, and the last having the BEGINNING bit set. Any other combinations, and the ABORT will not trigger, and a normal SACK will be sent instead. |
Hi again, I'm wondering if you have any idea if it's possible to work around the problem - to ensure that usrsctp doesn't abort? With the packetdrill and the sequence above, I tried to always send FORWARD-TSN as the only chunk in its packet, as bundling it with a DATA chunk seemed to reproduce the issue more frequently, but when that workaround was evaluated in production, it still happened. With the popularity of usrsctp as SCTP library in most browsers - and which will remain so for a long time, I really need to consider working around the issue even if it would be corrected. Thanks! |
Let me see if I can figure out what the bug is. Then it might be able to give a way to work around it. I'm a bit surprised by the plan to work around the bug, even if it is fixed. I'm not familiar with the browser world, but when talking at the beginning about the usrsctp integration into browsers, the statement was that not much effort needs to go into testing, since the browser vendors can roll out bug fixes world wide very fast. So has this changed and code stays much longer in use? |
While it is possible to release emergency fixes quickly or use our experiment framework to disable codepaths, I don't think that a fix for this issue would qualify, so it would have to go through the regular pipeline which takes a few months at least between the time it takes to merge the fix to the fix being promoted to stable. Some other browsers are also only updated with OS releases, which might be postponed indefinitely by the users. There are also a lot of users with fixed older versions (typically enterprise users) and other releases integrating Chromium (such as Electron or CEF) that might take a while to be updated. You also have a lot more users using SaaS products based on older versions of libwebrtc (and thus usrsctp) trying to communicate with all kinds of browser versions. Hence, a workaround is probably needed in dcSCTP and usrsctp might benefit from one as well. |
Thanks Michael - looking forward to your findings. Let me know if I can be of any assistance. Adding to what @Orphis said, while Chrome and Firefox on the consumer desktop are often up-to-date, mobile browsers and also mobile apps that use usrsctp, that are not strictly browsers. also have a lower uptake of new versions. Some mobile phones simply don't have enough storage to update, or are stuck on old OS versions that the new versions of the apps are incompatible with and will never be updated. Firefox says that it takes 4 weeks to reach 70% with the new version, and I don't know if the remaining 30% are ever updated... |
Thanks for the explanation! |
Will report here by the end of this week.
Thanks for the data point. |
Just for reference, and perhaps it helps during troubleshooting, this is a packetdrill that reproduced the issue with FORWARD-TSN in its own packet, not bundled with any DATA. Another difference here (from previous example), is that there is only a single fragmented message in-flight, and the FORWARD-TSN is not moved beyond it - just skipping a few unfragmented messages that were sent prior.
|
Here is a minimal (at least according to my current thinking) reproducer of the issue:
Still contemplating about the best way to fix this... |
Thanks for the update, Michael. I agree - it's very similar to my previous comment which was an actual capture "from the wild", but even shorter. |
Hi again, Just wondering if you have made any progress on this issue? Thanks, and have a great weekend! |
I'm interrupted by other bugs. Need to look into more detail. Planned for next week. |
Thanks, Michael. |
Hi Michael, I'm just checking in to see if you have had time to look more into this. Thanks! |
Haven't nailed down it yet. The semester just started and the IETF meeting is next week... Will look into it. Best regards
|
This proved to be not very efficient unfortunately, so revert it and keep bundling FORWARD-TSN with other packets to be more efficient. sctplab/usrsctp#597 is still unresolved. Note that this is not a clean revert; The logic to rate limit the sending of FORWARD-TSN is kept, as it still makes sense. This partly reverts commit 0ca62e3. Bug: webrtc:12961 Change-Id: I42728434290e7ece19e9c23f24ef6f3d3b171315 Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/259520 Reviewed-by: Harald Alvestrand <hta@webrtc.org> Commit-Queue: Victor Boivie <boivie@webrtc.org> Cr-Commit-Position: refs/heads/main@{#36584}
Upstream commit: https://webrtc.googlesource.com/src/+/ddc2f334c49c5287dae7ade273a530f4817ee0a6 Revert "dcsctp: Avoid bundling FORWARD-TSN and DATA chunks" This proved to be not very efficient unfortunately, so revert it and keep bundling FORWARD-TSN with other packets to be more efficient. sctplab/usrsctp#597 is still unresolved. Note that this is not a clean revert; The logic to rate limit the sending of FORWARD-TSN is kept, as it still makes sense. This partly reverts commit 0ca62e3752149ad37f73bf074db0a5f8fcaf6585. Bug: webrtc:12961 Change-Id: I42728434290e7ece19e9c23f24ef6f3d3b171315 Reviewed-on: https://webrtc-review.googlesource.com/c/src/+/259520 Reviewed-by: Harald Alvestrand <hta@webrtc.org> Commit-Queue: Victor Boivie <boivie@webrtc.org> Cr-Commit-Position: refs/heads/main@{#36584}
@tuexen Any chance of restarting work on this? |
dcSCTP seems to be able to provoke usrsctp to send ABORT in some situations, as described in sctplab/usrsctp#597. Using a packetdrill script, it seems as a contributing factor to this behavior is when a FORWARD-TSN chunk is bundled with a DATA chunk. This is a valid and recommended pattern in RFC3758: "F2) The data sender SHOULD always attempt to bundle an outgoing FORWARD TSN with outbound DATA chunks for efficiency." However, that seems to be a rare event in usrsctp, which generally sends each FORWARD-TSN in a separate packet. By doing the same, the assumption is that this scenario will generally be avoided. With many browsers and other clients using usrsctp, and which will not be upgraded for a long time, there is an advantage of avoiding the issue even if it is according to specification. Before this change, a FORWARD-TSN was bundled with outgoing DATA and due to this, it wasn't rate limited as the overhead was very little. With this change, a rate limiting behavior has been added to avoid sending too many FORWARD-TSN in small packets. It will be sent every RTT, or 200 ms, whichever is smallest. This is also described in the RFC. Bug: webrtc:12961 Change-Id: I3d8036a34f999f405958982534bfa0e99e330da3
This proved to be not very efficient unfortunately, so revert it and keep bundling FORWARD-TSN with other packets to be more efficient. sctplab/usrsctp#597 is still unresolved. Note that this is not a clean revert; The logic to rate limit the sending of FORWARD-TSN is kept, as it still makes sense. This partly reverts commit 0ca62e3. Bug: webrtc:12961 Change-Id: I42728434290e7ece19e9c23f24ef6f3d3b171315
This proved to be not very efficient unfortunately, so revert it and keep bundling FORWARD-TSN with other packets to be more efficient. sctplab/usrsctp#597 is still unresolved. This reverts commit 0ca62e3. Bug: Change-Id: I42728434290e7ece19e9c23f24ef6f3d3b171315
This is similar to issue #592, which was found to be a dcSCTP issue. In this case, I don't exactly know why usrsctp sends an ABORT. I have a packetdrill script that reproduces it (I haven't rebuilt the FreeBSD kernel in 21 years, so that was fun - and it doesn't seem to have changed a bit!)
So: The scenario is as follows:
Peer A is sending a lot of data. Some of these packets are lost (representing TSN ranges [284, 297], and some of these packets had a limited retransmission timeout, which expired before they could be retransmitted. So when the chunks were supposed to be resent, a FORWARD-TSN was sent to "skip" some of these chunks, while also bundling some chunks that hasn't yet expired but that were scheduled for retransmission.
The packetdrill script:
0x120 is 288, which is an end chunk, but the receiver doesn't even know that, as it has never seen that chunk.
I have the full PCAP of the sender side, if it would be of any help. And I have the most highest regard of usrsctp when it comes to protocol conformity, so I look forward to learning where my understanding of RFC3758 is lacking.
Thanks,
Victor
The text was updated successfully, but these errors were encountered: