New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in RDP TX timeout #109
Comments
Thank you for the report. I will look into this before the 1.5 release. |
Hi again, sorry for the wait. I believe this is not a bug, but might have an explanation for your issue: The TX task is stopped at the point when the unacknowledged packet number equals the window size. And the TX task is woken again when the number is half. What needs to happen in order to break out of the wait, is simply to transmit the packets and receive an ACK from the counterpart. However, that's where i have found a related issue. Maybe your issue could be related to window size and queue lengths. But first a bit of reasoning: So we have a similar check in RX mode, where an ACK would only be sent if there is more than half the buffer available. Setting the queue length to 10 and the window size to 4 works fine. But increasing the window size to 5 now gives this problem where the receiver will not automatically ack because there is never room for half the window size or more. Consequently we can do two things. Either way, we should do a check on the RDP options in order to ensure the parameters match the queue length limitations. I am working on a patch that will be released in v1.5 to solve this. |
I think it makes sense, to have the same check "for waking Tx", as used in csp_rdp_send() - seems like a simple fix and why wake up the Tx task if its not allowed to send more messages. |
It's a balance whether to wake the TX task for each available slot, or when the TX queue is at half full. In the event that delayed acks is not used. The first case would lead to more context switches and overhead. Whereas the second (and current) choice could lead to a TX buffer underflow in the case where the wakeup period is longer than it takes to ack half a window_size. If delayed acks are used, where we only ack for > window_size/2, once the TX queue is flushed, there will most likely be more than enough room for window_size/2. So it will not make any difference. What would really make a difference was to put the check into csp_rdp_new_packet. We might as well wake the TX task when an ACK is received right away, rather than just setting the ack number, and waiting for the timeout_check function. |
When deciding wether to ack a packet, we previously would consider the RX queue as available if there was enough room for window_size * 2 packets. However with reduced conn_queue_length (the default setting is now 10 in our setup), this would reduce the maximum allowed window size to 4. Considering the fact that it is wastefull to reserve this much buffer space, we have reduced the number to only a single time the window_size. This would allow the window size to approach the conn_queue_length. The user should be mindfull that using a window_size of 10, with a queue_length of 10 could result in dropped packets, since some packets could be already in the air. So setting window size a bit below is always recommended. This is partially in response to issue libcsp#109
Just read this through again and would like to add a few remarks:
|
When deciding wether to ack a packet, we previously would consider the RX queue as available if there was enough room for window_size * 2 packets. However with reduced conn_queue_length (the default setting is now 10 in our setup), this would reduce the maximum allowed window size to 4. Considering the fact that it is wastefull to reserve this much buffer space, we have reduced the number to only a single time the window_size. This would allow the window size to approach the conn_queue_length. The user should be mindfull that using a window_size of 10, with a queue_length of 10 could result in dropped packets, since some packets could be already in the air. So setting window size a bit below is always recommended. This is partially in response to issue libcsp#109
We have seen the problem Marikka refers to and its easily reproduced by a simple test, where task A sends messages to task B, and task B doesn't do any receiving. |
Aha, thanks for clarifying, now I see the issue. I was blinded by the similar issue with the RX queue length I had just worked around. But I see now that the check on the TX side should have been window size divided by two, not multiplied, to work as I had expected. The current timeout check is completely invalid. I agree on that. This is very very old issue. At least 7 or 8 years. The deadlock seems to have been added 2 years ago when it was changed from an if to a while statement in RDP send function. I agree with the fix. |
Just to add to the confusion. I just did a test like the one you mentioned. It seems i cannot reproduce the problem because of the check in the line above the TX check: libcsp/src/transport/csp_rdp.c Lines 563 to 566 in 6e51733
The check for queue_size < window_size, overrules the bad check in line 565. And therefore it stil works in my example. No deadlock here. I'm trying to work out how you are coming to the deadlock situation, for that to occur the queue must not be full, and snd_nxt must be greater than snd_una + window. The latter might be the case at the very beginning, where the snd_una might be initialized to zero and snd_nxt might be initialized to a random number. Let me check that out. libcsp/src/transport/csp_rdp.c Lines 627 to 630 in 6e51733
Edit: It seems that snd_nxt and snd_una are initialized correctly. So i cannot explain why your TX task would sleep, and why the timeout wakes it? I still agree that this should be cleaned up, and corrected. But i still cannot reproduce the deadlock. |
Sorry for not pushing the fix - will try and do so very soon. |
…ome cases, the connection would switch to CLOSED immediately. RDP: Fixed connection leak, if a RST segment was received on a closed connecton. RDP: Ensure connection is closed from both userspace and protocol, before closing completely (preventing undetermined behaviour). RDP: Added support for fast close of a connection (skipping the CLOSE_WAIT period), but only if both ends agree on close. RDP: Fixed issue "Possible bug in RDP TX timeout" (#109), see issue on github for further details.
- Ensure connection is kept in CLOSE_WAIT for a period of time. In some cases, the connection would switch to CLOSED immediately. (#138) - Fixed connection leak, if a RST segment was received on a closed connection. - Ensure connection is closed from both userspace and protocol, before closing completely (preventing undetermined behaviour). - Added support for fast close of a connection (skipping the CLOSE_WAIT period), but only if both ends agree on close. - Fixed issue "Possible bug in RDP TX timeout" (#109), see issue on github for further details.
This issue has been fixed in #138 . |
The conditions on these two lines seem contradictory:
libcsp/src/transport/csp_rdp.c
Line 939 in 17bbef3
libcsp/src/transport/csp_rdp.c
Line 563 in 17bbef3
In the condition within the
csp_rdp_send
function, the window size is not multiplied by two.In the condition within the
csp_rdp_check_timeouts
function, the window size is multiplied by two.The effect is that the program can end up in a loop where the timeout here never happens:
libcsp/src/transport/csp_rdp.c
Line 944 in 17bbef3
The text was updated successfully, but these errors were encountered: