Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcp_output might not be necessary after tcp_write #77

Closed
vitotai opened this issue Jan 3, 2018 · 6 comments
Closed

tcp_output might not be necessary after tcp_write #77

vitotai opened this issue Jan 3, 2018 · 6 comments
Labels

Comments

@vitotai
Copy link

vitotai commented Jan 3, 2018

Description:
_tx_unacked_len is used in AsyncClient::_sent() and AsyncClient::send(). The logic seems to assume that tcp_write() is necessary to send the data.
However, the wiki (http://lwip.wikia.com/wiki/Raw/TCP) indicates that

tcp_write() merely enqueues TCP data for later transmission; it does not actually start transmitting. Nevertheless, when tcp_write() is called from within a recv callback as in this example, there is no need to call tcp_output() to start transmission of sent data (indeed, tcp_output() specifically declines to do anything if it is called from within the recv callback). When you have returned from the recv callback, the stack will automatically initiate sending of any data -- and the ACK for the remote client's preceding packet will be combined with the first outgoing data segment. If tcp_write() is called through some other path (perhaps as a result of some event outside of lwIP processing), it may be necessary to call tcp_output to initiate data transmission.

Scenario:
I am using AsyncEventSource of ESPAsycWebServer and sending some configuration information at the establishment of the connection, onConnect() of AsyncEventSource. The connection is broken because of ACK timeout according the logic described above.
Adding some printing, I found that tcp_sent() acknowledges data that is written by tcp_write() without calling tcp_output().
The scenario seems to be
Peer -> EventSrouce request -> onConnect -> send data by calling AsyncEventSourceMessage::send().

AsyncEventSourceMessage::send() seems to fail to call client->send(), so it fails to call tcp_output().
However, the data is sent by only tcp_write(), and ACKed by tcp_sent() later, before _tx_unacked_len can be added by AsyncClient::send().

The printout I got by adding printf() in ESPAsyncTCP.cpp:

add:_tx_unsent_len:0 , will_send:59 size:59
send:_tx_unacked_len:0 _tx_unsent_len:59
add:_tx_unsent_len:0 , will_send:438 size:438
add:_tx_unsent_len:438 , will_send:131 size:131
add:_tx_unsent_len:569 , will_send:50 size:50
add:_tx_unsent_len:619 , will_send:46 size:46
add:_tx_unsent_len:665 , will_send:139 size:139
_sent: _tx_unacked_len:59, len=863
Ack timeout!!!

My temporal workaround is calling AsyncClient::send() in main loop instead of onConnect(). It seems to work fine. However, it's just workaround, not solution.

@zhivko
Copy link

zhivko commented Nov 19, 2018

Is this also true for esp32? @vitotai have you patched your fork of asyncwebserver? I would like to try it (I am on esp32).

@vitotai
Copy link
Author

vitotai commented Nov 20, 2018

I did not modify ESPAsyncWebServer, I just don’t send data in onConnect().

@stale
Copy link

stale bot commented Sep 21, 2019

[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 21, 2019
@stale
Copy link

stale bot commented Oct 5, 2019

[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions.

@stale stale bot closed this as completed Oct 5, 2019
@philbowles
Copy link

Never been fixed, like so many others, sadly :( This one has just bitten me (again) on top of all ther other issues with SSE. The bug is actually in ESPAsyncTCP as you have found. There's another little "twist" to the bug: if you "overlap" send request, i.e send #2 before #1 has been acked - (which will happen if you send them very rapidly) then the libs inability to understand tcp_write() and tcp_output correct behaviour leads to a similar problem. The Caller (AsyncWebserver) does not get acked, and so cannot clear its queue which builds until the timeout cuts in. The heart of the problem is that #2 and #3 etc do not get their size added to tx_unacked_len (but they still get actually sent!). Thus when the ack containing maybe 3 or packets is received, the lib has only counted the first so when it then subtracts the actual ack value from what it thinks is unacked, it gets a negative result (massive +ve as its unsigned) either way its not zero, so it never calls the caller's onAck as it wrongly thinks there is still data outstanding. This is the same whether setNoDelay is true or false.

@zhivko
Copy link

zhivko commented Mar 5, 2021

This is great find @philbowles ! It would be nice if ou or@me-no-dev could provide patch for this (if this is possible)... Do ou have test scenario to reproduce the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants