Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httpd server stops working after a while (IDFGH-1594) #3851

Closed
fshamshirdar opened this issue Jul 30, 2019 · 48 comments
Closed

httpd server stops working after a while (IDFGH-1594) #3851

fshamshirdar opened this issue Jul 30, 2019 · 48 comments

Comments

@fshamshirdar
Copy link

Hi there,

I have developed some modules for my ESP32 device (WiFi softAP, HTTP server, WebSocket, spiffs for local storage and SPI to communicate with a SPI master). Now the issue with this system is that after a few minutes of running, the wifi or webserver stops working properly. By that I mean sometimes the webpage does not load with the following log message, or ping packet goes with "Destination Host Unreachable".

"W (206326) httpd_txrx: httpd_sock_err: error in send : 11"

This problem goes away completely when I disable the SPI module. Even though I applied a much higher priority to the httpd_config_t, this problem persists.

Any thoughts?

@github-actions github-actions bot changed the title httpd server stops working after a while httpd server stops working after a while (IDFGH-1594) Jul 30, 2019
@anurag-kar
Copy link
Contributor

@fshamshirdar Thanks for reporting this. We'll try reproducing this on our side. In the meanwhile, it will be helpful if you could provide the monitor log for this.

@fshamshirdar
Copy link
Author

Thanks for following up, there is actually not much log when it happens. Here is all I have got:

httpd_txrx: httpd_sock_err: error in send : 11
httpd_txrx: httpd_sock_err: error in send : 104
httpd_txrx: httpd_sock_err: error in recv : 104

@anurag-kar
Copy link
Contributor

@fshamshirdar From the looks of it I think this is happening due to Wi-Fi disconnection.

httpd_txrx: httpd_sock_err: error in send : 104
httpd_txrx: httpd_sock_err: error in recv : 104

errno ECONNRESET (104) would mean that the client disconnected from the server abruptly. That could be due to disconnection.

If your ESP32 is running in station mode, could you check if it is visible on the Wi-Fi network right after the error occurs? Also are there any Wi-Fi related messages in the monitor log?

@fshamshirdar
Copy link
Author

No, the WiFi is running on softAP. Error 104 happens since the way I test the system is to refresh the webpage multiple times (some sort of stress testing) and is expectable. What does error 11 means? that's the last error I get from HTTP right before it goes down.

I should mention that the Bluetooth module is running in the BLE mode with WiFi coexistence "balance".

@anurag-kar
Copy link
Contributor

errno 11 = EAGAIN / EWOULDBLOCK, (you can find all errnos here https://github.com/espressif/esp-idf/blob/master/components/newlib/include/sys/errno.h).

That could be happening if the send buffer is full and it times out waiting for space. Could you try increasing send_wait_timeout in httpd_config?

that's the last error I get from HTTP right before it goes down.

Even in case of error, the server should keep functioning. How do you test if it goes down or not?

Also, please enable debug log level on the esp_http_server component. That may provide some more insight.

@no1seman
Copy link

@anurag-kar,
i have the same problems. Here is 3 cases:

  1. Periodicaly server freezes:
D (103975) httpd: httpd_server: processing socket 56�[0m
D (103975) httpd_sess: httpd_sess_process: httpd_req_new�[0m
D (103975) httpd_txrx: httpd_recv_with_opt: requested length = 128�[0m
D (103985) httpd_txrx: httpd_recv_with_opt: received length = 128�[0m
D (103995) httpd_parse: read_block: received HTTP request block size = 128�[0m
D (103995) httpd_parse: cb_url: message begin�[0m
D (104005) httpd_parse: cb_url: processing url = /index.css�[0m
D (104005) httpd_parse: verify_url: received URI = /index.css�[0m
D (104015) httpd_parse: cb_header_field: headers begin�[0m
D (104015) httpd_txrx: httpd_unrecv: length = 103�[0m
D (104025) httpd_parse: pause_parsing: paused�[0m
D (104025) httpd_parse: cb_header_field: processing field = Host�[0m
D (104035) httpd_txrx: httpd_recv_with_opt: requested length = 128�[0m
D (104035) httpd_txrx: httpd_recv_with_opt: pending length = 103�[0m
D (104045) httpd_parse: read_block: received HTTP request block size = 103�[0m
D (104055) httpd_parse: continue_parsing: skip pre-parsed data of size = 5�[0m
D (104055) httpd_parse: continue_parsing: un-paused�[0m
D (104065) httpd_parse: cb_header_value: processing value = 192.168.3.222�[0m
D (104075) httpd_parse: cb_header_field: processing field = Connection�[0m
D (104075) httpd_parse: cb_header_value: processing value = keep-alive�[0m
D (104085) httpd_parse: cb_header_field: processing field = User-Agent�[0m
D (104095) httpd_parse: cb_header_value: processing value = Mozilla/5.0 (Linux; Android 9; SM-N950F) Apple�[0m
D (104105) httpd_parse: parse_block: parsed block size = 103�[0m
D (104105) httpd_txrx: httpd_recv_with_opt: requested length = 128�[0m
D (104115) httpd_txrx: httpd_recv_with_opt: received length = 128�[0m
D (104115) httpd_parse: read_block: received HTTP request block size = 128�[0m
D (104125) httpd_parse: cb_header_value: processing value = WebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36�[0m
D (104135) httpd_parse: cb_header_field: processing field = Save-Data�[0m
D (104145) httpd_parse: cb_header_value: processing value = on�[0m
D (104145) httpd_parse: cb_header_field: processing field = Accept�[0m
D (104155) httpd_parse: cb_header_value: processing value = text/css,*/*;q=0.1�[0m
D (104165) httpd_parse: cb_header_field: processing field = Referer�[0m
D (104165) httpd_parse: parse_block: parsed block size = 231�[0m
D (104175) httpd_txrx: httpd_recv_with_opt: requested length = 128�[0m
D (104185) httpd_txrx: httpd_recv_with_opt: received length = 128�[0m
D (104185) httpd_parse: read_block: received HTTP request block size = 128�[0m
D (104195) httpd_parse: cb_header_value: processing value = http://192.168.3.222/�[0m
D (104205) httpd_parse: cb_header_field: processing field = Accept-Encoding�[0m
D (104205) httpd_parse: cb_header_value: processing value = gzip, deflate�[0m
D (104215) httpd_parse: cb_header_field: processing field = Accept-Language�[0m
D (104225) httpd_parse: cb_header_value: processing value = ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7,it;q=0.6,de;q=0.5,la�[0m
D (104235) httpd_parse: parse_block: parsed block size = 359�[0m
D (104235) httpd_txrx: httpd_recv_with_opt: requested length = 128�[0m
D (104245) httpd_txrx: httpd_recv_with_opt: received length = 10�[0m
D (104255) httpd_parse: read_block: received HTTP request block size = 10�[0m
D (104255) httpd_parse: cb_header_value: processing value = ;q=0.4�[0m
D (104265) httpd_parse: cb_headers_complete: bytes read     = 396�[0m
D (104275) httpd_parse: cb_headers_complete: content length = 0�[0m
D (104275) httpd_parse: pause_parsing: paused�[0m
D (104285) httpd_parse: cb_no_body: message complete�[0m
D (104285) httpd_parse: httpd_parse_req: parsing complete�[0m
D (104295) httpd_uri: httpd_uri: request for /index.css with type 1�[0m
D (104295) httpd_uri: httpd_find_uri_handler: [0] = /�[0m
D (104305) httpd_uri: httpd_find_uri_handler: [1] = /login�[0m
D (104305) httpd_uri: httpd_find_uri_handler: [2] = /login�[0m
D (104315) httpd_uri: httpd_find_uri_handler: [3] = /setconfig�[0m
D (104315) httpd_uri: httpd_find_uri_handler: [4] = /getdata�[0m
D (104325) httpd_uri: httpd_find_uri_handler: [5] = /*�[0m
D (104335) HTTP: GET /index.css�[0m
D (104335) HTTP: Request has key: 'Accept-Encoding' - value: 'gzip, deflate'�[0m
D (104345) HTTP: allocating new session (id: 7) for remote: 192.168.3.192:33986, accept gzip: True�[0m
D (104345) HTTP: Get static file: index.css.gz�[0m
D (104355) httpd_txrx: httpd_resp_set_hdr: new header = Content-Encoding: gzip�[0m
D (104365) httpd_txrx: httpd_send_all: sent = 79�[0m
D (104365) httpd_txrx: httpd_send_all: sent = 16�[0m
D (104375) httpd_txrx: httpd_send_all: sent = 2�[0m
D (104375) httpd_txrx: httpd_send_all: sent = 4�[0m
D (104375) httpd_txrx: httpd_send_all: sent = 2�[0m
D (104385) httpd_txrx: httpd_send_all: sent = 2�[0m
D (108195) httpd_txrx: httpd_send_all: sent = 38099�[0m
D (108195) HTTP: Sending file 'index.css.gz' took 3850 ms�[0m
D (108195) HTTP: Processing file 'index.css.gz' took 3860 ms�[0m
D (108195) httpd_sess: httpd_sess_process: httpd_req_delete�[0m
D (108205) httpd_sess: httpd_sess_process: success�[0m
  1. Ыystematically I also got 'httpd_txrx: httpd_sock_err: error in send : 104':
D (122135) httpd: httpd_accept_conn: newfd = 55�[0m
D (122145) httpd_sess: httpd_sess_new: fd = 55�[0m
D (122145) httpd: httpd_accept_conn: complete�[0m
D (122155) httpd: httpd_server: doing select maxfd+1 = 59�[0m
D (122155) httpd: httpd_server: processing socket 55�[0m
D (122165) httpd_sess: httpd_sess_process: httpd_req_new�[0m
D (122165) httpd_txrx: httpd_recv_with_opt: requested length = 128�[0m
D (122175) httpd_txrx: httpd_recv_with_opt: received length = 128�[0m
D (122175) httpd_parse: read_block: received HTTP request block size = 128�[0m
D (122185) httpd_parse: cb_url: message begin�[0m
D (122195) httpd_parse: cb_url: processing url = /index.js�[0m
D (122195) httpd_parse: verify_url: received URI = /index.js�[0m
D (122205) httpd_parse: cb_header_field: headers begin�[0m
D (122205) httpd_txrx: httpd_unrecv: length = 104�[0m
D (122215) httpd_parse: pause_parsing: paused�[0m
D (122215) httpd_parse: cb_header_field: processing field = Host�[0m
D (122225) httpd_txrx: httpd_recv_with_opt: requested length = 128�[0m
D (122225) httpd_txrx: httpd_recv_with_opt: pending length = 104�[0m
D (122235) httpd_parse: read_block: received HTTP request block size = 104�[0m
D (122245) httpd_parse: continue_parsing: skip pre-parsed data of size = 5�[0m
D (122245) httpd_parse: continue_parsing: un-paused�[0m
D (122255) httpd_parse: cb_header_value: processing value = 192.168.3.222�[0m
D (122255) httpd_parse: cb_header_field: processing field = Connection�[0m
D (122265) httpd_parse: cb_header_value: processing value = keep-alive�[0m
D (122275) httpd_parse: cb_header_field: processing field = User-Agent�[0m
D (122275) httpd_parse: cb_header_value: processing value = Mozilla/5.0 (Linux; Android 9; SM-N950F) AppleW�[0m
D (122285) httpd_parse: parse_block: parsed block size = 104�[0m
D (122295) httpd_txrx: httpd_recv_with_opt: requested length = 128�[0m
D (122305) httpd_txrx: httpd_recv_with_opt: received length = 128�[0m
D (122305) httpd_parse: read_block: received HTTP request block size = 128�[0m
D (122315) httpd_parse: cb_header_value: processing value = ebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.89 Mobile Safari/537.36�[0m
D (122325) httpd_parse: cb_header_field: processing field = Save-Data�[0m
D (122335) httpd_parse: cb_header_value: processing value = on�[0m
D (122335) httpd_parse: cb_header_field: processing field = Accept�[0m
D (122345) httpd_parse: cb_header_value: processing value = */*�[0m
D (122355) httpd_parse: cb_header_field: processing field = Referer�[0m
D (122355) httpd_parse: cb_header_value: processing value = http://192.168.3�[0m
D (122365) httpd_parse: parse_block: parsed block size = 232�[0m
D (122365) httpd_txrx: httpd_recv_with_opt: requested length = 128�[0m
D (122375) httpd_txrx: httpd_recv_with_opt: received length = 122�[0m
D (122385) httpd_parse: read_block: received HTTP request block size = 122�[0m
D (122385) httpd_parse: cb_header_value: processing value = .222/�[0m
D (122395) httpd_parse: cb_header_field: processing field = Accept-Encoding�[0m
D (122405) httpd_parse: cb_header_value: processing value = gzip, deflate�[0m
D (122405) httpd_parse: cb_header_field: processing field = Accept-Language�[0m
D (122415) httpd_parse: cb_header_value: processing value = ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7,it;q=0.6,de;q=0.5,la;q=0.4�[0m
D (122425) httpd_parse: cb_headers_complete: bytes read     = 380�[0m
D (122435) httpd_parse: cb_headers_complete: content length = 0�[0m
D (122435) httpd_parse: pause_parsing: paused�[0m
D (122445) httpd_parse: cb_no_body: message complete�[0m
D (122445) httpd_parse: httpd_parse_req: parsing complete�[0m
D (122455) httpd_uri: httpd_uri: request for /index.js with type 1�[0m
D (122455) httpd_uri: httpd_find_uri_handler: [0] = /�[0m
D (122465) httpd_uri: httpd_find_uri_handler: [1] = /login�[0m
D (122475) httpd_uri: httpd_find_uri_handler: [2] = /login�[0m
D (122475) httpd_uri: httpd_find_uri_handler: [3] = /setconfig�[0m
D (122485) httpd_uri: httpd_find_uri_handler: [4] = /getdata�[0m
D (122485) httpd_uri: httpd_find_uri_handler: [5] = /*�[0m
D (122495) HTTP: GET /index.js�[0m
D (122495) HTTP: Request has key: 'Accept-Encoding' - value: 'gzip, deflate'�[0m
D (122505) HTTP: allocating new session (id: 10) for remote: 192.168.3.192:33992, accept gzip: True�[0m
D (122515) HTTP: Get static file: index.js.gz�[0m
D (122515) httpd_txrx: httpd_resp_set_hdr: new header = Content-Encoding: gzip�[0m
D (122525) httpd_txrx: httpd_send_all: sent = 93�[0m
D (122525) httpd_txrx: httpd_send_all: sent = 16�[0m
D (122535) httpd_txrx: httpd_send_all: sent = 2�[0m
D (122535) httpd_txrx: httpd_send_all: sent = 4�[0m
D (122545) httpd_txrx: httpd_send_all: sent = 2�[0m
D (122545) httpd_txrx: httpd_send_all: sent = 2�[0m
�[0;33mW (122615) httpd_txrx: httpd_sock_err: error in send : 104�[0m
D (122615) httpd_txrx: httpd_send_all: error in send_fn�[0m
D (122615) HTTP: Sending file 'index.js.gz' took 100 ms�[0m
D (122615) HTTP: Processing file 'index.js.gz' took 120 ms�[0m
�[0;33mW (122625) httpd_uri: httpd_uri: uri handler execution failed�[0m
D (122635) httpd: httpd_server: closing socket 55�[0m
D (122635) httpd_sess: httpd_sess_delete: fd = 55�[0m
D (122645) HTTP: Deleting session (id: 10) for remote: 192.168.3.192:33992�[0m 
  1. httpd_parse: cb_headers_complete: incomplete termination of headers
D (179955) httpd: httpd_server: processing socket 53
D (179955) httpd_sess: httpd_sess_process: httpd_req_new
D (179955) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (179965) httpd_txrx: httpd_recv_with_opt: received length = 128
D (179965) httpd_parse: read_block: received HTTP request block size = 128
D (179975) httpd_parse: cb_url: message begin
D (179985) httpd_parse: cb_url: processing url = /getdata?res=stats
D (179985) httpd_parse: verify_url: received URI = /getdata?res=stats
D (179995) httpd_parse: cb_header_field: headers begin
D (179995) httpd_txrx: httpd_unrecv: length = 95
D (180005) httpd_parse: pause_parsing: paused
D (180005) httpd_parse: cb_header_field: processing field = Host
D (180015) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (180025) httpd_txrx: httpd_recv_with_opt: pending length = 95
D (180025) httpd_parse: read_block: received HTTP request block size = 95
D (180035) httpd_parse: continue_parsing: skip pre-parsed data of size = 5
D (180045) httpd_parse: continue_parsing: un-paused
D (180045) httpd_parse: cb_header_value: processing value = 192.168.3.222
D (180055) httpd_parse: cb_header_field: processing field = Connection
D (180055) httpd_parse: cb_header_value: processing value = keep-alive
D (180065) httpd_parse: cb_header_field: processing field = User-Agent
D (180075) httpd_parse: cb_header_value: processing value = Mozilla/5.0 (Windows NT 10.0; Win64; x
D (180085) httpd_parse: parse_block: parsed block size = 95
D (180085) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (180095) httpd_txrx: httpd_recv_with_opt: received length = 128
D (180095) httpd_parse: read_block: received HTTP request block size = 128
D (180105) httpd_parse: cb_header_value: processing value = 64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
D (180115) httpd_parse: cb_header_field: processing field = Accept
D (180125) httpd_parse: cb_header_value: processing value = */*
D (180125) httpd_parse: cb_header_field: processing field = x-compress
D (180135) httpd_parse: cb_header_value: processing value = null
D (180145) httpd_parse: cb_header_field: processing field = Referer
D (180145) httpd_parse: cb_header_value: processing value = http://19
D (180155) httpd_parse: parse_block: parsed block size = 223
D (180165) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (180165) httpd_txrx: httpd_recv_with_opt: received length = 128
D (180175) httpd_parse: read_block: received HTTP request block size = 128
D (180185) httpd_parse: cb_header_value: processing value = 2.168.3.222/
D (180185) httpd_parse: cb_header_field: processing field = Accept-Encoding
D (180195) httpd_parse: cb_header_value: processing value = gzip, deflate
D (180205) httpd_parse: cb_header_field: processing field = Accept-Language
D (180205) httpd_parse: cb_header_value: processing value = ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7,it;q=0.6,de;q=0.5,la;q=0.4
D (180215) httpd_parse: parse_block: parsed block size = 351
D (180225) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (180235) httpd_txrx: httpd_recv_with_opt: received length = 1
D (180235) httpd_parse: read_block: received HTTP request block size = 1
E (180245) httpd_parse: cb_headers_complete: incomplete termination of headers
W (180255) httpd_parse: parse_block: parsing failed
W (180255) httpd_txrx: httpd_resp_send_err: 400 Bad Request - Server unable to understand request due to invalid syntax
D (180265) httpd_txrx: httpd_send_all: sent = 71
D (180275) httpd_txrx: httpd_send_all: sent = 2
D (180275) httpd_txrx: httpd_send_all: sent = 57
D (180285) httpd: httpd_server: closing socket 53
D (180285) httpd_sess: httpd_sess_delete: fd = 53
D (180295) HTTP: Deleting session (id: 3) for remote: 192.168.3.155:50056

Seems to me that these errors has a common source

Description:
When 'delays' or '104 error' occurs, the CPU core load of the 'httpd' task increases dramatically up to 99%.
'Delays' and '104 error' more often happens on devices connected to network by WiFi.
During 'delay' http_server doesn't respond any other requests.
Third eaxample is very strange because chrome cannot send right headers in one case and bad ne in another. It's my working laptop and many-many-many URLs opens without any problems.

Here is full logs:

beacon.zip - 1st,2nd case
beacon1.zip - 3rd case

http-server settings:
http_srv_config->max_open_sockets = 10;
http_srv_config->max_uri_handlers = 8;
http_srv_config->max_resp_headers = 8;
http_srv_config->backlog_conn = 10;
http_srv_config->lru_purge_enable = true;
http_srv_config->recv_wait_timeout = 20;
http_srv_config->send_wait_timeout = 20;
http_srv_config->global_user_ctx = NULL;
http_srv_config->global_user_ctx_free_fn = NULL;
http_srv_config->global_transport_ctx = NULL;
http_srv_config->global_transport_ctx_free_fn = NULL;
http_srv_config->open_fn = NULL;
http_srv_config->close_fn = NULL;
http_srv_config->stack_size = HTTP_TASK_STACK_SIZE;
http_srv_config->server_port = 80;
http_srv_config->ctrl_port = 32768;
http_srv_config->uri_match_fn = httpd_uri_match_wildcard;
http_srv_config->core_id = 1;
http_srv_config->task_priority = 10;

Environment:

  • Development Kit: ESP32-Wrover-Kit
  • Kit version (for WroverKit/PicoKit/DevKitC): v4.1
  • Module or chip used: ESP32-WROVER-B
  • IDF version:v4.0-dev-1443-g39f090a4f
  • Build System: CMake
  • Compiler version: xtensa-esp32-elf-gcc (crosstool-NG esp32-2019r1) 8.2.0
  • Operating System: Linux
  • Power Supply: USB|external 5V&Battery
    192.168.3.222 - ESP32 connected via LAN8720 to router (192.168.3.1)
    192.168.3.155 - Win10 laptop connected by Ethernet to router (192.168.3.1)
    192.168.3.192 - Samsung Galaxy Note 8 connected to my network via WiFi to router (192.168.3.1)
    192.168.3.250 - Samsung Galaxy S8+ connected to my network via WiFi to router (192.168.3.1)

@anurag-kar
Copy link
Contributor

@no1seman Thanks for reporting.

The third case may have to do with parsing only, I'll check that.

The first case seems like the server is waiting to send an HTTP response. Since it runs on a single thread, it cannot start processing the next request until it finishes responding to the current request. Probably, the send buffer is full, so it keeps waiting till there is enough space for the response data. According to the log it is waiting for about 4 seconds :

D (104385) httpd_txrx: httpd_send_all: sent = 2
D (108195) httpd_txrx: httpd_send_all: sent = 38099

The second case is different from what @fshamshirdar is experiencing, because he expects ECONNRESET (104) from the nature of the tests he is performing. But in your case it seems the client is closing the connection unexpectedly, while receiving HTTP response.

D (122545) httpd_txrx: httpd_send_all: sent = 2
W (122615) httpd_txrx: httpd_sock_err: error in send : 104

When 'delays' or '104 error' occurs, the CPU core load of the 'httpd' task increases dramatically up to 99%.

That is expected, because send() is blocking.

@no1seman
Copy link

no1seman commented Aug 12, 2019

@anurag-kar,
"The first case seems like the server is waiting to send an HTTP response." - it is obvious but strange, because as you can see in log esp32 should send only 36Kbytes of data, all data is cached in esp32 external memory and WiFi is working without any problems with other resources. So there is some dalay in http_server or LwIP or WiFi stack that causes such a delay. Can't understand what to do with that.

"But in your case it seems the client is closing the connection unexpectedly, while receiving HTTP response" - yes, you are right, I've got the following error in Chome:
Uncaught (in promise) ChunkLoadError: Loading chunk 1 failed.
(error: http://192.168.3.222/1.index.js)
at Function.f.e (http://192.168.3.222/index.js:1:1471)
at HTMLAnchorElement.x (http://192.168.3.222/index.js:1:114506)

Chrome coundn't load frist chunk and canceled connection, but the question is: why esp failed correctly send that chunk? 100% the is on the esp side.

As I have mentioned in previous post, I have 2 interfaces on esp: ethernet and WiFi STA. On Ethernet such an error occurs rarely, but if I try to connect esp32 http-server via esp32 WiFi STA interface I got these errors:

W (35212) httpd_txrx: httpd_sock_err: error in send : 104
W (35212) httpd_uri: httpd_uri: uri handler execution failed
W (605372) httpd_txrx: httpd_sock_err: error in recv : 104
W (605382) httpd_txrx: httpd_sock_err: error in recv : 104
W (605382) httpd_txrx: httpd_sock_err: error in recv : 104

with a probability of 50%.

Here is the log:

D (61243) httpd: httpd_server: processing socket 55
D (61243) httpd_sess: httpd_sess_process: httpd_req_new
D (61243) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (61253) httpd_txrx: httpd_recv_with_opt: received length = 128
D (61253) httpd_parse: read_block: received HTTP request block size = 128
D (61263) httpd_parse: cb_url: message begin
D (61263) httpd_parse: cb_url: processing url = /index.css
D (61273) httpd_parse: verify_url: received URI = /index.css
D (61273) httpd_parse: cb_header_field: headers begin
D (61283) httpd_txrx: httpd_unrecv: length = 103
D (61283) httpd_parse: pause_parsing: paused
D (61293) httpd_parse: cb_header_field: processing field = Host
D (61293) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (61303) httpd_txrx: httpd_recv_with_opt: pending length = 103
D (61313) httpd_parse: read_block: received HTTP request block size = 103
D (61313) httpd_parse: continue_parsing: skip pre-parsed data of size = 5
D (61323) httpd_parse: continue_parsing: un-paused
D (61333) httpd_parse: cb_header_value: processing value = 192.168.3.130
D (61333) httpd_parse: cb_header_field: processing field = Connection
D (61343) httpd_parse: cb_header_value: processing value = keep-alive
D (61353) httpd_parse: cb_header_field: processing field = Pragma
D (61353) httpd_parse: cb_header_value: processing value = no-cache
D (61363) httpd_parse: cb_header_field: processing field = Cache-Control
D (61363) httpd_parse: cb_header_value: processing value = no-cache
D (61373) httpd_parse: cb_header_field: processing field = User-Agent
D (61383) httpd_parse: cb_header_value: processing value = Moz
D (61383) httpd_parse: parse_block: parsed block size = 103
D (61393) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (61393) httpd_txrx: httpd_recv_with_opt: received length = 128
D (61403) httpd_parse: read_block: received HTTP request block size = 128
D (61413) httpd_parse: cb_header_value: processing value = illa/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
D (61423) httpd_parse: cb_header_field: processing field = Accept
D (61433) httpd_parse: cb_header_value: processing value = text/c
D (61443) httpd_parse: parse_block: parsed block size = 231
D (61443) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (61453) httpd_txrx: httpd_recv_with_opt: received length = 128
D (61453) httpd_parse: read_block: received HTTP request block size = 128
D (61463) httpd_parse: cb_header_value: processing value = ss,*/*;q=0.1
D (61473) httpd_parse: cb_header_field: processing field = x-compress
D (61473) httpd_parse: cb_header_value: processing value = null
D (61483) httpd_parse: cb_header_field: processing field = Referer
D (61483) httpd_parse: cb_header_value: processing value = http://192.168.3.130/
D (61493) httpd_parse: cb_header_field: processing field = Accept-Encoding
D (61503) httpd_parse: cb_header_value: processing value = gzip, deflate
D (61513) httpd_parse: cb_header_field: processing field = Accept-Language
D (61513) httpd_parse: cb_header_value: processing value = ru-RU,ru;q=0.9,
D (61523) httpd_parse: parse_block: parsed block size = 359
D (61533) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (61533) httpd_txrx: httpd_recv_with_opt: received length = 51
D (61543) httpd_parse: read_block: received HTTP request block size = 51
D (61543) httpd_parse: cb_header_value: processing value = en-US;q=0.8,en;q=0.7,it;q=0.6,de;q=0.5,la;q=0.4
D (61553) httpd_parse: cb_headers_complete: bytes read     = 438
D (61563) httpd_parse: cb_headers_complete: content length = 0
D (61573) httpd_parse: pause_parsing: paused
D (61573) httpd_parse: cb_no_body: message complete
D (61573) httpd_parse: httpd_parse_req: parsing complete
D (61583) httpd_uri: httpd_uri: request for /index.css with type 1
D (61593) httpd_uri: httpd_find_uri_handler: [0] = /
D (61593) httpd_uri: httpd_find_uri_handler: [1] = /login
D (61603) httpd_uri: httpd_find_uri_handler: [2] = /login
D (61603) httpd_uri: httpd_find_uri_handler: [3] = /setconfig
D (61613) httpd_uri: httpd_find_uri_handler: [4] = /getdata
D (61613) httpd_uri: httpd_find_uri_handler: [5] = /*
D (61623) HTTP: GET /index.css
D (61623) HTTP: session (id: 3) already allocated for remote: 192.168.3.155:57769, accept gzip: True
D (61633) HTTP: Get static file: index.css.gz
D (61643) httpd_txrx: httpd_resp_set_hdr: new header = Content-Encoding: gzip
D (61643) httpd_txrx: httpd_send_all: sent = 79
D (61653) httpd_txrx: httpd_send_all: sent = 16
D (61653) httpd_txrx: httpd_send_all: sent = 2
D (61663) httpd_txrx: httpd_send_all: sent = 4
D (61663) httpd_txrx: httpd_send_all: sent = 2
D (61673) httpd_txrx: httpd_send_all: sent = 2
W (61753) httpd_txrx: httpd_sock_err: error in send : 104
D (61753) httpd_txrx: httpd_send_all: error in send_fn
D (61753) HTTP: Sending file 'index.css.gz' failed
D (61753) HTTP: Processing request 'index.css.gz' took 130 ms
W (61763) httpd_uri: httpd_uri: uri handler execution failed
D (61763) httpd: httpd_server: closing socket 55
D (61773) httpd_sess: httpd_sess_delete: fd = 55
D (61773) HTTP: Deleting session (id: 3) for remote: 192.168.3.155:57769

full log:
beacon2.zip

So, It comes out that esp connected to WiFi router via WiFi to my laptop connected to the same router via 1Gb/s Ethernet (all of them: router, esp & laptop is on my working table) can not with high probability send or recieve one of chunk of data via WiFi and that hangs an error. It's very strange.

Turnig on LwIP debug:

D (20175) httpd: httpd_server: processing socket 53
D (20175) httpd_sess: httpd_sess_process: httpd_req_new
D (20175) httpd_txrx: httpd_recv_with_opt: requested length = 128
lwip_recvfrom(53, 0x3f84b510, 128, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x0
lwip_recv_tcp: netconn_recv err=0, pbuf=0x3f84db30
lwip_recv_tcp: buflen=422 recv_left=128 off=0
lwip_recv_tcp: lastdata now pbuf=0x3f84db30
lwip_recvfrom(53):  addr=192.168.3.155 port=58616 len=128
D (20205) httpd_txrx: httpd_recv_with_opt: received length = 128
D (20215) httpd_parse: read_block: received HTTP request block size = 128
D (20225) httpd_parse: cb_url: message begin
D (20225) httpd_parse: cb_url: processing url = /schema.json
D (20235) httpd_parse: verify_url: received URI = /schema.json
D (20235) httpd_parse: cb_header_field: headers begin
D (20245) httpd_txrx: httpd_unrecv: length = 101
D (20245) httpd_parse: pause_parsing: paused
D (20255) httpd_parse: cb_header_field: processing field = Host
D (20255) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (20265) httpd_txrx: httpd_recv_with_opt: pending length = 101
D (20265) httpd_parse: read_block: received HTTP request block size = 101
D (20275) httpd_parse: continue_parsing: skip pre-parsed data of size = 5
D (20285) httpd_parse: continue_parsing: un-paused
D (20285) httpd_parse: cb_header_value: processing value = 192.168.3.130
D (20295) httpd_parse: cb_header_field: processing field = Connection
D (20305) httpd_parse: cb_header_value: processing value = keep-alive
D (20305) httpd_parse: cb_header_field: processing field = Pragma
D (20315) httpd_parse: cb_header_value: processing value = no-cache
D (20325) httpd_parse: cb_header_field: processing field = Cache-Control
D (20325) httpd_parse: cb_header_value: processing value = no-cache
D (20335) httpd_parse: cb_header_field: processing field = User-Agent
D (20335) httpd_parse: cb_header_value: processing value = M
D (20345) httpd_parse: parse_block: parsed block size = 101
D (20355) httpd_txrx: httpd_recv_with_opt: requested length = 128
lwip_recvfrom(53, 0x3f84b575, 128, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x3f84db30
lwip_recv_tcp: buflen=294 recv_left=128 off=0
lwip_recv_tcp: lastdata now pbuf=0x3f84db30
lwip_recvfrom(53):  addr=192.168.3.155 port=58616 len=128
D (20375) httpd_txrx: httpd_recv_with_opt: received length = 128
D (20385) httpd_parse: read_block: received HTTP request block size = 128
D (20395) httpd_parse: cb_header_value: processing value = ozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
D (20405) httpd_parse: cb_header_field: processing field = Accept
D (20415) httpd_parse: cb_header_value: processing value = */*
D (20415) httpd_parse: parse_block: parsed block size = 229
D (20425) httpd_txrx: httpd_recv_with_opt: requested length = 128
lwip_recvfrom(53, 0x3f84b5f5, 128, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x3f84db30
lwip_recv_tcp: buflen=166 recv_left=128 off=0
lwip_recv_tcp: lastdata now pbuf=0x3f84db30
lwip_recvfrom(53):  addr=192.168.3.155 port=58616 len=128
D (20455) httpd_txrx: httpd_recv_with_opt: received length = 128
D (20455) httpd_parse: read_block: received HTTP request block size = 128
D (20465) httpd_parse: cb_header_field: processing field = x-compress
D (20475) httpd_parse: cb_header_value: processing value = null
D (20475) httpd_parse: cb_header_field: processing field = Referer
D (20485) httpd_parse: cb_header_value: processing value = http://192.168.3.130/
D (20495) httpd_parse: cb_header_field: processing field = Accept-Encoding
D (20495) httpd_parse: cb_header_value: processing value = gzip, deflate
D (20505) httpd_parse: cb_header_field: processing field = Accept-Language
D (20515) httpd_parse: cb_header_value: processing value = ru-RU,ru;q=0.9,en-US;q=0.8,e
D (20515) httpd_parse: parse_block: parsed block size = 357
D (20525) httpd_txrx: httpd_recv_with_opt: requested length = 128
lwip_recvfrom(53, 0x3f84b675, 128, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x3f84db30
lwip_recv_tcp: buflen=38 recv_left=128 off=0
lwip_recv_tcp: deleting pbuf=0x3f84db30
lwip_recv_tcp: top while sock->lastdata=0x0
lwip_recv_tcp: netconn_recv err=-7, pbuf=0x0
lwip_recvfrom(53):  addr=192.168.3.155 port=58616 len=38
D (20555) httpd_txrx: httpd_recv_with_opt: received length = 38
D (20565) httpd_parse: read_block: received HTTP request block size = 38
D (20575) httpd_parse: cb_header_value: processing value = n;q=0.7,it;q=0.6,de;q=0.5,la;q=0.4
D (20585) httpd_parse: cb_headers_complete: bytes read     = 423
D (20585) httpd_parse: cb_headers_complete: content length = 0
D (20595) httpd_parse: pause_parsing: paused
D (20595) httpd_parse: cb_no_body: message complete
D (20605) httpd_parse: httpd_parse_req: parsing complete
D (20605) httpd_uri: httpd_uri: request for /schema.json with type 1
D (20615) httpd_uri: httpd_find_uri_handler: [0] = /
D (20615) httpd_uri: httpd_find_uri_handler: [1] = /login
D (20625) httpd_uri: httpd_find_uri_handler: [2] = /login
D (20625) httpd_uri: httpd_find_uri_handler: [3] = /setconfig
D (20635) httpd_uri: httpd_find_uri_handler: [4] = /getdata
D (20645) httpd_uri: httpd_find_uri_handler: [5] = /*
D (20645) HTTP: GET /schema.json
D (20645) HTTP: session (id: 1) already allocated for remote: 192.168.3.155:58616, accept gzip: True
D (20655) HTTP: Get static file: schema.json.gz
D (20665) httpd_txrx: httpd_resp_set_hdr: new header = Content-Encoding: gzip
lwip_send(53, data=0x3f84b510, size=71, flags=0x0)
lwip_send(53) err=0 written=71
D (20675) httpd_txrx: httpd_send_all: sent = 71
lwip_send(53, data=0x3f40c744, size=16, flags=0x0)
lwip_send(53) err=0 written=16
D (20685) httpd_txrx: httpd_send_all: sent = 16
lwip_send(53, data=0x3f423a88, size=2, flags=0x0)
lwip_send(53) err=0 written=2
D (20705) httpd_txrx: httpd_send_all: sent = 2
lwip_send(53, data=0x3f40c09c, size=4, flags=0x0)
lwip_send(53) err=0 written=4
D (20715) httpd_txrx: httpd_send_all: sent = 4
lwip_send(53, data=0x3f423a8c, size=2, flags=0x0)
lwip_send(53) err=0 written=2
D (20725) httpd_txrx: httpd_send_all: sent = 2
lwip_send(53, data=0x3f423a8c, size=2, flags=0x0)
lwip_send(53) err=0 written=2
D (20735) httpd_txrx: httpd_send_all: sent = 2
lwip_send(53, data=0x3f842274, size=4721, flags=0x0)
lwip_send(53) err=0 written=4721
D (20745) httpd_txrx: httpd_send_all: sent = 4721
D (20755) HTTP: Sending file 'schema.json.gz' took 100 ms
D (20755) HTTP: Processing request 'schema.json.gz' took 110 ms
D (20765) httpd_sess: httpd_sess_process: httpd_req_delete
D (20765) httpd_sess: httpd_sess_process: success
D (20775) httpd: httpd_server: doing select maxfd+1 = 55
lwip_select(55, 0x3fff1268, 0x0, 0x0, tvsec=-1 tvusec=-1)
lwip_selscan: fd=53 ready for reading
lwip_select: nready=1
D (20805) httpd: httpd_server: processing socket 53
D (20805) httpd_sess: httpd_sess_process: httpd_req_new
D (20805) httpd_txrx: httpd_recv_with_opt: requested length = 128
lwip_recvfrom(53, 0x3f84b510, 128, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x0
lwip_recv_tcp: netconn_recv err=-14, pbuf=0x0
lwip_recv_tcp: p == NULL, error is "Connection reset."!
lwip_recvfrom(53):  addr=0.0.0.0 port=16383 len=-1
W (20835) httpd_txrx: httpd_sock_err: error in recv : 104
D (20835) httpd_txrx: httpd_recv_with_opt: error in recv_fn
D (20845) httpd_parse: read_block: error in httpd_recv
D (20855) httpd: httpd_server: closing socket 53
lwip_close(53)
D (20855) httpd_sess: httpd_sess_delete: fd = 53
D (20865) HTTP: Deleting session (id: 1) for remote: 192.168.3.155:58616

in browser I've got: http://192.168.3.130/schema.json net::ERR_INVALID_HTTP_RESPONSE error
In that case you are right, client closed connection, but why server sent invalid http response?

If you have any ideas about that error?

@no1seman
Copy link

no1seman commented Aug 12, 2019

@anurag-kar,
regarding the 3rd case, i've got the following packet flow in wireshark:

11096	503.688958	192.168.3.155	192.168.3.130	HTTP	439	GET /getdata?res=stats HTTP/1.1 
11097	503.917774	192.168.3.155	192.168.3.130	TCP	439	[TCP Retransmission] 59092 → 80 [PSH, ACK] Seq=1 Ack=1 Win=64240 Len=385
11098	503.923940	192.168.3.130	192.168.3.155	TCP	60	80 → 59092 [ACK] Seq=1 Ack=386 Win=5359 Len=0
11099	503.925485	192.168.3.130	192.168.3.155	TCP	60	[TCP Dup ACK 11098#1] 80 → 59092 [ACK] Seq=1 Ack=386 Win=5359 Len=0
11100	504.158211	192.168.3.130	192.168.3.155	TCP	125	80 → 59092 [PSH, ACK] Seq=1 Ack=386 Win=5359 Len=71 [TCP segment of a reassembled PDU]
11102	504.187816	192.168.3.130	192.168.3.155	HTTP	113	HTTP/1.1 400 Bad Request  (text/html)

This error:

E (595635) httpd_parse: cb_headers_complete: incomplete termination of headers
W (595645) httpd_parse: parse_block: parsing failed
W (595655) httpd_txrx: httpd_resp_send_err: 400 Bad Request - Server unable to understand request due to invalid syntax

occurs after ACK retrasmission

@no1seman
Copy link

@anurag-kar,

finally catched: '104 error':
esp32 log:

D (22495) httpd: httpd_server: processing socket 53
D (22495) httpd_sess: httpd_sess_process: httpd_req_new
D (22495) httpd_txrx: httpd_recv_with_opt: requested length = 128
lwip_recvfrom(53, 0x3f84bdcc, 128, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x0
lwip_recv_tcp: netconn_recv err=0, pbuf=0x3f84b374
lwip_recv_tcp: buflen=392 recv_left=128 off=0
lwip_recv_tcp: lastdata now pbuf=0x3f84b374
lwip_recvfrom(53):  addr=192.168.3.155 port=59549 len=128
D (22525) httpd_txrx: httpd_recv_with_opt: received length = 128
D (22535) httpd_parse: read_block: received HTTP request block size = 128
D (22535) httpd_parse: cb_url: message begin
D (22545) httpd_parse: cb_url: processing url = /index.css
D (22545) httpd_parse: verify_url: received URI = /index.css
D (22555) httpd_parse: cb_header_field: headers begin
D (22555) httpd_txrx: httpd_unrecv: length = 103
D (22565) httpd_parse: pause_parsing: paused
D (22565) httpd_parse: cb_header_field: processing field = Host
D (22575) httpd_txrx: httpd_recv_with_opt: requested length = 128
D (22585) httpd_txrx: httpd_recv_with_opt: pending length = 103
D (22585) httpd_parse: read_block: received HTTP request block size = 103
D (22595) httpd_parse: continue_parsing: skip pre-parsed data of size = 5
D (22595) httpd_parse: continue_parsing: un-paused
D (22605) httpd_parse: cb_header_value: processing value = 192.168.3.130
D (22615) httpd_parse: cb_header_field: processing field = Connection
D (22615) httpd_parse: cb_header_value: processing value = keep-alive
D (22625) httpd_parse: cb_header_field: processing field = User-Agent
D (22635) httpd_parse: cb_header_value: processing value = Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl
D (22645) httpd_parse: parse_block: parsed block size = 103
D (22645) httpd_txrx: httpd_recv_with_opt: requested length = 128
lwip_recvfrom(53, 0x3f84be33, 128, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x3f84b374
lwip_recv_tcp: buflen=264 recv_left=128 off=0
lwip_select(53, 0x0, 0x3ffccc80, 0x0, tvsec=10 tvusec=0)
lwip_selscan: fd=52 ready for writing
lwip_select: nready=1
lwip_send(52, data=0x3f847a2b, size=85, flags=0x0)
lwip_recv_tcp: lastdata now pbuf=0x3f84b374
lwip_send(52) err=0 written=85
lwip_selscan: fd=52 ready for reading
lwip_select: nready=1
lwip_select(53, 0x3ffbb260, 0x0, 0x0, tvsec=0 tvusec=0)
lwip_selscan: fd=52 ready for reading
lwip_select: nready=1
lwip_recvfrom(52, 0x3f847824, 1, 0x0, ..)
lwip_recvfrom(53):  addr=192.168.3.155 port=59549 len=128
lwip_recv_tcp: top while sock->lastdata=0x0
D (22715) httpd_txrx: httpd_recv_with_opt: received length = 128
lwip_recv_tcp: netconn_recv err=0, pbuf=0x3ffeec18
D (22725) httpd_parse: read_block: received HTTP request block size = 128
lwip_recv_tcp: buflen=4 recv_left=1 off=0
D (22735) httpd_parse: cb_header_value: processing value = eWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36
lwip_recv_tcp: lastdata now pbuf=0x3ffeec18
D (22755) httpd_parse: cb_header_field: processing field = Accept
lwip_recvfrom(52):  addr=192.168.3.211 port=1883 len=1
lwip_select(53, 0x3ffbb260, 0x0, 0x0, tvsec=0 tvusec=0)
D (22765) httpd_parse: cb_header_value: processing value = text/css,*/*;q=0.1
lwip_selscan: fd=52 ready for reading
D (22775) httpd_parse: cb_header_field: processing field = x-compress
lwip_select: nready=1
D (22785) httpd_parse: cb_header_value: processing value = null
lwip_recvfrom(52, 0x3f847825, 1, 0x0, ..)
D (22795) httpd_parse: cb_header_field: processing field = Referer
lwip_recv_tcp: top while sock->lastdata=0x3ffeec18
D (22805) httpd_parse: cb_header_value: processing value = ht
lwip_recv_tcp: buflen=3 recv_left=1 off=0
D (22815) httpd_parse: parse_block: parsed block size = 231
lwip_recv_tcp: lastdata now pbuf=0x3ffeec18
D (22825) httpd_txrx: httpd_recv_with_opt: requested length = 128
lwip_recvfrom(52):  addr=192.168.3.211 port=1883 len=1
lwip_recvfrom(53, 0x3f84beb3, 128, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x3f84b374
lwip_select(53, 0x3ffbb260, 0x0, 0x0, tvsec=0 tvusec=0)
lwip_selscan: fd=52 ready for reading
lwip_select: nready=1
lwip_recv_tcp: buflen=136 recv_left=128 off=0
lwip_recv_tcp: lastdata now pbuf=0x3f84b374
lwip_recvfrom(52, 0x3f847826, 2, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x3ffeec18
lwip_recv_tcp: buflen=2 recv_left=2 off=0
lwip_recv_tcp: deleting pbuf=0x3ffeec18
lwip_recvfrom(53):  addr=192.168.3.155 port=59549 len=128
lwip_recvfrom(52):  addr=192.168.3.211 port=1883 len=2
D (22885) httpd_txrx: httpd_recv_with_opt: received length = 128
lwip_select(53, 0x3ffbb320, 0x0, 0x0, tvsec=1 tvusec=0)
D (22895) httpd_parse: read_block: received HTTP request block size = 128
D (22905) httpd_parse: cb_header_value: processing value = tp://192.168.3.130/
D (22915) httpd_parse: cb_header_field: processing field = Accept-Encoding
D (22915) httpd_parse: cb_header_value: processing value = gzip, deflate
D (22925) httpd_parse: cb_header_field: processing field = Accept-Language
D (22935) httpd_parse: cb_header_value: processing value = ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7,it;q=0.6,de;q=0.5,la;q
D (22945) httpd_parse: parse_block: parsed block size = 359
D (22945) httpd_txrx: httpd_recv_with_opt: requested length = 128
lwip_recvfrom(53, 0x3f84bf33, 128, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x3f84b374
lwip_recv_tcp: buflen=8 recv_left=128 off=0
lwip_recv_tcp: deleting pbuf=0x3f84b374
lwip_recv_tcp: top while sock->lastdata=0x0
lwip_recv_tcp: netconn_recv err=-7, pbuf=0x0
lwip_recvfrom(53):  addr=192.168.3.155 port=59549 len=8
D (22985) httpd_txrx: httpd_recv_with_opt: received length = 8
D (22995) httpd_parse: read_block: received HTTP request block size = 8
D (22995) httpd_parse: cb_header_value: processing value = =0.4
D (23005) httpd_parse: cb_headers_complete: bytes read     = 395
D (23005) httpd_parse: cb_headers_complete: content length = 0
D (23015) httpd_parse: pause_parsing: paused
D (23015) httpd_parse: cb_no_body: message complete
D (23025) httpd_parse: httpd_parse_req: parsing complete
D (23025) httpd_uri: httpd_uri: request for /index.css with type 1
D (23035) httpd_uri: httpd_find_uri_handler: [0] = /
D (23045) httpd_uri: httpd_find_uri_handler: [1] = /login
D (23045) httpd_uri: httpd_find_uri_handler: [2] = /login
D (23055) httpd_uri: httpd_find_uri_handler: [3] = /setconfig
D (23055) httpd_uri: httpd_find_uri_handler: [4] = /getdata
D (23065) httpd_uri: httpd_find_uri_handler: [5] = /*
D (23065) HTTP: GET /index.css
D (23075) HTTP: session (id: 1) already allocated for remote: 192.168.3.155:59549, accept gzip: True
D (23085) HTTP: Get static file: index.css.gz
lwip_select(53, 0x0, 0x3ffccc80, 0x0, tvsec=10 tvusec=0)
lwip_selscan: fd=52 ready for writing
lwip_select: nready=1
lwip_send(52, data=0x3f847a2b, size=82, flags=0x0)
D (23095) httpd_txrx: httpd_resp_set_hdr: new header = Content-Encoding: gzip
lwip_send(52) err=0 written=82
lwip_selscan: fd=52 ready for reading
lwip_send(53, data=0x3f84bdcc, size=79, flags=0x0)
lwip_select: nready=1
lwip_select(53, 0x3ffbb260, 0x0, 0x0, tvsec=0 tvusec=0)
lwip_selscan: fd=52 ready for reading
lwip_select: nready=1
lwip_recvfrom(52, 0x3f847824, 1, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x0
lwip_send(53) err=0 written=79
D (23135) httpd_txrx: httpd_send_all: sent = 79
lwip_recv_tcp: netconn_recv err=0, pbuf=0x3fff2f40
lwip_recv_tcp: buflen=4 recv_left=1 off=0
lwip_recv_tcp: lastdata now pbuf=0x3fff2f40
lwip_send(53, data=0x3f40c744, size=16, flags=0x0)
lwip_recvfrom(52):  addr=192.168.3.211 port=1883 len=1
lwip_select(53, 0x3ffbb260, 0x0, 0x0, tvsec=0 tvusec=0)
lwip_selscan: fd=52 ready for reading
lwip_select: nready=1
lwip_send(53) err=0 written=16
lwip_recvfrom(52, 0x3f847825, 1, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x3fff2f40
D (23185) httpd_txrx: httpd_send_all: sent = 16
lwip_recv_tcp: buflen=3 recv_left=1 off=0
lwip_send(53, data=0x3f423a88, size=2, flags=0x0)
lwip_recv_tcp: lastdata now pbuf=0x3fff2f40
lwip_send(53) err=0 written=2
lwip_recvfrom(52):  addr=192.168.3.211 port=1883 len=1
lwip_select(53, 0x3ffbb260, 0x0, 0x0, tvsec=0 tvusec=0)
D (23215) httpd_txrx: httpd_send_all: sent = 2
lwip_selscan: fd=52 ready for reading
lwip_select: nready=1
lwip_send(53, data=0x3f40c09c, size=4, flags=0x0)
lwip_recvfrom(52, 0x3f847826, 2, 0x0, ..)
lwip_recv_tcp: top while sock->lastdata=0x3fff2f40
lwip_send(53) err=0 written=4
lwip_recv_tcp: buflen=2 recv_left=2 off=0
D (23245) httpd_txrx: httpd_send_all: sent = 4
lwip_send(53, data=0x3f423a8c, size=2, flags=0x0)
lwip_recv_tcp: deleting pbuf=0x3fff2f40
lwip_send(53) err=0 written=2
lwip_recvfrom(52):  addr=192.168.3.211 port=1883 len=2
D (23265) httpd_txrx: httpd_send_all: sent = 2
lwip_send(53, data=0x3f423a8c, size=2, flags=0x0)
lwip_select(53, 0x3ffbb320, 0x0, 0x0, tvsec=1 tvusec=0)
lwip_send(53) err=0 written=2
D (23285) httpd_txrx: httpd_send_all: sent = 2
lwip_send(53, data=0x3f80be50, size=38099, flags=0x0)
lwip_send(53) err=-14 written=0
!! buf_len: 38099, ret: -1
W (23525) httpd_txrx: httpd_sock_err: error in send : 104
D (23525) httpd_txrx: httpd_send_all: error in send_fn
D (23525) HTTP: Sending file 'index.css.gz' failed
D (23535) HTTP: Processing request 'index.css.gz' took 470 ms
W (23535) httpd_uri: httpd_uri: uri handler execution failed
D (23545) httpd: httpd_server: closing socket 53
lwip_close(53)
D (23555) httpd_sess: httpd_sess_delete: fd = 53
D (23555) HTTP: Deleting session (id: 1) for remote: 192.168.3.155:59549

Whireshark:

....
497	9.143964	192.168.3.155	192.168.3.130	HTTP	446	GET /index.css HTTP/1.1
498	9.150446	192.168.3.155	192.168.3.130	TCP	66	59551 → 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 WS=256 SACK_PERM=1
499	9.172242	192.168.3.130	192.168.3.155	TCP	60	80 → 59551 [SYN, ACK] Seq=0 Ack=1 Win=5744 Len=0 MSS=1436
<---- 104 error got here 
501	9.173402	192.168.3.155	192.168.3.130	HTTP	430	GET /index.js HTTP/1.1 
504	9.381737	192.168.3.130	192.168.3.155	TCP	60	80 → 59551 [ACK] Seq=1 Ack=377 Win=5368 Len=0
505	9.386604	192.168.3.130	192.168.3.155	TCP	60	80 → 59549 [ACK] Seq=2884 Ack=874 Win=4871 Len=0
....

Full pcap file is here:
104.zip

Wish that this helps to solve the problem.

@no1seman
Copy link

no1seman commented Aug 13, 2019

@anurag-kar,
tomorrow I've made some tests and got the following:

Single thread on Ethernet IF (10 iterations):
./http_test.sh -s 192.168.3.222 -i 10 -d tmp
TEST TOOK: 39 seconds
TOTAL REQUESTS: 140 - 100%
SUCCESS: 124 - 88%
FAIL TO LOAD: 0 - 0%
FAIL MD5 HASH: 16 - 11%
Full log:
ethernet.zip
All errors on resources larger that 32Kbytes

Single thread on WiFi IF (10 iterations):
./http_test.sh -s 192.168.3.130 -i 10 -d tmp
TEST TOOK: 31 seconds
TOTAL REQUESTS: 140 - 100%
SUCCESS: 127 - 90%
FAIL TO LOAD: 0 - 0%
FAIL MD5 HASH: 13 - 9%
Full log:
wifi.zip
All errors on resources larger that 32Kbytes

Dual thread test on Ethernet (2x10 iterations):
./http_test.sh -s 192.168.3.222 -i 1 -d tmp1 & ./http_test.sh -s 192.168.3.222 -i 1 -d tmp2
Thread1:
TEST TOOK: 46 seconds
TOTAL REQUESTS: 140 - 100%
SUCCESS: 130 - 92%
FAIL TO LOAD: 0 - 0%
FAIL MD5 HASH: 10 - 7%
Thread2:
TEST TOOK: 92 seconds
TOTAL REQUESTS: 140 - 100%
SUCCESS: 127 - 90%
FAIL TO LOAD: 0 - 0%
FAIL MD5 HASH: 13 - 9%
Most but not all failed resources larger than 32Kbytes

Full log:
dual_ethernet.zip

Explanation of test :

  1. I've added md5hash to every response header:
esp_err_t get_handler(httpd_req_t *req)
{
...
unsigned char md5[16];
char md5hex[34];
mbedtls_md5_ret((const unsigned char *)SEND_BUFFER, SEND_BUFFER_LEN, md5);
httpd_resp_set_hdr(req, "MD5HASH", array2hex((const uint8_t *)&md5, (char *)&md5hex, 16, false));
httpd_resp_send(req, SEND_BUFFER, SEND_BUFFER_LEN);
...
}

char atoc(uint8_t value, bool capital)
{
    if (value < 10)
    {
        return (char)(value + 48);
    }
    else
    {
        if (capital)
            return (char)(value + 55);
        else
            return (char)(value + 87);
    }
}

char *array2hex(const uint8_t *hex, char *target, uint8_t len, bool capital)
{
    int i, hop = 0;
    for (i = 0; i < len; i++)
    {
        target[hop] = atoc((hex[i] & 0xf0) >> 4, capital);
        target[hop + 1] = atoc((hex[i] & 0x0f), capital);
        hop += 2;
    }
    target[hop] = '\0';
    return (target);
}
  1. Wrote a little bash script:
#!/bin/bash
if [ $# == 6 ]; then
    while getopts s:i:d: option; do
        case "${option}" in
        s) SERVER=${OPTARG} ;;
        i) ITERATIONS=${OPTARG} ;;
        d) TMPDIR=${OPTARG} ;;
        esac
    done

else
    echo "Error: Insufficient number of arguments"
    echo "Usage: http_test.sh -s 192.168.3.222 -i 10 -d tmp"
    exit 1
fi

# SERVER=192.168.3.222
# ITERATIONS=10
# TMPDIR=tmp

DISTDIR=storage
echo "Running test for server=$SERVER, iterations=$ITERATIONS, tempdir=$TMPDIR"

RESOURCE=([0]="index.html" [1]="favicon.ico" [2]="index.css" [3]="index.js" [4]="getdata?res=config" [5]="schema.json" [6]="1.index.js" [7]="1.index.css" [8]="getdata?res=stats" [9]="getdata?res=ps" [10]="getdata?res=ram" [11]="getdata?res=net" [12]="getdata?res=fs" [13]="getdata?res=climate")
FILENAME=([0]="index.html.gz" [1]="favicon.ico" [2]="index.css.gz" [3]="index.js.gz" [4]="config.json" [5]="schema.json.gz" [6]="1.index.js.gz" [7]="1.index.css.gz" [8]="stats.json" [9]="ps.json" [10]="ram.json" [11]="net.json" [12]="fs.json" [13]="climate.json")
RESTYPE=([0]="binary" [1]="binary" [2]="binary" [3]="binary" [4]="json" [5]="binary" [6]="binary" [7]="binary" [8]="json" [9]="json" [10]="json" [11]="json" [12]="json" [13]="json")
MD5SUM=()

RESCOUNT=14

FAILTOLOAD=0
BADMDSUMM=0
OK=0
TOTAL=0

# Print last build static files md5 summs
for ((M = 0; M < $RESCOUNT; M++)); do
    if [[ ${RESTYPE[$M]} == "binary" ]]; then
        referencemd5hash=$(md5sum ./$DISTDIR/${FILENAME[$M]} | awk '{print $1}')
        filesize=$(stat -c%s "./$DISTDIR/${FILENAME[$M]}")
        echo ${FILENAME[$M]}" FILESIZE: $filesize bytes MD5 HASH: "$referencemd5hash
        MD5SUM+=($referencemd5hash)
    else
        MD5SUM+=("")
    fi
done

mkdir -p $TMPDIR
rm -f $TMPDIR/*

START=$(date +%s)

for ((J = 0; J < $ITERATIONS; J++)); do
    for I in ${!RESOURCE[*]}; do
        echo -e "Requesting ${RESOURCE[$I]}:"
        TOTAL=$((TOTAL + 1))
        md5header=$(curl http://$SERVER/${RESOURCE[$I]} -sD - --connect-timeout 10 --max-time 10 --header "Accept-Encoding: gzip, deflate" -o ./${TMPDIR}/${J}-${FILENAME[$I]} | grep "MD5HASH:" | awk '{print $2}')
        md5header="${md5header:0:-1}"
        echo "Header ${J}-${FILENAME[$I]} MD5 HASH: $md5header"
        if [[ $? -ne 0 ]]; then
            FAILTOLOAD=$((FAILTOLOAD + 1))
            echo -e "\x1b[31m${FILENAME[$I]} FAIL TO LOAD\x1b[0m"
        else
            md5file=$(md5sum ./$TMPDIR/${J}-${FILENAME[$I]} | awk '{print $1}')
            echo "File ./${TMPDIR}/${J}-${FILENAME[$I]} MD5 HASH: $md5file"
            if [[ "$md5file" == "$md5header" ]]; then
                OK=$((OK + 1))
                echo -e "\x1b[32mMD5 OK\x1b[0m"
            else
                BADMDSUMM=$((BADMDSUMM + 1))
                echo -e "\x1b[31m${FILENAME[$I]} FAIL MD5 HASH\x1b[0m"
            fi
        fi
    done
done

END=$(date +%s)
DIFF=$(($END - $START))

echo -e "\n"
echo -e "TEST TOOK: $DIFF seconds"
echo -e "TOTAL REQUESTS: $TOTAL - 100%"
echo -e "\x1b[32mSUCCESS: $OK - $(((OK * 100) / TOTAL))%\x1b[0m"
echo -e "\x1b[31mFAIL TO LOAD: $FAILTOLOAD - $(((FAILTOLOAD * 100) / TOTAL))%\x1b[0m"
echo -e "\x1b[31mFAIL MD5 HASH: $BADMDSUMM - $(((BADMDSUMM * 100) / TOTAL))%\x1b[0m"

#rm -R $TMPDIR

Ths script is simply curl http-resources from esp32 in cycle and checks md5 hash from HTTP response header with md5 hash of downloaded file and aggregate statistics.

  1. Run it in single and dual thread modes

PS Now I'm 99% sure that all mentioned in this issue thread problems has single source - errors in trasmission data between server and client.

@bigbrassbed
Copy link

@ no1seman -> possibly 99% is a little bit too much :-)
I observe the same behaviour, not sending but receiving data - as stated somewhere above transmitting data via softAP interface (STA works without problems).
I got it sometimes working when inserting a 'well tempered' delay between two consecutive calls to 'httpd_req_recv' avoiding: httpd_txrx: httpd_sock_err: error in recv : 11.
Within this delay wifi softAP dis/reconnects and then 'buffer management' seems to work.

@no1seman
Copy link

Well, I solved the problem by turnig on sockets debug in lwip (by setting #define SOCKETS_DEBUG LWIP_DBG_ON in lwipopts.h) but its not a solution, not a solution for production (

Adding 'lwip sockets debug' - I added delays and now everything works really fine. If adding delays helps - that means that there is some locks or logic errors in tcp_transport or lwip components and wish that they will be found and fixed.

@anurag-kar
Copy link
Contributor

@no1seman @bigbrassbed I thank you for your efforts and insights. It is quite a finding. I'll try and reproduce this behavior on my side.

In the meanwhile, @fshamshirdar could you please try introducing delays or use LWIP_DBG_ON to see if that fixes your issue as well?

@bigbrassbed
Copy link

confirmed for my situation receiving data (...when I eventually found the right place for switching on debug-messages and those appeared :-) )
to narrow it down lets see if one (or some) of the messages are essentual...

@bigbrassbed
Copy link

...unfortunately not allways helps enabling debug messages (... possibly not enough total 'delay' added)

@anurag-kar
Copy link
Contributor

anurag-kar commented Aug 15, 2019

@bigbrassbed and @no1seman Just to ensure that this is not due to some recent changes on master, could you please check your tests with IDF release v3.2.2 ? Because otherwise this kind of issue would have showed up at other places as well, not just in the context of the HTTP server.

Ths script is simply curl http-resources from esp32 in cycle and checks md5 hash from HTTP response header with md5 hash of downloaded file and aggregate statistics.

@no1seman, could you also post the difference between the expected and received files in case of checksum mismatch? I believe you already have local copies of the files that have been kept on the server. The comparison could help pinpoint if there is some pattern to the data loss (like only at the end of the file or only the beginning). If these mismatch patterns are not random then it could still help point to some issue in the httpd_resp_send() implementation.

@no1seman
Copy link

@anurag-kar,

  1. It's impossible for me without big changes to make my app run on 3.2.2 because i'm using to much specific 4.0 features.
  2. Here: Reconnection doesn't occur after: Error write data or timeout (IDFGH-1651) esp-mqtt#126 we trying to find out probably the same problem source.
  3. Example of normal and broken files:
    Normal:
    index.css.gz
    Broken:
    6-index.css.gz
    Normal:
    1.index.css.gz
    Broken:
    0-1.index.css.gz

@anurag-kar
Copy link
Contributor

anurag-kar commented Aug 15, 2019

38099

@no1seman With the wireshark logs that you provided, I was able to recreate the TCP stream :
stream.txt

What I notice is that there is no CRLF terminator to indicate the end of the headers section:

HTTP/1.1 200 OK
Content-Type: text/css; charset=UTF-8
Content-Length: 38099
Content-Encoding: gzip
<Data starts here>

The transaction contains two exchanges, so you can compare it to the first response which does have correct termination.

From the monitor log I can clearly make out that the server did send the CRLF to terminate the headers section:

D (23135) httpd_txrx: httpd_send_all: sent = 79 <-- All default headers
D (23185) httpd_txrx: httpd_send_all: sent = 16 <- Custom header "Content-Encoding"
D (23215) httpd_txrx: httpd_send_all: sent = 2 <- Trailing colon and space ": "
D (23245) httpd_txrx: httpd_send_all: sent = 4 <- Custom header value "gzip"
D (23265) httpd_txrx: httpd_send_all: sent = 2 <- CRLF to terminate current header
D (23285) httpd_txrx: httpd_send_all: sent = 2 <- CRLF to terminate header section (this is the one which is missing from the TCP stream sequence)
W (23525) httpd_txrx: httpd_sock_err: error in send : 104

From the index.css.gz that you have provided, comparing it toi the stream, it seems CRLF got replaced by two null characters!

Update: By comparing the expected and received files, I only see that the difference is replacement of two consecutive bytes with null characters. The packet size remains the same, just this starnge replacement happens once in a while. Interesting!

@no1seman
Copy link

no1seman commented Aug 15, 2019

@anurag-kar,
seems to me that exploring http headers problems before fixing overall problem of transmitting data has no any reason. If you will look into attached files from my previous post you will see that there's nothing common inside of them excluding 6-8 first bytes.
Also, the problem is that i'm using batched build and all all project parts: web, spiff fs, app, webpack update and so on always rebuilds from the strach, so [current build index.css.gz] != [previous build index.css.gz]

@anurag-kar
Copy link
Contributor

anurag-kar commented Aug 15, 2019

@no1seman Thanks for testing this! What I can make out is that the headers problem is occurring due to some glitch in transmission (like you said). At least it seems to happen during send().

If possible, could you please provide a wireshark capture during error in recv : 104 / httpd_parse: cb_headers_complete: incomplete termination of headers or any other parser errors? That will help confirm whether the transmission issue is also happening the other way or not.

@no1seman
Copy link

no1seman commented Aug 15, 2019

@anurag-kar,
I've made some simple test app (with logic similar to my main app and also I tried to set sdkconfig setttings similar to my main app) and published it on github:
https://github.com/no1seman/transmit_bug

Prerequisites:

  1. Any ESP32 board with at least 2Mb falsh and 4Mb SPI PSRAM
  2. Esp-idf: v4.0-dev-1443-g39f090a4f
  3. Build method: cmake
  4. Compiler version: xtensa-esp32-elf-gcc (crosstool-NG esp32-2019r1) 8.2.0

How to use:

  1. Set WiFi SSID and pasword in menuconfig
  2. Set flash settings if needed in menuconfig
  3. Build & flash & monitor (run_app.sh provided in root dir)
  4. In separate terminal run:
    './http_test.sh -s IP_ADDRESS_OF_ESP -i NUMBER_OF_TEST_ITRATIONS -d NAME_OF_TEMPORARY_DIR'
    (Example: './http_test.sh -s 192.168.3.130 -i 10 -d tmp')

On last test I've got the following result:
TEST 192.168.3.130 IN 'tmp' DIRECTORY TOOK: 34 seconds
TOTAL REQUESTS: 30 - 100%
SUCCESS: 26 - 86%
FAIL TO LOAD: 0 - 0%
FAIL MD5 HASH: 4 - 13%

Full log is here:
http_transmit.zip

Will you please try to test on you site?

@negativekelvin
Copy link
Contributor

#3624 (comment)

@anurag-kar
Copy link
Contributor

@no1seman Thanks! I'll try your test program this weekend.

@no1seman
Copy link

no1seman commented Aug 16, 2019

@negativekelvin,
thank you for idea about the issue, but:

  1. With SPI PSRAM switched on and with vanilla v4.0-dev-1443-g39f090a4f - test NOT passes.
  2. With SPI PSRAM switched off test passing without any erorrs.
  3. With SPI PSRAM switched on and with lib provided by @igrr from that post: mbedTLS memory allocation stragegy SPIRAM/default allocation mode breaks mbedTLS with gcc 8 (IDFGH-1336) #3624 (comment) - test still NOT passes.

I'm a little bit confused, because later I have an idea that there is a problem with SPI PSRAM, but mbedtls_md5_ret() (look into my code) works fine with buffer located in SPI PSRAM (in later versions of my test script I have checked md5 not only calculated by http server but compared it with one made on my host linux and it always was correct) , but when data transmitted by tcp_transport from SPI PSRAM it damages occasionally.

So, seems that the source of the problem found, sure that you will provide a fix soon.

PS When I switching on CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP = y with igrr's patched lib the things goes from bad to worth, test give much more errors that in other cases.

@bigbrassbed
Copy link

@anurag-kar; @no1seman made some more test:
the message:
W (123832) httpd_txrx: httpd_sock_err: error in recv : 11
translates into (whatever httpd_txrx is reporting with '11'):

lwip_recvfrom: netconn_recv err=-3, netbuf=0x0 => sockets.c around line 1038
lwip_recvfrom(61): buf == NULL, error is "Timeout."! => sockets.c around line 1056

basically we are sitting in netconn_recv_data waiting that the timeout occurs, I fiddled around with timeout-values, but 'wifi softAP' resets the connection (no messages are comming up for this event) long before the timeout is triggered. I haven't figured out so far what the triggering moment for wifi softAP is. (to recap: transmission via wifi STA shows no problems)

@no1seman
Copy link

@bigbrassbed,

are you using SPI PSRAM during you tests (CONFIG_ESP32_SPIRAM_SUPPORT==y)?

@bigbrassbed
Copy link

CONFIG_SPIRAM_SUPPORT=y

SPI RAM config

CONFIG_SPIRAM_BOOT_INIT=y
CONFIG_SPIRAM_IGNORE_NOTFOUND=
CONFIG_SPIRAM_USE_MEMMAP=
CONFIG_SPIRAM_USE_CAPS_ALLOC=y
CONFIG_SPIRAM_USE_MALLOC=
CONFIG_SPIRAM_TYPE_AUTO=y
CONFIG_SPIRAM_TYPE_ESPPSRAM32=
CONFIG_SPIRAM_TYPE_ESPPSRAM64=
CONFIG_SPIRAM_SIZE=-1
CONFIG_SPIRAM_SPEED_40M=
CONFIG_SPIRAM_SPEED_80M=y
CONFIG_SPIRAM_MEMTEST=y
CONFIG_SPIRAM_CACHE_WORKAROUND=y
CONFIG_SPIRAM_BANKSWITCH_ENABLE=y
CONFIG_SPIRAM_BANKSWITCH_RESERVE=8
CONFIG_WIFI_LWIP_ALLOCATION_FROM_SPIRAM_FIRST=y
CONFIG_SPIRAM_ALLOW_BSS_SEG_EXTERNAL_MEMORY=
CONFIG_SPIRAM_OCCUPY_HSPI_HOST=
CONFIG_SPIRAM_OCCUPY_VSPI_HOST=y

@bigbrassbed
Copy link

no difference -> wifi softAP still stops working (after a while)

@no1seman
Copy link

no1seman commented Aug 16, 2019

@bigbrassbed,

thank you for answer.

Don't know if it help you, but I modifyed my test app to support SoftAP: https://github.com/no1seman/transmit_bug_ap

I can't reproduce you bug ("W (206326) httpd_txrx: httpd_sock_err: error in send : 11"), but all problems that I have with Ethernet/Wifi STA - is the same: freezings, bad content and so on.

If you will change in provided above app sdkconfig, web-server settings, softAP settings and http headers similar to yours I can try to test on my site and also it would help to find out the bug faster.

Instructions is the same: #3851 (comment)

@bigbrassbed
Copy link

I got it somehow running :-)
I switched to the Master branch and enabled wifi log.
Basically it seems that we got memory pressure and then wifi softAP decides to reset the connection to the client.
Now when I configured on Win10 to connect automatically to the ESP32 softAP, Win10 triggers just after receiving the reset a new connect and then the softAP acknowledges the 'new' connection and continues to work on the task: eg:

first connect to softAP
I (104582) wifi: ap recv assoc/reassoc request
I (104582) wifi: new:<1,1>, old:<1,1>, ap:<1,1>, sta:<1,0>, prof:1
I (104582) wifi: bss=0x3f80604c, resp=16 join
I (104582) wifi: station: b0:35:9f:77:fa:97 join, AID=1, bgn, 40U
I (104592) wifi: join success, ap send assoc response
I (104612) wifi apsta: station connected
I (104942) tcpip_adapter: softAP assign IP to station,IP is: 192.168.4.2
now I launch a post request towards the ESP
I (118632) ota: esp_ota_begin
I (122202) ota: received byte 3863
I (122272) ota: received byte 1436 <== here we hang, softAP resets connection, but client is fast enough to reconnect (possibly sometimes as well triggered by esp-softAP resp=48?)
I (122752) wifi: ap recv assoc/reassoc request
I (122752) wifi: new:<1,1>, old:<1,1>, ap:<1,1>, sta:<1,0>, prof:1
I (122752) wifi: bss=0x3f80604c, resp=16 join
I (122762) wifi: station: b0:35:9f:77:fa:97 join, AID=1, bgn, 40U
I (122762) wifi: join success, ap send assoc response
I (122792) wifi apsta: station connected
I (122962) tcpip_adapter: softAP assign IP to station,IP is: 192.168.4.2
==> and yep we continue with transmission.
I (127982) ota: received byte 1436
I (128022) ota: received byte 4308
I (128092) ota: received byte 1436
I (128112) ota: received byte 4308
I (128192) ota: received byte 1436
.... (total approx 900KB)

so big question: why does softAP reset the connection?
sometimes messages like
E (441714) wifi: esf_buf: t=8 l=1600 max:32, alloc:32 no eb, TXQ_BLOCK=0
come up -> my guess memory pressure - possibly we don't release memory in a timely manner?

@zahednejad203
Copy link

Well, I solved the problem by turnig on sockets debug in lwip (by setting #define SOCKETS_DEBUG LWIP_DBG_ON in lwipopts.h) but its not a solution, not a solution for production (

Adding 'lwip sockets debug' - I added delays and now everything works really fine. If adding delays helps - that means that there is some locks or logic errors in tcp_transport or lwip components and wish that they will be found and fixed.

@no1seman could you please explain your solution a bit? How and where did you add a delay?

@no1seman
Copy link

no1seman commented Oct 18, 2019

@zahednejad203,
short answer: just set #define SOCKETS_DEBUG LWIP_DBG_ON (instead of #define SOCKETS_DEBUG LWIP_DBG_OFF) in $IDF_PATH/components/lwip/port/esp32/include/lwipopts.h and you will get lwip sockets debug info in log output. This will also add some delays in lwip stack because printing log message takes some CPU time.

long answer: my bug was related to PSRAM cache issue. So if you are using external PSRAM on ESP32 rev. 1 - just don't do it, because there is no final solution at that moment and as Expressif's said: #2892 (comment) it will be solved only in rev. 3 silicon. If you are not using external PSRAM - your problem is in smth else.

@ThomasRogg
Copy link

Or work in unicore mode, then SPI-RAM works.

@xenpac
Copy link

xenpac commented Apr 16, 2020

@neoniousTR
Im just having the same issues here.
So, how to switch to unicore mode?

@IgorTrud
Copy link

It seems to be that UniCore does not help.

httpd hangs randomly.

If WDT is disabled - system stay functional (i.e other tasks running),
pings answered, but no new http connections could be established.
Strictly - not HTTP, but TCP.
I looked at sniffer logs - see TCP SYN send to ESP, and no replys.
(ICMP is still there, esp respond to pings)

Same behaviour in AP and STA modes.
Board - ESP32-WROVER-DEVKIT-C ( with PSRAM).

`E (2969827) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (2969827) task_wdt: - IDLE0 (CPU 0)
E (2969827) task_wdt: Tasks currently running:
E (2969827) task_wdt: CPU 0: httpd
E (2969827) task_wdt: Aborting.

abort() was called at PC 0x400d4eef on core 0
0x400d4eef: task_wdt_isr at /Users/igor/esp/esp-idf/components/esp_common/src/task_wdt.c:179 (discriminator 1)

Backtrace:0x400973b2:0x3ffb06a0 0x40097115:0x3ffb06c0 0x4009a476:0x3ffb06e0 0x400d4eef:0x3ffb0750 0x40083802:0x3ffb0770 0x4000bfed:0x3ffbe3f0 0x4009a861:0x3ffbe400 0x4009b54a:0x3ffbe420 0x4009111a:0x3ffbe460 0x40172219:0x3ffbe480 0x4009387b:0x3ffbe4a0 0x40172522:0x3ffbe4c0 0x40094002:0x3ffbe520 0x400d335b:0x3ffbe5c0 0x400fdf20:0x3ffbe630 0x400fe02e:0x3ffbe670 0x4009a6e1:0x3ffbe690
0x400973b2: panic_abort at /Users/igor/esp/esp-idf/components/esp_system/panic.c:330

0x40097115: esp_system_abort at /Users/igor/esp/esp-idf/components/esp_system/system_api.c:68

0x4009a476: abort at /Users/igor/esp/esp-idf/components/newlib/abort.c:46

0x400d4eef: task_wdt_isr at /Users/igor/esp/esp-idf/components/esp_common/src/task_wdt.c:179 (discriminator 1)

0x40083802: _xt_lowint1 at /Users/igor/esp/esp-idf/components/freertos/xtensa/xtensa_vectors.S:1105

0x4009a861: vPortExitCritical at /Users/igor/esp/esp-idf/components/freertos/xtensa/port.c:419

0x4009b54a: xQueueGenericReceive at /Users/igor/esp/esp-idf/components/freertos/queue.c:1541

0x4009111a: sys_mutex_lock at /Users/igor/esp/esp-idf/components/lwip/port/esp32/freertos/sys_arch.c:82

0x40172219: sock_inc_used_locked at /Users/igor/esp/esp-idf/components/lwip/lwip/src/api/sockets.c:378

0x4009387b: tryget_socket_unconn_locked at /Users/igor/esp/esp-idf/components/lwip/lwip/src/api/sockets.c:489

0x40172522: lwip_selscan at /Users/igor/esp/esp-idf/components/lwip/lwip/src/api/sockets.c:1921

0x40094002: lwip_select at /Users/igor/esp/esp-idf/components/lwip/lwip/src/api/sockets.c:2058

0x400d335b: esp_vfs_select at /Users/igor/esp/esp-idf/components/vfs/vfs.c:985

0x400fdf20: httpd_server at /Users/igor/esp/esp-idf/components/esp_http_server/src/httpd_main.c:174

0x400fe02e: httpd_thread at /Users/igor/esp/esp-idf/components/esp_http_server/src/httpd_main.c:227

0x4009a6e1: vPortTaskWrapper at /Users/igor/esp/esp-idf/components/freertos/xtensa/port.c:143`

@IgorTrud
Copy link

This loop consume 100% of CPU,
and causing WDT.

If WDT is off - ESP not accept new connections.
.......
lwip_recvfrom_udp_raw: buflen=12
lwip_recvfrom_udp_raw(49): addr=127.0.0.1 port=63778 len=12
lwip_sendto(50, data=0x3ffbe570, short_size=12, flags=0x0 to=127.0.0.1 port=32768
lwip_select(63, 0x3ffbe640, 0x0, 0x0, tvsec=-1 tvusec=-1)
lwip_selscan: fd=48 ready for reading
lwip_selscan: fd=49 ready for reading
lwip_select: nready=2
lwip_recvfrom(49, 0x3ffbe600, 12, 0x0, ..)
lwip_recvfrom_udp_raw[UDP/RAW]: top sock->lastdata=0x0
lwip_recvfrom_udp_raw[UDP/RAW]: netconn_recv err=0, netbuf=0x3ffc53ec
lwip_recvfrom_udp_raw: buflen=12
lwip_recvfrom_udp_raw(49): addr=127.0.0.1 port=63778 len=12
lwip_sendto(50, data=0x3ffbe570, short_size=12, flags=0x0 to=127.0.0.1 port=32768
lwip_select(63, 0x3ffbe640, 0x0, 0x0, tvsec=-1 tvusec=-1)
lwip_selscan: fd=48 ready for reading
lwip_selscan: fd=49 ready for reading
lwip_select: nready=2
lwip_recvfrom(49, 0x3ffbe600, 12, 0x0, ..)
lwip_recvfrom_udp_raw[UDP/RAW]: top sock->lastdata=0x0
lwip_recvfrom_udp_raw[UDP/RAW]: netconn_recv err=0, netbuf=0x3ffc53ec
lwip_recvfrom_udp_raw: buflen=12
lwip_recvfrom_udp_raw(49): addr=127.0.0.1 port=63778 len=12
lwip_sendto(50, data=0x3ffbe570, short_size=12, flags=0x0 to=127.0.0.1 port=32768
lwip_select(63, 0x3ffbe640, 0x0, 0x0, tvsec=-1 tvusec=-1)
lwip_selscan: fd=48 ready for reading
lwip_selscan: fd=49 ready for reading
lwip_select: nready=2
lwip_recvfrom(49, 0x3ffbe600, 12, 0x0, ..)
lwip_recvfrom_udp_raw[UDP/RAW]: top sock->lastdata=0x0
lwip_recvfrom_udp_raw[UDP/RAW]: netconn_recv err=0, netbuf=0x3ffc53ec
lwip_recvfrom_udp_raw: buflen=12
.......

@shaeberling
Copy link

I am having the same issue with my ESP32 httpd server randomly hanging. I am currently on IDF version 4.2. This is problematic for having the ESP as a home automation note, and suddenly no longer responding.

Have there been any other root-causes or work-around identified?

I am thinking to switch to the mongoose webserver to see if it works more reliably.

@Louis-Riel
Copy link

@shaeberling Two things I tried that gave me some improvements. I am using the camera_web_server example of the esp-who kit.
1- Frank from Scientric pointed me in the right direction for the periodic jams of 30 seconds. NTP was gumming up the works.

Changed the receive timeout to 30 seconds in esp-idf/components/esp_http_server/include/esp_http_server.h

#define HTTPD_DEFAULT_CONFIG() {                        \
        .task_priority      = tskIDLE_PRIORITY+5,       \
        .stack_size         = 4096,                     \
        .core_id            = tskNO_AFFINITY,           \
        .server_port        = 80,                       \
        .ctrl_port          = 32768,                    \
        .max_open_sockets   = 7,                        \
        .max_uri_handlers   = 8,                        \
        .max_resp_headers   = 8,                        \
        .backlog_conn       = 5,                        \
        .lru_purge_enable   = false,                    \
        .recv_wait_timeout  = 30,                       \//<---- This one here
        .send_wait_timeout  = 5,                        \
        .global_user_ctx = NULL,                        \
        .global_user_ctx_free_fn = NULL,                \
        .global_transport_ctx = NULL,                   \
        .global_transport_ctx_free_fn = NULL,           \

https://scientric.com/2019/11/07/esp32-cam-stream-capture/
2- See if that helps, it was a factor for me, even though I was using a different board.
espressif/arduino-esp32#2659
https://www.youtube.com/watch?v=NvmyCBbTGPs&feature=youtu.be
Waiting on my ipex wifi antenna to see if it fixes the problem, but I go from 0.4fps to 10+ while I handle the board in my hand.

@shaeberling
Copy link

Thank you @Louis-Riel . I am not sure this is the problem that I have.

While I haven't reproduced it yet while I had it connected to read the logs, I think it might be due to the core going to 99% and then no longer handling http requests, since I can still ping it.

Since I have no easy way to repro, I decided to with mongoose a try. It was pretty easy for me to add it to the code base and it has been working flawlessly so far.

@shaeberling
Copy link

I finally root-caused the issue and have an easy repro.

Browsers by default keep the connection to the server alive to speed up future requests. However, by disconnecting the client device from the network before it is able to close the connection results in that socket remaining open. Eventually, we run out of the maximum configured socket count and the server stops responding.

Here is an easy repro:
Setup: Connect the ESP32 to the wifi and start-up either httpd or mongoose. Have a phone ready, I used a Pixel phone with Chrome, on the same Wifi network.

  1. Make a request to the webserver running on the ESP32 using Chrome on your phone.
  2. Now disconnect the phone from WIFI by disabling it. (This is crucial, since the socket will now remain open, as the client wasn't able to close it properly before losing connection to the network.
  3. Re-enable WIFI on your phone.
  4. GOTO 1 and repeat

The default max open socket count for httpd is 10, so after about 10 requests you will see that it is no longer responding. When using mongoose, you'll instead get a CONNECTION RESET instead.

My solution for now is to send Connection: close in the HTTP response header so that Chrome and other browsers close the connection immediately. But it bugs me that I am dependent on the client behaving well, and it thus being very easy to DOS the device.

Am I missing a timeout setting that force closes open sockets after they have been idle for a while?

@anurag-kar
Copy link
Contributor

@shaeberling Maybe setting lru_purge_enable to true in the config structure would help. That should close the least recently used socket for any new connection request when max open socket count is reached.

@mahavirj
Copy link
Member

@no1seman @fshamshirdar PSRAM issue has been fixed now, please try with 2020r2 toolchain on IDF master branch. (Ref: aa25b16)

@shaeberling Did above suggestion help you?

@SamyRICHET
Copy link

I reproduce the same issue on my project with lru_purge_enable set to true.

@shubhamkulkarni97
Copy link
Contributor

shubhamkulkarni97 commented Jun 25, 2020

@shaeberling @SamyRICHET I was able to reproduce the issue with steps mentioned here. Can you try the patch attached below and check if it fixes the issue (when lru_purge_enable is set to true).

Please let us know if the patch works for you.

fix_lru_counter.patch.zip

@SamyRICHET
Copy link

@shubhamkulkarni97 For me, this patch work on my project. Thanks you

@sergioaiello
Copy link

sergioaiello commented Sep 24, 2022

Hi.
I'm sorry but the patch does not work to me
The httpd stops ("W (xxxxx) httpd_txrx: httpd_sock_err: error in recv : 11") in case the communication between a client and a file server stops suddenly (eg power down reset of the client) so that the ongoing file transfer cannot be completed.

loganfin pushed a commit to Lumenaries/esp_http_server that referenced this issue Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests