Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WDT on repetitive connection when distant server is slow to free conn or respond #69

Closed
jc999 opened this issue Oct 2, 2019 · 5 comments
Labels

Comments

@jc999
Copy link

jc999 commented Oct 2, 2019

Hello all,

I am using 16 esp32, sending POST to a small http server & receiving udp msg. I am getting an wdt on wifi core after some hours of processing, on some esp32.

version :
PLATFORM: Espressif 32 1.10.0,
arduino-esp32 v1.0.3
AsyncTCP 1.1.0

The situation :

I have a server sending every 10 sec a broadcast udp request to multiple ESP32 ( wemos d1 esp32). When the esp32 receives the udp request, it sends a tcp POST, with asynctcp, to the server.
Everything work fine with 3 esp32.
However, with 16 esp32, I got wdt after a few hours, on some device. I guess it's when the server is not fast enough to handle the requests at once.
The server is a simple java HttpServer. The UDP sender is a simple app in java sending the packet.

on the ESP, udp server is WiFiUDP, read in the main loop

void readUdpControl() {
  int count = udpControl.parsePacket();
  if (count) { 
    String buf = udpControl.readString();
    processNetControl(buf);
  }
}

in the following code, getWDPostInfo returns a arduinojson'ised' structure as string.

(the watchdog wording here is not linked with espressif wdt...

void watchDogAsync() {
  if (tcpClient) { // client already exists
    log("ASYNCTCP-Stooping: client already exists");
    return;
  }

  watchdogContent = getWDPostInfo().c_str();

  tcpClient = new AsyncClient();
  if (!tcpClient) { // could not allocate client
    log("ASYNCTCP-Stooping:  could not allocate client");
    return;
  }

  tcpClient->onError([](void *arg, AsyncClient *client, int error) {
    tcpClient = NULL;
    delete client;
  }, NULL);

  tcpClient->onConnect([](void *arg, AsyncClient *client) {

    client->onError(NULL, NULL);

    client->onData([](void *arg, AsyncClient *c, void *data, size_t len) {},
                   NULL);

    client->onDisconnect([](void *arg, AsyncClient *c) {
      tcpClient = NULL;
      delete c;
      Serial.println("Dis");

    }, NULL);

    tcpClient->write(watchdogContent.c_str());
    // tcpClient->stop();
  }, NULL);

  if (!tcpClient->connect(udpOTA.remoteIP().toString().c_str(), 23780)) {
    log("ASYNCTCP-Connect Fail");
    AsyncClient *client = tcpClient;
    tcpClient = NULL;
    delete client;
  }
}
ASYNCTCP-Sending Async...
E (31521402) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (31521402) task_wdt:  - IDLE0 (CPU 0)
E (31521402) task_wdt: Tasks currently running:
E (31521402) task_wdt: CPU 0: wifi
E (31521402) task_wdt: CPU 1: IDLE1
E (31521402) task_wdt: Aborting.
abort() was called at PC 0x400f69e0 on core 0


Backtrace: 0x4008f024:0x3ffbe160 0x4008f255:0x3ffbe180 0x400f69e0:0x3ffbe1a0 0x40083469:0x3ffbe1c0 0x4014aea5:0x3ffafab0 0x4014b441:0x3ffafae0 0x40092171:0x3ffafb10 0x4008af9d:0x3ffafb50

c:\Users\jc\SynologyDrive\Liti>java -jar EspStackTraceDecoder.jar c:/Users/jc/.platformio/packages/toolchain-xtensa32/bin/xtensa-esp32-elf-addr2line.exe c:/Users/jc/SynologyDrive/Liti/vsCode/LitiController/.pio/build/esp32doit-devkit-v1/firmware.elf dump.txt
Exception Cause: Not found

0x400f69e0: task_wdt_isr at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/esp32/task_wdt.c:252
0x4008f024: invoke_abort at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/esp32/panic.c:707
0x4008f255: abort at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/esp32/panic.c:707
0x400f69e0: task_wdt_isr at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/esp32/task_wdt.c:252
0x40083469: _xt_lowint1 at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/xtensa_vectors.S:1154
0x4014aea5: lmacProcessTxRtsError at ??:?
0x4014b441: lmacProcessTxComplete at ??:?
0x40092171: ppTask at ??:?
0x4008af9d: vPortTaskWrapper at /Users/ficeto/Desktop/ESP32/ESP32/esp-idf-public/components/freertos/port.c:355 (discriminator 1)

Since everything works fine when using 3 devices, I guess the problem origin when using all my esp32 is my server not handling stuff quickly enough. But AsyncTCP should take care of this and not letting wdt'd.

How can I be assured asynctcp is using a timeout on all steps of the connexion ?

@jc999 jc999 changed the title WDT on repetitive connection when server is slow to free conn or respond WDT on repetitive connection when distant server is slow to free conn or respond Oct 2, 2019
@me-no-dev
Copy link
Owner

it's not purely AsyncTCP that does this. Your ESP is too busy to have spare cycles to run the idle task for 5 seconds. I could introduce maybe artificial delay in the task somehow....

@jc999
Copy link
Author

jc999 commented Oct 2, 2019

hmm .. but the esp are running fine if I run only 2 or 3 of them... when using all of them (same binary code... ) the problems start. the esp is not doing more stuff when running alone or with a lot of other esp... (1 udp broadcast received in sync/loop, 1 tcp sent to the broadcaster with async).
which could mean that some network task are taking too long, depending on the external http server. Thus, in this case ( in case of it's linked), a timeout could be a solution to kill the conn...

@iafilius
Copy link

iafilius commented Oct 26, 2019

A few questions/tips to look into:

regarding webserver:
The java webserver isn’t esp32, just random webserver?
Are tcp sessions nicely closed after each request? Or do you keep it (until session cache timeout?)
Whats does netstat -anp say on webserver.
Is it somehow predictable?
doing a 16x simultanous connect to one service may/will hit the tcp backlog queue/mechanism.
Having a tcp backlog setting lower then the simultanous requests will keep the other clients waiting, by not sending a response.
Depending on TCP stack settings the SYN-retry may be 3 seconds or so.
If the/your code operates in blocking mode at this point it is easy to get the WDT triggered.
I'm curious to have a look at:
tcpdump/capture of the traffic, and a pointer to the exact time to look at.
netstat -anp during the issue, and when it just runs fine.

Can you share the actual code (server + ESP32)? so we can have a more close look?
Wdt mask’s the issue, i would suggest to disable it for debugging purposes and see what your issue really is.

In case not into the tcp backlog mechanism, a linkk which i thing desribes it quite in depth.
http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html

it is not completely clear to me from where watchDogAsync is called, and if you use custom watchdog or asynctcp's buildin.

Regards,

Arjan

@stale
Copy link

stale bot commented Dec 25, 2019

[STALE_SET] This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 25, 2019
@stale
Copy link

stale bot commented Jan 8, 2020

[STALE_DEL] This stale issue has been automatically closed. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants