Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watchdog reset dependent on delay? #2493

Closed
Lucianovici opened this issue Feb 22, 2019 · 13 comments
Closed

Watchdog reset dependent on delay? #2493

Lucianovici opened this issue Feb 22, 2019 · 13 comments

Comments

@Lucianovici
Copy link

Is it possible that within a task running on Core 0 the esp_task_wdt_reset() to fail to reset the WD timer?

Board: ESP-WROOM-32 WEMOS.

Please check the following example with and without the delay() before the WD reset function.


#include "Arduino.h"
#include "esp_task_wdt.h"

#define TIMEOUT 3

void setup() {
    Serial.begin(9600);

    Serial.printf("esp_task_wdt_init: %d\n", esp_task_wdt_init(TIMEOUT, false));

    enableLoopWDT();

    Serial.printf("Setup finished on Core: %d\n", xPortGetCoreID());
}

void doOnCore1(void *args) {
    Serial.printf("Task on Core: %d\n", xPortGetCoreID());
    Serial.printf("esp_task_wdt_add: %d\n", esp_task_wdt_add(NULL));
    Serial.printf("esp_task_wdt_status: %d\n", esp_task_wdt_status(NULL));

    long increment = 10000000;
    while (increment > 0) {
        increment--;
        delay(1);  // <--- HERE, Why is this required for ESP32 to proper reset the WD timer?
        esp_task_wdt_reset();
    }

    Serial.printf("esp_task_wdt_delete: %d\n\n", esp_task_wdt_delete(NULL));
    vTaskDelete(NULL);
}

void loop() {
    delay(2000);
    Serial.printf("Loop on Core: %d\n\n", xPortGetCoreID());
    xTaskCreatePinnedToCore(&doOnCore1, "doOnCore1", configMINIMAL_STACK_SIZE * 10, NULL, 2 | portPRIVILEGE_BIT, NULL, PRO_CPU_NUM);
}

If I don't use the delay I get this:

Task on Core: 0
esp_task_wdt_add: 0
esp_task_wdt_status: 0
Loop on Core: 1

Task on Core: 0
esp_task_wdt_add: 0
esp_task_wdt_status: 0
E (10183) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (10183) task_wdt:  - IDLE0 (CPU 0)
E (10183) task_wdt: Tasks currently running:
E (10183) task_wdt: CPU 0: doOnCore1
E (10183) task_wdt: CPU 1: IDLE1

When delay is called, then everything works as expected, resetting the WD timer.

Thanks.

@lbernstone
Copy link
Contributor

lbernstone commented Feb 22, 2019

This is the IDLE task (ie, the scheduler) complaining that it is not getting any time. If you do not want this behavior, it can be disabled with disableCore0WDT(). Note that it is there for a reason,and real time behavior can be lost if you lock up the cpu.

@Lucianovici
Copy link
Author

I do want to use both cores and I do want to have a watchdog on both cores.

Did you check my example? The question is why it is reseting the watchdog timer only when calling the delay function and not otherwise.

Thanks!

@lbernstone
Copy link
Contributor

It is not your task that is tripping the watchdog. It is IDLE0 (read your error carefully). By putting in a delay, the scheduler gives time to IDLE0, so it does not panic.

@Lucianovici
Copy link
Author

@lbernstone Thanks for your response.

I read it very carefully and it is the CPU 0 that is not resetting the WD timer, although CPU 1 - where the loop is running appear idle.

Please understand that by calling enableLoopWDT I enable this if branch of code that helps CPU 1 to reset the WD timer.

If you remove esp_task_wdt_reset you'll get this:

Loop on Core: 1

Task on Core: 0
esp_task_wdt_add: 0
esp_task_wdt_status: 0
E (6206) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (6206) task_wdt:  - loopTask (CPU 1)
E (6206) task_wdt: Tasks currently running:
E (6206) task_wdt: CPU 0: IDLE0
E (6206) task_wdt: CPU 1: IDLE1

Please note the difference.
Thank you.

@atanisoft
Copy link
Collaborator

atanisoft commented Feb 23, 2019

PRO_CPU_NUM is defined as core zero, to pin to core one you will want APP_CPU_NUM likely..

However, there is a subtle flaw in your code posted originally. Each loop iteration appears to create another task without checking if it has already created it or not. This will very likely crash St some point as it will run out of heap or other unknown behaviors.

Now, expanding on what @lbernstone wrote. In your task you are firing off on PRO CPU, if you do not have any form of delay in the task the task scheduler will not have any CPU time to run and that will trigger the WDT. The enableLoopWDT method enables the WDT for the APP CPU where both setup() and loop() are called from. The esp_task_wdt_reset API only resets for the "current" task and not others that may also run on the same core and not all tasks are tracked by the WDT.

@Lucianovici
Copy link
Author

Lucianovici commented Feb 23, 2019

@atanisoft Thanks for your fast reply.

Indeed I acknowledge the flaw, I just put something together really fast to demonstrate that the esp_task_wdt_reset is not doing its job in a task pinned to CPU 0, unless I call that delay(1).

Please not that I used esp_task_wdt_add to ensure that the task is subscribed to the Task Watchdog Timer.

Could you reproduce this behaviour?
Thank you very much.

@atanisoft
Copy link
Collaborator

WDT is doing it's job, the problem is in your task. It prevents the task scheduler from running on the PRO core since it monopolizes all CPU cycles. Without having delay/vTaskDelay/etc the task scheduler never runs and the idle task for the core (scheduler) will not reset the WDT for it's task and that will trigger the WDT.

So no the WDT is not dependent on delay. WDT is dependent on cooperative scheduling of all running tasks on a core.

And yes I can easily reproduce this behavior with your code or my own as there is a critical flaw in the cooperative nature of the task.

@Lucianovici
Copy link
Author

OK - let me get this right. Isn't the ESP32 able to run simultaneously on both cores?

Well it should as per tech spec. So why do I need to give control back to the CPU 1 where the loop is running, in order for that to reset the WD?

This is very much intentional, to monopolize all CPU cycles on CPU 0, that should be resetting the WD itself.

I'm pretty sure I didn't have this kind of issues while doing similar thing in ESP-IDF, but I can provide a proof of concept code to demonstrate that each core should be able to reset the WD while heavily using each core without "windows" (delay).

Am I missing something here? How can I heavily use both cores in parallel, that are able to reset the WD individually?

Thank you for your detailed explanation so far.

BTW: I moved the creation of the new task on CPU 0 at the end of setup() to avoid the flaw you mentioned, since now I create only one task. Of course same behaviour.

@atanisoft
Copy link
Collaborator

OK - let me get this right. Isn't the ESP32 able to run simultaneously on both cores?

Yes, it is very capable of doing just that.

So why do I need to give control back to the CPU 1 where the loop is running, in order for that to reset the WD?

The usage of delay (or variants) does not shift control between cores. It shifts control between TASKS running on the same core (unless the task floats between cores in which case that is a different story).

I'm pretty sure I didn't have this kind of issues while doing similar thing in ESP-IDF

What you are doing here is no different than running in ESP-IDF. The same problem would be easily displayed via ESP-IDF as it is starving the task scheduler which leads to the WDT triggering for that core.

Am I missing something here? How can I heavily use both cores in parallel, that are able to reset the WD individually?

The implementation on the ESP32 is such that each core operates independently with a task scheduler running on each core (IDLE0, IDLE1). These schedulers must have a few cycles periodically otherwise the WDT will be triggered as it assumes the task is hung. The usage of delay (which calls vTaskDelay internally) is simply to allow the scheduler to do it's maintenance activities, shift to another task with higher priority, etc.

you could look at vTaskDelay(0) which is a light weight way to inform the scheduler that it shift to another higher priority task if there is one, if there isn't one it will return almost immediately (WDT will be reset in this case for the scheduler).

@lbernstone
Copy link
Contributor

lbernstone commented Feb 23, 2019

Here is sample code. Unremarking either line will prevent WDT timeout. The first works with the OS, feeding the scheduler (every 2^12 passes). The other just turns off the scheduler WDT.

void locker(void *args) {
  for(uint32_t x=0; x<UINT32_MAX; x++) {  
    Serial.println(x);
    //if (x == x >> 12 << 12) delay(1);
  }
  vTaskDelete(NULL);
}

void setup() {
  Serial.begin(115200);
  //disableCore0WDT();
  xTaskCreatePinnedToCore(&locker, "locker", 2048, NULL, 5, NULL, 0);
}
void loop() {}

@Lucianovici
Copy link
Author

@lbernstone Thanks for the demo code.
@atanisoft Excellent explanation. Thanks.

Alright I can understand now, I also made a proof of concept with ESP-IDF.

I have read the docs https://docs.espressif.com/projects/esp-idf/en/latest/api-reference/system/freertos.html#

I know it's not the appropriate place, but would you please be kind and answer to this question, while closing this issue:

What's the best practice when dealing with a timer interrupt, that is creating a task pinned to the other core, so it is entirely focused on sensitive blocking RS485 IO operations?
Considering that I want both cores to be subscribed to the watchdog timer, how can I reliable do that?

Of course I can just add a delay, but is this the correct approach?

I really appreciate your input here.
Thanks.

@lbernstone
Copy link
Contributor

If it is an interrupt, use interrupts along with queues. You don't need to micromanage and bitbang when there is a driver built in to the system.

@Lucianovici
Copy link
Author

Thanks.

It is much more complex than that. I saw it is a decent practice to give some windows for scheduler to feed the watchdog within the heavy blocking tasks. So I'm going to do just that.

I appreciate your support in helping me figuring out.
Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants