Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TW#16112] nvs_get_str() seams to causes lock (reproducable on arduino-esp32) #1157

Closed
Curclamas opened this issue Oct 23, 2017 · 11 comments
Closed

Comments

@Curclamas
Copy link
Contributor

Description

This error is reproducible on custom boards as well as ESP32-DevBoard:
When nvs_get_str() is called and there is another task that runs on higher priority, the ESP32 can inadvertently lock up.

Reproduction

The error is described and reproduced in arduino-esp32-issue: espressif/arduino-esp32#740
It seams to be rooted in IDF though.

@igrr
Copy link
Member

igrr commented Oct 23, 2017

I think this was fixed in b013f5d, but I am not sure which IDF commit you are using. Can you please mention the commit ID?

@Curclamas
Copy link
Contributor Author

@igrr Now I've tried it with the latest IDF from arduino-esp32 which is 9274814 (should be after b013f5d). The device still freezes...

@Curclamas
Copy link
Contributor Author

One additional piece of Information: after several crashes it appears that the ESP32 does not boot. Only the following can than be seen in the log-output:

preferences.getString:  <<<---- Here it crashes
ets Jun  8 2016 00:22:57

rst:0x7 (TG0WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, ⸮  <<<---- Nothing below here

@FayeY FayeY changed the title nvs_get_str() seams to causes lock (reproducable on arduino-esp32) [TW#16112] nvs_get_str() seams to causes lock (reproducable on arduino-esp32) Oct 30, 2017
@igrr
Copy link
Member

igrr commented Oct 30, 2017

Could you please try the attached patch?

Based on my testing, this solves the issue you have found. There are still periods of time when loop_task doesn't run because coreTask is taking all the CPU time (for a few seconds), but there is no more deadlock, eventually TLS connection fails in coreTask, and loop_task runs again.

From 0e65e67480b5f5a6f452aa785a6ec64764a78866 Mon Sep 17 00:00:00 2001
From: Ivan Grokhotkov <ivan@espressif.com>
Date: Mon, 30 Oct 2017 18:53:39 +0800
Subject: [PATCH] spi_flash: raise priority of the task performing spi_flash
 operation

Fixes https://github.com/espressif/arduino-esp32/issues/740
---
 components/spi_flash/cache_utils.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/components/spi_flash/cache_utils.c b/components/spi_flash/cache_utils.c
index 53a0caf8e..cc1f56929 100644
--- a/components/spi_flash/cache_utils.c
+++ b/components/spi_flash/cache_utils.c
@@ -110,6 +110,11 @@ void IRAM_ATTR spi_flash_disable_interrupts_caches_and_other_cpu()
         assert(other_cpuid == 1);
         spi_flash_disable_cache(other_cpuid, &s_flash_op_cache_state[other_cpuid]);
     } else {
+        // Temporarily raise current task priority to prevent a deadlock while
+        // waiting for IPC task to start on the other CPU
+        TaskHandle_t self = xTaskGetCurrentTaskHandle();
+        int old_prio = uxTaskPriorityGet(self);
+        vTaskPrioritySet(self, configMAX_PRIORITIES - 1);
         // Signal to the spi_flash_op_block_task on the other CPU that we need it to
         // disable cache there and block other tasks from executing.
         s_flash_op_can_start = false;
@@ -121,6 +126,8 @@ void IRAM_ATTR spi_flash_disable_interrupts_caches_and_other_cpu()
         }
         // Disable scheduler on the current CPU
         vTaskSuspendAll();
+        // Can now set the priority back to the normal one
+        vTaskPrioritySet(self, old_prio);
         // This is guaranteed to run on CPU <cpuid> because the other CPU is now
         // occupied by highest priority task
         assert(xPortGetCoreID() == cpuid);
-- 
2.13.5 (Apple Git-94)

@Curclamas
Copy link
Contributor Author

Curclamas commented Oct 30, 2017

@igrr after this patch the board no longer freezes. But it seams that the HTTPS-Task eventually ceases to run, I have yet to find out where.

EDIT: Disabling the scheduler and changing the priority and back seams a little bit harsh/complicated. Wouldn't some kind of lock or a critical section also work here?

@igrr
Copy link
Member

igrr commented Dec 8, 2017

The scheduler needs to be disabled to prevent switching to other tasks during flash operations. Other tasks may be (and probably are) running code from cache, but the cache is disabled during flash operation. So a task switch during flash operation would likely cause an immediate crash.

Effectively, we want the task doing the flash operation to be temporary elevated to highest priority. There is no "normal" way to do this in FreeRTOS, using locks and critical sections only.

@Curclamas
Copy link
Contributor Author

Curclamas commented Dec 8, 2017 via email

@Curclamas
Copy link
Contributor Author

Curclamas commented Jun 22, 2018

@igrr While comparing my branch with master: this still does not seam to be upstream right?
Any news on this issue? Because I'd love to keep the differences between my branch and master small. Especially since I want to move to a long time support / release version soon.

@igrr
Copy link
Member

igrr commented Jun 25, 2018

Sorry for the delay @Curclamas, I've pushed this patch for review now. Should be in IDF master branch soon.

@Curclamas
Copy link
Contributor Author

Curclamas commented Jun 25, 2018 via email

@igrr igrr closed this as completed in f9f2937 Jul 4, 2018
@kealist
Copy link

kealist commented Aug 2, 2018

I know I am resurrecting this, but I don't see this change propageted into esp-idf and I am experiencing the same issue with the above fix resolving the issue.

catalinio pushed a commit to catalinio/pycom-esp-idf that referenced this issue Jun 28, 2019
The fix is for the situation when cache disabling mechanism causes
a deadlock with user tasks. Situation is as follows:

1. spi_flash operation is started from low-priority task on CPU0
2. It uses IPC to wake up high-priority IPC1 task on CPU1, preventing
   all other tasks on CPU1 from running. This is needed to safely
   disable the cache.
3. While the task which started spi_flash operation is waiting for IPC1
   task to acknowledge that CPU1 is not using cache anymore, it is
   preempted by a higher priority application task ("app0").
4. Task app0 busy-waits for some operation on CPU1 to complete. But
   since application tasks are blocked out by IPC1 task, this never
   happens. Since app0 is busy-waiting, the task doing spi flash
   operation never runs.

The more or less logical soltion to the problem would be to also do
cache disabling on CPU0 and the SPI flash operation itself from IPC0
task. However IPC0 task stack would need to be increased to allow doing
SPI flash operation (and IPC1 stack as well). This would waste some
memory. An alternative approach adopted in this fix is to call FreeRTOS
functions to temporary increase the priority of SPI flash operation task
to the same level as the IPC task.

Fixes espressif/arduino-esp32#740
Fixes espressif/esp-idf#1157
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants