-
Notifications
You must be signed in to change notification settings - Fork 7.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pm: Fix atomicity of rtos idle lock acquiring (IDFGH-876) #3110
pm: Fix atomicity of rtos idle lock acquiring (IDFGH-876) #3110
Conversation
It's possible for esp_pm_impl_isr_hook(...) to be nested due to the fact that interrupts are nested on the ESP32. To fix this we need to place the acquiring of the lock into a critical section to ensure it does not get nested on the system, otherwise the system will never release the idle lock when this occurs and will not go into lower power states. A sample backtrace encountering this (the code was instrumented to go into a while(1) loop when the condition was hit to get this backtrace) from commit d7a7a68: #0 leave_idle () at esp-idf/components/esp32/pm_esp32.c:444 espressif#1 0x4008143a in esp_pm_impl_isr_hook () at esp-idf/components/esp32/pm_esp32.c:473 espressif#2 0x40082750 in _xt_medint2 () at esp-idf/components/freertos/xtensa_vectors.S:1243 espressif#3 0x4000bff0 in ?? () espressif#4 0x40090bb0 in vTaskExitCritical (mux=0x3ffbd230) at esp-idf/components/freertos/tasks.c:4304 espressif#5 0x40081758 in esp_pm_lock_acquire (handle=0x3ffbd218) at esp-idf/components/esp32/pm_locks.c:126 espressif#6 0x40081399 in leave_idle () at esp-idf/components/esp32/pm_esp32.c:440 espressif#7 0x4008143a in esp_pm_impl_isr_hook () at esp-idf/components/esp32/pm_esp32.c:473 espressif#8 0x400826b8 in _xt_lowint1 () at esp-idf/components/freertos/xtensa_vectors.S:1154 espressif#9 0x400d14b0 in esp_pm_impl_waiti () at esp-idf/components/esp32/pm_esp32.c:483 espressif#10 0x400d2c77 in esp_vApplicationIdleHook () at esp-idf/components/esp32/freertos_hooks.c:63 espressif#11 0x40091008 in prvIdleTask (pvParameters=0x0) at esp-idf/components/freertos/tasks.c:3412 espressif#12 0x40090344 in vPortTaskWrapper (pxCode=0x40090ffc <prvIdleTask>, pvParameters=0x0) at esp-idf/components/freertos/port.c:143 Signed-off-by: Tim Nordell <tim.nordell@nimbelink.com>
Thanks for submitting this @tim-nordell-nimbelink! As we were running some stress tests with this change applied, we ran into a rare crash which is still being investigated. The fix itself looks correct, but we want to make sure we understand the cause of this new crash before we merge this. Will update when we have some progress. |
Hi @igrr - Thanks for looking it over. Hopefully you can find the root cause of the new crash that you're observing! I can understand that a crash that happens rarely is hard to debug, and testing is always a good thing. Good luck! |
@tim-nordell-nimbelink The issues I have previously observed were due to the fact that preemption may happen not only inside
Please check the attached patch, it moves the interrupt lock into |
Hi @igrr - It should still maintain the fix for the issue I saw; I'm not running the update_ccompare() code here since I'm running on a single core for power consumption reasons. I do think I see a potential problem within on_freq_update(...) though - is it possible for both cores to run this code simultaneously? If so, then:
Another scenario:
It'd be a race condition for it to occur with enough interrupt activity at just the right time, but I bet it's possible. I think you need a lock that spans both cores (instead of the two cores independently) around the setting of s_ccount_div/s_ccount_mul and clearing of those. I'll never see this race condition myself since I'm running a single core. That might allow for the lock proposed in this pull request to remain where it is if you indeed need the multi-core lock since the setting of s_need_update_ccompare[...] is solely within on_freq_update(...).
|
@tim-nordell-nimbelink Sorry that I haven't noticed your comment before.
I think this situation should not happen due the check at the top of The patch attached above has been merged in d31ee80. Please let me know if you see any related issues. And thanks for contributing! |
Follows the approach proposed in espressif#3110, but masks the interrupts during the entire ISR hook, not only during leave_idle. Interrupt nesting during update_ccompare may also cause issues. Closes espressif#3057
It's possible for esp_pm_impl_isr_hook(...) to be nested due to the fact
that interrupts are nested on the ESP32. To fix this we need to place the
acquiring of the lock into a critical section to ensure it does not get
nested on the system, otherwise the system will never release the idle
lock when this occurs and will not go into lower power states.
A sample backtrace encountering this (the code was instrumented to go into
a while(1) loop when the condition was hit to get this backtrace) from
commit d7a7a68:
Signed-off-by: Tim Nordell tim.nordell@nimbelink.com