-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
My Pico program uses two cores. Here is a sequence of events leading to the bug
- bootup: core 0: call
flash_safe_execute_core_init - core 0: start code on core 1
- ...normal operation for a while...
- core 0: enter long-running interrupt handler
- core 1: call
flash_safe_executewith short timeout parameter - core 1: call
multicore_lockout_start_block_until - core 1: send
LOCKOUT_MAGIC_STARTand wait for an acknowledgment - core 1: timeout reached
- core 1:
multicore_lockout_handlerreturnsPICO_ERROR_TIMEOUTis returned - core 0: leave long-running interrupt handler and enter
multicore_lockout_handler
Observed behaviour:
- core 1 continues execution as usual
- core 0 stays within
multicore_lockout_handlerforever, in the following loop:
while (multicore_fifo_pop_blocking_inline() != LOCKOUT_MAGIC_END) {
tight_loop_contents(); // not tight but endless potentially
}
- Interrupts are disabled, as core 0 is waiting for
multicore_lockout_end_block_untilto be called. - Pico stops responding to USB and any other interrupts which are serviced on core 0.
Expected behaviour:
- core 0:
multicore_lockout_handlerexits quickly as the lockout request has been abandoned
Analysis:
The timeout caused flash_safe_execute to abandon the attempt to lock out core 0, so LOCKOUT_MAGIC_END is never sent. Core 0 is not aware that the lockout request was abandoned and waits forever.
Workarounds:
Core 0 can be recovered by a second attempt to use flash_safe_execute. Alternatively, using a very long timeout will avoid the problem.
Steps to reproduce
The attached project flash-lockout-timeout-bug.tar.gz contains a small program which reproduces the problem.
The two cores run in loops, with cycle counts in global variables for easier debugging. One core runs flash_safe_execute periodically, while the other disables interrupts periodically. When a call to flash_safe_execute on core Y coincides with disabled interrupts on core X, flash_safe_execute returns PICO_ERROR_TIMEOUT (-2). Unfortunately this also causes core X to get stuck in multicore_lockout_handler. If core X = core 1, then USB continues to work, and the status can be seen on the serial line (the cycle count for core X stops, while the cycle count for core Y continues to increment). If core X = core 0, then USB stops working (no interrupts) but a hardware debugger can still be used to observe the cycle counts and see where core 0 has become stuck.