Skip to content

When using flash_safe_execute, if the victim core does not respond within the timeout, it will be locked out forever #2454

@jwhitham

Description

@jwhitham

My Pico program uses two cores. Here is a sequence of events leading to the bug

  • bootup: core 0: call flash_safe_execute_core_init
  • core 0: start code on core 1
  • ...normal operation for a while...
  • core 0: enter long-running interrupt handler
  • core 1: call flash_safe_execute with short timeout parameter
  • core 1: call multicore_lockout_start_block_until
  • core 1: send LOCKOUT_MAGIC_START and wait for an acknowledgment
  • core 1: timeout reached
  • core 1: multicore_lockout_handler returns PICO_ERROR_TIMEOUT is returned
  • core 0: leave long-running interrupt handler and enter multicore_lockout_handler

Observed behaviour:

  • core 1 continues execution as usual
  • core 0 stays within multicore_lockout_handler forever, in the following loop:
            while (multicore_fifo_pop_blocking_inline() != LOCKOUT_MAGIC_END) {
                tight_loop_contents(); // not tight but endless potentially
            }
  • Interrupts are disabled, as core 0 is waiting for multicore_lockout_end_block_until to be called.
  • Pico stops responding to USB and any other interrupts which are serviced on core 0.

Expected behaviour:

  • core 0: multicore_lockout_handler exits quickly as the lockout request has been abandoned

Analysis:
The timeout caused flash_safe_execute to abandon the attempt to lock out core 0, so LOCKOUT_MAGIC_END is never sent. Core 0 is not aware that the lockout request was abandoned and waits forever.

Workarounds:
Core 0 can be recovered by a second attempt to use flash_safe_execute. Alternatively, using a very long timeout will avoid the problem.

Steps to reproduce
The attached project flash-lockout-timeout-bug.tar.gz contains a small program which reproduces the problem.

The two cores run in loops, with cycle counts in global variables for easier debugging. One core runs flash_safe_execute periodically, while the other disables interrupts periodically. When a call to flash_safe_execute on core Y coincides with disabled interrupts on core X, flash_safe_execute returns PICO_ERROR_TIMEOUT (-2). Unfortunately this also causes core X to get stuck in multicore_lockout_handler. If core X = core 1, then USB continues to work, and the status can be seen on the serial line (the cycle count for core X stops, while the cycle count for core Y continues to increment). If core X = core 0, then USB stops working (no interrupts) but a hardware debugger can still be used to observe the cycle counts and see where core 0 has become stuck.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions