When using flash_safe_execute, if the victim core does not respond within the timeout, it will be locked out forever

My Pico program uses two cores. Here is a sequence of events leading to the bug
- bootup: core 0: call `flash_safe_execute_core_init`
- core 0: start code on core 1
- ...normal operation for a while...
- core 0: enter long-running interrupt handler
- core 1: call `flash_safe_execute` with short timeout parameter
- core 1: call `multicore_lockout_start_block_until`
- core 1: send `LOCKOUT_MAGIC_START` and wait for an acknowledgment
- core 1: timeout reached
- core 1: `multicore_lockout_handler` returns `PICO_ERROR_TIMEOUT` is returned
- core 0: leave long-running interrupt handler and enter `multicore_lockout_handler`

**Observed behaviour**: 
- core 1 continues execution as usual
- core 0 stays within `multicore_lockout_handler` forever, in the following loop:
```
            while (multicore_fifo_pop_blocking_inline() != LOCKOUT_MAGIC_END) {
                tight_loop_contents(); // not tight but endless potentially
            }
```
- Interrupts are disabled, as core 0 is waiting for `multicore_lockout_end_block_until` to be called.
- Pico stops responding to USB and any other interrupts which are serviced on core 0.

**Expected behaviour**:
- core 0: `multicore_lockout_handler` exits quickly as the lockout request has been abandoned

**Analysis**: 
The timeout caused `flash_safe_execute` to abandon the attempt to lock out core 0, so `LOCKOUT_MAGIC_END` is never sent. Core 0 is not aware that the lockout request was abandoned and waits forever.


**Workarounds:**
Core 0 can be recovered by a second attempt to use `flash_safe_execute`. Alternatively, using a very long timeout will avoid the problem.

**Steps to reproduce**
The attached project flash-lockout-timeout-bug.tar.gz contains a small program which reproduces the problem.

The two cores run in loops, with cycle counts in global variables for easier debugging. One core runs `flash_safe_execute` periodically, while the other disables interrupts periodically. When a call to `flash_safe_execute` on core Y coincides with disabled interrupts on core X, `flash_safe_execute` returns `PICO_ERROR_TIMEOUT` (-2). Unfortunately this also causes core X to get stuck in `multicore_lockout_handler`. If core X = core 1, then USB continues to work, and the status can be seen on the serial line (the cycle count for core X stops, while the cycle count for core Y continues to increment). If core X = core 0, then USB stops working (no interrupts) but a hardware debugger can still be used to observe the cycle counts and see where core 0 has become stuck.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

When using flash_safe_execute, if the victim core does not respond within the timeout, it will be locked out forever #2454

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

When using flash_safe_execute, if the victim core does not respond within the timeout, it will be locked out forever #2454

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions