-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rp2: Fix recursive atomic sections when core1 is active. #15264
rp2: Fix recursive atomic sections when core1 is active. #15264
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #15264 +/- ##
=======================================
Coverage 98.42% 98.42%
=======================================
Files 161 161
Lines 21248 21248
=======================================
Hits 20914 20914
Misses 334 334 ☔ View full report in Codecov by Sentry. |
Code size report:
|
fef17c7
to
4c3c51c
Compare
Updated so the test is a bit more aggressive about testing the nesting of disable_irq. |
Thanks, this looks like a necessary fix. The
I'm pretty sure the reason we needed custom mutex+irq functions is still valid, so we can't separate them again. See dc2a4e3 |
mp_thread_begin_atomic_section() is expected to be recursive (i.e. for nested machine.disable_irq() calls, or if Python code calls disable_irq() and then the Python runtime calls mp_handle_pending() which also enters an atomic section to check the scheduler state). On rp2 when not using core1 the atomic sections are recursive. However when core1 was active (i.e. _thread) then there was a bug that caused the core to live-lock if an atomic section recursed. Adds a test case specifically for mutual exclusion and recursive atomic sections when using two threads. Without this fix the test immediately hangs on rp2. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
4c3c51c
to
cfa55b4
Compare
That deadlock happens from taking the mutex before disabling interrupts, is that right? I don't think there's a similar deadlock from disabling interrupts before trying to take the mutex, because it's no longer possible for that core to be interrupted at the same point. I think the main limitation of changing it to two calls is that if a core is waiting for the mutex, the current code restores interrupts each time around the loop so it doesn't starve interrupts on that core. If we disable interrupts before trying to take the mutex then interrupts will remain disabled on that core until the mutex is taken. Which I think is probably often a small period of time, but it could be longer in some cases. However it's a small enough piece of code that keeping it like it is seems like the best course of action. 👍 |
Summary
mp_thread_begin_atomic_section() is expected to be recursive (i.e. for nested machine.disable_irq() calls, or if Python code calls disable_irq() and then the Python runtime calls mp_handle_pending() which also enters an atomic section to check the scheduler state).
On rp2 when not using core1 the atomic sections are recursive.
However when core1 was active (i.e. _thread) then there was a bug that caused the core to live-lock if an atomic section recursed.
Adds a test case specifically for mutual exclusion and recursive atomic sections when using two threads. Without this fix the test immediately hangs on rp2.
This was found while testing a fix for micropython/micropython-lib#874 (but it's only a partial fix for that issue).
Testing
thread/disable_irq.py
test on esp32 port and verified correct output (viampremote run
, currently thread tests are disabled on this port.)Trade-offs and Alternatives
recursive_mutex_enter_blocking
is also compiled into the firmware, so it might be possible to callsave_and_disable_interrupts
and thenrecursive_mutex_enter_blocking
in order to save a little code size. Not sure, though.This work was funded through GitHub Sponsors.