New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rp2/thread: Fix memory corruption when thread is created in core1. #8310
Conversation
Updated PR as test fails when both threads call gc.collect() aggressively: Garbages are printed in main thread. from machine import UART, Pin
import time
import _thread
import gc
def _test():
count = 0
uart = UART(0, 115200 , parity=None, stop=1, bits=8, tx=Pin(0), rx=Pin(1), txbuf=32, timeout=10)
while True:
m = f'{count} '* 10 + '\r\n'
uart.write(m)
count += 1
gc.collect()
if count % 200 == 0:
pass
def count_count():
n = 1
m = None
while True:
m = f'{n} '* 10
print(m)
gc.collect()
n += 1
#time.sleep(1)
return m
_thread.start_new_thread(_test, ())
count_count() I put mutex for GC to fix this but still not really sure if it is OK to manipulate core1's stack from core0. But if I put |
Sorry. Yet another fix. Following test was failing: from math import sin, cos, radians
import time
def calc():
product = 1.0
for counter in range(1, 1000, 1):
for dex in list(range(1, 360, 1)):
angle = radians(dex)
product *= sin(angle)**2 + cos(angle)**2
return product
def bench(number=10, repeat=5):
results = []
for i in range(repeat):
result = []
for n in range(number):
t = time.ticks_ms()
calc()
elapsed = time.ticks_diff(time.ticks_ms(), t)
result.append(elapsed)
results.append(sum(result)/number/1000)
results = list(sorted(results))
return results
def test1(name):
for n in range(3):
r = bench(1, 1)
print(f'{name}: {r[0]:.3f}')
import _thread
_thread.start_new_thread(test1, ('core1',))
test1('core0') |
@yjchun thank you so much for helping to resolve this issue. Over in the RPI forums we were starting to lose faith that this issue was going to be taken over. |
I think I found root cause of the problem: was simply missing |
This patch has fixed #7977. |
Excellent, that's good news. |
After further testing with this patch I suspect there is still something wrong with the memory management. I have simple Can anyone else report success or failure here? |
I think there are uncertainties about HW resource sharing among multicores. In UART case, UART uses an interrupt and UART buffer is not protected against ISR: core1 writes to UART buffer and core0 UART ISR modify same buffer can happen at the same time. |
Ah, thanks for that. My code protects the UART with If the maintainers can identify which hardware devices cannot be safely shared, it really needs documenting as it will trip up numerous users. I wonder if the PIO can safely be shared, with both threads writing to the FIFO. |
A little off-topic: @peterhinch, For HW resources, I agree we need to identify which HW devices can be shared. For your test case, it is still not clear if it is the case I mentioned or other problem. I like to do more serious tests with multicore but may take some time. |
I've found the problem - a concurrency bug in my code. I now have shared write access to the UART working reliably. For anyone wanting to emulate this, a lock is essential. Further, each thread only writes a single character to the UART. There may well be a concurrency issue with multi-byte writes given the presumed behaviour of the interrupt. Returning to the topic of this thread my dual-core test runs indefinitely with no issues with shared memory. Without the patch my script rapidly fails. Apologies for hijacking this thread with a fault which proved to be irrelevant. |
This patch fixed my problem with playing WAV files on a second thread! Now with this patch I can play WAV files on core1 and on core0 I can print output or wait for a user input and start/stop the wave-player accordingly. Hope these changes get accepted and merged into the base branch quickly. Thanks! Details: |
@yjchun the patch now looks very good, and I understand why it's needed: the stack (and arg) of core1 is itself a root pointer, not just the entries in it. So the bug was that the entire stack was being reclaimed by the GC. With this patch the line |
@peterhinch it looks like you don't have any additional issues that came out of the discussion here? If you do, please open other issues to record them. |
@dpgeorge OK. I will test it. But if I understand correctly, |
You are right it is not checking |
@dpgeorge Do we still need to call |
We don't need to collect the stack in that function, because it's already being collected by the call to collect If you leave that call there, it'll collect the stack twice and be slightly inefficient. If you try to remove it that would be good, but might be tricky. |
Tried replacement for gc_helper_collect_regs_and_stack() on core1. #if MICROPY_PY_THREAD
// provided by gchelper_m0.s
uintptr_t gc_helper_get_regs_and_sp(uintptr_t *regs);
MP_NOINLINE void gc_collect_regs() {
gc_helper_regs_t regs;
gc_helper_get_regs_and_sp(regs);
gc_collect_root((void **)regs, sizeof(regs) / sizeof(uint32_t));
}
#endif |
That's correct. With the patch my application is rock-solid. |
It's really hard to find such test cases. But, yes, that code looks like it'll collect/scan the registers correctly.
Yes, I also realised this. That's hard to do, and so far doesn't seem to be needed... so let's just fix the issue of scanning the &core1_stack and &core1_arg. |
So, this may be my final version then. The fix is already pushed. Thanks for the support. For the registers GC scanning, let me try in another PR. |
Use a different way to force building sdkconfig early
This PR try to fix #7124 and #7981 among others.
There were many issues reported about multicore in the forums and some of them are memory corruption. The problem I narrowed down was that GC was not correctly working on core1. I tried to fix this problem and so far it is working fine in my own tests. But as I am not sure if I am touching something sensitive correctly.
Hope someone with knowledge on GC to take a look.
Test code 1
Test code 2: