rp2/thread: Fix memory corruption when thread is created in core1. #8310

yjchun · 2022-02-16T10:16:20Z

This PR try to fix #7124 and #7981 among others.

There were many issues reported about multicore in the forums and some of them are memory corruption. The problem I narrowed down was that GC was not correctly working on core1. I tried to fix this problem and so far it is working fine in my own tests. But as I am not sure if I am touching something sensitive correctly.
Hope someone with knowledge on GC to take a look.

Test code 1

from machine import UART, Pin
import time
import _thread
import gc

def _test():
    count = 0
    uart = UART(0, 115200 , parity=None, stop=1, bits=8, tx=Pin(0), rx=Pin(1), txbuf=32, timeout=10)
    while True:
        m = f'{count} '* 10 + '\r\n'
        uart.write(m)
        count += 1
        if count % 200 == 0:
            print(f'count: {count}, mem_free({gc.mem_free()}).')
            #gc.collect()

def count_count():
	n = 1
	while True:
		print(gc.mem_free())
		#gc.collect()
		print("Count Count counts", n)
		n += 1
		time.sleep(1)

_thread.start_new_thread(_test, ())
count_count()

Test code 2:

def test():
    for i in range(0x1800):
        print(gc.mem_free())
        bytes(0)
    print("finished")

_thread.start_new_thread(test, ())

…0 and core1 at the same time.

yjchun · 2022-02-16T16:04:37Z

Updated PR as test fails when both threads call gc.collect() aggressively: Garbages are printed in main thread.

from machine import UART, Pin
import time
import _thread
import gc

def _test():
    count = 0
    uart = UART(0, 115200 , parity=None, stop=1, bits=8, tx=Pin(0), rx=Pin(1), txbuf=32, timeout=10)
    while True:
        m = f'{count} '* 10 + '\r\n'
        uart.write(m)
        count += 1
        gc.collect()
        if count % 200 == 0:
            pass

def count_count():
    n = 1
    m = None
    while True:
        m = f'{n} '* 10
        print(m)
        gc.collect()
        n += 1
        #time.sleep(1)
    return m

_thread.start_new_thread(_test, ())
count_count()

I put mutex for GC to fix this but still not really sure if it is OK to manipulate core1's stack from core0. But if I put if(get_core_num()==1) for mp_thread_gc_others(), it does not work.
Any opinion?

…ests.

yjchun · 2022-02-16T17:01:15Z

Sorry. Yet another fix. Following test was failing:

from math import sin, cos, radians
import time

def calc():
    product = 1.0
    for counter in range(1, 1000, 1):
        for dex in list(range(1, 360, 1)):
            angle = radians(dex)
            product *= sin(angle)**2 + cos(angle)**2
    return product

def bench(number=10, repeat=5):
    results = []
    for i in range(repeat):
        result = []
        for n in range(number):
            t = time.ticks_ms()
            calc()
            elapsed = time.ticks_diff(time.ticks_ms(), t)
            result.append(elapsed)
        results.append(sum(result)/number/1000)

    results = list(sorted(results))
    return results


def test1(name):
    for n in range(3):
        r = bench(1, 1)
        print(f'{name}: {r[0]:.3f}')

import _thread
_thread.start_new_thread(test1, ('core1',))
test1('core0')

jrullan · 2022-02-18T05:25:01Z

@yjchun thank you so much for helping to resolve this issue. Over in the RPI forums we were starting to lose faith that this issue was going to be taken over.

yjchun · 2022-02-19T00:14:58Z

I think I found root cause of the problem: was simply missing gc_collect_root() on core1's stack and arg.
Hope this is really last update on this PR and sorry for frequent update. Better had studied GC properly before making this PR.

peterhinch · 2022-02-19T14:42:02Z

This patch has fixed #7977.

fivdi · 2022-02-19T15:49:36Z

This patch has fixed #7977.

Excellent, that's good news.

peterhinch · 2022-02-20T16:50:40Z

After further testing with this patch I suspect there is still something wrong with the memory management.

I have simple uasyncio code running on core 0 and a minimal loop running on core 1. The two threads share a UART, but it is protected with a _thread.lock instance. If I disable the core 1 routine, everything works. Enabling it leads to non-deterministic behaviour on both cores. Putting a gc.collect() in the loop on core 1, and in a looping coroutine on core 0, radically alters the behaviour but it is still wrong.

Can anyone else report success or failure here?

yjchun · 2022-02-20T18:46:49Z

I think there are uncertainties about HW resource sharing among multicores. In UART case, UART uses an interrupt and UART buffer is not protected against ISR: core1 writes to UART buffer and core0 UART ISR modify same buffer can happen at the same time.
I guess most other HW resources are not multicore safe either.

peterhinch · 2022-02-21T09:24:28Z

Ah, thanks for that. My code protects the UART with _thread.lock so hopefully writing to the buffer is safe (I only write, and then only a single character). The interrupt may be unsafe, because it may occur after the lock is released.

If the maintainers can identify which hardware devices cannot be safely shared, it really needs documenting as it will trip up numerous users.

I wonder if the PIO can safely be shared, with both threads writing to the FIFO.

yjchun · 2022-02-21T15:41:27Z

A little off-topic:
I didn't do any experiments but only from speculation, not many python objects are multicore safe. As long as object operations span multiple cycles there may be race condition problem, without expensive thread lock. In pthread world, I used to put lock even for a++ and I guess this might be same in rp2 thread (unlike CPython thread where they have GIL and using single core only).

@peterhinch, For HW resources, I agree we need to identify which HW devices can be shared. For your test case, it is still not clear if it is the case I mentioned or other problem. I like to do more serious tests with multicore but may take some time.

peterhinch · 2022-02-21T17:33:03Z

I've found the problem - a concurrency bug in my code. I now have shared write access to the UART working reliably. For anyone wanting to emulate this, a lock is essential. Further, each thread only writes a single character to the UART. There may well be a concurrency issue with multi-byte writes given the presumed behaviour of the interrupt.

Returning to the topic of this thread my dual-core test runs indefinitely with no issues with shared memory. Without the patch my script rapidly fails.

Apologies for hijacking this thread with a fault which proved to be irrelevant.

EliAaron · 2022-02-24T12:10:21Z

This patch fixed my problem with playing WAV files on a second thread!

Now with this patch I can play WAV files on core1 and on core0 I can print output or wait for a user input and start/stop the wave-player accordingly.

Hope these changes get accepted and merged into the base branch quickly.

Thanks!

Details:
I tried playing WAV files on my Raspberry Pi Pico on a second thread using this project https://github.com/danjperron/PicoAudioPWM, but I could not get it to work properly. The playing got stuck after a second or two and the device became unresponsive. I tried commentating out different parts of the wavePlayer code to find the source of the problem. I found that starting the DMA (dma1.start()) caused a crash after a few seconds. I also found that commenting most of the code except the WAV file reading (f.readframes(nbFrame)) caused a crash after about 30 seconds. Now with this patch it works smoothly!

dpgeorge · 2022-02-25T05:22:28Z

@yjchun the patch now looks very good, and I understand why it's needed: the stack (and arg) of core1 is itself a root pointer, not just the entries in it. So the bug was that the entire stack was being reclaimed by the GC.

With this patch the line gc_collect_root((void **)core1_stack, core1_stack_num_words); can now be removed because the scanning that it does has already been done by gc_collect_root((void **)&core1_stack, 1). If you agree, can you please remove that part of the code?

dpgeorge · 2022-02-25T05:24:27Z

@peterhinch it looks like you don't have any additional issues that came out of the discussion here? If you do, please open other issues to record them.

yjchun · 2022-02-25T05:42:43Z

@dpgeorge OK. I will test it. But if I understand correctly, gc_helper_collect_regs_and_stack() was supposed to check core1_stack when run in core1 but it is not. That's why gc_collect_root((void **)&core1_stack, 1) is outside of if clause. Is there no problem with it?

dpgeorge · 2022-02-25T05:48:49Z

gc_helper_collect_regs_and_stack() was supposed to check core1_stack when run in core1 but it is not.

You are right it is not checking core1_stack pointer itself (only the elements it points to). This is because the pointer is stored in the BSS and is not traced as a root pointer by the GC.

yjchun · 2022-02-25T06:10:43Z

@dpgeorge Do we still need to call gc_helper_collect_regs_and_stack() in core1 then?
In a quick test, it works fine without calling gc_helper_collect_regs_and_stack() in core1.

dpgeorge · 2022-02-25T07:09:27Z

Do we still need to call gc_helper_collect_regs_and_stack() in core1 then?

We don't need to collect the stack in that function, because it's already being collected by the call to collect (&core_stack1, 1). But we do need to collect the registers.

If you leave that call there, it'll collect the stack twice and be slightly inefficient. If you try to remove it that would be good, but might be tricky.

yjchun · 2022-02-25T08:30:35Z

Tried replacement for gc_helper_collect_regs_and_stack() on core1.
Will this gather registers correctly? I tried this for core1 but my test program did run well with this and without this. I couldn't figure out test case for when object is stored only in registers, when GC is called.
And another problem is that both core0's and core1's registers need be collected, not just the on GC.

#if MICROPY_PY_THREAD
// provided by gchelper_m0.s
uintptr_t gc_helper_get_regs_and_sp(uintptr_t *regs);

MP_NOINLINE void gc_collect_regs() {
    gc_helper_regs_t regs;
    gc_helper_get_regs_and_sp(regs);
    gc_collect_root((void **)regs, sizeof(regs) / sizeof(uint32_t));
}
#endif

peterhinch · 2022-02-25T10:45:55Z

@dpgeorge

it looks like you don't have any additional issues that came out of the discussion here?

That's correct. With the patch my application is rock-solid.

dpgeorge · 2022-02-25T13:02:19Z

Will this gather registers correctly? I tried this for core1 but my test program did run well with this and without this. I couldn't figure out test case for when object is stored only in registers, when GC is called.

It's really hard to find such test cases. But, yes, that code looks like it'll collect/scan the registers correctly.

And another problem is that both core0's and core1's registers need be collected, not just the on GC.

Yes, I also realised this. That's hard to do, and so far doesn't seem to be needed... so let's just fix the issue of scanning the &core1_stack and &core1_arg.

yjchun · 2022-02-25T14:52:51Z

So, this may be my final version then. The fix is already pushed. Thanks for the support. For the registers GC scanning, let me try in another PR.

dpgeorge · 2022-03-01T06:09:04Z

Thanks for updating, it looks very good now. Squashed and merged in 2d47020, and I also tested the snippets of code that previously broke, and they are now working.

Great work @yjchun !

Use a different way to force building sdkconfig early

yjchun added 2 commits February 16, 2022 18:47

rp2/thread: Fix memory corruption when thread is created in core1.

49c1445

rp2/thread: Prevent mp_thread_gc_others() from being accessed by core…

89de1be

…0 and core1 at the same time.

rp2/thread: Fixed memory corruption problem still happening in some t…

98241eb

…ests.

dpgeorge added the port-rp2 label Feb 17, 2022

rp2/thread: Memory corruption problem proper fix.

ac8afe6

peterhinch mentioned this pull request Feb 19, 2022

rp2: Dual core code behaves non-deterministically driving UART #7977

Closed

yjchun mentioned this pull request Feb 22, 2022

Using asyncio in _thread #8340

Open

ports/rp2: Remove duplicate GC on core1.

afcdf28

dpgeorge closed this Mar 1, 2022

Gadgetoid mentioned this pull request Jun 27, 2022

Picodisplay and Threads? pimoroni/pimoroni-pico#72

Closed

tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Aug 24, 2023

Merge pull request micropython#8310 from jepler/fix-doc-build-119

6c4f209

Use a different way to force building sdkconfig early

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rp2/thread: Fix memory corruption when thread is created in core1. #8310

rp2/thread: Fix memory corruption when thread is created in core1. #8310

yjchun commented Feb 16, 2022

yjchun commented Feb 16, 2022

yjchun commented Feb 16, 2022

jrullan commented Feb 18, 2022

yjchun commented Feb 19, 2022

peterhinch commented Feb 19, 2022

fivdi commented Feb 19, 2022

peterhinch commented Feb 20, 2022

yjchun commented Feb 20, 2022

peterhinch commented Feb 21, 2022

yjchun commented Feb 21, 2022

peterhinch commented Feb 21, 2022 •

edited

EliAaron commented Feb 24, 2022 •

edited

dpgeorge commented Feb 25, 2022

dpgeorge commented Feb 25, 2022

yjchun commented Feb 25, 2022

dpgeorge commented Feb 25, 2022

yjchun commented Feb 25, 2022

dpgeorge commented Feb 25, 2022

yjchun commented Feb 25, 2022

peterhinch commented Feb 25, 2022

dpgeorge commented Feb 25, 2022

yjchun commented Feb 25, 2022

dpgeorge commented Mar 1, 2022

rp2/thread: Fix memory corruption when thread is created in core1. #8310

rp2/thread: Fix memory corruption when thread is created in core1. #8310

Conversation

yjchun commented Feb 16, 2022

yjchun commented Feb 16, 2022

yjchun commented Feb 16, 2022

jrullan commented Feb 18, 2022

yjchun commented Feb 19, 2022

peterhinch commented Feb 19, 2022

fivdi commented Feb 19, 2022

peterhinch commented Feb 20, 2022

yjchun commented Feb 20, 2022

peterhinch commented Feb 21, 2022

yjchun commented Feb 21, 2022

peterhinch commented Feb 21, 2022 • edited

EliAaron commented Feb 24, 2022 • edited

dpgeorge commented Feb 25, 2022

dpgeorge commented Feb 25, 2022

yjchun commented Feb 25, 2022

dpgeorge commented Feb 25, 2022

yjchun commented Feb 25, 2022

dpgeorge commented Feb 25, 2022

yjchun commented Feb 25, 2022

peterhinch commented Feb 25, 2022

dpgeorge commented Feb 25, 2022

yjchun commented Feb 25, 2022

dpgeorge commented Mar 1, 2022

peterhinch commented Feb 21, 2022 •

edited

EliAaron commented Feb 24, 2022 •

edited