Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serial Print Deadlock with Multicore #7

Closed
savejeff opened this issue Jun 10, 2022 · 21 comments
Closed

Serial Print Deadlock with Multicore #7

savejeff opened this issue Jun 10, 2022 · 21 comments

Comments

@savejeff
Copy link

savejeff commented Jun 10, 2022

Hi,

Ive found a Problem that seems to only happen with the PIO core and not with the Arduino Version.

The original issue (here) I opened in the earlephilhower/arduino-pico was closed.

I tested the code on multiple Windows 10 PCs with the same result

Here the relevant code. I've reduced the Pio project to its minimum
Demo_RP2040_Deadlock_Clean.zip

and this is the ulf that i get and doesn't work:
firmware.zip

As it works with Arduino IDE and not with PIO, might this be caused by optimization?

@maxgerhardt
Copy link
Owner

Thanks, I will check out it later in the debugger on why it locks up / what state it is in.

@savejeff
Copy link
Author

That would be awesome thanks!
I although that about using the Pico Probe to check what actually happens here, but i've not yet used it. The mbed core i used so far did not support it. if the arduino_pico core supports the pico probe well that would be another bonus.

@maxgerhardt
Copy link
Owner

In the new platform integration (platformio#36), PicoProbe debugging works with both ArduinoCore-mbed and Arduino-Pico as the used Arduino core. And I've also tested that this works with both cores, so if one core hits a breakpoint (or I just press the pause button), I see the exact execution state of both cores.

@savejeff
Copy link
Author

That sounds very good. I think I'll look into it over the weekend.

@earlephilhower
Copy link

I can repro the failure locally using PIO and the latest integration (just replaced the platform.ini with the lines given in that PR).

Both the RAM and flash sizes are different from when built in the IDE under any optimization option. I would expect flash sizes to differ w/optimization, but the RAM used normally does not change as those are simply global vars which are invariant to optimization levels.

I'm still learning PIO so I'll try to get the actual compile and link commands and see what differences I find (but no guarantees...only times I've used P.IO is for building Marlin and it was just "hit return")

@savejeff
Copy link
Author

savejeff commented Jun 10, 2022

I can really recommend PIO. I love it more every day I use it.

I'll try to get into step-by-step debugging with SWD and Pico Probe. maybe ill find the spot where it gets stuck.

@maxgerhardt what are the default optimization settings in PIO ? maybe the mutex's are copied inline. but that doesn't really explain the freez

@maxgerhardt
Copy link
Owner

What I see with the debugger: After 2 iterations, the second core gets into a hardfault while trying to de-allocate a String object.

grafik

Not yet sure why.

Maybe the String gets used beyond the time the log function is left (there is no Serial.flush() so it's probably still in its ringbuffer), or something else..

@maxgerhardt
Copy link
Owner

maxgerhardt commented Jun 10, 2022

The faulting instruction is a load-into-register instruction with the source being r7 + 4, but the r7 register is pointing at 0x4db85158, is that even valid RAM?

grafik

Heap corruption when the heap is concurrently used by e.g. the String functions?

Are heap free() and malloc() function guarded by a mutex and threadsafe?

Well GDB definitely can't read from that memory address..

x/1x 0x4db85158
Info : SWD DPIDR 0x0bc12477
Info : SWD DLPIDR 0x10000001
Error: Failed to read memory at 0x4db85158

@earlephilhower
Copy link

Some GOOG and some tr later, I found the difference.

The PIO build is running without any default optimization. The IDE uses -Os by default.

All tested optimization levels work in the IDE except for -O which is, IIRC, the GCC default when none is specified.

So, looks to be not PIO related, just GCC optimization and the core related. No idea what specifically at this point, but a simple workaround: add -Os to the build_flags = line.

@earlephilhower
Copy link

earlephilhower commented Jun 10, 2022

Also, the RAM and FLASH size differences are not real, just PIO accounting for things differently than the IDE's -size does. So no real worries other than the missing optimizaiton level and potential issue in the core itself (i.e. not needed to be tracked here)

@maxgerhardt
Copy link
Owner

I can confirm that with the latest version, by default there is indeed no -Os line. Not sure how I could have overlooked that -- though I think someone told me about it once.

arm-none-eabi-g++ -o .pio\build\pico\FrameworkArduino\main.cpp.o -c -std=gnu++17 -fno-exceptions -fno-rtti -Werror=return-type -march=armv6-m -mcpu=cortex-m0plus -mthumb -ffunction-sections -fdata-sections -DPLATFORMIO=60002 -DARDUINO_RASPBERRY_PI_PICO -DARDUINO_ARCH_RP2040 -DUSBD_MAX_POWER_MA=500 -DARDUINO=10810 -DARDUINO_ARCH_RP2040 -DF_CPU=133000000L -DBOARD_NAME="pico" -DCFG_TUSB_MCU=OPT_MCU_RP2040 -DUSB_VID=0x2e8a -DUSB_PID=0x000a "-DUSB_MANUFACTURER="Raspberry Pi"" -DUSB_PRODUCT="Pico" -DSERIALUSB_PID=0x000a [...] C:\Users\Max.platformio\packages\framework-arduinopico\cores\rp2040\main.cpp

(Cut out includes for brievity)

@savejeff
Copy link
Author

I can confirm that the first tests with build_flags = -O2 do run without a deadlock. it looks promising.

I'll run some more tests later!

@maxgerhardt
Copy link
Owner

maxgerhardt commented Jun 10, 2022

Patched by earlephilhower/arduino-pico#619.

However, it leaves a bad feeling: When compiling with -O0 or -Og (which it btw by default does when you start a debug session in PlatformIO), then there is a possible race in the free routine of the heap?

Default optimization is correct and should be done, but there might be a sleeper cell here that might blow up later. With optimization it might only reduce the chances of it happening, not fully eliminating it.

@maxgerhardt
Copy link
Owner

The String dealloc happens in jtask1, likely after the LOG_FUNC has been exited. The log function accepts the String by reference. Maybe it would make a difference if it accepted it by copy and the delete was mutex protected? Hm.

@savejeff
Copy link
Author

If i understand correctly: The String variable becomes out of scope and is deleted, but the Serial write function is still using the object (and not a copy of it)

Every copying in the logfunction would thus only move the scope around a bit but change not the problem itself

@earlephilhower
Copy link

If i understand correctly: The String variable becomes out of scope and is deleted, but the Serial write function is still using the object (and not a copy of it)

No, that can't be the case here. The Serial.write/etc. calls are all synchronous. For the USB code, it's just copying the string to the USB buffers and returning. You can free/delete the data you sent in any time after the call. A copy of the data is already stuffed in the USB transmit buffers at that point.

I think @maxgerhardt was thinking there might be a multithreaded unsafe malloc/free implementation so that the memory manager could end up giving out bad addresses for the newly allocated Strings.

@earlephilhower
Copy link

In any case, let's move this back to earlephilhower/arduino-pico#614 as it is not platform.io related.

@earlephilhower
Copy link

There really is a race condition that is unprotected. malloc and free are protected with __wrap mutex-protected functions.

realloc is not, though! There is no __wrap_realloc and hence 2 reallocs run at the same time and cause the heap to go crazy.

I need to see if I broke the pico-sdk or if I somehow missed including the wrapper in the release...

earlephilhower added a commit to earlephilhower/pico-sdk that referenced this issue Jun 12, 2022
Protect against heap corruption by mutex-protecting the realloc() call
(like malloc/free are already).

Fixes raspberrypi#863
Fixes maxgerhardt/platform-raspberrypi#7
Fixes earlephilhower/arduino-pico#614
@earlephilhower
Copy link

This can be closed. Latest pico-sdk has the patch applied while the full patch is a PR in the RPi repo.

@savejeff
Copy link
Author

Ah very nice!
Thanks for the thorough search and analysis. I even gained some insight ;]
Didn't think a simple "my print func causes freeze" would lead down this rabbit hole.

When will the latest pico-sdk be updated into the core?

@earlephilhower
Copy link

When will the latest pico-sdk be updated into the core?

The latest master has the patch already, and it'll go out to everyone on 1.3.2 (whenever the RPi team releases and assuming they accept it). Update and you should be good to go.

kilograham pushed a commit to raspberrypi/pico-sdk that referenced this issue Jun 17, 2022
Protect against heap corruption by mutex-protecting the realloc() call
(like malloc/free are already).

Fixes #863
Fixes maxgerhardt/platform-raspberrypi#7
Fixes earlephilhower/arduino-pico#614
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants