Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear micro ros FreeRTOS Task stack consumption and consequences for stack monitoring in FreeRTOS #110

Closed
DrMarkusKrug opened this issue Aug 2, 2023 · 41 comments

Comments

@DrMarkusKrug
Copy link

I like to understand the task stack consumption of micro ros, for the task that contains the node initialization commands (rclc_support_init(), rclc_node_init_default(), rclc_node_init_default()).
I observed that regardless how much stack I'm configuring for this task, the function call uxTaskGetStackHighWaterMark() is reporting zero bytes left after calling rclc_support_init(). This leads to the fact that the FreeRTOS stack monitoring mechanism is not working (because it requires a minimum of 16 bytes remaining on the stack).
Is this really the expeced behaviour of the micro ros stack allocation? If that is the case a programmer has to make sure that all variables that belong to this task are declared and allocated before calling rclc_support_init(). Additionally the FreeRTOS stack monitoring cannot be used in the entire application - which is really a critical point from my point of view.

Did I miss anything?
Best Regards Markus

@pablogs9
Copy link
Member

pablogs9 commented Aug 2, 2023

Usually, micro-ROS uses a considerable amount of stack. How much are you configuring?

@DrMarkusKrug
Copy link
Author

I configured 700 in CubeMX (that is 2800 Bytes). I did this based on the figures provided on the micro ros website and some other discussion here.
However, i cannot really check how much really is needed in my application because it always seems to consume the maximum that is configured. If I configure too low it simply crash because of stack corruption. So I try to find a value that is high enough but not wasting stack memory. This was a try-and-error approach that doesn't make me happy and always leave some uncertainty.
Therefore I like to learn more about micro ros is consuming the available stack - or a patch in microros that is using just the required stack memory and not the maximum that is available.

@pablogs9
Copy link
Member

pablogs9 commented Aug 2, 2023

Try to configure > 20 kB of stack for the micro-ROS task

@DrMarkusKrug
Copy link
Author

Well I can do, but where is the rational behind this? Why is there no possiblity to check how much stack is really needed?

@pablogs9
Copy link
Member

pablogs9 commented Aug 2, 2023

You can set a higher stack, and once you have the application properly running, you will be able to check the max stack consumption. Then you will be able to tune to the minimum required.

@DrMarkusKrug
Copy link
Author

How can I check the actual stack consumption? The function uxTaskGetStackHighWaterMark () is always returning the maximum of the available stack

@pablogs9
Copy link
Member

pablogs9 commented Aug 2, 2023

uxTaskGetStackHighWaterMark as stated in the documentation returns "the minimum
amount of remaining stack space that was available to the task since the task started executing": https://www.freertos.org/uxTaskGetStackHighWaterMark.html

So, if it is returning the maximum value, your task is not using stack at all.

@DrMarkusKrug
Copy link
Author

Sorry, I think I was not good in explaining that. Therefore I attach now pictures:

Screenshot from 2023-08-03 13-16-19
So this is just before the micro ros parts getting initialized. The code runs in a task that has 2800 words stack size. At that point 45 of them are used. So this is inline with the variable and task explorer.

Screenshot from 2023-08-03 13-18-30
After calling rclc_support_init() the available stack drops to 0. That is very likely an overflow. However, the stacksize is already 11200 Bytes.

This happens even if I allow 20.000Bytes for the mentioned task. Is this really the intended behaviour?

Best Regards
Markus

@pablogs9
Copy link
Member

pablogs9 commented Aug 4, 2023

No, this is not the intended behavior. What if you set more than 20kB, it is replicable?

@DrMarkusKrug
Copy link
Author

DrMarkusKrug commented Aug 4, 2023 via email

@DrMarkusKrug
Copy link
Author

Hello,
I did the following:

Preperation:

  1. Recompiled the micro ros library with the original meta parameters (that lead to a 30MByte library file). Just to make sure the micro ros API calls are in defined condition.
  2. Changed the stack size for the task that is doing all the micro ros initialization to 22400 Byte (that is about the maximum I can affort on the microcontroller I'm using).

Result:

  • Concerning the stack usage I got the same as last week. The attached screenshot shows parts of my task list after I paused the application in the debugger. You can see that the task ROSSpin (that is initializing the publishers/subscribers and node) is consuming all of its stack -> actually already a good hint for a stack overflow.
  • I added FreeRTOS timers to my application. Because of the used tool (STM32CubeIDE) they get initialized before the tasks. Now I can proof that even with 22400 Bytes of stack the micro ros initialization is causing a stack overflow because the stack area of the timers, that are adjacent to the ROSSpin task, is getting corrupted at the point where the micro ros node is initialized.
    Actually this is really weired because the initialization of the micro ros should fail if there is not enough space on the stack left. Identicating by an appropriate return value of the function call rclc_support_init(). Ultimately the FreeRTOS malloc function is used by the micro ros API and this function is actually providing that service. So it is either ignored or there must be something in the micro ros library code that is allocating additional memory that is not covered by the malloc function call.
  • My application 'pretends' to run as expected but only because I rearranged my application that all of the tasks start using their stack after the micro ros library initialization and also by manually changing the order of FreeRTOS tasks/timers/queues/events initialization - that is no good idea to me. What makes me even more worried is still that fact that I have no idea about the actual stack consumption of the micro ros initialization. During run-time it seems to be less then 20kByte (I already tested 2 kByte and it looks 'alive'). However during intialization it is more then 22.400Byte.

I'm even more surprised that this hasn't reported by other programmers so far. Did I fail somewhere and run into a pit hole that doesn't exist if I haven't overlooked something?

Screenshot from 2023-08-07 11-50-16

@pablogs9
Copy link
Member

pablogs9 commented Aug 7, 2023

Take a look at the profiling results in our testbench: https://github.com/micro-ROS/micro_ros_renesas_testbench/actions/runs/5781727309

image

image

Maybe a good idea is to check the stack consumption when the transport open function is called for the first time, this will provide a good overview of which is the real stack consumption of the actual micro-ROS stack without the transport.

I'm not sure about which is your transport, but also, I'm not sure how much stack requires the USB or UART ST stack to work.

@DrMarkusKrug
Copy link
Author

Hi,
I'm using the serial transport. So that should be slim in comparision to USB or Ethernet.
I can certainly try to work around the stack overflow issue. However, my main issue remains. There is something happening in the rclc_support_init() call that is temporarily using more stack then it is permitted - at least up to the test I did with 22400 Bytes. Isn't that something that should be further investigated by the providers of the micro ros lib?

Best Regards
Markus

@pablogs9
Copy link
Member

pablogs9 commented Aug 7, 2023

We, as maintainers of micro-ROS, can take a look at your issue and solve it if finally is a problem in the micro-ROS stack. In any case, we will need some details (your code, configuration, target hardware, etc) about your project to try to replicate this, because as far as we test, the stack consumption is inside the known limits.

@DrMarkusKrug
Copy link
Author

Let me know where I should upload my project. For testing you need an STM32L4 board (or maybe any other STM32 board will do because I don't think it is related to that microcontroller).
I did some further investigation into the function call rclc_support_init() and realized that in line 39 the stack is overflowed (see attached screenshot). I did also some assembler debugging in this function and find out that just before the stack is overflowed the available stack is almost consumed (left only 80 Bytes). Interestingly if I increase or decrease the task stack, the remaining 80 Bytes are constant. A couple of assembler commands later the stack overflow occurs. You can see the belonging assembler code. Actually it makes no sense for my why this code is causing the problem. It looks pretty much OK for me.
Screenshot from 2023-08-07 13-03-45
Screenshot from 2023-08-07 13-24-29

@pablogs9
Copy link
Member

pablogs9 commented Aug 7, 2023

Are you setting the micro-ROS allocators? You can share your main file just here to take a first look

@DrMarkusKrug
Copy link
Author

DrMarkusKrug commented Aug 7, 2023

This is the function that does all the micro ros init. The variable support is defined as global variable because it is use it in other files/tasks as well.

void micro_ros_setup(void)
{
	volatile size_t temp;
	/* setup micro ros node */
    rmw_uros_set_custom_transport(
      true,
      (void *) &huart2,
      cubemx_transport_open,
      cubemx_transport_close,
      cubemx_transport_write,
      cubemx_transport_read);

    rcl_allocator_t freeRTOS_allocator = rcutils_get_zero_initialized_allocator();
    freeRTOS_allocator.allocate = microros_allocate;
    freeRTOS_allocator.deallocate = microros_deallocate;
    freeRTOS_allocator.reallocate = microros_reallocate;
    freeRTOS_allocator.zero_allocate =  microros_zero_allocate;

    if (!rcutils_set_default_allocator(&freeRTOS_allocator))
    {
        printf("Error on default allocators (line %d)\n", __LINE__);
    }

    // micro-ROS app INCLUDE_pxTaskGetStackSize
    temp =uxTaskGetStackHighWaterMark(NULL);
    temp = uxTaskGetStackSize(NULL);
    rcl_allocator_t allocator = rcl_get_default_allocator();
    //create init_options
    temp =uxTaskGetStackHighWaterMark(NULL);
    temp = uxTaskGetStackSize(NULL);
    rclc_support_init(&support, 0, NULL, &allocator);
}

@pablogs9
Copy link
Member

pablogs9 commented Aug 8, 2023

Maybe it is a good idea to test another allocator instead of the provided one. A wrapper on top of malloc, free, calloc and so on shall work.

Regarding ... it is use it in other files/tasks as well., you are not performing micro-ROS calls in other execution threads right?? By default the micro-ROS API is not thread safe.

@DrMarkusKrug
Copy link
Author

I commented out the provided freeRTOS_xxx allocators and got a more realistic behaviour of the task stack usage (that is now using the standard malloc/free - not even the FreeRTOS versions). I will do further checks on that. Still surprised that this hasn't been reported before.

I'm aware of the non thread safety of the rcl API calls - which for me is a major topic for future enhancements of the micro ros libraray. I don't understand why the micro ros library provides a lot of RTOS examples but the underlying functionality is not thread safe.

@pablogs9
Copy link
Member

pablogs9 commented Aug 8, 2023

I'm aware of the non thread safety of the rcl API calls - which for me is a major topic for future enhancements of the micro ros libraray. I don't understand why the micro ros library provides a lot of RTOS examples but the underlying functionality is not thread safe.

By default the micro-ROS API is not thread-safe. -> Calm, enable this flag that that's it...

I commented out the provided freeRTOS_xxx allocators and got a more realistic behaviour of the task stack usage (that is now using the standard malloc/free - not even the FreeRTOS versions). I will do further checks on that. Still surprised that this hasn't been reported before.

Regarding the allocators: we made custom allocators for this package some time ago, due to the fact that freeRTOS do not provide a proper realloc (required by micro-ROS). As far as I understand, somehow these allocators are no longer working or at least not working properly on your platform (heap smashes over the stack?). In any case, is a good (and common) practice in micro-ROS to create custom allocators to handle properly the heap allocation required at micro-ROS initialization.

You can find information about this here: https://docs.vulcanexus.org/en/latest/rst/tutorials/micro/memory_management/memory_management.html#allocators

@DrMarkusKrug
Copy link
Author

Hi,
if I set the flag UCLIENT_PROFILE_MULTITHREAD in colcon.meta and recompile the library I get:
#error XRCE multithreading not supported for this platform

The provided allocators definiatly cause some trouble in stack violation/overrun. I replaced them by the FreeRTOS standard versions (+ calloc and realloc in a custom version) and the heap/stack tools start showing resonable values. I will do further tests and come back afterwards.

Best Regards
Markus

@pablogs9
Copy link
Member

pablogs9 commented Aug 8, 2023

UCLIENT_PROFILE_MULTITHREAD enables the build of this file at middleware level: https://github.com/eProsima/Micro-XRCE-DDS-Client/blob/master/src/c/profile/multithread/multithread.c

So, add to your micro-ROS build procedure the definition of PLATFORM_NAME_FREERTOS and ensure that it has access to the include paths referenced here: https://github.com/eProsima/Micro-XRCE-DDS-Client/blob/97175304425c5bee87c6fddd99de1ef8d0c394dc/include/uxr/client/profile/multithread/multithread.h#L33-L34

@DrMarkusKrug
Copy link
Author

Hi,
can you tell me how to do permit access to the include paths?
'and ensure that it has access to the include paths referenced here: https://github.com/eProsima/Micro-XRCE-DDS-Client/blob/97175304425c5bee87c6fddd99de1ef8d0c394dc/include/uxr/client/profile/multithread/multithread.h#L33-L34'

My build procedure is exactly the same like described on https://github.com/micro-ROS/micro_ros_stm32cubemx_utils. I add the path to the Makefile and the toolchain.cmake but still get the error:
FreeRTOS.h: No such file or directory

Best Regards
Markus

@pablogs9
Copy link
Member

pablogs9 commented Aug 8, 2023

Share a volume to the docker builder here where your include paths for those required headers are accesible: docker run -it --rm -v $(pwd):/project --env MICROROS_LIBRARY_FOLDER=micro_ros_stm32cubemx_utils/microros_static_library microros/micro_ros_static_library_builder:iron

And add them to be accessible via the CMake toolchain usin something like include_directories:
https://github.com/micro-ROS/micro_ros_stm32cubemx_utils/blob/iron/microros_static_library_ide/library_generation/toolchain.cmake

@DrMarkusKrug
Copy link
Author

I finally managed to create a new library with the profile UCLIENT_PROFILE_MULTITHREAD and set the symbol PLATFORM_NAME_FREERTOS. However, the resulting library is blocking rcl calls now. So even my first call to rclc_support_init() hangs till the timeout push it further. Certainly without success and therefore I cannot execute my application further.
Is there any semaphore or similar I need to setup manually for the multithread profile?

@pablogs9
Copy link
Member

Where is your application blocking at RCL level?
UCLIENT_PROFILE_MULTITHREAD feature only locks at middleware and RMW level. You should not need to initialize anything.

@DrMarkusKrug
Copy link
Author

It blocks during the call rclc_support_init(). The call comes back after 4-5 seconds but didn't suceed. Therefore all the following API calls run into problems.
I detected something that might help to understand the problem. I'm using the provided DMA transport routines. After I changed to the multithread profile the DMA read data structure (as part of the uxrCustomTransport data structure) is not set anymore. Maybe during setting of this data structure the rclc_support_init() call fails. That might help in identifying the root cause of the problem.

@pablogs9
Copy link
Member

Does it works with the non DMA transport?

@DrMarkusKrug
Copy link
Author

I tried with the interrupt version and also have no success. I removed the DMA setting, kept the interrupt and use the interrupt transport routines. Acutally the receive ISR is never hit. I guess the initialization is first sending a message to the agent and it will answer afterwards. So it seems the initial sending does not happen. But this is just guessing.

@pablogs9
Copy link
Member

Does the micro-ROS Agent receive anything? Use the -v6 at the agent to set max verbosity.

@DrMarkusKrug
Copy link
Author

Hi,
not receiving anything with the lib that was generated with the UCLIENT_PROFILE_MULTITHREAD profile.

@DrMarkusKrug
Copy link
Author

One more things around the memory allocation problems:

If the original (coming from heap4.c) malloc function is used the function call rclc_node_init_default() fails with 'bad allocation' as the return value. If I use the provided pvPortMallocMicroROS() it works. Actually I couldn't find an explaination for this.

@DrMarkusKrug
Copy link
Author

One question comes to my mind. Why are the rcl API using dynamic memory allocation/deallocation/reallocation anyway? At least for the initialization routines. Didn't they work only on pointers as input parameters ? So the memory for them can be allocated before calling the functions.

@pablogs9
Copy link
Member

We do not maintain RCL, it is a ROS 2 layer. We only maintain the micro-ROS build system, the RMW for XRCE-DDS and the actual middleware. The former two, dynamic memory free.

@DrMarkusKrug
Copy link
Author

Hi,
In the meantime I wrapped the standard C memory functions to fit into the custom allocator calling scheme. So each time they are called they ask for memory from the heap instead the task stack. On a first test that seems to work. At least I get rid of the stupid 100% task stack usage. From a first look I also couldn't confirm the requirement of having >10kByte (or even 20kByte) of stack for the task that is executing the micro ros initialization (https://github.com/micro-ROS/micro_ros_stm32cubemx_utils/blob/iron/.images/Set_freertos_stack.jpg).
My initialization task has less 2kByte stack usage and the heap usage for a node and 3 publisher is around 800 Byte. At least the first numer seems to be in line with the above published numbers (however from another uC but with a very similar core architecture to the one I'm using M33 vs M4F https://github.com/micro-ROS/micro_ros_renesas_testbench/actions/runs/5781727309). So I have no idea on what this published high numbers mentioned above are based on. The numbers here: https://micro.ros.org/docs/concepts/middleware/memo_prof/ seems to be more realistic.
My application seems to run stable. I will continue testing and also expanding my applications for 2 more subscribers.

@DrMarkusKrug
Copy link
Author

We do not maintain RCL, it is a ROS 2 layer. We only maintain the micro-ROS build system, the RMW for XRCE-DDS and the actual middleware. The former two, dynamic memory free.

To me it looks like there is an unfortunate override of the FreeRTOS task stack pointer in some layer. The overwriting does not seem to be accidental but mistakenly puts the stack pointer at the very end of the available area. The access coming afterwards then leads to the overflow.

@DrMarkusKrug
Copy link
Author

I tested today with 3 publisher and 2 subscriber. The publishers have a frequency of 100Hz, 20Hz. The subscribers are received with 1Hz. The uC I use is a Cortex M4F with 80MHz. So far everything fine.
So after a week of intensive fight with the memory allocation in the ROS2 stack I will close the comment. You will find my solution in this comment #110 (comment).
Thanks for all the support and hints.

@mstiehm-NF
Copy link

I tested today with 3 publisher and 2 subscriber. The publishers have a frequency of 100Hz, 20Hz. The subscribers are received with 1Hz. The uC I use is a Cortex M4F with 80MHz. So far everything fine. So after a week of intensive fight with the memory allocation in the ROS2 stack I will close the comment. You will find my solution in this comment #110 (comment). Thanks for all the support and hints.

Was this also with multithread support? So were you able to build both multithreaded AND custom allocators?

@DrMarkusKrug
Copy link
Author

Hi, no multithread is not working for STM32 series and FreeRTOS. Something in the ROS2 layers seems to corrupt the memory. Because the publish function call is realized as blocking call I used a semaphore for protecting it (because I use the publish call in 3 different FreeRTOS tasks).
I also end up with putting as much as possible of the micro ros data structures to the heap and used the standard malloc/calloc/realloc functiions. The provided custom allocators do not work for STM32 and FreeRTOS.

@nikitax75
Copy link

Hi @DrMarkusKrug . Encountering the same issue with rcl_support_init on the STM32H743 and wondering what exact allocation functions you wrapped on to solve the issue.

Tried wrapping the allocator on malloc, free, realloc and calloc but it does not appear to solve the problem. Could you share your allocation code?

Thanks in advance

@DrMarkusKrug
Copy link
Author

Hi,

I did the following:
1.) In function micro_ros_setup() I changed the lines concering the allocators into:

freeRTOS_allocator.allocate = myMalloc;
freeRTOS_allocator.deallocate = myFree;
freeRTOS_allocator.reallocate = myRealloc;
freeRTOS_allocator.zero_allocate =  myCalloc;

And then simply wrote a wrapper to the standard C allocation functions:
void * myMalloc(size_t size,void * dummy)
{
return(malloc(size));
}

void myFree(void * pointer, void * state)
{
free(pointer);
}

void * myCalloc(size_t number_of_elements, size_t size_of_element, void * state)
{
return(calloc(number_of_elements, size_of_element));
}

void * myRealloc(void * pointer, size_t size, void * state)
{
return(realloc(pointer, size));
}

This works because the necessary memory is taken from the heap outside the FreeRTOS heap (so you might need to adopt your linker script to separte the memory into different sections). In my case (STM32L4KC) there are two RAM sections anyway. One of them I'm using exclusively for the FreeRTOS heap. So here is the snippet for the linker script:
/* Memories definition */
MEMORY
{
RAM (xrw) : ORIGIN = 0x20000000, LENGTH = 64K
RAM2 (xrw) : ORIGIN = 0x10000000, LENGTH = 16K
FLASH (rx) : ORIGIN = 0x8000000, LENGTH = 256K
}

and
/* placing my named section /
.ram2 (NOLOAD):
{
KEEP(
(.FreeRTOSHeap)) /* keep my variable even if not referenced */
} > RAM2

and somewhere in the file custom_memory_manager.c:
uint8_t ucHeap[ configTOTAL_HEAP_SIZE ] attribute((section(".FreeRTOSHeap")));

I found some important hints how to start in this video:
https://www.youtube.com/watch?v=xbWaHARjSmk&ab_channel=RoboticsinaNutshell
But I guess you are beyond that level if you realized the memory problem (or should I say disaster?). That is not mentioned in the video but I add a comment a while ago - at a time that I was not aware about the root cause of it.

Finally I have to say I'm quite disappointed about the quality and performance of the entire ROS2+MicroROS code. I expected far more because all of these robot projects seems to use them and not much critics are reported. I guess the reason for this is:
1.) a lot of projects are on a research or hobbiest level
2.) a lot of projects use more desktop computer ressources and not really embedded systems on a standard industry level
3.) lack of alternatives
4.) not so much experience on the programmers side how to programm robust and safety critical embedded applications

I only used MicroROS because one of my customer insist.
For me it works now and it seems to be stable after a lot of debugging and analysing. If possible I will avoid to use it in the future if there is no significant change. However the support of pablog9 was great and probably I would have skipped the project if he was not available.

Best Regards
Markus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants