[Foxy] Add support for ROS tracing tools #9

carlossvg · 2020-01-12T09:23:10Z

Currently we track statistics to measure real-time performance using different tools. Some of them are included in the project pendulum_tools and others are external (OSRF memory memory tools). To read this statistics we publish them using a ROS 2 publisher, which can be intrusive because it generates additional traffic, ROS 2 executables, etc.

In Eloquent ros2 tracing tools were introduced, these tools would allow to track most of the statistics we are currently tracking and even more. The advantages of using this approach are that we rely in only one tool and we can remove all the specific code to track statistics from the project. Also, the overhead of tracing tools is minimal, which makes it a less intrusive solution than the current approach.

These are some of the steps in order to add support for the ROS2 tracing tools:

Update real-time instructions. Explain how to use tracing tools.
Create launch file to run the demo with tracing enabled. (See the example)
Add several jupyter scripts (See the example):
- Measure callbacks duration and jitter.
- Count the number of messages received and lost.
- Measure message latency.
- Measure the number of memory allocations.
- Measure the number of page faults. -> This must be enabled when building the kernel
- Capture the node on_active event to process some statistics only in active state.

Some disadvantages:

For the moment is only compatible with Linux because it uses Lttng under the hood.
For the moment is not possible to track statistics at runtime (aka live-tracing).

Links:

The text was updated successfully, but these errors were encountered:

carlossvg · 2020-07-19T17:38:32Z

Hi @christophebedard We plan to replace "OSRF memory tools" to analyze memory allocations and use ros2_tracing instead. Could you point us in the right direction?

The goal is to detect any malloc (and alike) syscalls only when a lifecycle node is in the active state.

Do you think it is possible to do this just by using a jupyter notebook? Or does it make sense to add some tracepoints in the lifecycle callbacks to make this easier?
How can we classify each memory allocation depending on the layer it is created?
Do you have any similar example code to analyze memory allocations?

Thanks.

christophebedard · 2020-07-20T23:48:29Z

Do you think it is possible to do this just by using a jupyter notebook? Or does it make sense to add some tracepoints in the lifecycle callbacks to make this easier?

Yeah, you can't do this without having a way to know which memory allocations happened during the active state and which didn't, so you will indeed need to add tracepoints to track lifecycle state changes.

I think support for this should be added to tracetools (and instrumentation added to rclcpp_lifecycle + support for it added to the processing tools), so I can take care of that if you want.

How can we classify each memory allocation depending on the layer it is created?

So this is unrelated to the first point? You would need to track the execution flow across layers, e.g. by putting a tracepoint at each layer boundary. Then you use those to keep track of the "layer state." Of course this can get ugly if the execution flow is complex, i.e. if it triggers a lot of tracepoints 😆. I'm guessing you mean layers like rclcpp, rcl, rmw, rmw implementation?

Another way could be to gather callstack information along with the memory allocation trace events (with something like this, since this isn't a standard LTTng feature), look at the function which called malloc/free/etc. (and/or which executable/shared lib that function is from) and figure out which layer it belongs to. This is probably overly complex to set up, though, but since you don't expect to have memory allocations (well at least during the "active" state), this might have even less performance impact since you wouldn't need to keep track of layers using tracepoints. So less triggered tracepoints/generated events versus collecting more information for each memory allocation tracepoint but doing more processing work.

Do you have any similar example code to analyze memory allocations?

Of course! Here's a few links:

Example Jupyter notebook plotting memory consumption over time: https://gitlab.com/micro-ROS/ros_tracing/tracetools_analysis/-/blob/master/tracetools_analysis/analysis/memory_usage.ipynb
Example launch file to get the data for the notebook above: https://gitlab.com/micro-ROS/ros_tracing/tracetools_analysis/-/blob/master/tracetools_analysis/launch/memory_usage.launch.py
Maybe you can re-use the data that is used in the example notebook, e.g. use the lifecycle state tracepoints to identify "active" time intervals and check those intervals for any changes in consumption. If you only want to check for calls to malloc/free/etc., take a look at the processing code used for the example notebook: https://gitlab.com/micro-ROS/ros_tracing/tracetools_analysis/-/blob/master/tracetools_analysis/tracetools_analysis/processor/memory_usage.py

I hope that was helpful. Let me know if you need more information!

carlossvg · 2020-07-21T17:17:08Z

I think support for this should be added to tracetools (and instrumentation added to rclcpp_lifecycle + support for it added to the processing tools), so I can take care of that if you want.

Ok cool. In the meanwhile I can test can test the checking for memory allocations for all the tracing session.

Another way could be to gather callstack information along with the memory allocation trace events

Yes, I think this fits better with what I had in mind. Plotting memory consumption over time just for the active state could be interesting too, at least as a POC. I will start with the easiest one and I will ask for help to set up the callstack example.

christophebedard · 2020-08-16T22:56:41Z

@carlossvg I've started working on this (see: https://gitlab.com/micro-ROS/ros_tracing/ros2_tracing/-/issues/99) and I was wondering if you can think of other things that would be good to instrument for lifecycle nodes (other than state transitions). I can't think of anything myself, but if there is anything it might be good to do at the same time.

carlossvg · 2020-08-19T08:04:53Z

@christophebedard That's great. I think with the state transition traces most of my use case is covered. After looking at the rclcpp/rcl lifecycle implementation I can't find any clear places to put additional traces. For the moment tracing the state transitions seem to be more than enough for me.

carlossvg · 2020-08-19T10:57:38Z

@christophebedard I started working on the callbacks duration scripts in this branch. I have some questions maybe you can help me to solve:

The long time to process all the data is probably the biggest pain point for this use case. I tried to trace for some minutes and the processing time takes several minutes to complete. Do you know if there is some way to solve this issue (i.e: filtering events, plotting babeltrace output with other tool, etc)?
I would like to plot the delta time between callbacks, for example, to measure the jitter of a timer callback. I think the code would be very similar to the one in the callback duration notebook. Unfortunately, I'm not familiar with notebook and pandas. Could you point me how could I modify the callback duration notebook to implement this?

christophebedard · 2020-08-19T12:47:33Z

Yeah it's a big problem currently. I see you're keeping default values for most params. If you don't need kernel events (at least currently), you should remove them, i.e. events_kernel=[]. That should reduce the processing time significantly. I just opened an issue to work on improving it. Also, you might want to put the Trace object first in the array for LaunchDescription. I haven't actually tested whether that changes anything though (but let me know if you find out).
The dataframe returned by get_callback_durations() contains both the start timestamp of callbacks as well as their duration. That dataframe is used to make a ColumnDataSource object (itself used for plotting), where the timestamp is for the X axis and the duration for the Y axis. I'm not sure exactly how you want your jitter plot to look like, but I'm guessing you want to remove the duration column and add an "index" column that can act as the X axis and make the timestamp column the Y axis (I think the X axis column is simply the first column and the Y axis column is the second one). Then you need to transform/process the dataframe to change timestamps into time deltas. I'm sure there's examples somewhere online.

carlossvg · 2020-08-20T16:18:46Z

@christophebedard Thanks for the answers. I added the changes you suggested and I created the script for the timestamp differences. I created a PR here #39, feel free to provide your input and drop me a comment there if you think something can be improved. Thanks for your support.

christophebedard · 2020-08-22T20:44:44Z

@carlossvg I'll take a look!

I also wanted to mention (just in case) that the modifications (to rcl, tracetools, tracetools_analysis, etc.) won't be merged into Foxy, since it's a new feature and Foxy was already released. They'll go into Rolling (which will become Galactic eventually). Did you consider targeting Rolling instead of Foxy? Although I'm guessing the pendulum demo is meant to be used with the last release instead of the latest state.

carlossvg · 2020-08-23T09:21:02Z

@christophebedard

Although I'm guessing the pendulum demo is meant to be used with the last release instead of the latest state.

That's right, for the moment we prefer to target the latest release. In the meanwhile, we could create a Rolling branch to test some of the new features.

@LanderU What's your input on this?

LanderU · 2020-08-23T12:03:35Z

@carlossvg we can create a rolling branch and start testing features when they will integrated.

christophebedard · 2020-08-23T13:50:41Z

@carlossvg @LanderU so then you'll maintain forks/branches for rcl/tracetools/tracetools_analysis with the new instrumentation/modifications backported to Foxy?

christophebedard · 2020-08-23T13:52:28Z

also, just to link to the various related MRs/branches from here:

rcl instrumentation: https://gitlab.com/micro-ROS/ros_tracing/rcl/-/tree/add-lifecycle-node-state-transition-instrumentation
- once everything looks good I'll open a PR upstream
MR for ros2_tracing: https://gitlab.com/micro-ROS/ros_tracing/ros2_tracing/-/merge_requests/199
MR for tracetools_analysis: https://gitlab.com/micro-ROS/ros_tracing/tracetools_analysis/-/merge_requests/83

the MRs are still WIP, but they should be ready soon.

carlossvg added the enhancement New feature or request label Jan 12, 2020

carlossvg mentioned this issue Jan 12, 2020

ROS 2 Eloquent Elusor #12

Closed

carlossvg changed the title ~~Add support for ROS tracing tools~~ [Foxy] Add support for ROS tracing tools Aug 16, 2020

carlossvg mentioned this issue Aug 16, 2020

[Meta-Ticket] Foxy improvements #35

Open

32 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Foxy] Add support for ROS tracing tools #9

[Foxy] Add support for ROS tracing tools #9

carlossvg commented Jan 12, 2020

carlossvg commented Jul 19, 2020

christophebedard commented Jul 20, 2020

carlossvg commented Jul 21, 2020

christophebedard commented Aug 16, 2020 •

edited

carlossvg commented Aug 19, 2020

carlossvg commented Aug 19, 2020

christophebedard commented Aug 19, 2020 •

edited

carlossvg commented Aug 20, 2020

christophebedard commented Aug 22, 2020 •

edited

carlossvg commented Aug 23, 2020

LanderU commented Aug 23, 2020

christophebedard commented Aug 23, 2020

christophebedard commented Aug 23, 2020

[Foxy] Add support for ROS tracing tools #9

[Foxy] Add support for ROS tracing tools #9

Comments

carlossvg commented Jan 12, 2020

carlossvg commented Jul 19, 2020

christophebedard commented Jul 20, 2020

carlossvg commented Jul 21, 2020

christophebedard commented Aug 16, 2020 • edited

carlossvg commented Aug 19, 2020

carlossvg commented Aug 19, 2020

christophebedard commented Aug 19, 2020 • edited

carlossvg commented Aug 20, 2020

christophebedard commented Aug 22, 2020 • edited

carlossvg commented Aug 23, 2020

LanderU commented Aug 23, 2020

christophebedard commented Aug 23, 2020

christophebedard commented Aug 23, 2020

christophebedard commented Aug 16, 2020 •

edited

christophebedard commented Aug 19, 2020 •

edited

christophebedard commented Aug 22, 2020 •

edited