Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU use of executor #1637

Open
adamdbrw opened this issue Apr 19, 2021 · 2 comments
Open

High CPU use of executor #1637

adamdbrw opened this issue Apr 19, 2021 · 2 comments
Assignees

Comments

@adamdbrw
Copy link

adamdbrw commented Apr 19, 2021

Report

Ubuntu 20.04
Rolling
Both packages and my own build
Commit: 61fcc76
Independent of rmw implementation (tested both FastDDS and CycloneDDS).

Steps to reproduce issue

This was first reported and confirmed by me when recording a rosbag2 (which uses SingleThreadedExecutor). Investigating CPU load, I noticed that the majority of resource use is due to executor function get_next_executable(), meaning that even with empty callbacks (no actual work to execute) the CPU load remains very high (around 70% on my machine). The rosbag2 comes from a real (automotive) use case and amounts to about 4k executables (subscriptions) per second.

To reproduce, it should be enough to use the spin() function with enough traffic to ensure high amount of executables. Performance package in rosbag2 could be used to automate running of desired number of publishers.

Expected behavior

Executor should us less CPU for acquiring the next executable. This is important e. g. in the case of rosbag2 it affects how the recorded system performs.

Actual behavior

Executor has a high CPU consumption even when subscription callbacks are empty (just to acquire next executables).

rolling_spin_empty_callbacks_cyclonedds

Additional information

A partial work around is to use spin_some() or spin_all() followed by a short (e. g. 1 ms) sleep in a while (rclcpp::ok()) loop, instead of a spin().
Note that spin_once() with a similar sleep won't work as well since we want to execute a higher number of executables each second than it would permit.

With the case of ~4k executables per second, when calling wait_for_work, 3.3k calls to rcl_wait per each second are made, so it is mostly only one executable that is returned each time, which seems quite inefficient. I am not sure if this is by design (since it perhaps minimizes latency), certainly collections used to gather a bunch of executables for each rcl_wait call are underused.

When a 1 ms sleep is introduced after we miss the cache (before/after rcl_wait), only ~600 calls to rcl_wait per second are made while successfully executing the same number of callbacks per second.

Sleeping (and chrono steady clock) of course can have their behavior dependency on platform so it is hard to suggest this as an executor level change, but certainly a factor to be aware of.

Perhaps another type of executor (Events based) would be more suitable for this type of use-case / requirements for low CPU consumption. There is quite some work ongoing ros2/design#305.

@fujitatomoya
Copy link
Collaborator

@adamdbrw

Just FYI, there's been discussion on this stuff, https://discourse.ros.org/t/singlethreadedexecutor-creates-a-high-cpu-overhead-in-ros-2/10077. how about using StaticSingleThreadedExecutor? which cannot collect entities dynamically, but it would be worth to see how much improvement we could have?

@adamdbrw
Copy link
Author

adamdbrw commented Apr 21, 2021

Thank you for the suggestion - it is a good idea and so it was something I checked right away looking through available executors. I checked the Static one, observed only minimal improvement in my case as described above, probably because it doesn't solve the core issue.

@clalancette clalancette assigned ivanpauno and wjwwood and unassigned ivanpauno May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants