Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU usage in Humble #381

Open
roni-kreinin opened this issue Apr 24, 2023 · 16 comments
Open

100% CPU usage in Humble #381

roni-kreinin opened this issue Apr 24, 2023 · 16 comments

Comments

@roni-kreinin
Copy link

The Create 3 is hitting 100% CPU usage in some cases in Humble. This occurs in both FastDDS and CycloneDDS, though FastDDS does seem to hit 100% more easily. From some testing it seems to occur when there is a lot of ROS2 data being sent around on a connected RPi4, even if the data is not going to the Create 3.

Some TurtleBot4 users are also experiencing this issue when trying to run two namespaced robots. The individual robots can run fine but as soon as two are on the same network they run into issues.

Related issues:

@schenkch
Copy link

schenkch commented May 4, 2023

In my case, in Galactic with CycloneDDS it is also hitting 100% CPU usage.

@Jayden-F
Copy link

I am experiencing the same issue on Humble with Cyclone DDS. Due to ongoing issues with services in Fast DDS, I cannot confirm if the problem persists with FAST DDS. However, it it would be great to have a solution. I am going to try work around it use separate domain ids. However, this is not ideal.

@lukeopteran
Copy link

We have the same issue, any heavy ROS2 traffic with fastDDS or Cyclone seems to knock out the robot, after a while the robot moves in a stop start manner when receiving a continuous forward cmd_vel. We have a RPi4 running Galactic, with the robot running using latest Humble 1.2 or the latest Galactic 4.5 firmware. The only solution is to fully reboot the robot. I have found this is better other the USB-ethernet, rather than the wifi but it still has the same issue eventually.

I have tried isolating the robot from the bulk of the ROS2 (Different Domain IDs + Bridge) traffic but it still appears to become overwhelmed.

I think there could be a network buffer on the robot that isn't getting handled fast enough and is backing up....

@alsora
Copy link
Collaborator

alsora commented May 17, 2023

Hi, thank you a lot for your reports.

It looks like there are different scenarios being described here:

  1. Create 3 with Humble hits 100% CPU usage when new ROS 2 nodes are created in the same network (even if they don't communicate with the robot). This happens both with Fast-DDS as well as CycloneDDS. (described by @roni-kreinin)
  2. Create 3 with Galactic or Humble hits 100% CPU usage when there's heavy ROS 2 traffic in the network. This happens both with CycloneDDS and Fast-DDS (described by @schenkch, @lukeopteran, @Jayden-F ).

Can you help us replicate a stress test?
I just did some tests on my Create 3 with ROS 2 Galactic and Fast-DDS.
On the laptop I started a rosbag recorder, which I consider a very stressful test, as it subscribes to all the topics published by the robot.
The CPU usage does indeed grow, but it doesn't reach 100% and it doesn't impact in a noticeable way the robot's operations.

We are investigating the problem and we are looking for a way to minimize the impact of ROS 2 traffic on the robot's CPU.
We are adding instrumentation tools to better understand which threads on the robot are responsible for the CPU usage increase and we are also in contact with DDS vendors to find a way to fine-tune the RMW for this use-case.

The CPU overhead of ROS 2 remote communication is a complex problem that doesn't affect only Create 3, but it's more evident on this robot due to its available computational resources.

Please keep posting your experience with as much details as possible, as that will help us getting to a solution as soon as possible.
Thank you

@Jayden-F
Copy link

Thank you, @alsora. I am currently using two turtlebots4, let's call them "walle" and "eve" which I am running under their respective namespaces using cyclone dds. I have recently experimented with the turtlebot4s a bit more.

Here is what I've found:
Suppose I start using the navigation2 package and the launch file under nav_bringup on my remote desktop in the walle namespace. I can successfully run SLAM and the navigation stack. However, at this point, the other robot, eve, will become unresponsive and show %100 CPU in their logs. Additionally, my dedicated router will show 100% CPU usage due to the quadratic network traffic cyclone dds creates.

Next, we ran them in different Domain IDs to reduce the number of processes/nodes on the same network. This method allowed us to successfully run SLAM and the navigation stack on our remote desktop for both robots. Using separate Domain IDs, The network traffic on the router was reduced to 40% CPU usage.

Our initial idea was to use fast dds to overcome the issues of cyclone dds. However, currently, ros2 services do not work with the fast-DDS discovery server: eProsima/Fast-DDS#3505

I hope this information is helpful.

@ipa-rar
Copy link

ipa-rar commented May 22, 2023

@Jayden-F

Suppose I start using the navigation2 package and the launch file under nav_bringup on my remote desktop in the walle namespace.

Do you have any reference implementation of this. I have been looking around regarding setting up namespaced nav_bringup but could not find any proper references. I have opened an issue in turtlebot4 but did not receive any useful response there. Do you mind providing some info about how to setup the same.

@Jayden-F
Copy link

Jayden-F commented May 22, 2023

@ipa-rar
I use this code from the following repo, it is based on the navigation module from The Construct.
https://bitbucket.org/shortestpathlab/turtlebot4/src/develop/

Additionally, The navigation library has a default bring up file:
ros2 launch nav2_bringup bringup_launch.py

The above launch file is used by the turtlebot4, calling it directly will allow you to configure the default parameters such as namespace.

I hope this helps.

@jmm-slamcore
Copy link

Has there been any progress or workarounds for this? We also see this issue and Out of memory on a Create 3 + Jetson Xavier NX setup running Galactic with FastRTPS/DDS when launching many things at once (SLAM, Nav2, Rviz2) creating heavy ROS 2 traffic in the network. This causes the Create to freeze with the red light. Launching nodes sequentially one after the other sometimes helps but it is not very reliable. The robot, jetson and visualisation laptop are using a unique Domain ID.

@mdxtinkernick
Copy link

mdxtinkernick commented Jul 8, 2023

there is new firmware for the base (version 2,1 for humble). It makes no mention of fixing 100% cpu usage, but does add the ability to set the ntp servers for the create3.

Should we wait for an update to the pi image to test this, or should we just go ahead ?

@royito55
Copy link

@jmm-slamcore I am experiencing the same problem with CycloneDDS. I was hoping that the new firmware update would address this, but it didn't.

My goal is then to try and limit the traffic the Create3 sees from the network with a DDS XML configuration. However, even disabling multicast in all machines does not prevent the robot from crashing.

@sashag1991
Copy link

Is there expectations for a fix to 100% CPU utilization. Is it also the reason that i don't see local cost map in RVIZ ?

@shamlian
Copy link
Collaborator

Reduction of processor utilization is a topic of importance to our team and we plan to release some fruit of that work very soon.

@ValkyrX
Copy link

ValkyrX commented Jul 24, 2023

i also have 100% cpu usage, it make the robot unusable and constantly needing reboots, also if i dont use pi4 as dicovery server the robot is not usable at all, simply undocking can take 30 seconds or more, also if trying to run navigation as per turtlebot 4 instructions the messages are just not getting from the create to the pc runnign the navigation, constant waiting for messages its not great and makes this platform unusable :-( cmon guys give us a fix

@shamlian
Copy link
Collaborator

We have just released G.5.2 and H.2.2, which reduce core robot loading by about 9%. Hopefully this, in combination with other configuration work that is in progress, will help. From my perspective, you are welcome to update your Create 3 robot in a TurtleBot configuration to H.2.2, but I don't know how dependent TurtleBot configuration is on Create 3 firmware version.

@sashag1991
Copy link

I tried H.2.2 also with ntp.conf edit as suggested here turtlebot/turtlebot4#216. I still see around 100% CPU utilization, but most importantly i done see local cost map updated with obstacles. Is there something i can do to trigger local cost map to work?

@sashag1991
Copy link

Any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests