Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xmlrpc.XmlRpcNode consumes 100%+ cpu after being shutdown #2238

Open
drjsmith opened this issue Apr 26, 2022 · 7 comments · May be fixed by #2368
Open

xmlrpc.XmlRpcNode consumes 100%+ cpu after being shutdown #2238

drjsmith opened this issue Apr 26, 2022 · 7 comments · May be fixed by #2368

Comments

@drjsmith
Copy link

TLDR: calling shutdown() on a XmlRpcNode does not fully shut it down and leaves a thread running.
I am currently on Noetic/Ubuntu 20.04/python 3.8.

What is going on:

  • Calling start() results in a new thread running self.server.serve_forever(), where server is of type ThreadingXMLRPCServer which has a base class of socketserver.BaseServer.
  • That function, defined for BaseServer in socketserver.py:215, will not exit until the loop's condition is satisfied: while not self.__shutdown_request:
  • self.__shutdown_request can be set by calling shutdown() on BaseServer
  • However, XmlRpcNode's shutdown() method never calls self.server.shutdown(), only server.socket.close() and server.server_close(). I'm pretty sure those last 2 functions do the same thing, since:
class TCPServer(BaseServer):
    ...
    def server_close(self):
        self.socket.close()

The thread is then stuck in an infinite loop. Before shutting down XmlRpcNode, the thread not use a significant amount of cpu; afterwards, however, the thread consumes 100% of a single core (as shown by top).

It seems to me that the solution may be as simple as adding self.server.shutdown() to XmlRpcNode's shutdown() function, though I'm not certain whether it should be before or after closing the socket or if there are other considerations.

Why this matters to me:
I'm guessing this isn't a common problem, since generally the process would be ending shortly afterwards. However, I am using roslaunch.parent.ROSLaunchParent to launch and shutdown nodes as part of a python-based robot navigation benchmarking codebase. I wrote the code and used it extensively on Kinetic/Ubuntu 16.04/python 2.7 and did not encounter this issue at that time: starting and stopping hundreds of instances of roslaunch.parent.ROSLaunchParent did not cause any noticeable issues. As soon as I switched to Noetic/Ubuntu 20.04/python 3.8, I noticed that after running a few experiments the python process was at times using ~166% cpu. Usage hits 100% if there is only 1 'stopped' instance; ~166% if there are 2 or more.
The cpu usage of the process drops very low as soon as I start a new instance; I'm not familiar enough with the Selector-related code to understand why this is so.

Minimum example:
The script starts a roscore and then repeatedly creates, starts, and stops roslaunch.parent.ROSLaunchParent instances; run top to see how the process' cpu usage changes.

import os
import time
import roslaunch

class RoslaunchTest(object):

    def __init__(self, is_core=True):
        os.environ["ROS_MASTER_URI"]="http://localhost:11312"
        self.is_core = is_core

    def start(self):
        print("Starting...")
        uuid = roslaunch.rlutil.get_or_generate_uuid(None, not self.is_core)

        self.roslaunch_object = roslaunch.parent.ROSLaunchParent(
            run_id=uuid, roslaunch_files=[],
            is_core=self.is_core, port=11312
        )
        self.roslaunch_object.start()
        print("Started!")

    def stop(self):
        print("Stopping...")
        self.roslaunch_object.shutdown()
        print("Stopped!")

def demo():
    core = RoslaunchTest(is_core=True)
    core.start()

    test = RoslaunchTest(is_core=False)
    def cycle():
        test.start()
        time.sleep(5)
        test.stop()
        time.sleep(10)

    for _ in range(4):
        cycle()

if __name__ == '__main__':
    demo()
@fgrcar
Copy link

fgrcar commented May 3, 2022

I've noticed the same issues on the ROS Melodic/Ubuntu 18 but only with the Python 3.6.9. On the other hand this is not the case with the Python 2.7.

@BRNKR
Copy link

BRNKR commented Aug 17, 2022

Same for me. Unusable at the moment. We want to switch between two driving modes with different set of nodes. we want to avoid running all at the time when it is not necessary. for that we have a launch_handler with service which should start the corresponding launch file. when killing one each time a new instance of my handler appears in my process list, which generates 100% cpu load. for each on/off it generates a new one.

@drjsmith
Copy link
Author

@TobiMiller @fgrcar Here is how I worked around the problem:

#Class that fixes the problem
class RoslaunchShutdownWrapper(roslaunch.parent.ROSLaunchParent):

    def shutdown(self):
        server = self.server.server if self.server is not None and self.server.server is not None else None
        super(RoslaunchShutdownWrapper, self).shutdown()
        if server:
            server.shutdown()

    def __del__(self):
        print("Shutting down now!!!!!!")
        self.shutdown()

#How to use it:
roslaunch_object = RoslaunchShutdownWrapper(
                                run_id=..., roslaunch_files=...,
                                is_core=..., port=...,
                                sigterm_timeout=...
                            )
roslaunch_object.start()

Let me know if it works for you.

@RodolpheCyber
Copy link

RodolpheCyber commented Aug 27, 2022

@drjsmith Thanks so much for posting what helped for you !

I'm surprised this hasn't been addressed sooner, it's really a pain. I made a QT user Interface using the roslaunch API basically to launch and shutdown nodes of our stack. But I'm seeing the same CPU error you mentioned as I shutdown the nodes, which makes our Qt interface extremely slow and impossible to use as some nodes are being shut.

Your proposition seems to work for me so far though! Based on it, I wrote a quick class that the Qt interface calls to launch and shutdown nodes :

class RoslaunchWrapperObject(roslaunch.parent.ROSLaunchParent):

    def start(self):
        super(RoslaunchWrapperObject, self).start()
        print(self.server)

    def stop(self):
        print("Stopping...")
        print(self.server)
        server = self.server.server if self.server is not None and self.server.server is not None else None
        super(RoslaunchWrapperObject, self).shutdown()
        if server:
           server.shutdown()

To launch a node from the QT :

def launch(parameter):
    uuid = roslaunch.rlutil.get_or_generate_uuid(None, False)
    roslaunch.configure_logging(uuid)
    cli_args = parameter
    roslaunch_args = cli_args[1:]
    roslaunch_file = [(roslaunch.rlutil.resolve_launch_arguments(cli_args)[0], roslaunch_args)]
    launcher = RoslaunchWrapperObject(run_id = uuid, roslaunch_files = roslaunch_file)
    return launcher

and the calls are made like this :

cli_args = ['PATH/online.launch','simulation:=true']
        self.launch = launch(cli_args)
        self.launch.start() # to start it 
        self.launch.stop() # to stop it 

Thank you so much for this, it really helped ! I do hope they address the issue and find a proper way to fix it so we can reuse the API just like on previous ROS versions.

@drjsmith
Copy link
Author

@RodolpheCyber You're welcome!

l1va added a commit to l1va/ros_comm that referenced this issue Sep 27, 2022
Fixing the issue with 100% CPU usage after the shutdown
ros#2238
@adipotnis
Copy link

is this issue still present on melodic?

@drjsmith
Copy link
Author

drjsmith commented Nov 1, 2022

@adipotnis I discovered the problem when upgrading from kinetic to noetic, so I haven't personally tested it on melodic. @fgrcar reported encountering the problem on melodic but only when using python 3. I wouldn't be surprised if the root bug has been present for many versions and has only been exposed due to the transition to python 3, though I haven't verified this.

jamesdarrenmuir added a commit to jamesdarrenmuir/ros_comm that referenced this issue Dec 15, 2023
stop CPU usage spike when a rospy node receives a signals.SIGINT interrupt but before it terminates the process (e.g. pressing CTRL+C while the node is sleeping due to a rospy.sleep() call no longer sends CPU usage of a core to 100%)

should also fix ros#2238
@jamesdarrenmuir jamesdarrenmuir linked a pull request Jan 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants