Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple controller_manager spawn calls freeze a controller manager #265

Open
mrjogo opened this issue Feb 25, 2017 · 21 comments
Open

Multiple controller_manager spawn calls freeze a controller manager #265

mrjogo opened this issue Feb 25, 2017 · 21 comments

Comments

@mrjogo
Copy link

mrjogo commented Feb 25, 2017

Using https://github.com/davetcoleman/ros_control_boilerplate/tree/0.4.0/rrbot_control as an example, replace the ros_control_controller_manager node in https://github.com/davetcoleman/ros_control_boilerplate/blob/0.4.0/rrbot_control/launch/rrbot_hardware.launch#L21 with:

<node name="ros_control_controller_manager" pkg="controller_manager" type="controller_manager" respawn="false"
	  output="screen" args="spawn joint_state_controller" />
<node name="ros_control_controller_manager" pkg="controller_manager" type="controller_manager" respawn="false"
	  output="screen" args="spawn position_trajectory_controller" />

Build and run roslaunch rrbot_control rrbot_hardware.launch and the controller manager stops responding, either to service calls (ie, /rrbot/controller_manager/list_controllers) or Ctrl-C (if run outside roslaunch). The same thing happens if using spawner instead of controller_manager. I haven't tested it on a controller manager running outside of ros_control_boilerplate.

If it's intended that only one spawn call is allowed, I didn't see it in the documentation.

@bmagyar
Copy link
Member

bmagyar commented Feb 25, 2017

Your snippet clearly launches 2 controller managers so that's where the problem is.
Starting up 2 controller_manager nodes which lead to a resource and name conflict. The reason it stops responding is probably because either something gets stuck or the first controller manager dies when the second one is started.

Take a look at this example where the spawner script is used to set up controllers on an already running controller_manager. If you intend to start things up separately or in a parametrised way, this is the way to go.

spawner is only a script that calls services load_controller, switch_controller. It has the same effect as you'd get by executing the same commands from the command line, see here.

Dave's boilerplate functionalities allow a quick setup by using the controller manager + spawner at the same time but if you need to launch things in a modular way I'd recommend looking at any of the PAL robots for inspiration.

@mathias-luedtke
Copy link
Contributor

mathias-luedtke commented Feb 25, 2017

Your snippet clearly launches 2 controller managers so that's where the problem is.

I think the problem is that both use the same node name ros_control_controller_manager.
So the first might get killed in the middle of the service call.
Please try it with distinct names.

You can spawn multiple controllers at once, too:

<node name="ros_control_controller_manager" pkg="controller_manager" type="controller_manager" respawn="false"
	  output="screen" args="spawn joint_state_controller position_trajectory_controller" />

(just realized that is this is the default for the boilerplate template)

spawner [...] It has the same effect as you'd get by executing the same commands from the command line

The spawner script is different as it will (try to) stop the controller if it stops (ctrl+c or kill).
This need proper set-up to avoid problems.
I tend to avoid it.

@mathias-luedtke
Copy link
Contributor

Please try it with distinct names.

I have tested it myself, and I get the same bevahior, looks like a bug.

@mrjogo
Copy link
Author

mrjogo commented Feb 27, 2017

@bmagyar

Your snippet clearly launches 2 controller managers so that's where the problem is.

The controller_manager script is just an interface to an already running controller_manager (confusing name). It's fairly similar to the spawner script, which also causes the issue if two are started.

@ipa-mdl Yes, I still see the problem with different node names.

Conceptually, I think multiple spawn nodes should be able to run together (I was using it to start some controllers using --stopped and others started), so it's either a bug or the documentation should be clarified to indicate only one spawn node at a time should be run.

@mathias-luedtke
Copy link
Contributor

Conceptually, I think multiple spawn nodes should be able to run together

Yes that should work!
I have tested it with our Care-O-bot simulation on indigo without problems.
Not sure if this is a regression bug, a bug with the default gazebo plugin or the boilerplate template..

@bmagyar
Copy link
Member

bmagyar commented Feb 27, 2017

@mrjogo

The controller_manager script is just an interface to an already running controller_manager (confusing name). It's fairly similar to the spawner script, which also causes the issue if two are started.

Yeah, sorry for rushing to conclusion, my eyes skipped the fact that it was the python script.

This is actually not that hard to test.
I'd try launching this by hand and testing the idea against it.

@bmagyar
Copy link
Member

bmagyar commented Oct 30, 2017

Any update on this?

@jacknlliu
Copy link

jacknlliu commented Dec 16, 2017

Same behavior on Ubuntu 16.04, ROS kinetic. I think this is a regression bug.

And I find that after combined the controller_manager nodes to only one, rosservice call /controller_manager/list_controllers works, but if we use rosservice call /controller_manager/switch_controllers to stop controllers, then it will have no responds and rosservice call /controller_manager/list_controllers also has no responds again.

@mathias-luedtke
Copy link
Contributor

mathias-luedtke commented Dec 16, 2017

TL;DR: This is no bug in the controller_manager, but an issue with the boilerplate template.

I went through the code of the ros_control_boilerplate and have found out that it is running its control loop with a ROS timer in conjunction with an AsyncSpinner with two threads.

The service handlers in controller_manager are protected by locks.
So if two switch requests come in at the same time, the first one will be run, the second one will block.
In case of the boilerplate template these two service calls block both available spinner threads, so the control loop cannot be run.
This results in a deadlock because the first service handler waits for the control loop to process the switching.

We could improve the situation a little bit by resolving the deadlock with a timed lock (so it will recover after some time), but the boilerplate controller loop will freeze for a certain time.
IMHO it would be best to migrate the boilerplate code to a sleep-based loop.

@mathias-luedtke
Copy link
Contributor

I have proposed a fix: PickNikRobotics/ros_control_boilerplate#20

@alextoind
Copy link

I think I'm facing a similar problem without ros_control_boilerplate installed: if I launch several rosrun controller_manager controller_manager spaw ... commands almost simultaneously, they stuck while spawning controllers, e.g.

Terminal 1:

rosrun controller_manager controller_manager spawn d1_joint_state_controller d1_c1 d1_c2
Loaded d1_joint_state_controller

Terminal 2:

rosrun controller_manager controller_manager spawn d2_joint_state_controller d2_c1 d2_c2
Loaded d2_joint_state_controller
Loaded d2_c1
Loaded d2_c2

Terminal 3: control node started previously (the same if it is started after)

...
[ INFO] [1524818643.691951727]: Is waiting for [controller_manager] to startup...
[ INFO] [1524818643.693721761]: [controller_manager] is ready!
[ INFO] [1524818643.697370788]: Is waiting for controller [d1_joint_state_controller] to startup...
[ INFO] [1524818643.840676782]: Controller [d1_joint_state_controller] is loaded!
[ INFO] [1524818643.840705571]: Is waiting for controller [d1_c1] to startup...

However, if I use load ... commands in the exactly same manner, it works as expected (apart form the needs to start controllers later). And of course it works if I spawn all the controllers in one shot.

The control node calls controller_manager.getControllerByName(c_name) and action_client.waitForServer() respectively to wait for controllers and action server startup (I'm using default joint trajectory controllers at the moment).

I don't know whether it is correct to add these waits or not, but I've added them in the past to be sure that everything was started correctly. Now that I need to add several spawners they seem to mess up. Indeed, if I remove both calls everything start correctly even with the spawn ... commands.

It seems a deadlock problem but it is difficult to figure out exactly where.

Any thoughts? Is it the right behavior?

@miguelprada
Copy link
Contributor

@alextoind, which version of ros_control_boilerplate are you using?

It is possible that PickNikRobotics/ros_control_boilerplate#20 solves your problems? That PR has been very recently merged, so the fix is not yet available in the released packages.

@bmagyar
Copy link
Member

bmagyar commented Apr 27, 2018 via email

@alextoind
Copy link

@miguelprada as said at the beginning I have no ros_control_boilerplate installed in my system. Please, let me know whether it is better to open a new issue or not.

@bmagyar I'm sorry about that, here you are:

  • Ubuntu Xenial 16.04 (kernel 4.13.0-39-generic x86_64)
  • ROS Kinetic
  • ros_control 0.13.2 (0.13.2-0xenial-20180416-120230-0800)
    and all other packages up to date (everything "stock" apart from my control node)

Thank you very much

@mathias-luedtke
Copy link
Contributor

mathias-luedtke commented Apr 27, 2018

everything "stock" apart from my control node

Is it based on something like ros_control_boilerplate (which is meant as a template)?
Do you run the update loop with a ROS Timer?
Any chance of deadlocks in your code?

@miguelprada
Copy link
Contributor

@miguelprada as said at the beginning I have no ros_control_boilerplate installed in my system. Please, let me know whether it is better to open a new issue or not.

Oops I read your post too fast and thought it was a with instead of a without 🤦‍♂️

In any case, the problem might be the same one that was fixed in PickNikRobotics/ros_control_boilerplate#20, as @ipa-mdl seems to suggest when asking whether your control node is based on ros_control_boilerplate. Here's his explanation of what was going on.

@alextoind
Copy link

alextoind commented May 2, 2018

@ipa-mdl these are my first-level dependencies, but I don't think there is nothing like ros_control_boilerplate:

roscpp
actionlib
combined_robot_hw
hardware_interface
joint_limits_interface
transmission_interface
urdf
control_msgs
control_toolbox
controller_manager
message_runtime
std_msgs
std_srvs

And I'm quite sure there aren't deadlocks in my code: it just initializes things up and relies on ROS Action Clients. However I do run the update loop of my control node through a ROS Timer. Is that the problem?

// inside constructor
control_timer_ = node_handle_.createWallTimer(control_duration_, &robotControl::controlCallback, this);
...

void robotControl::controlCallback(const ros::WallTimerEvent &timer_event) {
  update(timer_event.current_real, timer_event.current_real - timer_event.last_real);
}

void robotControl::update(const ros::WallTime& time, const ros::WallDuration& period) {
  // read the state from the hardware interface

  // update the controllers
  controller_manager_.update(ros::Time(time.toSec()), ros::Duration(period.toSec()));

  // write the commands to the hardware interface
}

@miguelprada I had already read that post, but since I use an AsyncSpinner initialized with just one thread (and Actions are initialized with spin_thread forced to false), I think that it can be something like that, but not exactly the same problem.

With my setup I should have only one thread accessing those resources. Isn't it?

Thank you very much for your time

@mathias-luedtke
Copy link
Contributor

mathias-luedtke commented May 2, 2018

I had already read that post, but since I use an AsyncSpinner initialized with just one thread (and Actions are initialized with spin_thread forced to false), I think that it can be something like that, but not exactly the same problem.

Not the same problem, but the same effect.
The service callback of the controller_manager passes data to the real-time system (run with update) and blocks until the request was processed by the loop.
So the update must be independent from the ROS callback system.

Since you are running an AsyncSpinner, you can run the update in the main thread.
Just like I did it in PickNikRobotics/ros_control_boilerplate@7199b3c.

@alextoind
Copy link

Thank you @ipa-mdl!

And what about a dedicated Callback Queue and AsyncSpinner pair to handle the controller_manager update loop? Would it be ok, or it may suffer the same deadlock problem?

An improved doc as suggested in #330 would be great for people like me!

@mathias-luedtke
Copy link
Contributor

mathias-luedtke commented May 2, 2018

And what about a dedicated Callback Queue and AsyncSpinner pair to handle the controller_manager update loop? Would it be ok, or it may suffer the same deadlock problem?

This should work. But I don't see any benefit, especially if the main ends with ros::waitForShutdown().

@alextoind
Copy link

Just to be sure I had got it correctly ;)

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants