fuse -> ROS 2 fuse_core : Nodes and Waitables #284

methylDragon · 2022-11-07T23:19:48Z

See: #276
This depends on: #283

When merging this, @methylDragon should check that it doesn't lead to Time regressions...

Description

This PR ports the ROS 1 nodes to ROS 2 node pointers, and also uses Brett's solution for CallbackWrappers and Waitables to support async behavior.

Whether it works or not remains to be seen, we'll likely need to circle back if we see deadlocks happening.

Some node handles have been replaced with node pointers or node interfaces. A future PR will introduce the concept of a NodeInterfaceHandle and change the signatures to support it.

PR Layout

Because I was cherry-picking Brett's commits here, I'd recommend reviewing the PR by looking at the merged changes instead of per commit.

Additional Notes

All subordinate plugins (e.g. sensors, motion models, etc.) now start an internal node using the default global context. We need to first make sure that configuration works first before we try to improve it, so it's a place that's likely to be circled back to.. Hopefully...

Pinging @svwilliams for visibility.

Signed-off-by: methylDragon <methylDragon@gmail.com> Authored-by: Brett Downing

Signed-off-by: methylDragon <methylDragon@gmail.com>

methylDragon · 2022-11-07T23:47:07Z

fuse_core builds if #283 is merged and this is rebased (with conflicts resolved)
I'm going to mark this PR as ready.

sloretz · 2022-11-09T22:50:09Z

fuse_core builds if #283 is merged and this is rebased (with conflicts resolved)

🎉 That's great! That means it's a great time to start re-enabling tests. I opened a PR against the ros-params branch to enable the gtest ones in methylDragon#1 . Once #283 merges, mind cherry-picking that commit to this branch?

Is enough of fuse_core ported with this PR to enable all of the tests? (the rest of the tests will need to be migrated to launch_pytest).

methylDragon · 2022-11-09T23:16:18Z

fuse_core builds if #283 is merged and this is rebased (with conflicts resolved)

tada That's great! That means it's a great time to start re-enabling tests. I opened a PR against the ros-params branch to enable the gtest ones in methylDragon#1 . Once #283 merges, mind cherry-picking that commit to this branch?

Is enough of fuse_core ported with this PR to enable all of the tests? (the rest of the tests will need to be migrated to launch_pytest).

This PR should migrate everything except ceres options (and its interactions with ROS 2 parameters)
I was gonna get the params first, but that might take awhile, so enabling tests first and fixing them probably makes more sense

fuse_core/CMakeLists.txt

fuse_core/include/fuse_core/async_motion_model.h

sloretz · 2022-11-09T23:12:33Z

fuse_core/include/fuse_core/async_motion_model.h

-  ros::NodeHandle private_node_handle_;  //!< A node handle in the private namespace using the local callback queue
-  ros::AsyncSpinner spinner_;  //!< A single/multi-threaded spinner assigned to the local callback queue
-
+  rclcpp::Node::SharedPtr node_;  //!< The node for this motion model


It looks like the node is unused except for adding the waitable to the executor.

Maybe we could pass the node or node interfaces via MotionModel::Initialize()? That eliminates the extra context and node, and it looks doable higher in the stack where MotionModel::initialize() is called by an Optimizer which does have a Node.

The node here is mostly used to structure parameters in the config files
It's reasonable to run the graph with dozens of named motion_models, and each one needs a unique name and parameters, each motion model can be associated with a different physical component that may or may not be included in any given deployment.

The separated node also allows complex motion_models (usually associated with a single robot chassis) to be targeted by ros cli tools as a separate entity in the ROS graph

@BrettRD would you say that it's preferable then to keep the nodes separated? This was one of the places I planned to circle back on to try to combine it all into a single node (with the iffy bit being what happens when all the callback queues are merged), but the point you raised about CLI tools is pretty valid..

I'm not sure of the implications on lifecycle nodes and node components though.

I don't know what's better. (edit: it's a pretty huge API divergence to change the shape of Fuse on the node graph)

Have a think about sensor_models and fuse_publishers too, they will need to interact with sensor nodes and planning tools that will definitely appear under ros namespaces, so it would be nice for a plugin to exist in the node graph under the relevant namespace

One node wouldn't be a disaster.
You can still isolate device-specific configs into separate conf-only packages and load multiple config files into the optimiser node at launch.
However, if it's one node, it's going to make a blob in the node graph so dense it'll become a running joke. edit: (I have this exact problem exposing gstreamer properties on a pipeline node, it's not unmanageable)

I was expecting to run separate nodes and have to do some fancy footwork to allow fuse_optimiser to side-load its plugins into its host composable node container, and then have a lot of trouble getting access to the parameters.
Either that, or have fuse_optimiser inherit from container, and cause a headache when any other package tries to do the same
I haven't studied lifecycle nodes enough to comment there

I don't think the callback queues will be a problem; executors are built to cope with multiple queues, and most of the fuse plugins are sharing the one optimiser queue

The node here is mostly used to structure parameters in the config files
It's reasonable to run the graph with dozens of named motion_models, and each one needs a unique name and parameters, each motion model can be associated with a different physical component that may or may not be included in any given deployment.

I think we can still structure parameters in the config file by prepending name_ + '.param_name' to each parameter name.

The separated node also allows complex motion_models (usually associated with a single robot chassis) to be targeted by ros cli tools as a separate entity in the ROS graph

I can think of a few reasons to avoid multiple nodes, but I'm not familiar enough with fuse to judge how useful it would be to a node per async motion model/publisher/sensor model. @svwilliams what are your thoughts?

Reasons to avoid multiple nodes:

Nodes come with extra overhead like a graph guard condition, a rosout publisher, a parameter events publisher, and services for setting and getting parameters.

Atomically changing parameters is scoped to a single node. A single node would allow someone to change parameters to two motion models atomically, such that if one failed (maybe from an invalid value) none of the changes would go through

Building the nodes internally prevents choosing whether they use regular or lifecycle nodes.

This is where my lack of experience with ROS2 is going to hinder me offering sound advice.

I can discuss the design intent in ROS1 and see if that helps.

The motion models and sensors models act very node-like. They subscribe to sensor topics, process the data, and "publish" the results for consumption by the optimizer.

The reason the sensor models and motion models are not stand-alone nodes or nodelets in ROS1 is the each model "publishes" a derived class object instead of a message. Each model creates derived Constraints and Variables for use by the Optimizer. New user-derived models may create Constraints and Variables the Optimizer node cannot know about at compile-time. I could not think of an elegant solution to transmit derived objects using the standard pub-sub mechanisms in ROS1.

Using ROS1 plugins, however, I can define an interface to the Optimizer that sends derived object pointers around, so that was what I implemented.

But there is a downside to the plugin system. The Optimizer class creates the sensor and motion model instances as class variables. The Optimizer can call methods of the loaded plugins, so it would be easy to poll each loaded sensor and motion model for new data. But the sensor models and motion models are producing data for the Optimizer. In an ideal world, the models would decide for themselves when they had new data and would push that into the Optimizer. It is difficult for the plugins to call methods of the Optimizer.

To solve the polling/pushing issues, there are a series callbacks registered between the Optimizer and the Models. This results in rather complex threading implications.

To hide all all of that complexity, the AsyncModels each run their own thread(s) and callback spinner. This makes them act a lot like a conventional ROS node or nodelet. Each derived Model can define the number of threads used to service its callbacks, it gets a private node handle in its namespace for parameter reading, etc.

Does any of that help?

fuse_core/include/fuse_core/callback_wrapper.h

sloretz · 2022-11-09T23:21:19Z

fuse_core/include/fuse_core/callback_wrapper.h

+
+
+private:
+  std::recursive_mutex reentrant_mutex_;  //!< mutex to allow this callback to be added to multiple callback groups simultaneously


Does "multiple callback groups simultaneously" mean adding the same Waitable instance to multiple executors? That sounds undesirable. If two executors are trying to execute the same thing at the same time then one is going to be blocked, when it could have been executing something else.

Can this be removed?

the comment is mistaken, (oops)
reentrant_mutex is called to protect rcl_wait_set_add_guard_condition, which inserts a pointer into a list of pointers and is not thread-safe. This allows multiple threads to insert callbacks into one queue without trashing RCL's internal data structures

queue_mutex is the one you're worried about, it allows multiple executors to pull jobs from the queue.
If someone managed to put the wrapper into multiple executors without the mutex, it would be a nightmare to debug.

The lock is released before the callback is run. An executor will only be blocked by the time it takes for another executor to pull a pointer from a queue.

fuse/fuse_core/src/callback_wrapper.cpp

Lines 102 to 112 in ee39c45

{

std::lock_guard<std::recursive_mutex> lock(queue_mutex_);

if(!callback_queue_.empty()){

cb_wrapper = callback_queue_.front();

callback_queue_.pop_front();

}

}

//the lock is released and the callback is no longer associated with the queue, run it.

if(cb_wrapper) {

cb_wrapper->call();

}

Executors won't try to execute the same code twice, and they won't wait for a callback to finish either.

edit: grammar

Updated the comment fb52543

fuse_core/src/callback_wrapper.cpp

Signed-off-by: methylDragon <methylDragon@gmail.com>

…lling-waitables Signed-off-by: methylDragon <methylDragon@gmail.com> Co-Authored-by: Brett Downing

sloretz

In-progress review. Github is now showing all the Time changes in this PR. I'll try an old trick of changing the base and back to see if that fixes it.

fuse_core/src/callback_wrapper.cpp

Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>

Signed-off-by: methylDragon <methylDragon@gmail.com>

fuse_core/src/async_motion_model.cpp

sloretz · 2022-11-11T23:09:29Z

fuse_core/src/async_motion_model.cpp

+  );
+  auto result = callback->getFuture();
+  callback_queue_->addCallback(callback);
+  result.wait();


I think the intent of the separate queue and executor is to not wait for the result here.

I removed it everywhere where it made sense to d3b6c94

There is only one wait() remaining, in bool AsyncMotionModel::apply(Transaction& transaction)

I commented them out and added a TODO though, just in case we need to wait due to race conditions/etc. when everything compiles

The futures and waits are all about thread management.

The AsyncMotionModel is running its own callback spinner thread. So if a derived implementation created a Timer or subscribed to a sensor topic, those callbacks will run in the AsyncMotionModel spinner thread. However, the apply() method here is going to be called by the Optimizer, and hence it will run in the Optimizer's thread. This means any derived motion model must be very careful about thread synchronization. And it is not very obvious what callbacks will run in what threads. To make it easier to write derived motion model classes, I implemented the following:

The apply() method will be called by the Optimizer thread

bool AsyncMotionModel::apply(Transaction& transaction) {

Wrap the user-defined implementation, applyCallback(), inside a promise.

auto callback = boost::make_shared<CallbackWrapper<bool>>( std::bind(&AsyncMotionModel::applyCallback, this, std::ref(transaction)));

Create a future from the wrapper so that the apply() function may receive the results of the applyCallback() function.

auto result = callback->getFuture();

Insert the applyCallback() call into the callback queue of the AsyncMotionModel. This means that the applyCallback() method will be executed by the AsyncMotionModel spinner thread.

callback_queue_.addCallback(callback, reinterpret_cast<uint64_t>(this));

Wait for the AsyncMotionModel spinner thread to complete the applyCallback() call.

result.wait();

Return the results of the applyCallback() method out of the apply() method.

return result.get();

This was done so that the user-implementation of the apply() business logic, implemented in the applyCallback() method, will execute synchronously with any other callbacks (timers, subscriptions, services, etc) defined in the motion model. Typically the author of a derived AsyncMotionModel does not need to implement any mutexes or other thread synchronization mechanisms even though the execution of a motion model involves multiple threads created in different parts of the fuse ecosystem.

fuse_core/src/async_publisher.cpp

fuse_core/src/async_sensor_model.cpp

fuse_core/src/callback_wrapper.cpp

Signed-off-by: methylDragon <methylDragon@gmail.com>

fuse_core/include/fuse_core/callback_wrapper.h

fuse_core/src/async_motion_model.cpp

fuse_core/src/async_publisher.cpp

fuse_core/src/async_sensor_model.cpp

fuse_core/src/callback_wrapper.cpp

Signed-off-by: methylDragon <methylDragon@gmail.com>

sloretz

Changes LGTM (noting TODO to decide on multiple nodes vs single node).

Builds locally and all tests pass except for the linters.

methylDragon and others added 7 commits November 7, 2022 13:57

build a callback queue as a rclcpp::Waitable

307262a

Signed-off-by: methylDragon <methylDragon@gmail.com> Authored-by: Brett Downing

move optimisers to rclcpp

2acc6c0

Signed-off-by: methylDragon <methylDragon@gmail.com> Authored-by: Brett Downing

compile callback_wrapper

5526ab0

Signed-off-by: methylDragon <methylDragon@gmail.com> Authored-by: Brett Downing

Using new callback wrapper in AsyncMotionModel

81aead6

Signed-off-by: methylDragon <methylDragon@gmail.com> Authored-by: Brett Downing

patches async_motion_model to use CallbackAdapter

21c2134

Signed-off-by: methylDragon <methylDragon@gmail.com> Authored-by: Brett Downing

fixes link-time CallbackWrapper issue

9fca67e

makes async plugins use ros2 nodes

87bc05f

Signed-off-by: methylDragon <methylDragon@gmail.com> Authored-by: Brett Downing

methylDragon changed the title ~~fuse -> ROS 2 : Nodes and Waitables~~ fuse -> ROS 2 fuse_core : Nodes and Waitables Nov 7, 2022

Use global context for async nodes and fix callbacks

ee39c45

Signed-off-by: methylDragon <methylDragon@gmail.com>

methylDragon force-pushed the rolling-waitables branch from 677451b to ee39c45 Compare November 7, 2022 23:46

methylDragon marked this pull request as ready for review November 7, 2022 23:47

methylDragon requested a review from sloretz November 7, 2022 23:47

methylDragon mentioned this pull request Nov 7, 2022

Fuse -> ROS 2 Rolling Porting Progress #276

Closed

54 tasks

sloretz reviewed Nov 9, 2022

View reviewed changes

Update callback wrappers

fb52543

Signed-off-by: methylDragon <methylDragon@gmail.com>

methylDragon force-pushed the rolling-waitables branch from 1957430 to fb52543 Compare November 10, 2022 23:05

methylDragon added 3 commits November 10, 2022 15:15

Get context from node to check for rclcpp::ok

ba64b63

Signed-off-by: methylDragon <methylDragon@gmail.com>

Merge branch 'rolling-waitables' into rolling-params

54f16b3

Merge remote-tracking branch 'methylDragon/rolling-waitables' into ro…

72fe062

…lling-waitables Signed-off-by: methylDragon <methylDragon@gmail.com> Co-Authored-by: Brett Downing

sloretz reviewed Nov 11, 2022

View reviewed changes

fuse_core/src/callback_wrapper.cpp Outdated Show resolved Hide resolved

fuse_core/src/callback_wrapper.cpp Outdated Show resolved Hide resolved

fuse_core/src/callback_wrapper.cpp Outdated Show resolved Hide resolved

fuse_core/src/callback_wrapper.cpp Outdated Show resolved Hide resolved

sloretz changed the base branch from rolling to devel November 11, 2022 00:40

sloretz changed the base branch from devel to rolling November 11, 2022 00:40

sloretz and others added 2 commits November 10, 2022 16:40

Add gtest tests

282275c

Signed-off-by: Shane Loretz <sloretz@osrfoundation.org>

Refine mutex use

cdeca18

Signed-off-by: methylDragon <methylDragon@gmail.com>

sloretz reviewed Nov 11, 2022

View reviewed changes

methylDragon added 2 commits November 11, 2022 15:29

Add TODOs

7f463a9

Signed-off-by: methylDragon <methylDragon@gmail.com>

Fix bugs and add node to executor

b7480f8

Signed-off-by: methylDragon <methylDragon@gmail.com>

sloretz reviewed Nov 11, 2022

View reviewed changes

Remove wait after callbacks are added for graphCallback

3e38f8d

Signed-off-by: methylDragon <methylDragon@gmail.com>

methylDragon force-pushed the rolling-waitables branch from d3b6c94 to 3e38f8d Compare November 12, 2022 01:07

sloretz approved these changes Nov 12, 2022

View reviewed changes

methylDragon merged commit 9f9395e into locusrobotics:rolling Nov 12, 2022

methylDragon deleted the rolling-waitables branch November 12, 2022 02:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fuse -> ROS 2 fuse_core : Nodes and Waitables #284

fuse -> ROS 2 fuse_core : Nodes and Waitables #284

methylDragon commented Nov 7, 2022 •

edited

methylDragon commented Nov 7, 2022 •

edited

sloretz commented Nov 9, 2022

methylDragon commented Nov 9, 2022

sloretz Nov 9, 2022

BrettRD Nov 10, 2022

methylDragon Nov 10, 2022

BrettRD Nov 11, 2022 •

edited

sloretz Nov 11, 2022

svwilliams Nov 16, 2022

sloretz Nov 9, 2022

BrettRD Nov 10, 2022 •

edited

methylDragon Nov 10, 2022

sloretz left a comment

sloretz Nov 11, 2022

methylDragon Nov 11, 2022 •

edited

svwilliams Nov 16, 2022

sloretz left a comment



		private:
		std::recursive_mutex reentrant_mutex_; //!< mutex to allow this callback to be added to multiple callback groups simultaneously

	{
	std::lock_guard<std::recursive_mutex> lock(queue_mutex_);
	if(!callback_queue_.empty()){
	cb_wrapper = callback_queue_.front();
	callback_queue_.pop_front();
	}
	}
	//the lock is released and the callback is no longer associated with the queue, run it.
	if(cb_wrapper) {
	cb_wrapper->call();
	}

fuse -> ROS 2 fuse_core : Nodes and Waitables #284

fuse -> ROS 2 fuse_core : Nodes and Waitables #284

Conversation

methylDragon commented Nov 7, 2022 • edited

Description

PR Layout

Additional Notes

methylDragon commented Nov 7, 2022 • edited

sloretz commented Nov 9, 2022

methylDragon commented Nov 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BrettRD Nov 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BrettRD Nov 10, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sloretz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

methylDragon Nov 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sloretz left a comment

Choose a reason for hiding this comment

methylDragon commented Nov 7, 2022 •

edited

methylDragon commented Nov 7, 2022 •

edited

BrettRD Nov 11, 2022 •

edited

BrettRD Nov 10, 2022 •

edited

methylDragon Nov 11, 2022 •

edited