Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in callback_groups_for_each. #1723

Merged
merged 2 commits into from
Jul 23, 2021

Conversation

clalancette
Copy link
Contributor

The main reason to add this method in is to make accesses to the
callback_groups_ vector thread-safe. By having a
callback_groups_for_each that accepts a std::function, we can
just have the callers give us the callback they are interested
in, and we can take care of the locking.

The rest of this fairly large PR is cleaning up all of the places
that use get_callback_groups() to instead use
callback_groups_for_each().

Signed-off-by: Chris Lalancette clalancette@openrobotics.org

This is the solution to the problem identified in ros-simulation/gazebo_ros_pkgs#1296 . In short, when using a thread to deal with use_sim_time, it is the case that the callbacks_group_ vector can be accessed by two threads simultaneously. In certain circumstances (particularly startup time), this can lead to a crash. By doing proper locking, we avoid the crash.

There are two other things that I'd like to discuss in relation to this:

  1. Should we remove the now-unsafe get_callback_groups() method? I have not done that here, but I really think we should. It is not currently possible to access the callback_groups_ vector in a thread-safe way. Alternatively we could deprecate it, but given that there is really no safe way to use it I'd rather just remove it.
  2. When adding the callback_groups_for_each() method, I replicated the previous structure where there is a method on both the NodeBase and Node objects. The latter is a thin wrapper around the former. But it seems redundant; Node has a public method to get the NodeBase, and NodeBase has the public method callback_groups_for_each(). So anyplace that currently does node->callback_groups_for_each() could be trivially changed to node->node_base()->callback_groups_for_each(). I'd opt for removing the method from the Node class, but I'm interested in what others think

Questions, comments, and reviews are all very welcome.

@clalancette
Copy link
Contributor Author

clalancette commented Jul 22, 2021

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Copy link
Member

@ivanpauno ivanpauno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with green CI.

I have only left some minor comments

rclcpp/include/rclcpp/node.hpp Outdated Show resolved Hide resolved
@@ -2733,13 +2733,33 @@ TEST_F(TestNode, get_publishers_subscriptions_info_by_topic) {

TEST_F(TestNode, callback_groups) {
auto node = std::make_shared<rclcpp::Node>("node", "ns");
size_t num_callback_groups_in_basic_node = node->get_callback_groups().size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could still use get_callback_groups().size() here.
If we want to delete get_callback_groups(), which is a good idea IMO, we could add a callback_groups_size() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could. Right now we only seem to get the size of the callback group in tests, so I'm OK with the more verbose way of doing it. In the future if we decide we need it in more places, we can add it in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it would be a good idea to get rid of get_callback_groups() in its current form, because a) it would force users of it that are outside of our code base to consider if they're using it in a thread-unsafe way and b) because we could provide a const method that returns a copy of the callback groups vector instead, which could be used for size() and other things, which I would prefer over a dedicated get_callback_groups_size().

@ivanpauno
Copy link
Member

(a bit unrelated)

Why are we storing the "callback groups" that we don't need to automatically add to the executor in the node? It seems to be unnecessary.
In that case, it feels that we don't even need to call something like node->create_callback_group(), but we should be able to construct the CallbackGroup directly.

@clalancette clalancette force-pushed the clalancette/thread-safe-callback-groups branch from 1e6770f to 551dbcf Compare July 22, 2021 14:11
@clalancette
Copy link
Contributor Author

Why are we storing the "callback groups" that we don't need to automatically add to the executor in the node? It seems to be unnecessary.

I'm not sure I totally understand the question. As far as I can tell, we need to store each callback group we create so that we can later iterate over them, either via callback_groups_for_each or for callback_group_in_node.

Or did you have a different way in mind to find and iterate over the callback groups?

@ivanpauno
Copy link
Member

ivanpauno commented Jul 22, 2021

Or did you have a different way in mind to find and iterate over the callback groups?

AFAIS, in all places where we iterate over callback groups we're skipping the ones that should not be automatically added to the executor, e.g.

if (shared_group_ptr->automatically_add_to_executor_with_node() &&
.
It seems that the only reason that we had for storing a callback groups in Node was for adding callback groups to the executor, so I don't see why we also need to store the ones we want to handle separately.

edit: The callback_group_in_node() check should be simply deleted in that case.

@clalancette
Copy link
Contributor Author

It seems that the only reason that we had for storing a callback groups in Node was for adding callback groups to the executor, so I don't see why we also need to store the ones we want to handle separately.

edit: The callback_group_in_node() check should be simply deleted in that case.

Ah, I see what you mean.

I agree that we should maybe re-examine how this is done. That said, I don't necessarily want to do it in here; this PR is already large, and if we decide to go ahead and remove get_callback_groups, it is going to get even larger.

What I'll do is to open a separate issue to address that, and we can talk about it there.

rclcpp/include/rclcpp/node.hpp Outdated Show resolved Hide resolved
rclcpp/src/rclcpp/executor.cpp Show resolved Hide resolved
rclcpp/src/rclcpp/node_interfaces/node_base.cpp Outdated Show resolved Hide resolved
@@ -2733,13 +2733,33 @@ TEST_F(TestNode, get_publishers_subscriptions_info_by_topic) {

TEST_F(TestNode, callback_groups) {
auto node = std::make_shared<rclcpp::Node>("node", "ns");
size_t num_callback_groups_in_basic_node = node->get_callback_groups().size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it would be a good idea to get rid of get_callback_groups() in its current form, because a) it would force users of it that are outside of our code base to consider if they're using it in a thread-unsafe way and b) because we could provide a const method that returns a copy of the callback groups vector instead, which could be used for size() and other things, which I would prefer over a dedicated get_callback_groups_size().

@wjwwood
Copy link
Member

wjwwood commented Jul 23, 2021

I also think we might not need to store the callback groups that aren't automatically added to an executor, and that they could even be constructed directly.

@wjwwood
Copy link
Member

wjwwood commented Jul 23, 2021

The only thing that might be affected is that notifying any executor that the callback group is associated with that something has been added/removed (the executor needs to interrupt itself in this case to rebuild the wait set), sometimes goes through the node, but again that might be only if the automatically_add_to_executor_with_node is true.

The main reason to add this method in is to make accesses to the
callback_groups_ vector thread-safe.  By having a
callback_groups_for_each that accepts a std::function, we can
just have the callers give us the callback they are interested
in, and we can take care of the locking.

The rest of this fairly large PR is cleaning up all of the places
that use get_callback_groups() to instead use
callback_groups_for_each().

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
Callers should change to using for_each_callback_group(), or
store the callback groups they need internally.

Signed-off-by: Chris Lalancette <clalancette@openrobotics.org>
@clalancette
Copy link
Contributor Author

All right. I've responded to all of the inline comments, I believe. I also rebased this onto the latest to get #1726 so windows CI will pass. Finally, I've removed get_callback_groups in a separate commit. That change also requires a small PR to the examples repository, but otherwise requires no further changes in the core.

@clalancette
Copy link
Contributor Author

Full CI (including ros2/examples#320):

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@clalancette
Copy link
Contributor Author

The failing test on Windows is flakey; it also failed on Windows in https://ci.ros2.org/view/nightly/job/nightly_win_rep/2312. So this is otherwise green. Since I also have approvals, I'm going to go ahead and merge this one along with ros2/examples#320 .

@clalancette clalancette merged commit 133088e into master Jul 23, 2021
@delete-merged-branch delete-merged-branch bot deleted the clalancette/thread-safe-callback-groups branch July 23, 2021 20:55
@wjwwood
Copy link
Member

wjwwood commented Jul 23, 2021

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants