Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential resource deadlock of NodeSate while destorying timeSource #1865

Closed

Conversation

Barry-Xu-2018
Copy link
Collaborator

Address issue #1861

@@ -591,6 +591,8 @@ bool TimeSource::clock_thread_is_joinable()

TimeSource::~TimeSource()
{
// Make sure clock subscriber thread is terminated at this time
node_state_->detachNode();
Copy link
Contributor

@ralph-lange ralph-lange Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
node_state_->detachNode();
node_state_.reset();

For consistency with TimeSource::detachNode(), I propose to rely on NodeState's dtor, which will call NodeState::detachNode() and will be called since there is only this one shared pointer to it, or? But even this line is redundant since node_state_ will be reset automatically in ~TimeSource().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing, using node_state_->detachNode() fixes the problem, while node_state_.reset() does not. I think it has to do with the lifetime of objects, specifically when passing the pointers into lambda captures.

That said, you are right, this shouldn't be necessary at all. In point of fact, looking at this code it is unnecessarily complicated. We really don't need to use shared and weak pointers for most of these things; if we manage the ClocksState as part of the NodeState, then we can just use normal object lifetimes. I made a set of changes that do this on this branch: https://github.com/clalancette/rclcpp/tree/clalancette/time-source-cleanups

It's a bigger change than this, but it cleans up the code pretty nicely and it solves the problem for me. @Barry-Xu-2018 can you take a look and let me know what you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ralph-lange

Thanks for your comments.
I find a bug which is relevant to NodeState's dtor. (Refer to #1861 (comment)). At this case, after TimeSource is destroyed (Normally, internal NodeState should be destroyed too), NodeState is alive. So I add node_state_->detachNode() to make sure clock subscriber thread must be terminated at this time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clalancette

We really don't need to use shared and weak pointers for most of these things; if we manage the ClocksState as part of the NodeState, then we can just use normal object lifetimes. I made a set of changes that do this on this branch: https://github.com/clalancette/rclcpp/tree/clalancette/time-source-cleanups

Completely agree. I also think my solution isn't good solution.

It's a bigger change than this, but it cleans up the code pretty nicely and it solves the problem for me.
@Barry-Xu-2018 can you take a look and let me know what you think?

I quickly read your code.
I have one comments in detachNode().
https://github.com/clalancette/rclcpp/blob/a68d7519147f9ae38416d8a61dffc8e39ead0c1a/rclcpp/src/rclcpp/time_source.cpp#L282-L294
I think clocks_state_.disable_ros_time(); should be moved behind destroy_clock_sub();.
It is because we not sure whether clock_cb() is executed at disabling ros time.

  void clock_cb(std::shared_ptr<const rosgraph_msgs::msg::Clock> msg)
  {
    if (!clocks_state_.is_ros_time_active() && SET_TRUE == this->parameter_state_) {
      clocks_state_.enable_ros_time();
    }
    // Cache the last message in case a new clock is attached.
    clocks_state_.cache_last_msg(msg);
   ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Barry-Xu-2018, @clalancette Thank you very much for the additional explanation.
@clalancette I've reviewed your bigger change and like it very much. The structure (object lifetimes, pointers) is much cleaner now. First, I had some doubts whether passing the this pointer of NodeState (instead of a shared_ptr) to the lambda expression for the subscription may cause new problems because of the risk that the clock_executor_ still accesses the the object while it is already being cleaned up. But of course this cannot happen if destroy_clock_sub() is called as first step of clean up. I propose to add a prominent note in detachNode or ~NodeState to ensure that this is not changed accidentally in the future.

@clalancette
Copy link
Contributor

Closing in favor of #1867.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants