-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential resource deadlock of NodeSate while destorying timeSource #1865
Fix potential resource deadlock of NodeSate while destorying timeSource #1865
Conversation
Signed-off-by: Barry Xu <barry.xu@sony.com>
@@ -591,6 +591,8 @@ bool TimeSource::clock_thread_is_joinable() | |||
|
|||
TimeSource::~TimeSource() | |||
{ | |||
// Make sure clock subscriber thread is terminated at this time | |||
node_state_->detachNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node_state_->detachNode(); | |
node_state_.reset(); |
For consistency with TimeSource::detachNode()
, I propose to rely on NodeState's dtor, which will call NodeState::detachNode()
and will be called since there is only this one shared pointer to it, or? But even this line is redundant since node_state_
will be reset automatically in ~TimeSource()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing, using node_state_->detachNode()
fixes the problem, while node_state_.reset()
does not. I think it has to do with the lifetime of objects, specifically when passing the pointers into lambda captures.
That said, you are right, this shouldn't be necessary at all. In point of fact, looking at this code it is unnecessarily complicated. We really don't need to use shared and weak pointers for most of these things; if we manage the ClocksState as part of the NodeState, then we can just use normal object lifetimes. I made a set of changes that do this on this branch: https://github.com/clalancette/rclcpp/tree/clalancette/time-source-cleanups
It's a bigger change than this, but it cleans up the code pretty nicely and it solves the problem for me. @Barry-Xu-2018 can you take a look and let me know what you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments.
I find a bug which is relevant to NodeState's dtor. (Refer to #1861 (comment)). At this case, after TimeSource is destroyed (Normally, internal NodeState should be destroyed too), NodeState is alive. So I add node_state_->detachNode()
to make sure clock subscriber thread must be terminated at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really don't need to use shared and weak pointers for most of these things; if we manage the ClocksState as part of the NodeState, then we can just use normal object lifetimes. I made a set of changes that do this on this branch: https://github.com/clalancette/rclcpp/tree/clalancette/time-source-cleanups
Completely agree. I also think my solution isn't good solution.
It's a bigger change than this, but it cleans up the code pretty nicely and it solves the problem for me.
@Barry-Xu-2018 can you take a look and let me know what you think?
I quickly read your code.
I have one comments in detachNode().
https://github.com/clalancette/rclcpp/blob/a68d7519147f9ae38416d8a61dffc8e39ead0c1a/rclcpp/src/rclcpp/time_source.cpp#L282-L294
I think clocks_state_.disable_ros_time();
should be moved behind destroy_clock_sub();
.
It is because we not sure whether clock_cb() is executed at disabling ros time.
void clock_cb(std::shared_ptr<const rosgraph_msgs::msg::Clock> msg)
{
if (!clocks_state_.is_ros_time_active() && SET_TRUE == this->parameter_state_) {
clocks_state_.enable_ros_time();
}
// Cache the last message in case a new clock is attached.
clocks_state_.cache_last_msg(msg);
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Barry-Xu-2018, @clalancette Thank you very much for the additional explanation.
@clalancette I've reviewed your bigger change and like it very much. The structure (object lifetimes, pointers) is much cleaner now. First, I had some doubts whether passing the this pointer of NodeState (instead of a shared_ptr) to the lambda expression for the subscription may cause new problems because of the risk that the clock_executor_
still accesses the the object while it is already being cleaned up. But of course this cannot happen if destroy_clock_sub()
is called as first step of clean up. I propose to add a prominent note in detachNode
or ~NodeState
to ensure that this is not changed accidentally in the future.
Closing in favor of #1867. |
Address issue #1861