Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve startup race condition for sim time #608

Merged
merged 12 commits into from
Dec 12, 2018
Merged

Conversation

tfoote
Copy link
Contributor

@tfoote tfoote commented Dec 11, 2018

Fixes #595

A race condition at startup was identified in #595 where the clock and parameter interfaces needed each other to initialize and since they could not rely on both being initialized the loop was closed through sending and receiving ROS messages. This however meant that initialization was dependent on messages being passed through ROS topics before time would be initialized correctly. The solution to this is to break apart the clock dependency into the time source and clock separately, like we already do for the underlying data and this breaks the cycle.

When reviewing this there's an extension which would be a good improvement to not rely on parameter events being published but use the internal node interface API to get the events in a callback. However as we're late in the release cycle I'm going to hold off on fixing that is this is functional just not optimal and would require a noteably large API change. Along with that change if there was a more fully featured API for registering callbacks for parameter changes locally, the parameter event publishing could also be split out into a separate component that could obviate the need for a parameter to turn that functionality on and off in the constructor of the parameter, and avoid the dependencies on the clock and topics interface for the core parameter API.

Second fix for #595
This was causing it to require /clock topic data or a parameter change becaue the clock was attached after the node was attached.
Doing it in the other order would have worked.

Also homogenizing the behavior to use the last recieved value otherwise zero time when enabling sim time.
@tfoote tfoote added the in progress Actively being worked on (Kanban column) label Dec 11, 2018
@tfoote tfoote self-assigned this Dec 11, 2018
@tfoote
Copy link
Contributor Author

tfoote commented Dec 11, 2018

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@tfoote tfoote added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Dec 11, 2018
@tfoote tfoote changed the title Resolve startup race condition Resolve startup race condition for sim time Dec 11, 2018
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

rclcpp/include/rclcpp/node_interfaces/node_time_source.hpp Outdated Show resolved Hide resolved
rclcpp/src/rclcpp/time_source.cpp Outdated Show resolved Hide resolved
@tfoote
Copy link
Contributor Author

tfoote commented Dec 11, 2018

Resolved comments and rerunning CI for mac and windows. The windows tests looked to be a bunch from other issues.

  • macOS Build Status
  • Windows Build Status

@ros-pull-request-builder retest this please

@chapulina
Copy link
Contributor

The original sim time test in gazebo_ros_pkgs passes all expectations for me with this branch, but it hangs in the end after printing free(): invalid pointer, which doesn't happen with this repo's master branch.

@chapulina
Copy link
Contributor

After debugging a bit with @tfoote, it looks that my problem may be coming from having rclcpp both from debs and from source installed, so they may be conflicting with each other. We ran the exact same test under gazebo_ros_pkgs and under rclcpp, and during the latter nothing bad happens, which suggests the issue is in how gazebo_ros_pkgs is being compiled against rclcpp.

I'll make a PR changing the sim time test to work with this and we can run CI with the combination.

@tfoote
Copy link
Contributor Author

tfoote commented Dec 12, 2018

This PR has been tested as working against the new navigation stack by @crdelsey #595 (comment)

@@ -0,0 +1,50 @@
// Copyright 2017 Open Source Robotics Foundation, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2018?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could probably still be fixed

{

/// Pure virtual interface class for the NodeTimeSource part of the Node API.
class NodeTimeSourceInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might need a virtual destructor to make sure ~NodeTimeSource() gets called when the std::shared_ptr<NodeTimeSourceInterface> on the node is destructed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is probably true as well for all the other interface classes too. I went over it with @wjwwood and it looks like virtual default constructors are the best approch. We're clearly not using it right now but in the future not having them will cause warnings or errors.

@tfoote
Copy link
Contributor Author

tfoote commented Dec 12, 2018

Added virtual destructors for all interfaces in a7de819
Rerunning CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status


RCLCPP_PUBLIC
virtual
~NodeBaseInterface() = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The destructor is conventionally at the top of the class just after any constructors.

* move destructors to top of class
@tfoote
Copy link
Contributor Author

tfoote commented Dec 12, 2018

Retrying windows after merging refactor

  • Windows Build Status

@@ -0,0 +1,50 @@
// Copyright 2017 Open Source Robotics Foundation, Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could probably still be fixed

@tfoote
Copy link
Contributor Author

tfoote commented Dec 12, 2018

@ros-pull-request-builder retest this please

@tfoote tfoote merged commit c93beb5 into master Dec 12, 2018
@tfoote tfoote removed the in review Waiting for review (Kanban column) label Dec 12, 2018
@tfoote tfoote deleted the separate_time_source branch December 12, 2018 19:53
cho3 pushed a commit to cho3/rclcpp that referenced this pull request Jun 3, 2019
Resolves ros2#595 

* Separate the Node Time Source from the Node Clock
* Implement initial value checking of use_sim_time parameter parameter
* Be sure to update all newly attached clocks
* Homogenizing the behavior to use the last received value otherwise zero time when enabling sim time.
* Add virtual destructors to interface classes
nnmm pushed a commit to ApexAI/rclcpp that referenced this pull request Jul 9, 2022
Signed-off-by: Mikael Arguedas <mikael.arguedas@gmail.com>
DensoADAS pushed a commit to DensoADAS/rclcpp that referenced this pull request Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'use_sim_time' only works when set dynamically, not through launch parameters
5 participants