New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix race conditions when updating PlanningScene #232
Conversation
res.error_code.val = moveit_msgs::MoveItErrorCodes::TIMED_OUT; | ||
else | ||
res.error_code.val = moveit_msgs::MoveItErrorCodes::CONTROL_FAILED; | ||
res.error_code.val = moveit_msgs::MoveItErrorCodes::CONTROL_FAILED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simply reformatting nested if's into a linear structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes like this should be in a separate PR
@@ -322,6 +322,7 @@ void move_group::MoveGroupPickPlaceAction::executePickupCallback(const moveit_ms | |||
{ | |||
setPickupState(PLANNING); | |||
|
|||
context_->planning_scene_monitor_->waitForCurrentRobotState(ros::Time::now()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we start planning, ensure that we have the latest robot state received...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your Github comment should be an inline comment instead :)
@@ -199,6 +208,7 @@ class CurrentStateMonitor | |||
ros::Time last_tf_update_; | |||
|
|||
mutable boost::mutex state_update_lock_; | |||
mutable boost::condition_variable state_update_condition_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces an ABI incompatibility.
@@ -878,7 +878,7 @@ bool TrajectoryExecutionManager::validate(const TrajectoryExecutionContext &cont | |||
ROS_DEBUG_NAMED("traj_execution", "Validating trajectory with allowed_start_tolerance %g", allowed_start_tolerance_); | |||
|
|||
robot_state::RobotStatePtr current_state; | |||
if (!csm_->waitForCurrentState(1.0) || !(current_state = csm_->getCurrentState())) | |||
if (!csm_->waitForCurrentState(ros::Time::now()) || !(current_state = csm_->getCurrentState())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, calling the new function waitForCurrentState() instead of the old one, which was renamed to waitForCompleteRobotState().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
skimming looks good to me but @v4hn is the one who needs to really review this
res.error_code.val = moveit_msgs::MoveItErrorCodes::TIMED_OUT; | ||
else | ||
res.error_code.val = moveit_msgs::MoveItErrorCodes::CONTROL_FAILED; | ||
res.error_code.val = moveit_msgs::MoveItErrorCodes::CONTROL_FAILED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
/** @brief Wait for at most \e wait_time seconds until the complete robot state is known. Return true if the full state is known */ | ||
bool waitForCompleteState(double wait_time) const; | ||
/// replaced by waitForCompleteState, will be removed in L-turtle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment should be /** */
for consistency
{ | ||
ROS_DEBUG_STREAM_NAMED("planning_scene_monitor", "last robot motion: " << (t-last_robot_motion_time_).toSec() << " ago"); | ||
new_scene_update_condition_.wait_for(lock, boost::chrono::nanoseconds(timeout.toNSec())); | ||
timeout -= ros::WallTime::now()-start; // compute remaining wait_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spacing
unlocking needs to be performed in reverse order of locking otherwise deadlocks are risked
…ue_) Due to an upstream bug, it's not possible to start multiple AsyncSpinners from different threads. Filed PR: ros/ros_comm#867 The spinner is now only needed to serve our own callback_queue_ for scene updates, which is only required for syncSceneUpdates() that syncs all kind of scene updates, not only the robot state.
…State() deprecated old functions, which should be removed in L-turtle
... because we might wait up to 1s for a robot state update
Rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very tricky PR to review - you have too many things changes here in a very complex way. please simplify. i also take issue with how you've renamed a function then created a new function with the same name - not good!
@@ -322,6 +322,7 @@ void move_group::MoveGroupPickPlaceAction::executePickupCallback(const moveit_ms | |||
{ | |||
setPickupState(PLANNING); | |||
|
|||
context_->planning_scene_monitor_->waitForCurrentRobotState(ros::Time::now()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your Github comment should be an inline comment instead :)
/** @brief Wait for at most \e wait_time seconds until the complete current state is known. Return true if the full state is known */ | ||
bool waitForCurrentState(double wait_time) const; | ||
/** @brief Wait for at most \e wait_time seconds for a robot state more recent than t */ | ||
bool waitForCurrentState(const ros::Time t=ros::Time::now(), double wait_time=1) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces needed around =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 should be 1.0
or 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in comment add "(defaults to 1 sec)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in comment explain what the return value means - e.g. @return false indicates the current robot state at time \e t was not found within the \e wait_time
/** @brief Wait for at most \e wait_time seconds until the complete robot state is known. Return true if the full state is known */ | ||
bool waitForCompleteState(double wait_time) const; | ||
/** replaced by waitForCompleteState, will be removed in L-turtle */ | ||
MOVEIT_DEPRECATED bool waitForCurrentState(double wait_time) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you clarify in the comment if this new function behaves differently than this deprecated waitForCurrentState
?
@@ -344,6 +344,13 @@ class PlanningSceneMonitor : private boost::noncopyable | |||
/** @brief This function is called every time there is a change to the planning scene */ | |||
void triggerSceneUpdateEvent(SceneUpdateType update_type); | |||
|
|||
/** \brief Wait for robot state to become more recent than t. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"than time t"
* If there is no state monitor active, there will be no scene updates. | ||
* Hence, you can specify a timeout to wait for those updates. Default is 1s. | ||
*/ | ||
bool waitForCurrentRobotState(const ros::Time &t, double wait_time = 1.); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you deprecate the function named waitForCurrentState
in the current_state_monitor but here add one named that? Should it also be waitForCompleteState
here? I find your varying use of these two functions in current_state_monitor and planning_scene_monitor very confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained by @v4hn, we agreed to correctly rename the old function waitForCurrentRobotState -> waitForCompleteRobotState to reflect its actual semantics.
The new function waitForCurrentRobotState actually does what its name states: waiting for an update of the current robot state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that makes sense, its just hard to review
|
||
/* If the robot doesn't move, we will never receive an update from CSM in planning scene. | ||
As we ensured that an update, if it is triggered by CSM, is directly passed to the scene, | ||
we can early return true here (if we successfully received a CSM update). Otherwise return false. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice comment!
// However, scene updates are only published if the robot actually moves. Hence we need a timeout! | ||
// As publishing planning scene updates is throttled (2Hz by default), a 1s timeout is a suitable default. | ||
boost::shared_lock<boost::shared_mutex> lock(scene_update_mutex_); | ||
ros::Time prev_robot_motion_time_ = last_robot_motion_time_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should not be an underscore at the end of this local variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only member variables should have an underscore in the end, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, and prev_robot_motion_time_
is a local variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I missed the not in your original comment.
if (octomap_monitor_) | ||
octomap_monitor_->getOcTreePtr()->unlockRead(); | ||
scene_update_mutex_.unlock_shared(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes like this should be in a separate PR
if self.group.execute(self.plan(target)): | ||
actual = np.asarray(self.group.get_current_joint_values()) | ||
self.assertTrue(np.allclose(target, actual, atol=1e-4, rtol=0.0)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this could be in a separate PR also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this? This is the unittest which tests the behavior fixed by this PR.
res.error_code.val = moveit_msgs::MoveItErrorCodes::TIMED_OUT; | ||
else | ||
res.error_code.val = moveit_msgs::MoveItErrorCodes::CONTROL_FAILED; | ||
res.error_code.val = moveit_msgs::MoveItErrorCodes::CONTROL_FAILED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes like this should be in a separate PR
I really want to get this merged in so we can release kinetic, so have attempted to review, above. Would greatly appreciate anyone who can test this PR on their physical robot setup. @Jntzko? |
Thanks dave!
Yes this is meddling with a core function, but I discussed that with @rhaschke in the old request already. |
I've tested this PR in indigo with the 9 commits from @rhaschke in the simulation. Therefore I made a small test case where the ur5 moves in a loop to four positions.
As @v4hn told me, this error should not appear with the fixes from the PR. To get this test running you need to check out the following repositories: The branch testFix#442 from: The test case: And also this repositories for our lab setup: Start the test with: |
I guess, to reproduce the test, I also would need to have a UR arm, wouldn't I? This test runs without problems for me with a simulation-only robot. |
On Mon, Oct 10, 2016 at 06:02:32AM -0700, Robert Haschke wrote:
No, he ran this in demo mode. That's why I'm pretty sure the problem lies in the synchronization. |
@rhaschke I've tested your code, also just in the simulation and I get the same "start point deviates from current robot state" error. |
This discussion on what tests you are using to check this PR inspired me to write a stub tutorial on the topic: moveit/moveit_tutorials#24 |
@Jntzko Sorry, but I cannot reproduce the error in Kinetic with my proposed unit test. Could you, please, send me the whole output of running the unit test?
|
I addressed most of the additional comments from @davetcoleman. Those, which are not addressed, I commented inline. I didn't separated stuff into individual PRs. They are nicely separated by individual commits. |
Agreed. However, I don't see a reason yet, why it should fail. Do you?
That's why I asked for the full log. |
This is a poor-man's-replacement for rhaschke's work that waits for the current robot state. This can be removed after his work is merged. (See #232) Without the additional sleep the automatic update of the start state will pick a point near the end of the executed trajectory instead of the current state. Let's give the monitor a moment to catch up.
@Jntzko Please send the full log of the unit test to allow me to further look into this. |
@rhaschke I have cherry picked your latest commits in my indigo branch too and ran the test multiple times. The following error does not appear in every run within the 30 movements, but it still appears every now and then(appeared in 2 of 3 runs):
I work with the Here's one of the rostest-log files where the error appeared: |
@rhaschke I ran your test on kinetic now and increased the number of movements to 100.
Here's the rostest-log file: |
I could indeed reproduce the problem in Kinetic too now. The update is missed rarely but sometimes. I think, the core issue is that the We have the same situation with fake controllers too: They publish to Hence, it might happen, that However, this is of course not a feasible solution. The core issue is that we cannot know the delay between the last controller command send by Hence, I suggest to drop this PR for Kinetic. However, I will file some parts of this PR as separate PRs - to be included in the Kinetic release. |
This is a poor-man's-replacement for rhaschke's work that waits for the current robot state. This can be removed after his work is merged. (See #232) Without the additional sleep the automatic update of the start state will pick a point near the end of the executed trajectory instead of the current state. Let's give the monitor a moment to catch up.
This is a poor-man's-replacement for rhaschke's work that waits for the current robot state. This can be removed after his work is merged. (See #232) Without the additional sleep the automatic update of the start state will pick a point near the end of the executed trajectory instead of the current state. Let's give the monitor a moment to catch up.
This is a poor-man's-replacement for rhaschke's work that waits for the current robot state. This can be removed after his work is merged. (See #232) Without the additional sleep the automatic update of the start state will pick a point near the end of the executed trajectory instead of the current state. Let's give the monitor a moment to catch up.
This is a poor-man's-replacement for rhaschke's work that waits for the current robot state. This can be removed after his work is merged. (See moveit#232) Without the additional sleep the automatic update of the start state will pick a point near the end of the executed trajectory instead of the current state. Let's give the monitor a moment to catch up.
This is a poor-man's-replacement for rhaschke's work that waits for the current robot state. This can be removed after his work is merged. (See moveit#232) Without the additional sleep the automatic update of the start state will pick a point near the end of the executed trajectory instead of the current state. Let's give the monitor a moment to catch up.
After having merged #63 we can finally attempt to merge https://github.com/rhaschke/moveit/pull/1 against ros-planning/moveit. This PR attempts to fix moveit/moveit_ros#442 and the flaky test mentioned in #221 (comment). This PR draws on reverted PRs moveit/moveit_ros#716 moveit/moveit_ros#724, and moveit/moveit_ros#728.
AsyncSpinner
andCallBackQueue
inPlanningSceneMonitor
for now. With fix trajectory service blocking callback queue moveit_ros#717 (cherry-picking #713 and #717 #59) this is not required anymore to update the robot state. HoweversyncSceneUpdates()
relies on them, which is why I had to remove them for now too. It's anyway in discussion, whether this function is useful. I will file a separate PR for this.CurrentStateMonitor::waitForCurrentState()
and added a deprecated fallback function calling the new methodwaitForCompleteState()
. Deprecation should be OK in Kinetic, but not Indigo and Jade. This should be considered when back-porting.