Skip to content
This repository has been archived by the owner on Nov 13, 2017. It is now read-only.

Race conditions when updating PlanningScene: fixup #716 #728

Closed
wants to merge 8 commits into from

Conversation

rhaschke
Copy link
Contributor

This continues #716 and #724 which were both merged already. I repeat here my comments to #724:

@v4hn Realizing the simplification we agreed on in the phone call turned out to be rather difficult:

  1. simplification of stateUpdateTimerCallback() (removing the additional timestamp check) caused dead locks. Hence, I reverted these changes.
  2. I successfully moved validatePlan() from MoveGroupInterface (client side) to MoveGroup's ExecutionServiceCapability (server side) - as agreed.
  3. Simplification of syncSceneUpdates(): We definitely need both while loops and the timeout. When the robot doesn't move at all, we will never receive a scene update. The first loop is required to check for a recent robot state update directly in CSM, while the second loop (in case there is no CSM) waits - with a timeout - for a direct scene update. There is no way to omit this timeout. The only chance would be to trigger all input channels of PSM to send updates - which is not possible. However, in move_group, this doesn't pose a problem, because there is a CSM instantiated.
  4. Removing syncSceneUpdates() after trajectory execution makes sense. Doing so, led to failure of the test suggested in Trajectory start doesn't match Current Position, when using plan/execute #442. Now I fixed it. CSM::waitForCurrentState() didn't do what I expected from the name...

Some more thoughts:

  • I suggest some more API changes: syncSceneUpdates() should be renamed to waitForCurrentRobotState().
  • There are some methods waitForCurrentState() in CurrentStateMonitor, which actually should be renamed to waitForCompleteState() as they don't look at recent timestamps.
  • I'm still arguing for a true method syncSceneUpdates() which incorporates all pending scene updates in the callback queue. I agree to @v4hn that this doesn't guarantee that all scene updates up to a given timestamp are incorporated, because - due to network delays - ROS messages might be received delayed. The only guarantee for a synchronous scene update is the new method applyPlanningScene().
  • I added some integration tests in moveit_ros/test. To this end, I imported a Fanuc robot description and moveit_config from ROS industrial. Please check, if that's fine.

from MoveGroupInterface (client side)
to MoveGroup's ExecutionServiceCapability (server side)

This checks validity of plan no matter which client called.
Placing it in MoveGroupContext will allow re-use by other capabilities.
Primary objective of syncSceneUpdates() is to receive a recent robot state.
If a current state monitor is active, this is all to monitor.
If not, scene updates are monitored, in particular the timestamp of their RobotState member.
- new CurrentStateMonitor::waitForCurrentState(ros::Time)
- simply wait for state update to reach scene
- fixed validatePlan(): use new CSM::waitForCurrentState()
If the robot doesn't move, an update in CSM is not forwarded to the planning scene.
Hence, we always wait until timeout for a recent last_robot_motion_time_.
If we ensure that an update, if it is triggered by CSM, is directly
passed to the scene, we can early return true.

This partially reverts commit 907980a392d352f5e2ebb6a0b6906c85d9c7d72c.
@davetcoleman
Copy link
Member

I think the integration tests make this PR way too big - should be separate as discussion on call

new syncSceneUpdates() indeed also waits for all pending scene updates to be processed
@v4hn v4hn mentioned this pull request Jul 28, 2016
1 task
@@ -38,6 +38,7 @@
#define MOVEIT_MOVE_GROUP_CONTEXT_

#include <moveit/macros/class_forward.h>
#include <trajectory_msgs/JointTrajectory.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the new dependency to the package.xml

@rhaschke
Copy link
Contributor Author

I'm afraid that I cannot handle the comments before next week. Please be patient.

@v4hn
Copy link
Contributor

v4hn commented Jul 28, 2016

@rhaschke too bad. Sorry it took me so long :(
@130s is it ok, if this will take another week before a new indigo release?
Releasing without this fixed-up request is not an option.

@davetcoleman
Copy link
Member

Hopefully before August 5th...

@v4hn
Copy link
Contributor

v4hn commented Aug 3, 2016

@davetcoleman , @130s, @mikeferguson / whoever feels responsible. This is a bit critical, so please read and comment.

@davetcoleman this is not going to happen before Friday,
I've been told @rhaschke is on vacation starting this Monday.

He will be really pissed because of that, but at the moment I'm considering a revert of the whole list of commits for #442 up to now. Maybe that's what we should have done directly after the unreviewed merge...
We have at least one open bug with respect to these changes by @dornhege, a list of things that should be improved, ABI breakage w.r.t. the last release and I just noticed a segfault in the RViz plugin in my setup when removing the MotionPlanning display, that gdb blames on the new spinner_ that has been added to the PlanningSceneMonitor. I'm not sure we can have that spinner there, because ros keeps a recursive mutex on spinners, and there must only be one thread that keeps spinners. I'm pretty sure this might have broken API in a totally unintuitive way (definitely not acceptable in indigo), because if you keep a AsyncSpinner around from your main thread and create the monitor from a different thread things probably break.

Overall, this is no state in which we should let people work on "minor" issues and the whole "fix the start state" patch is clearly not ready for a release.

I therefore propose to revert these change-sets and try to add only the part that makes the ExecutionController check whether the trajectory it is about to execute starts somewhere around the current state. This should be possible without most of this code and without breakage.
Later on we can focus on getting this to work without this many issues..

@davetcoleman
Copy link
Member

Yes, after seeing #736 and possibly related moveit/moveit#10 I have been worried about the state of the PlanningSceneMonitor. Considering it was never fully reviewed I do not think we should feel guilty about the revert. +1

@dornhege
Copy link
Contributor

dornhege commented Aug 4, 2016

+1 for the revert. Ideally before the merge and unless it is shown that #736 is not an issue in general or independent from this, we cannot release.
The PR is still there to be worked on, but this seems to be not trivial to get right.

@rhaschke
Copy link
Contributor Author

rhaschke commented Aug 4, 2016

Looks like AsyncSpinner isn't really usable as intended. As I can't work on the issue in the next days, go for the revert as suggested by @v4hn.

@davetcoleman davetcoleman changed the title fixup #716 Race conditions when updating PlanningScene: fixup #716 Aug 4, 2016
@davetcoleman
Copy link
Member

@v4hn can you do the revert today before tomorrow's merge?

@v4hn
Copy link
Contributor

v4hn commented Aug 4, 2016

I opened a request to revert the previous merge commits in #742 .

I'll close this request. We will have to look through the changes again anyway and rebase/reformat them for the merged repo.

@rhaschke
Copy link
Contributor Author

@v4hn @dornhege @davetcoleman
I'm back from vacation and I'm going to look into the AsyncSpinner issue now.

Obviously any use of an AsyncSpinner is currently not safe: If they are started from different threads, only the very first one will actually be started. Hence it depends on the order of their start, which one will be active, which is not acceptable. As there are multiple AsyncSpinners used throughout the source, #736 reported by @dornhege might be only the tip of the iceberg.

I will try to fix that issue upstream in ros_comm and then analyze, why it failed in #736 but not for me.

@v4hn
Copy link
Contributor

v4hn commented Aug 15, 2016

On Mon, Aug 15, 2016 at 12:32:20AM -0700, Robert Haschke wrote:

I will try to fix that issue upstream in ros_comm and then analyze, why it failed in #736 but not for me.

Thanks! This is clearly a bug in ros_comm and the commit that introduced the global mutex there is at fault.
The trouble is, it might be hard to remove it again without understanding why it was introduced in the first place..
Maybe it's worth a try to contact the author? It has been a couple of years, but maybe he remembers the original bug (if there was one)?

@davetcoleman
Copy link
Member

Welcome back @rhaschke, hope vacation was good! Thanks for continuing to work on this

@rhaschke
Copy link
Contributor Author

The other usage of AsyncSpinners in moveit_ros source tree is uncritical. They mostly belong to stand-alone programs. I think, the reason that @dornhege observed the bug, but not me, was that he was running the whole perception pipeline too. The perception/mesh_filter instantiates its own PlanningSceneMonitor and thus has thrown the error.

I prepared ros/ros_comm#867 and hope that they will consider it timely.

@v4hn
Copy link
Contributor

v4hn commented Aug 16, 2016

Wow, given the scope of your patch @rhaschke I'm pretty sure this will not be merged in indigo, not sure about kinetic. But we should probably fix the safety issue in indigo...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants