Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"object" collides with "object": unconditionally allocate unpadded collision environments #1899

Merged
merged 3 commits into from
Mar 9, 2020

Conversation

v4hn
Copy link
Contributor

@v4hn v4hn commented Feb 13, 2020

I just read through #1835 and moveit/moveit_task_constructor#122 which describe a problem with collision checking of attached objects ending up in self-collision.

I am sorry. I debugged this problem in a hurry already back in October for a demo,
but did not get around to investigate further or create a thread for it. ☹️

This patch resolved the problem for me, although it is likely not the best solution.
Without digging deeper into this again, here's what I recall:

  • Collision detectors maintain separate collision structures for padded and unpadded collision checking
  • Apparently the unpadded collision checking is thought to be less popular and is only allocated when it is requested through API that allows to write to it
  • As long as it is not allocated, a standard pattern in the PlanningScene structures is applied and the parents unpadded collision structure is used if required
  • @j-petit 's global unification of CollisionRobot and CollisionWorld into CollisionEnv seems to have exposed the issue to trigger with MTC's execution capability, although I'm not convinced the logic was perfectly sane before
  • Currently, things go wrong the moment an object becomes attached in a diff planning scene
    • it is removed from the collision world
    • it is added to the collision robot structure
    • during unpadded collision checking, the parent's (unpadded) collision world is returned which still contains the object
    • Thus the object suddenly collides with itself

The patch addresses this logic bug simply by always allocating the unpadded collision environment.
I am not convinced it is worth the effort to try to fix the lazy allocation scheme here, but I did not benchmark anything and this only becomes relevant when the benchmarks include attach/detach operations...

On top of that, I am not convinced this double-accounting for padded and unpadded checking makes sense anymore(?), but I did not look into the underlying structures deep enough to understand this.

@codecov-io
Copy link

codecov-io commented Feb 13, 2020

Codecov Report

Merging #1899 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1899      +/-   ##
==========================================
+ Coverage   50.24%   50.24%   +<.01%     
==========================================
  Files         313      313              
  Lines       24663    24662       -1     
==========================================
+ Hits        12391    12392       +1     
+ Misses      12272    12270       -2
Impacted Files Coverage Δ
moveit_core/planning_scene/src/planning_scene.cpp 46.08% <100%> (-0.06%) ⬇️
moveit_core/robot_state/src/robot_state.cpp 44.1% <0%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 643bc57...8cd7d6b. Read the comment docs.

@rhaschke
Copy link
Contributor

@v4hn, I don't see how your explanation and your fix fits my observation of a race condition: When adding a small sleep after a PlanningScene update in MTC's execution capability, the issue is gone.
But, adding a sleep doesn't trigger an extra environment allocation, does it?

@simonschmeisser
Copy link
Contributor

I think the right fix would be to find the place where this copy-on-write is broken (ie making a copy is not triggered) instead of removing it? I would expect your changes to have some impact in cases like multi-threaded planners etc.?

@v4hn
Copy link
Contributor Author

v4hn commented Feb 18, 2020

I think the right fix would be to find the place where this copy-on-write is broken (ie making a copy is not triggered) instead of removing it?

I agree but did not have the time to do so, especially as this whole part was rearranged lately...

@v4hn, I don't see how your explanation and your fix fits my observation of a race condition: When adding a small sleep after a PlanningScene update in MTC's execution capability, the issue is gone.

Did you try the patch?

I am not sure anymore whether an additional sleep fixed my problems, I tried to find the actual problem.
I could imagine the PlanningSceneMonitor pushes its diff if you wait long enough, but sleeping 0.1 seems rather short for that.

I would expect your changes to have some impact in cases like multi-threaded planners etc.?

You are welcome to setup a benchmark to show that. As I said, a-priori I'm not convinced the whole structural overhead there is justified.

@j-petit
Copy link
Contributor

j-petit commented Feb 18, 2020

  • Collision detectors maintain separate collision structures for padded and unpadded collision checking
  • Apparently the unpadded collision checking is thought to be less popular and is only allocated when it is requested through API that allows to write to it
  • As long as it is not allocated, a standard pattern in the PlanningScene structures is applied and the parent's unpadded collision structure is used if required

That is also my understanding of the current status-quo. Maybe this discussion helps?

On top of that, I am not convinced this double-accounting for padded and unpadded checking makes sense anymore(?), but I did not look into the underlying structures deep enough to understand this

When working on this part of MoveIt, I also had the impression that the efficiency gains are not worth the complexity added (anymore?). The code is very difficult to understand.

Copy link
Contributor

@rhaschke rhaschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not convinced that we should merge this workaround.
Somebody should take the time to really understand the race condition.

@v4hn
Copy link
Contributor Author

v4hn commented Feb 19, 2020

I'm undecided whether I want to see this merged (temporarily?) and mainly wanted to share the insights.
If you feel like debugging further @rhaschke , please go ahead.

Incidentally, an alternative workaround for the issue is to remove Unpadded from the plan_execution, thus enforcing padding for the collision-monitoring too.
I feel this would work around the current problem with less of a hypothetical hard-to-benchmark-but-possibly-existing impact, but of course it still leaves the unpadded worlds in an inconsistent state under some conditions. Maybe you would prefer this as a temporary way to fix upstream @rhaschke ?

@v4hn
Copy link
Contributor Author

v4hn commented Feb 19, 2020

As a side note: could someone explain to me how this patch manages to reduce coverage in the checker?
It reduces LOC and even branches.

@felixvd
Copy link
Contributor

felixvd commented Feb 21, 2020

It might be deleting LOC that were covered by tests, so the percentage of uncovered code grows?

@rhaschke
Copy link
Contributor

Incidentally, an alternative workaround for the issue is to remove Unpadded from the plan_execution, thus enforcing padding for the collision-monitoring too.

Yes, this change seems to be less source-intrusive. I would like to avoid workarounds that strongly change the collision code without fully understanding the underlying problem. However, I don't have the resources right now to further investigate. For MTC, introducing a 0.1s sleep "solved" the issue. As there are no other reported failures due to this bug, I prefer this as the least-intrusive workaround. Nevertheless, your analysis will be very helpful.

@v4hn
Copy link
Contributor Author

v4hn commented Feb 27, 2020

Incidentally, an alternative workaround for the issue is to remove Unpadded from the plan_execution, thus enforcing padding for the collision-monitoring too.

Yes, this change seems to be less source-intrusive. I would like to avoid workarounds that strongly change the collision code without fully understanding the underlying problem.

To add a quick note here: This does not seem to be a solution as Padded collision checking might yield errors when picking up things from a table

@v4hn v4hn force-pushed the pr-master-lazy-unpadded-checking branch 4 times, most recently from a46e702 to 7ceae98 Compare March 8, 2020 09:46
@v4hn v4hn force-pushed the pr-master-lazy-unpadded-checking branch 2 times, most recently from b9c53ed to 1afb965 Compare March 8, 2020 11:18
v4hn added 2 commits March 8, 2020 12:23
If a scene contains an object, either itself or its parent,
collision checking should respect the object.

Incidentally, failing to do so can yield "object" collides with
"object" errors in diffs with attach instructions.
@v4hn v4hn force-pushed the pr-master-lazy-unpadded-checking branch from 1afb965 to 7d047dc Compare March 8, 2020 11:24
@v4hn
Copy link
Contributor Author

v4hn commented Mar 9, 2020

I think the right fix would be to find the place where this copy-on-write is broken (ie making a copy is not triggered) instead of removing it?

I spent a few hours on this the other night and it turns out the whole logic of unpadded collision checking on diff() scenes was broken for ages (if it ever worked).
I added a test to the planning scene for this.

Note that this bug affects the standard use of the PlanningSceneMonitor which maintains diffs.

Whereas a source comment mentions copy-on-write, this is actually not implemented.
Unpadded collision checking always uses the parent's collision world.
The only two ways to get correct unpadded collision checking for a diff() scene are
(1) by decoupling it from the parent, which disables the whole concept of the diff.
(2) by pushing the diff to the parent.

To make the scene structure consistent again - objects in the scene must be respected for all types of collision checking - I propose to merge the original patch from this pull-request.
Consistency has to come before performance and as CI demonstrates, the patch fixes the test.

I consider the alternative solution, to implement a lazy allocation when unpadded checking is requested, inferior as it would introduce more fragile logic in an overcomplicated datastructure.

@rhaschke

Copy link
Contributor

@rhaschke rhaschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for deeply analyzing this issue.

@rhaschke rhaschke merged commit ab3b5a8 into moveit:master Mar 9, 2020
@j-petit
Copy link
Contributor

j-petit commented Mar 9, 2020

Great work, thanks. Does this also fix #1835?

v4hn added a commit to moveit/moveit_task_constructor that referenced this pull request Mar 9, 2020
Fixed via moveit/moveit#1899.
Please use *latest* master (if you do not use melodic builds).

This reverts commit c44d0ca.
@simonschmeisser
Copy link
Contributor

@v4hn @rhaschke has this been included in this or the previous melodic release? if so, which one? I just got a call about some race conditions and it could save some of my holidays if this were included in the latest release? (Assuming it's actually this what we're seeing ... )

@v4hn
Copy link
Contributor Author

v4hn commented Jul 13, 2020 via email

@simonschmeisser simonschmeisser added the backport porting back changes from more current branches label Jul 13, 2020
@simonschmeisser
Copy link
Contributor

Ok, I thought so as well.

Unfortunately there are no copies of PlanningScenes involved in where we run into issues so simply backporting this most likely won't fix our issues. Will need more investigations after holidays ...

@v4hn
Copy link
Contributor Author

v4hn commented Jul 13, 2020 via email

sjahr pushed a commit to sjahr/moveit that referenced this pull request Jun 21, 2024
* Fix timeout in waitForCurrentState

* Remove rclcpp dependency
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport porting back changes from more current branches
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants