Skip to content

Conversation

@simonschmeisser
Copy link
Contributor

@simonschmeisser simonschmeisser commented Jul 2, 2022

This should speed up collision checks with FCL by not converting the RobotState to a FclCollisionManager (a BVH for example) up to four times but only once. This "conversion" includes calculating AABBs for each Link which is at least one matrix multiplication.
Also checking two optimized spatial structures against each other might be faster than doing these checks per scene object.

I would be really grateful to @captain-yoshi if you could run benchmarks to test the speed gain. Should be especially visible in case of no collisions. If you provide me with the scripts, I can also run the actual benchmarks myself.

The separation between padded collision checking for robot-scene collision and unpadded self-collisions somewhat limits the benefits of this optimization. To save it I have some more todos:

  • Add parameter to use padded robot for self-collisions
  • document new parameter in MIGRATION/changelog
  • detect if there is actually any difference between padded and unpadded robot

@codecov
Copy link

codecov bot commented Jul 2, 2022

Codecov Report

Merging #3167 (88847cc) into master (7949bd6) will decrease coverage by 0.01%.
The diff coverage is 62.17%.

@@            Coverage Diff             @@
##           master    #3167      +/-   ##
==========================================
- Coverage   62.08%   62.07%   -0.00%     
==========================================
  Files         375      375              
  Lines       33139    33200      +61     
==========================================
+ Hits        20572    20607      +35     
- Misses      12567    12593      +26     
Impacted Files Coverage Δ
...include/moveit/collision_detection/collision_env.h 100.00% <ø> (ø)
...lude/moveit/collision_detection/collision_matrix.h 100.00% <ø> (ø)
...it/planning_scene_monitor/planning_scene_monitor.h 95.46% <ø> (ø)
.../collision_detection_fcl/src/collision_env_fcl.cpp 83.80% <42.43%> (-6.20%) ⬇️
...nning_scene_monitor/src/planning_scene_monitor.cpp 66.35% <66.67%> (+0.01%) ⬆️
...eit_core/collision_detection/src/collision_env.cpp 72.40% <73.34%> (+0.21%) ⬆️
moveit_core/planning_scene/src/planning_scene.cpp 64.94% <100.00%> (+0.06%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@simonschmeisser
Copy link
Contributor Author

Note that PlanningScene needs changes to use this new override as well due to the handling of padding there. So no need to rush with review or benchmarks, will need further work

@simonschmeisser simonschmeisser force-pushed the fcl-combined-collision-check branch from 302822c to a52d835 Compare July 5, 2022 18:31
@simonschmeisser simonschmeisser force-pushed the fcl-combined-collision-check branch from 52ea399 to 3d441b0 Compare July 7, 2022 19:10
@simonschmeisser simonschmeisser force-pushed the fcl-combined-collision-check branch from 3d441b0 to 8f6fd08 Compare August 4, 2022 05:29
@simonschmeisser simonschmeisser force-pushed the fcl-combined-collision-check branch from 8f6fd08 to 2c1bf03 Compare August 23, 2022 19:37
@simonschmeisser
Copy link
Contributor Author

I used moveit_benchmark_suite to monitor my progress and validate my ideas. Here are two sample plots:
fcl-mesh-100-nc
So we can see quite some improvement when there are no collisions (this uses a panda robot surrounded by 100 random meshes but none of them are colliding with the panda)

fcl-mesh-100-4c

Unfortunately in the case where there are some (four) collisions, my PR performs a bit worse than MoveIt master. My hypothesis is that building up the BVH is somewhat expensive and not needed when there are collisions and the check fails fast. Master only builds it in the self-collision check which is skipped when there are collision with the scene.

In our applications we usually have the robotic cell and machinery modeled in URDF as well and I'll try to do benchmarks with such a cell as well.

For us this is already a very valuable improvement as we do way more collision checks without collisions than checks with collisions but I'll still try to figure out if I can improve on the scene collision test case as well.

@simonschmeisser
Copy link
Contributor Author

Looking at the rest of the test cases I noticed that the impact of the expensive BVH creation increases for simpler test cases (only one box) with collisions:
fcl-box-1c

But if not in collision we still see the nice improvement:
fcl-box-1nc

Also somewhat unrelated but very surprising non-colliding collision checks with meshes seem to be more than a factor 2 cheaper than with boxes (60.000/s vs 24.000/s). I'll investigate that further as well
fcl-box-100-nc

@Levi-Armstrong any further ideas and hints on where to improve? I found a comment in tesseract where you concluded that collisions between two BVHs / managers are slower than iterating over a list and colliding each entry with the second BVH/manager. How did you get to that conclusion?

@AndyZe
Copy link
Member

AndyZe commented Aug 25, 2022

To be honest, this looks like a small improvement in most cases. But occasionally it's a very large penalty. I'm not sure it's worth it.

@simonschmeisser
Copy link
Contributor Author

Here is a aggregated flame graph over all the collision checks in the benchmark suite
fcl_check_collision

So the main block is indeed narrow-phase collision checks which could be improved by hpp-fcl

This graph shows the same flame graph with narrow phase collision checks filtered out:

fcl_without_narrowphase

Now we can see (from left to right) some small blocks belonging mostly to string comparisons and similar in AllowedCollisionMatrix checks inside the collisionCallback, then the actual self collision check in the middle (bool fcl::detail::dynamic_AABB_tree::selfCollisionRecurse) and then the construction of the BroadPhase CollisionManager for self-collision (the representation of the robot). (Note that for scene objects the CollisionManager is already recycled and only updated when necessary)

In tesseract a std::unordered_map is used instead of std::map for the ACM, but at least for these simple test cases that wouldn't make a big difference. Another difference is that tesseract recycles the CollisionManager for self-collisions and only updates whichever links changed. I'll try implementing this but it does not yet explain the huge difference in their graph.

The last block to the right is a destructor which sounds wrong, maybe it's doing too much?

@captain-yoshi
Copy link
Contributor

Nice graphs !

In tesseract a std::unordered_map is used instead of std::map for the ACM, but at least for these simple test cases that wouldn't make a big difference

This makes a huge difference for tests that returns after the first collision. The framework that happens to check the object in collision first will obviously be faster. I agree for tests that checks all the collisions.

image

Also I think Tesseract tells FCL that it uses convex meshes and there are some optimisation done. Can we add this flag in MoveIt to tell FCL that the meshes are convex ? @v4hn

Il try adding the collision check with Tesseract in the benchmark suite (probably in October). That way we could check the graphs and @simonschmeisser could compare the flame graphs.

@simonschmeisser
Copy link
Contributor Author

There have been long and recurring discussions about declaring meshes as convex in URDF that never lead anywhere. FCL has a different geometry type for convex meshes, I'll try what happens when I change it in general and if improvements are promising convex decomposition from bullet HACD could be used here as well like tesseract seems to do.

@v4hn
Copy link
Contributor

v4hn commented Aug 26, 2022

There have been long and recurring discussions about declaring meshes as convex in URDF that never lead anywhere.

I believe the main argument for convex meshes, capsules,... was always that urdf must not be changed. I strongly hoped the situation would change at least with ros2.

@simonschmeisser
Copy link
Contributor Author

Tesseract includes it's own URDF parsing library due to the frozen nature of urdfdom.

In unrelated news, I did the switch from std::map to std::unordered_map and got another 4% improvement for the simple AllowedCollisionMatrix in the benchmark:

fcl-mesh-100-nc

fcl-mesh-100-4c

@captain-yoshi
Copy link
Contributor

captain-yoshi commented Aug 29, 2022

While the 4% is nice, would that be a breaking change for the worst time complexity ?

I think a std::unordered_map uses a hash table which have many downsides per this link :

Hash tables suffer from O(n) worst time complexity due to two reasons:

  1. If too many elements were hashed into the same key: looking inside this key may take O(n) time.
  2. Once a hash table has passed its load balance - it has to rehash [create a new bigger table, and re-insert each element to the table].

However, it is said to be O(1) average and amortized case because:

  1. It is very rare that many items will be hashed to the same key [if you chose a good hash function and you don't have too big load balance.
  2. The rehash operation, which is O(n), can at most happen after n/2 ops, which are all assumed O(1): Thus when you sum the average time per op, you get : (n*O(1) + O(n)) / n) = O(1)

Note because of the rehashing issue - a realtime applications and applications that need low latency - should not use a hash table as their data structure.

Annother issue with hash tables: cache
Another issue where you might see a performance loss in large hash tables is due to cache performance. Hash Tables suffer from bad cache performance, and thus for large collection - the access time might take longer, since you need to reload the relevant part of the table from the memory back into the cache.

For real-time applications, I would prefer a worst case of std::map (Red-black tree) which is O(log(n)) instead of a potential O(n) for hash tables. That's just my opinion and I agree that it may not trigger the worst case that often. I would assume that for somebody who has many small objects in it's scene (more then 1000) we would see differences in timing, then again maybe not !

image

@rhaschke rhaschke force-pushed the master branch 3 times, most recently from 1dae0c8 to 5323772 Compare July 28, 2023 10:32
@rhaschke rhaschke force-pushed the master branch 2 times, most recently from e1b0ec1 to fdc3b1c Compare July 28, 2023 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants