Benchmark robot state #2546

marioprats · 2023-11-22T14:19:07Z

Description

This change simplifies memory management in RobotState, at the expense of slower copy/construct operations.
Instead of having a void *memory_ chunk of memory with crafted pointers pointing inside, this change updates the pointer variables to be just std vectors.
Construction is more expensive because now it has to do multiple allocations/initializations, instead of a single one. On the positive side, the code is more readable, maintainable and less error-prone. This change actually fixes at least a couple of memory issues with the existing code.
There is also a path to re-gain performance by not allocating / initializing data that is rarely used, and compute that data only when it's needed (e.g joint transforms, or collision body transforms). This change would make that easier.

On the performance check, these are the RobotState benchmark results for the related test cases:

With this PR:
---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
robotStateConstruct/100                     0.020 ms        0.020 ms        35180
robotStateConstruct/1000                    0.197 ms        0.197 ms         3650
robotStateConstruct/10000                    1.95 ms         1.95 ms          348
robotStateCopy/100                          0.024 ms        0.024 ms        32654
robotStateCopy/1000                         0.197 ms        0.197 ms         3160
robotStateCopy/10000                         2.09 ms         2.09 ms          318
robotStateUpdate/10                         0.029 ms        0.029 ms        22723
robotStateUpdate/100                        0.282 ms        0.282 ms         2464
robotStateUpdate/1000                        2.72 ms         2.72 ms          252
robotStateForwardKinematics/10              0.025 ms        0.025 ms        28880
robotStateForwardKinematics/100             0.245 ms        0.245 ms         2902
robotStateForwardKinematics/1000             2.48 ms         2.48 ms          285

Without this change (i.e. at HEAD):
---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
robotStateConstruct/100                     0.006 ms        0.006 ms       121103
robotStateConstruct/1000                    0.057 ms        0.057 ms        12037
robotStateConstruct/10000                   0.584 ms        0.584 ms         1240
robotStateCopy/100                          0.007 ms        0.007 ms       107305
robotStateCopy/1000                         0.062 ms        0.062 ms        11115
robotStateCopy/10000                        0.628 ms        0.628 ms         1133
robotStateUpdate/10                         0.028 ms        0.028 ms        25451
robotStateUpdate/100                        0.277 ms        0.277 ms         2450
robotStateUpdate/1000                        2.78 ms         2.78 ms          254
robotStateForwardKinematics/10              0.024 ms        0.024 ms        29117
robotStateForwardKinematics/100             0.242 ms        0.242 ms         2847
robotStateForwardKinematics/1000             2.41 ms         2.41 ms          288

So, state updates and FK are the same. Copy and construction are around 4x slower, e.g. copying 10.000 states changes from 0.6ms to ~2ms.

Regarding trajectories:

With RS Change:
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
robotTrajectoryCreate/10          0.008 ms        0.008 ms        91556
robotTrajectoryCreate/100         0.089 ms        0.089 ms         7889
robotTrajectoryCreate/1000         1.11 ms         1.11 ms          620
robotTrajectoryCreate/10000        22.2 ms         22.2 ms           32
robotTrajectoryCreate/100000        244 ms          244 ms            3
robotTrajectoryTiming/10          0.022 ms        0.022 ms        29042
robotTrajectoryTiming/100         0.070 ms        0.070 ms         9982
robotTrajectoryTiming/1000        0.322 ms        0.322 ms         2161
robotTrajectoryTiming/10000        1.22 ms         1.22 ms          440
robotTrajectoryTiming/20000        1123 ms         1123 ms            1

Without this change (i.e. at HEAD):
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
robotTrajectoryCreate/10          0.005 ms        0.005 ms       100000
robotTrajectoryCreate/100         0.050 ms        0.050 ms        13940
robotTrajectoryCreate/1000         1.10 ms         1.10 ms          609
robotTrajectoryCreate/10000        16.1 ms         16.1 ms           43
robotTrajectoryCreate/100000        190 ms          190 ms            4
robotTrajectoryTiming/10          0.020 ms        0.020 ms        34208
robotTrajectoryTiming/100         0.063 ms        0.063 ms        10910
robotTrajectoryTiming/1000        0.298 ms        0.298 ms         2342
robotTrajectoryTiming/10000       0.975 ms        0.975 ms          554
robotTrajectoryTiming/20000        1057 ms         1057 ms            1

Creating a trajectory with 100.000 states goes from 190ms to 244ms.

Checklist

Required by CI: Code is auto formatted using clang-format
Extend the tutorials / documentation reference
Document API changes relevant to the user in the MIGRATION.md notes
Create tests, which fail without this PR reference
Include a screenshot if changing a GUI
While waiting for someone to review your request, please help review another open pull request to support the maintainers

codecov · 2023-11-22T15:28:43Z

Codecov Report

Attention: 23 lines in your changes are missing coverage. Please review.

Comparison is base (81094a6) 50.92% compared to head (27b842e) 50.87%.

Files	Patch %	Lines
moveit_core/robot_state/src/robot_state.cpp	69.36%	19 Missing ⚠️
...bot_state/include/moveit/robot_state/robot_state.h	75.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2546      +/-   ##
==========================================
- Coverage   50.92%   50.87%   -0.04%     
==========================================
  Files         388      388              
  Lines       32274    32253      -21     
==========================================
- Hits        16431    16407      -24     
- Misses      15843    15846       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

moveit_core/robot_state/include/moveit/robot_state/robot_state.h

moveit_core/robot_state/src/robot_state.cpp

moveit_core/robot_state/include/moveit/robot_state/robot_state.h

rhaschke · 2023-12-18T07:23:41Z

@marioprats, the optimized memory allocation was there for a reason. Allocating a single chunk of memory for the RobotState not only speeds up allocation but primarily ensures that a state's memory fits into the CPU cache (and is not scattered around). Both, allocation and caching are important for planning, which needs to allocate and update hundreds of states. You mention "at least a couple of memory issues" that are fixed by that change. Could you please name a few?

marioprats · 2023-12-18T09:26:46Z

Hi @rhaschke, thanks for following up on this.
I understand the motivation behind the previous memory allocation approach, but it came with added complexity and increased risk for bugs, for a little gain in performance (~2ms over creation of 10k states) that we can probably recover in different ways (e.g. not always allocating velocities and accelerations or some of the transforms).

One example of an issue is the one referenced above in MTC (ros-planning/moveit_task_constructor#518). That's because of an out-of-bounds memory access in RobotState.

Here's another example, part of this PR:

memset(velocity_, 0, sizeof(double) * 2 * robot_model_->getVariableCount());

That was setting velocity_ to zero, but incorrectly doing it for double the size as I understand.

This PR also fixed an issue in updateLinkTransformsInternal that we found when running the tests and discovering out-of-bounds access.

The previous referenced issue in MTC made me realize there's a few other functions in RobotState that can trigger out-of-bounds access. I'll prepare a change to fix those.

rhaschke · 2023-12-18T09:50:19Z

The added complexity was nicely hidden in the init procedure. A user of RobotState doesn't notice.
You only measured performance only for instantiation and FK. What about cache usage in heavy planning methods with thousands of states randomly accessed?
So far you named issues in edge cases only (robots having zero variables). I'm curious to see the underlying reason for moveit/moveit_task_constructor#518!
The other example you give, initializing velocity_ to zero, simply zeroes both, velocities and accelerations in one call. No issue there.

rhaschke

Some more comments on this PR:

rhaschke · 2023-12-18T09:53:14Z

moveit_core/robot_state/src/robot_state.cpp

  // set velocity & acceleration to 0
-  memset(velocity_, 0, sizeof(double) * 2 * robot_model_->getVariableCount());
-  memset(dirty_joint_transforms_, 1, robot_model_->getJointModelCount() * sizeof(unsigned char));
+  std::fill(velocity_.begin(), velocity_.end(), 0);


This doesn't set acceleration to zero anymore as stated by the comment.

You are right, thanks. I'll send a PR to fix this.

To close the loop on this, here's a fix for this issue: #2617

I don't see the missing initialization of the accelerations in #2617 either.

Sorry, wrong PR. Should be this: #2618

rhaschke · 2023-12-18T10:06:32Z

moveit_core/robot_state/src/robot_state.cpp

+        // The root joint doesn't have any variables: avoid calling getJointTransform() on it.
+        global_link_transforms_[idx_link] = Eigen::Isometry3d::Identity();


Calling getJointTransform() (or actually accessing the joint position of a fixed joint) just became a problem with this PR. Before, the (invalid) address of the position entry was passed but actually not accessed!

See for instance this public method, before this change:

https://github.com/ros-planning/moveit2/blob/6bf33ebfbb577faf2635c9f9be7107177a17c035/moveit_core/robot_state/include/moveit/robot_state/robot_state.h#L562

I believe that joint->getFirstVariableIndex() has an out-of-bounds value if you call this for a fixed joint at the end of the tree, e.g. it will be 10 for a position_ size of 10, therefore pointing out of bounds. Sure, it would have returned (not accessed) an invalid address, but a user can mistakenly access it, since this is in the public API. Similarly, for a fixed joint somewhere in between the tree this would return a valid address, but pointing to data belonging to a different joint.

I consider that an issue and bad API design in general.

We are planning to make deeper changes in the design of core types to make them harder to misuse (we also had issues with tests failing randomly because of uninitialized data members). Having a more hardened memory model is a step towards that. A challenge is that we don't have great test coverage or benchmarks. We have also been improving on that little by little (see RobotState and RobotTrajectory benchmarks) but there's still a lot to do. Please let me know if you are aware of any benchmark that we can run higher in the planning stack to better measure the impact of this change.

Yes, the pointer API can be misused, but I think it wasn't in the existing code base. However, this "risk" came for improved performance. I think the better performance in caching and memory allocation gets more pronounced if you are running short of memory, which nowadays is less than a problem than in the past...
In any case, we are balancing security for performance. With your PR you decided to weigh the former more.
Please also note that changing basic data structures also introduces the risk of new bugs - for example the missing initialization of accelerations in your PR.

marioprats added 2 commits November 22, 2023 15:02

simplify memory management in RobotState

23e93e4

further changes

7ee883b

tylerjw approved these changes Nov 22, 2023

View reviewed changes

uavster self-requested a review November 22, 2023 15:36

uavster approved these changes Nov 22, 2023

View reviewed changes

ChrisThrasher approved these changes Nov 22, 2023

View reviewed changes

moveit_core/robot_state/include/moveit/robot_state/robot_state.h Outdated Show resolved Hide resolved

moveit_core/robot_state/src/robot_state.cpp Outdated Show resolved Hide resolved

marioprats added 2 commits November 28, 2023 04:37

Merge branch 'main' into benchmark_robot_state

31957ac

avoid pointer arithmetic where possible

1e80743

marioprats marked this pull request as ready for review November 28, 2023 12:44

marioprats added 2 commits November 28, 2023 15:40

fix memory access issue on root joint with 0 variables

c6b04f6

Merge branch 'main' into benchmark_robot_state

1748fd6

ChrisThrasher reviewed Nov 28, 2023

View reviewed changes

moveit_core/robot_state/include/moveit/robot_state/robot_state.h Outdated Show resolved Hide resolved

marioprats added 3 commits November 29, 2023 14:49

fix vector size

cb60855

remove unused header

1e94a58

Merge branch 'main' into benchmark_robot_state

27b842e

tylerjw merged commit f7b0c15 into moveit:main Nov 30, 2023
12 checks passed

rhaschke reviewed Dec 18, 2023

View reviewed changes

This was referenced Dec 18, 2023

fix out-of-bounds memory access with zero-variable joints #2617

Merged

reset accelerations on setToDefaultValues #2618

Merged

rhaschke mentioned this pull request Dec 21, 2023

Fix benchmarks #2628

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark robot state #2546

Benchmark robot state #2546

marioprats commented Nov 22, 2023 •

edited

codecov bot commented Nov 22, 2023 •

edited

rhaschke commented Dec 18, 2023

marioprats commented Dec 18, 2023

rhaschke commented Dec 18, 2023

rhaschke left a comment

rhaschke Dec 18, 2023

marioprats Dec 18, 2023

marioprats Dec 18, 2023

rhaschke Dec 18, 2023

marioprats Dec 18, 2023

rhaschke Dec 18, 2023

marioprats Dec 18, 2023

rhaschke Dec 18, 2023

		// The root joint doesn't have any variables: avoid calling getJointTransform() on it.
		global_link_transforms_[idx_link] = Eigen::Isometry3d::Identity();

Benchmark robot state #2546

Benchmark robot state #2546

Conversation

marioprats commented Nov 22, 2023 • edited

Description

Checklist

codecov bot commented Nov 22, 2023 • edited

Codecov Report

rhaschke commented Dec 18, 2023

marioprats commented Dec 18, 2023

rhaschke commented Dec 18, 2023

rhaschke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marioprats commented Nov 22, 2023 •

edited

codecov bot commented Nov 22, 2023 •

edited