Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark robot state #2546

Merged
merged 9 commits into from Nov 30, 2023
Merged

Benchmark robot state #2546

merged 9 commits into from Nov 30, 2023

Conversation

marioprats
Copy link
Contributor

@marioprats marioprats commented Nov 22, 2023

Description

This change simplifies memory management in RobotState, at the expense of slower copy/construct operations.
Instead of having a void *memory_ chunk of memory with crafted pointers pointing inside, this change updates the pointer variables to be just std vectors.
Construction is more expensive because now it has to do multiple allocations/initializations, instead of a single one. On the positive side, the code is more readable, maintainable and less error-prone. This change actually fixes at least a couple of memory issues with the existing code.
There is also a path to re-gain performance by not allocating / initializing data that is rarely used, and compute that data only when it's needed (e.g joint transforms, or collision body transforms). This change would make that easier.

On the performance check, these are the RobotState benchmark results for the related test cases:

With this PR:
---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
robotStateConstruct/100                     0.020 ms        0.020 ms        35180
robotStateConstruct/1000                    0.197 ms        0.197 ms         3650
robotStateConstruct/10000                    1.95 ms         1.95 ms          348
robotStateCopy/100                          0.024 ms        0.024 ms        32654
robotStateCopy/1000                         0.197 ms        0.197 ms         3160
robotStateCopy/10000                         2.09 ms         2.09 ms          318
robotStateUpdate/10                         0.029 ms        0.029 ms        22723
robotStateUpdate/100                        0.282 ms        0.282 ms         2464
robotStateUpdate/1000                        2.72 ms         2.72 ms          252
robotStateForwardKinematics/10              0.025 ms        0.025 ms        28880
robotStateForwardKinematics/100             0.245 ms        0.245 ms         2902
robotStateForwardKinematics/1000             2.48 ms         2.48 ms          285

Without this change (i.e. at HEAD):
---------------------------------------------------------------------------------
Benchmark                                       Time             CPU   Iterations
---------------------------------------------------------------------------------
robotStateConstruct/100                     0.006 ms        0.006 ms       121103
robotStateConstruct/1000                    0.057 ms        0.057 ms        12037
robotStateConstruct/10000                   0.584 ms        0.584 ms         1240
robotStateCopy/100                          0.007 ms        0.007 ms       107305
robotStateCopy/1000                         0.062 ms        0.062 ms        11115
robotStateCopy/10000                        0.628 ms        0.628 ms         1133
robotStateUpdate/10                         0.028 ms        0.028 ms        25451
robotStateUpdate/100                        0.277 ms        0.277 ms         2450
robotStateUpdate/1000                        2.78 ms         2.78 ms          254
robotStateForwardKinematics/10              0.024 ms        0.024 ms        29117
robotStateForwardKinematics/100             0.242 ms        0.242 ms         2847
robotStateForwardKinematics/1000             2.41 ms         2.41 ms          288

So, state updates and FK are the same. Copy and construction are around 4x slower, e.g. copying 10.000 states changes from 0.6ms to ~2ms.

Regarding trajectories:

With RS Change:
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
robotTrajectoryCreate/10          0.008 ms        0.008 ms        91556
robotTrajectoryCreate/100         0.089 ms        0.089 ms         7889
robotTrajectoryCreate/1000         1.11 ms         1.11 ms          620
robotTrajectoryCreate/10000        22.2 ms         22.2 ms           32
robotTrajectoryCreate/100000        244 ms          244 ms            3
robotTrajectoryTiming/10          0.022 ms        0.022 ms        29042
robotTrajectoryTiming/100         0.070 ms        0.070 ms         9982
robotTrajectoryTiming/1000        0.322 ms        0.322 ms         2161
robotTrajectoryTiming/10000        1.22 ms         1.22 ms          440
robotTrajectoryTiming/20000        1123 ms         1123 ms            1

Without this change (i.e. at HEAD):
-----------------------------------------------------------------------
Benchmark                             Time             CPU   Iterations
-----------------------------------------------------------------------
robotTrajectoryCreate/10          0.005 ms        0.005 ms       100000
robotTrajectoryCreate/100         0.050 ms        0.050 ms        13940
robotTrajectoryCreate/1000         1.10 ms         1.10 ms          609
robotTrajectoryCreate/10000        16.1 ms         16.1 ms           43
robotTrajectoryCreate/100000        190 ms          190 ms            4
robotTrajectoryTiming/10          0.020 ms        0.020 ms        34208
robotTrajectoryTiming/100         0.063 ms        0.063 ms        10910
robotTrajectoryTiming/1000        0.298 ms        0.298 ms         2342
robotTrajectoryTiming/10000       0.975 ms        0.975 ms          554
robotTrajectoryTiming/20000        1057 ms         1057 ms            1

Creating a trajectory with 100.000 states goes from 190ms to 244ms.

Checklist

  • Required by CI: Code is auto formatted using clang-format
  • Extend the tutorials / documentation reference
  • Document API changes relevant to the user in the MIGRATION.md notes
  • Create tests, which fail without this PR reference
  • Include a screenshot if changing a GUI
  • While waiting for someone to review your request, please help review another open pull request to support the maintainers

Copy link

codecov bot commented Nov 22, 2023

Codecov Report

Attention: 23 lines in your changes are missing coverage. Please review.

Comparison is base (81094a6) 50.92% compared to head (27b842e) 50.87%.

Files Patch % Lines
moveit_core/robot_state/src/robot_state.cpp 69.36% 19 Missing ⚠️
...bot_state/include/moveit/robot_state/robot_state.h 75.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2546      +/-   ##
==========================================
- Coverage   50.92%   50.87%   -0.04%     
==========================================
  Files         388      388              
  Lines       32274    32253      -21     
==========================================
- Hits        16431    16407      -24     
- Misses      15843    15846       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@uavster uavster self-requested a review November 22, 2023 15:36
@marioprats marioprats marked this pull request as ready for review November 28, 2023 12:44
@rhaschke
Copy link
Contributor

@marioprats, the optimized memory allocation was there for a reason. Allocating a single chunk of memory for the RobotState not only speeds up allocation but primarily ensures that a state's memory fits into the CPU cache (and is not scattered around). Both, allocation and caching are important for planning, which needs to allocate and update hundreds of states. You mention "at least a couple of memory issues" that are fixed by that change. Could you please name a few?

@marioprats
Copy link
Contributor Author

Hi @rhaschke, thanks for following up on this.
I understand the motivation behind the previous memory allocation approach, but it came with added complexity and increased risk for bugs, for a little gain in performance (~2ms over creation of 10k states) that we can probably recover in different ways (e.g. not always allocating velocities and accelerations or some of the transforms).

One example of an issue is the one referenced above in MTC (ros-planning/moveit_task_constructor#518). That's because of an out-of-bounds memory access in RobotState.

Here's another example, part of this PR:

memset(velocity_, 0, sizeof(double) * 2 * robot_model_->getVariableCount());

That was setting velocity_ to zero, but incorrectly doing it for double the size as I understand.

This PR also fixed an issue in updateLinkTransformsInternal that we found when running the tests and discovering out-of-bounds access.

The previous referenced issue in MTC made me realize there's a few other functions in RobotState that can trigger out-of-bounds access. I'll prepare a change to fix those.

@rhaschke
Copy link
Contributor

The added complexity was nicely hidden in the init procedure. A user of RobotState doesn't notice.
You only measured performance only for instantiation and FK. What about cache usage in heavy planning methods with thousands of states randomly accessed?
So far you named issues in edge cases only (robots having zero variables). I'm curious to see the underlying reason for moveit/moveit_task_constructor#518!
The other example you give, initializing velocity_ to zero, simply zeroes both, velocities and accelerations in one call. No issue there.

Copy link
Contributor

@rhaschke rhaschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments on this PR:

Comment on lines 309 to +310
// set velocity & acceleration to 0
memset(velocity_, 0, sizeof(double) * 2 * robot_model_->getVariableCount());
memset(dirty_joint_transforms_, 1, robot_model_->getJointModelCount() * sizeof(unsigned char));
std::fill(velocity_.begin(), velocity_.end(), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't set acceleration to zero anymore as stated by the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, thanks. I'll send a PR to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To close the loop on this, here's a fix for this issue: #2617

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the missing initialization of the accelerations in #2617 either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, wrong PR. Should be this: #2618

Comment on lines +721 to +722
// The root joint doesn't have any variables: avoid calling getJointTransform() on it.
global_link_transforms_[idx_link] = Eigen::Isometry3d::Identity();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling getJointTransform() (or actually accessing the joint position of a fixed joint) just became a problem with this PR. Before, the (invalid) address of the position entry was passed but actually not accessed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See for instance this public method, before this change:

https://github.com/ros-planning/moveit2/blob/6bf33ebfbb577faf2635c9f9be7107177a17c035/moveit_core/robot_state/include/moveit/robot_state/robot_state.h#L562

I believe that joint->getFirstVariableIndex() has an out-of-bounds value if you call this for a fixed joint at the end of the tree, e.g. it will be 10 for a position_ size of 10, therefore pointing out of bounds. Sure, it would have returned (not accessed) an invalid address, but a user can mistakenly access it, since this is in the public API. Similarly, for a fixed joint somewhere in between the tree this would return a valid address, but pointing to data belonging to a different joint.

I consider that an issue and bad API design in general.

We are planning to make deeper changes in the design of core types to make them harder to misuse (we also had issues with tests failing randomly because of uninitialized data members). Having a more hardened memory model is a step towards that. A challenge is that we don't have great test coverage or benchmarks. We have also been improving on that little by little (see RobotState and RobotTrajectory benchmarks) but there's still a lot to do. Please let me know if you are aware of any benchmark that we can run higher in the planning stack to better measure the impact of this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the pointer API can be misused, but I think it wasn't in the existing code base. However, this "risk" came for improved performance. I think the better performance in caching and memory allocation gets more pronounced if you are running short of memory, which nowadays is less than a problem than in the past...
In any case, we are balancing security for performance. With your PR you decided to weigh the former more.
Please also note that changing basic data structures also introduces the risk of new bugs - for example the missing initialization of accelerations in your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants