fix out-of-bounds memory access with zero-variable joints #2617

marioprats · 2023-12-18T10:34:12Z

This change updates some methods to avoid an out-of-bounds access after the recent changes to use std::vector as the underlying storage in RobotState (#2546).
Before that change, there was nothing stopping us from passing or returning invalid memory addresses. But now that will trigger range_check exceptions.
This change fixes an out-of-bounds access that can happen with fixed joint models with a variable count of 0, by special-handling that case where needed. It also moves the implementation of those functions to the source file, to not clutter the header file further.

codecov · 2023-12-18T11:25:55Z

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (0807cb2) 50.60% compared to head (50b318b) 51.00%.

Files	Patch %	Lines
moveit_core/robot_state/src/robot_state.cpp	76.39%	17 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2617      +/-   ##
==========================================
+ Coverage   50.60%   51.00%   +0.41%     
==========================================
  Files         386      387       +1     
  Lines       32085    32285     +200     
==========================================
+ Hits        16232    16465     +233     
+ Misses      15853    15820      -33

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rhaschke

The setter changes are all not necessary. memcpy doesn't access any memory if the number of bytes to write is zero. https://godbolt.org/z/9MrnWP4Kh
The getter changes became required just because of your original PR. Originally, forming the address, e.g. position_ + getFirstVariableIndex() was not accessing the memory yet (now it does and fails) and subsequent methods didn't access the memory either for fixed joints.
The RobotState code was highly optimized before and you degrade that more and more. I have not seen a single example of a real range violation.

sea-bass · 2023-12-18T12:55:17Z

The RobotState code was highly optimized before and you degrade that more and more. I have not seen a single example of a real range violation.

We are trying to modernize an old, decaying code base so it's easier to work with and to be able to make more meaningful contributions in the future.

As such, I don't think blaming someone for "degrading" the code in this effort is productive. We won't get it right the first time, but we'll ideally get somewhere better.

If you have any concrete suggestions on how we can more effectively bring these abstractions to use best C++ practices from this decade, we would welcome your input. But keeping things stagnant because "it's how it's always been done" doesn't seem like a great option either for the reasons @marioprats already mentioned.

rhaschke · 2023-12-18T13:35:58Z

I didn't want to come across offending. Sorry, if I did.
I just have seen several code degradations in MoveIt2 in the past, which arose from not thinking deep enough about the intended code changes in advance. In that sense, I was blaming the whole MoveIt2 maintainer team for adopting supposed improvements too fast. Please think twice or more times about changes.

While Mario's initial PR was kind of innocent regarding performance, the fixup here introduces many more additional guards, which will definitely make the code slower - not only for initialization.

The fundamental problem of this "old-style" code as you name it, is the use of raw pointers in the API. However, this problem is not (yet) addressed. I think a modern way to address this issue is using ranges. However, this would require many many API adaptions...

Alternatively, in order to more easily catch invalid memory accesses (if they occur), you could revert all the changes and assign fixed joints a first_variable_index_ such that the resulting pointer will point outside the allocated array. Thus, accessing this location will (hopefully) cause a segfault. Ideally, the resulting pointer would be the nullptr. But that doesn't work for all, positions, velocities, and accelerations. 😞

In any case, my major point is that moving away from the single memory chunk is a bad idea. The security risks you want to address are not directly related to this optimization, but the raw pointer API. There are mitigations, which keep the memory optimization and harden the API at no extra performance costs, however with huge refactorings...

sea-bass · 2023-12-18T13:43:11Z

Thank you, that's very helpful!

I think we are looking at the possibility of big refactorings because of considerations like accounting for dynamics (not just kinematics), and more importantly having a mutable robot model where joints/links can be added/removed after a URDF is loaded. Tools like Pinocchio support this quite well.

So while throwing something with dynamic memory like std::vector at the problem all the way down seems appealing, you're right that we should think about this in more detail so as not to bloat things too much where not needed.

rhaschke · 2023-12-18T13:56:21Z

I think dynamics can be easily handled with the current code base already. Actually, there is already some dynamics code.
Enabling dynamic RobotModel changes would be very nice to have. However, you will also need a mechanism to invalidate associated RobotStates (or vice versa link the RobotState to a fixed-life-time RobotModel). In any case, the RobotState variables will have a fixed size again (for a given RobotModel), so you can still use a single memory chunk.

marioprats · 2023-12-18T14:55:47Z

I added a basic test to check that getJointPositions works on a fixed joint at the end of a chain, and returns nullptr.
As mentioned earlier and in other places, this is part of a longer term effort to harden core types - not only thinking about users but also about us who need to scale and maintain this long term.

tylerjw · 2023-12-18T18:09:08Z

Thus, accessing this location will (hopefully) cause a segfault.

This is wishful thinking, many invalid memory accesses existed in our test with the old model of not-initalizing the memory in the old memory pool implementation. Relying on ub to catch logic errors has not been shown to be reliable enough.

ranges are not the panacea of performance and safety that the modern C++ evangelists would like you to believe they are. They are worse than raw pointers in most cases and are just an illusion of safety. They are part of the group of pointer-wrapping objects that behave like value types. They are worse for the user with the exception of function arguments to generic functions. Many of our uses of raw pointers are return values from methods. Replacing these with range types would encourage more incorrect use of the API. range types are not first-class fat pointers and if that is what we want we should use a language that includes that concept as first-class in the language.

rhaschke · 2023-12-18T19:12:22Z

Many invalid memory accesses existed in our test with the old memory pool implementation.

I am really curious to see them! So far, all the ones @marioprats pointed out turned out to be edge cases that don't occur in practice (like accessing the joint position of a fixed joint).

ranges are worse than raw pointers in most cases and are just an illusion of safety.

I can't follow this reasoning. Can you give some examples to support your claims?

sea-bass

Tested with this change and the original bug I found is resolved.

I think the other comments in this thread should be taken into consideration for further big architecture decisions, but this PR as is definitely fixes the immediate bug and we should get it in.

tylerjw

I'm happy with this change. Thank you for the test. The only real way to avoid all bounds checking and make safe interfaces at runtime is to make sizes compile-time constants. If we don't like the performance impacts of bounds-checking-per-access in loops we need to build interfaces for looping with indexes.

tylerjw · 2023-12-18T19:51:55Z

I am really curious to see them! So far, all the ones @marioprats pointed out turned out to be edge cases that don't occur in practice (like accessing the joint position of a fixed joint).

When working on this change: 8d2eb90
I uncovered many of our tests did not call setToDefaultValues() on RobotState objects before dereferencing values inside the RobotState. This UB went un-noticed for years until my change which caused these failures to show up all over.

tylerjw · 2023-12-18T20:07:46Z

ranges are worse than raw pointers in most cases and are just an illusion of safety.

I can't follow this reasoning. Can you give some examples to support your claims?

There are several reasons why ranges are not first-class parts of the language and are not "more safe".

A handful of range adapters (filter is a good example) create range types with internal state that has to be mutated when you advance the range. As a result of this range types do not support exterior const. This form of const is also counter-intuitive. It is like having a const std::shared_ptr<T> vs std::shared_ptr<const T>. The second is a different type and the first the interior ptr can be mutated through. ranges behave in the same way with regard to const.

As referenced above, interior const-ness can't be done through language features, and therefore, we have const_range adapters. This means that wherever you use range types, you have a coloring issue about const, and that either shows up macros in your library code, copy-pasting, or users having to deal with you only supporting const or non-const interfaces.

Raw pointers have first-class interior and exterior construction support in a way that these library types never will. Another surprising side-effect is if you take a const range and pass in a non-const range a new pointer-wrapping-object has to be created. shared_ptr provides the classic common example. Consdier this function:

void f(const std::shared_ptr<const int>& data);

And this call site:

auto data = std::make_shared<int>(5);
f(data);

Due to implicit conversions, and std::shared_ptr<const int> and std::shared_ptr<int> being different types, a copy of the shared pointer will be created, and then a pointer to that temporary is created.
If instead the function was defined as:

void f(int const * const data);

No extra copies are created when the user calls it.
Ranges exhibit the same behavior of being second-class to raw pointers.

The "safety" argument for ranges is that they combine the pointer with the size. This only matters if you check that size, they do nothing help you avoid use-after-free bugs. Size can just as easily be passed into functions as a separate argument that take a pointer without loosing interior const-ness.

tylerjw · 2023-12-18T20:16:01Z

Here is a primer on some of the issues with ranges:
https://youtu.be/O8HndvYNvQ4?si=dL-ygWTFNTDoR0Ct

Generally, though, I'm coming around to believing that types that wrap pointers with a value type are an anti-pattern and a lousy can of worms being sold to use by "modern C++ evangelists."

rhaschke · 2023-12-19T10:37:32Z

This means that wherever you use range types, you have a coloring issue about const.

I consider this a feature and not an issue: You can select which code to execute depending on the constness of your range argument. You are right, that conversion from (inner) non-const to const range comes at the cost of a new object creation by default. However, I think you could replace the default behavior with a reinterpret_cast, couldn't you? The problem with this latter approach is that the underlying non-const range still could change, while its const "adapter" pretends constness... Probably that's the reason, they decided for making a copy.

henningkayser

Important fix! But in addition to checking for this specific edge case, I think we should generally check for joint model compatibility with the state's robot model. Otherwise, we can never be sure that the variables of the requested joint model match the internal data structure.

rhaschke · 2023-12-19T13:12:10Z

Here is a primer on some of the issues with ranges:
https://youtu.be/O8HndvYNvQ4?si=dL-ygWTFNTDoR0Ct

Thanks for pointing out this talk. Very interesting. But note: in the talk, he doesn't blame ranges, only views!

tylerjw · 2023-12-19T15:25:48Z

Here is a primer on some of the issues with ranges:
https://youtu.be/O8HndvYNvQ4?si=dL-ygWTFNTDoR0Ct

Thanks for pointing out this talk. Very interesting. But note: in the talk, he doesn't blame ranges, only views!

My understanding is that ranges are a type/compile-time idea. They are a restriction on template arguments that apply to iterator and container types. The new concrete types from the ranges library are these view types. My understanding from talking to people at conferences is that there is no intention of view types ever being used as anything but temporary types. If you never store a view past a temporary lifetime then views are safe, ergonomic, and a nice improvement.

For our uses, though, where we need improvements over our API boundaries (function parameters, return values, state in classes), they don't help much. Also, being a C++20 feature, it'll be a while before we can use them in a ROS 2 project.

All that being said, I think this is a good change and we should merge this.

rhaschke · 2023-12-19T15:44:46Z

My understanding is that ranges are a type idea. The new concrete types from the ranges library are these view types.
For our uses, though, where we need improvements over our API boundaries (function parameters, return values, state in classes), they don't help much. Also, being a C++20 feature, it'll be a while before we can use them in a ROS 2 project.

Maybe you are right. Essentially a range just defines an iterable sequence via begin() and end(). According to the talk, using ranges is fine as long as the range type doesn't have an internal state (which is not true for many std::view types). I think you just want to use subrange, which is just fine.

In any case, I still consider #2546 and this PR a degradation of the code base and not an improvement.
However, you are the MoveIt2 maintainers and it's your decision to follow this route. I just don't support it.

tylerjw · 2023-12-19T15:59:17Z

I think a large part of the problem with ranges is that they are based on the iterator concept (begin and end). I've started experimenting with this library: https://github.com/tcbrindle/flux. The cursor approach taken here does not suffer from the iterator invalidation problems that the stl rangest library does.

rhaschke · 2023-12-19T19:26:58Z

The problem with the cursor concept is that it doesn't scale to non-random access containers.
There are always pros and cons ;-)

fix out-of-bounds memory access with zero-variable joints

3fe896e

marioprats marked this pull request as ready for review December 18, 2023 11:29

marioprats mentioned this pull request Dec 18, 2023

avoid mapping memory and comparing joints with zero variables moveit/moveit_task_constructor#519

Closed

sea-bass requested review from corycrean and sea-bass and removed request for corycrean December 18, 2023 12:11

marioprats mentioned this pull request Dec 18, 2023

Benchmark robot state #2546

Merged

6 tasks

marioprats requested review from sjahr and henningkayser December 18, 2023 12:25

rhaschke reviewed Dec 18, 2023

View reviewed changes

add basic test

35e60a0

typo

50b318b

sea-bass approved these changes Dec 18, 2023

View reviewed changes

tylerjw approved these changes Dec 18, 2023

View reviewed changes

henningkayser reviewed Dec 19, 2023

View reviewed changes

tylerjw merged commit 2e3b61c into moveit:main Dec 19, 2023
10 of 12 checks passed

marioprats mentioned this pull request Dec 20, 2023

Error while planning: [move_group.moveit.planning_pipeline]: Exception caught: 'vector::_M_range_check: __n (which is 3) >= this->size() (which is 3)' #2615

Closed

rhaschke mentioned this pull request Dec 21, 2023

Fix benchmarks #2628

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix out-of-bounds memory access with zero-variable joints #2617

fix out-of-bounds memory access with zero-variable joints #2617

marioprats commented Dec 18, 2023 •

edited

codecov bot commented Dec 18, 2023 •

edited

rhaschke left a comment

sea-bass commented Dec 18, 2023

rhaschke commented Dec 18, 2023

sea-bass commented Dec 18, 2023

rhaschke commented Dec 18, 2023

marioprats commented Dec 18, 2023

tylerjw commented Dec 18, 2023 •

edited

rhaschke commented Dec 18, 2023

sea-bass left a comment

tylerjw left a comment

tylerjw commented Dec 18, 2023

tylerjw commented Dec 18, 2023 •

edited

tylerjw commented Dec 18, 2023

rhaschke commented Dec 19, 2023

henningkayser left a comment

rhaschke commented Dec 19, 2023

tylerjw commented Dec 19, 2023

rhaschke commented Dec 19, 2023

tylerjw commented Dec 19, 2023

rhaschke commented Dec 19, 2023

fix out-of-bounds memory access with zero-variable joints #2617

fix out-of-bounds memory access with zero-variable joints #2617

Conversation

marioprats commented Dec 18, 2023 • edited

codecov bot commented Dec 18, 2023 • edited

Codecov Report

rhaschke left a comment

Choose a reason for hiding this comment

sea-bass commented Dec 18, 2023

rhaschke commented Dec 18, 2023

sea-bass commented Dec 18, 2023

rhaschke commented Dec 18, 2023

marioprats commented Dec 18, 2023

tylerjw commented Dec 18, 2023 • edited

rhaschke commented Dec 18, 2023

sea-bass left a comment

Choose a reason for hiding this comment

tylerjw left a comment

Choose a reason for hiding this comment

tylerjw commented Dec 18, 2023

tylerjw commented Dec 18, 2023 • edited

tylerjw commented Dec 18, 2023

rhaschke commented Dec 19, 2023

henningkayser left a comment

Choose a reason for hiding this comment

rhaschke commented Dec 19, 2023

tylerjw commented Dec 19, 2023

rhaschke commented Dec 19, 2023

tylerjw commented Dec 19, 2023

rhaschke commented Dec 19, 2023

marioprats commented Dec 18, 2023 •

edited

codecov bot commented Dec 18, 2023 •

edited

tylerjw commented Dec 18, 2023 •

edited

tylerjw commented Dec 18, 2023 •

edited