Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix out-of-bounds memory access with zero-variable joints #2617

Merged
merged 3 commits into from Dec 19, 2023

Conversation

marioprats
Copy link
Contributor

@marioprats marioprats commented Dec 18, 2023

This change updates some methods to avoid an out-of-bounds access after the recent changes to use std::vector as the underlying storage in RobotState (#2546).
Before that change, there was nothing stopping us from passing or returning invalid memory addresses. But now that will trigger range_check exceptions.
This change fixes an out-of-bounds access that can happen with fixed joint models with a variable count of 0, by special-handling that case where needed. It also moves the implementation of those functions to the source file, to not clutter the header file further.

Copy link

codecov bot commented Dec 18, 2023

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (0807cb2) 50.60% compared to head (50b318b) 51.00%.

Files Patch % Lines
moveit_core/robot_state/src/robot_state.cpp 76.39% 17 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2617      +/-   ##
==========================================
+ Coverage   50.60%   51.00%   +0.41%     
==========================================
  Files         386      387       +1     
  Lines       32085    32285     +200     
==========================================
+ Hits        16232    16465     +233     
+ Misses      15853    15820      -33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@rhaschke rhaschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The setter changes are all not necessary. memcpy doesn't access any memory if the number of bytes to write is zero. https://godbolt.org/z/9MrnWP4Kh
The getter changes became required just because of your original PR. Originally, forming the address, e.g. position_ + getFirstVariableIndex() was not accessing the memory yet (now it does and fails) and subsequent methods didn't access the memory either for fixed joints.
The RobotState code was highly optimized before and you degrade that more and more. I have not seen a single example of a real range violation.

@sea-bass
Copy link
Contributor

The RobotState code was highly optimized before and you degrade that more and more. I have not seen a single example of a real range violation.

We are trying to modernize an old, decaying code base so it's easier to work with and to be able to make more meaningful contributions in the future.

As such, I don't think blaming someone for "degrading" the code in this effort is productive. We won't get it right the first time, but we'll ideally get somewhere better.

If you have any concrete suggestions on how we can more effectively bring these abstractions to use best C++ practices from this decade, we would welcome your input. But keeping things stagnant because "it's how it's always been done" doesn't seem like a great option either for the reasons @marioprats already mentioned.

@rhaschke
Copy link
Contributor

I didn't want to come across offending. Sorry, if I did.
I just have seen several code degradations in MoveIt2 in the past, which arose from not thinking deep enough about the intended code changes in advance. In that sense, I was blaming the whole MoveIt2 maintainer team for adopting supposed improvements too fast. Please think twice or more times about changes.

While Mario's initial PR was kind of innocent regarding performance, the fixup here introduces many more additional guards, which will definitely make the code slower - not only for initialization.

The fundamental problem of this "old-style" code as you name it, is the use of raw pointers in the API. However, this problem is not (yet) addressed. I think a modern way to address this issue is using ranges. However, this would require many many API adaptions...

Alternatively, in order to more easily catch invalid memory accesses (if they occur), you could revert all the changes and assign fixed joints a first_variable_index_ such that the resulting pointer will point outside the allocated array. Thus, accessing this location will (hopefully) cause a segfault. Ideally, the resulting pointer would be the nullptr. But that doesn't work for all, positions, velocities, and accelerations. 😞

In any case, my major point is that moving away from the single memory chunk is a bad idea. The security risks you want to address are not directly related to this optimization, but the raw pointer API. There are mitigations, which keep the memory optimization and harden the API at no extra performance costs, however with huge refactorings...

@sea-bass
Copy link
Contributor

Thank you, that's very helpful!

I think we are looking at the possibility of big refactorings because of considerations like accounting for dynamics (not just kinematics), and more importantly having a mutable robot model where joints/links can be added/removed after a URDF is loaded. Tools like Pinocchio support this quite well.

So while throwing something with dynamic memory like std::vector at the problem all the way down seems appealing, you're right that we should think about this in more detail so as not to bloat things too much where not needed.

@rhaschke
Copy link
Contributor

I think dynamics can be easily handled with the current code base already. Actually, there is already some dynamics code.
Enabling dynamic RobotModel changes would be very nice to have. However, you will also need a mechanism to invalidate associated RobotStates (or vice versa link the RobotState to a fixed-life-time RobotModel). In any case, the RobotState variables will have a fixed size again (for a given RobotModel), so you can still use a single memory chunk.

@marioprats
Copy link
Contributor Author

I added a basic test to check that getJointPositions works on a fixed joint at the end of a chain, and returns nullptr.
As mentioned earlier and in other places, this is part of a longer term effort to harden core types - not only thinking about users but also about us who need to scale and maintain this long term.

@tylerjw
Copy link
Contributor

tylerjw commented Dec 18, 2023

Thus, accessing this location will (hopefully) cause a segfault.

This is wishful thinking, many invalid memory accesses existed in our test with the old model of not-initalizing the memory in the old memory pool implementation. Relying on ub to catch logic errors has not been shown to be reliable enough.

ranges are not the panacea of performance and safety that the modern C++ evangelists would like you to believe they are. They are worse than raw pointers in most cases and are just an illusion of safety. They are part of the group of pointer-wrapping objects that behave like value types. They are worse for the user with the exception of function arguments to generic functions. Many of our uses of raw pointers are return values from methods. Replacing these with range types would encourage more incorrect use of the API. range types are not first-class fat pointers and if that is what we want we should use a language that includes that concept as first-class in the language.

@rhaschke
Copy link
Contributor

Many invalid memory accesses existed in our test with the old memory pool implementation.

I am really curious to see them! So far, all the ones @marioprats pointed out turned out to be edge cases that don't occur in practice (like accessing the joint position of a fixed joint).

ranges are worse than raw pointers in most cases and are just an illusion of safety.

I can't follow this reasoning. Can you give some examples to support your claims?

Copy link
Contributor

@sea-bass sea-bass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with this change and the original bug I found is resolved.

I think the other comments in this thread should be taken into consideration for further big architecture decisions, but this PR as is definitely fixes the immediate bug and we should get it in.

Copy link
Contributor

@tylerjw tylerjw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this change. Thank you for the test. The only real way to avoid all bounds checking and make safe interfaces at runtime is to make sizes compile-time constants. If we don't like the performance impacts of bounds-checking-per-access in loops we need to build interfaces for looping with indexes.

@tylerjw
Copy link
Contributor

tylerjw commented Dec 18, 2023

I am really curious to see them! So far, all the ones @marioprats pointed out turned out to be edge cases that don't occur in practice (like accessing the joint position of a fixed joint).

When working on this change: 8d2eb90
I uncovered many of our tests did not call setToDefaultValues() on RobotState objects before dereferencing values inside the RobotState. This UB went un-noticed for years until my change which caused these failures to show up all over.

@tylerjw
Copy link
Contributor

tylerjw commented Dec 18, 2023

ranges are worse than raw pointers in most cases and are just an illusion of safety.

I can't follow this reasoning. Can you give some examples to support your claims?

There are several reasons why ranges are not first-class parts of the language and are not "more safe".

A handful of range adapters (filter is a good example) create range types with internal state that has to be mutated when you advance the range. As a result of this range types do not support exterior const. This form of const is also counter-intuitive. It is like having a const std::shared_ptr<T> vs std::shared_ptr<const T>. The second is a different type and the first the interior ptr can be mutated through. ranges behave in the same way with regard to const.

As referenced above, interior const-ness can't be done through language features, and therefore, we have const_range adapters. This means that wherever you use range types, you have a coloring issue about const, and that either shows up macros in your library code, copy-pasting, or users having to deal with you only supporting const or non-const interfaces.

Raw pointers have first-class interior and exterior construction support in a way that these library types never will. Another surprising side-effect is if you take a const range and pass in a non-const range a new pointer-wrapping-object has to be created. shared_ptr provides the classic common example. Consdier this function:

void f(const std::shared_ptr<const int>& data);

And this call site:

auto data = std::make_shared<int>(5);
f(data);

Due to implicit conversions, and std::shared_ptr<const int> and std::shared_ptr<int> being different types, a copy of the shared pointer will be created, and then a pointer to that temporary is created.
If instead the function was defined as:

void f(int const * const data);

No extra copies are created when the user calls it.
Ranges exhibit the same behavior of being second-class to raw pointers.

The "safety" argument for ranges is that they combine the pointer with the size. This only matters if you check that size, they do nothing help you avoid use-after-free bugs. Size can just as easily be passed into functions as a separate argument that take a pointer without loosing interior const-ness.

@tylerjw
Copy link
Contributor

tylerjw commented Dec 18, 2023

Here is a primer on some of the issues with ranges:
https://youtu.be/O8HndvYNvQ4?si=dL-ygWTFNTDoR0Ct

Generally, though, I'm coming around to believing that types that wrap pointers with a value type are an anti-pattern and a lousy can of worms being sold to use by "modern C++ evangelists."

@rhaschke
Copy link
Contributor

This means that wherever you use range types, you have a coloring issue about const.

I consider this a feature and not an issue: You can select which code to execute depending on the constness of your range argument. You are right, that conversion from (inner) non-const to const range comes at the cost of a new object creation by default. However, I think you could replace the default behavior with a reinterpret_cast, couldn't you? The problem with this latter approach is that the underlying non-const range still could change, while its const "adapter" pretends constness... Probably that's the reason, they decided for making a copy.

Copy link
Member

@henningkayser henningkayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important fix! But in addition to checking for this specific edge case, I think we should generally check for joint model compatibility with the state's robot model. Otherwise, we can never be sure that the variables of the requested joint model match the internal data structure.

@rhaschke
Copy link
Contributor

Here is a primer on some of the issues with ranges:
https://youtu.be/O8HndvYNvQ4?si=dL-ygWTFNTDoR0Ct

Thanks for pointing out this talk. Very interesting. But note: in the talk, he doesn't blame ranges, only views!

@tylerjw
Copy link
Contributor

tylerjw commented Dec 19, 2023

Here is a primer on some of the issues with ranges:
https://youtu.be/O8HndvYNvQ4?si=dL-ygWTFNTDoR0Ct

Thanks for pointing out this talk. Very interesting. But note: in the talk, he doesn't blame ranges, only views!

My understanding is that ranges are a type/compile-time idea. They are a restriction on template arguments that apply to iterator and container types. The new concrete types from the ranges library are these view types. My understanding from talking to people at conferences is that there is no intention of view types ever being used as anything but temporary types. If you never store a view past a temporary lifetime then views are safe, ergonomic, and a nice improvement.

For our uses, though, where we need improvements over our API boundaries (function parameters, return values, state in classes), they don't help much. Also, being a C++20 feature, it'll be a while before we can use them in a ROS 2 project.


All that being said, I think this is a good change and we should merge this.

@tylerjw tylerjw merged commit 2e3b61c into moveit:main Dec 19, 2023
10 of 12 checks passed
@rhaschke
Copy link
Contributor

My understanding is that ranges are a type idea. The new concrete types from the ranges library are these view types.
For our uses, though, where we need improvements over our API boundaries (function parameters, return values, state in classes), they don't help much. Also, being a C++20 feature, it'll be a while before we can use them in a ROS 2 project.

Maybe you are right. Essentially a range just defines an iterable sequence via begin() and end(). According to the talk, using ranges is fine as long as the range type doesn't have an internal state (which is not true for many std::view types). I think you just want to use subrange, which is just fine.

In any case, I still consider #2546 and this PR a degradation of the code base and not an improvement.
However, you are the MoveIt2 maintainers and it's your decision to follow this route. I just don't support it.

@tylerjw
Copy link
Contributor

tylerjw commented Dec 19, 2023

I think a large part of the problem with ranges is that they are based on the iterator concept (begin and end). I've started experimenting with this library: https://github.com/tcbrindle/flux. The cursor approach taken here does not suffer from the iterator invalidation problems that the stl rangest library does.

@rhaschke
Copy link
Contributor

The problem with the cursor concept is that it doesn't scale to non-random access containers.
There are always pros and cons ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants