Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize thread-safety documentation #1004

Open
hidmic opened this issue Aug 12, 2020 · 13 comments
Open

Standardize thread-safety documentation #1004

hidmic opened this issue Aug 12, 2020 · 13 comments
Assignees

Comments

@hidmic
Copy link
Contributor

hidmic commented Aug 12, 2020

Feature request

Feature description

Often, though not consistently, ROS 2 APIs explicitly state their thread (un)safety. These binary statements, when not qualified by a footnote (and even then), are quite unspecific. There's no mention as to:

  • whether a function can be called concurrently with itself or not (re-entrancy),
  • whether a function can be called concurrently with other functions in a given group or not,
  • whether a function can be called concurrently on the same input and/or output argument or not.
  • whether a function can be called concurrently on different (but potentially shared) input and/or output argument or not.

As a result, we have a vague qualifier for authors, reviewers, and users.

There have been some exchanges in the past, but still it creeps up every now and then. I think we should sit and unambiguously standardize how we document thread-safety.

Implementation considerations

It's worth exploring how other software standards and libraries do this.

For POSIX:

MT-Safe or Thread-Safe functions are safe to call in the presence of other threads.

Being MT-Safe does not imply a function is atomic, nor that it uses any of the memory synchronization mechanisms POSIX exposes to users. It is even possible that calling MT-Safe functions in sequence does not yield an MT-Safe combination.

Functions like memset or free are tagged as MT-Safe -- that is, no guarantees are made when using these functions' arguments concurrently with the same functions or others.

For Qt:

A thread-safe function can be called simultaneously from multiple threads, even when the invocations use shared data, because all references to the shared data are serialized.

A reentrant function can also be called simultaneously from multiple threads, but only if each invocation uses its own data.

Perhaps we should talk about reentrant functions and thread-safe data structures? Maybe we should state if reads or writes are to be expected on any given argument?

@hidmic
Copy link
Contributor Author

hidmic commented Aug 12, 2020

FYI @ivanpauno

@hidmic
Copy link
Contributor Author

hidmic commented Aug 12, 2020

Also, @ros2/team to kickstart the discussion.

@ivanpauno
Copy link
Member

whether a function can be called concurrently with itself or not,

isn't this exactly the definition of re-entrant?
I tend to forget definitions, so maybe I'm wrong.

whether a function can be called concurrently with other functions in a given group or not,

This will usually only matter if the same object is passed to different functions of the group concurrently (you can also have problems if they share global state, but we probably want to avoid that).


I would also like to differentiate the case of "it's safe to call if concurrently" from "concurrent calls have some sort of memory ordering ensured".
The second is a stronger guarantee than the first one.

@ivanpauno
Copy link
Member

We should take an account why we're documenting this:

  • In the case of rmw API, it's a requirement to the rmw implementation to ensure the specified thread safety.
  • In the case of rcl API, it's a hint for specific clients (e.g. rclcpp) so they can protect/synchronize function calls accordingly.

IMO if differentiating some of this cases will help rmw implementations and client libraries to get thread safety right, being more specific does worth it.

@hidmic
Copy link
Contributor Author

hidmic commented Aug 12, 2020

isn't this exactly the definition of re-entrant?

Whoops, yeah. Updated the description.

you can also have problems if they share global state, but we probably want to avoid that

We should do our best to avoid that in rcl for sure, but I will say that every requirement we put in rmw APIs constrains third-party implementations.

"it's safe to call if concurrently" from "concurrent calls have some sort of memory ordering ensured"

That's a good point. We should also define what we mean by safe. Program termination isn't the worst prospect. Races and contentions breaking caller expectations are much harder to track and fix.


IMHO if we want to make thread-safety an API requirement (and I think we have to, to some extent), we have to be specific. Also, the less unnecessary synchronization we do in upper layers the better the overall performance will be.

@gbiggs
Copy link
Member

gbiggs commented Aug 13, 2020

whether a function can be called concurrently with other functions in a given group or not,

This sounds like it could be really hard to track. I guess it would depend on these groups being very well specified?

A better angle might be to talk about thread-safe data structures rather than whether a group of functions can operate concurrently on a single instance.

Do you have an example of where you see this being used? For example the context structure when used in any of the rmw interface functions?

@gbiggs
Copy link
Member

gbiggs commented Aug 13, 2020

if we want to make thread-safety an API requirement (and I think we have to, to some extent)

+1 to making it a requirement.

@ivanpauno
Copy link
Member

ivanpauno commented Aug 18, 2020

A better angle might be to talk about thread-safe data structures rather than whether a group of functions can operate concurrently on a single instance.

I think that @hidmic comment was going in that direction.
A function can be thread safely applied over a data structure, and another function can maybe not be applied thread safely over the same data structure.

@gbiggs
Copy link
Member

gbiggs commented Aug 19, 2020

What I meant was talking about data structures that contain their own locking or similar mechanisms to ensure they are thread safe, rather than putting the onus on the functions to declare that they use the data structure safely.

A function can be thread safely applied over a data structure

I don't understand what this means. Do you mean something like "all functions that are thread-safe over data structure X only ever read it"? That would imply that even though those functions are thread safe over X, you cannot use them at the same time as functions that write X or you would lose that thread safety.

On the other hand, if you mean "this function uses the locks to manage its access to X" then I think that you may as well just talk about whether X has associated locks or not, and make all functions that touch X thread-safe with respect to X by using the locks. Anything that doesn't use the locks would be broken rather than just "not thread safe".

@hidmic
Copy link
Contributor Author

hidmic commented Aug 19, 2020

What I meant was talking about data structures that contain their own locking or similar mechanisms to ensure they are thread safe, rather than putting the onus on the functions to declare that they use the data structure safely.

This makes sense to me. It'd seem to me that if we document (a) data structures' support for concurrent access (or lack thereof), and (b) functions' reentrancy and argument mutability (not always implied by the arguments' type, e.g. allocators), we would be painting a complete picture.

that would imply that even though those functions are thread safe over X, you cannot use them at the same time as functions that write X or you would lose that thread safety.

I will say that knowing when a read-write lock can be used instead of a mutex would be valuable.

Anything that doesn't use the locks would be broken rather than just "not thread safe".

Agreed.

@ivanpauno
Copy link
Member

ivanpauno commented Aug 19, 2020

I don't understand what this means. Do you mean something like "all functions that are thread-safe over data structure X only ever read it"? That would imply that even though those functions are thread safe over X, you cannot use them at the same time as functions that write X or you would lose that thread safety.

👍 Reconsidering my previous comment, I agree with this.
Only specifying if a data structure is thread safe or not is more logical.

@clalancette
Copy link
Contributor

@ros2/team ping on this one for any other ideas or opinions here.

@ivanpauno
Copy link
Member

ros2/rmw#270 introduced extra paragraphs to explain thread safety of some functions, which I think is nice.
I think it would also be nice to have a note together with structures, commenting if they are expected to be thread safe or not.
e.g.: rmw_publisher_t is expected to be a thread safe object.

I think that would be enough to have thread-safety clearly documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants