Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for ON_REQUESTED_INCOMPATIBLE_QOS and ON_OFFERED_INCOMPATIBLE_QOS event callbacks in ROS2. #822

Closed
14 tasks
jaisontj opened this issue Nov 16, 2019 · 37 comments
Assignees
Labels
enhancement New feature or request

Comments

@jaisontj
Copy link

jaisontj commented Nov 16, 2019

Feature description

Currently, it is possible for ROS2 users to create publishers and subscriptions to a topic with incompatible QoS policies; for example: subscription to a topic with a deadline smaller than the deadline for a publisher to the topic. This results in the subscription not receiving any data even though the publisher is publishing to the topic. These scenarios are difficult to debug and hence the user needs to be notified/warned when they create publishers/subscriptions with incompatible QoS Policies.

Thankfully, the DDS spec has callbacks in place which help with this, namely on_requested_incompatible_qos and on_offered_incompatible_qos.

Implementation considerations

The implementation can follow the same procedure as the existing event callbacks (on_offered_deadline_missed, on_requested_deadline_missed, on_liveliness_lost and on_liveliness_changed).

Listing out briefly the work required in each package:

  • ros2/rmw
    • Add the RMW_EVENT_REQUESTED_INCOMPATIBLE_QOS and RMW_EVENT_OFFERED_INCOMPATIBLE_QOS values to the enum rmw_event_type_t
    • Define two new structures to hold data regarding the status of these events: rmw_requested_incompatible_qos_status_t and rmw_offered_incompatible_qos_status_t.

Something like this:

struct  rmw_requested_incompatible_qos_status_t
{
   int32_t total_count;
   int32_t total_count_change;
   int32_t last_policy_id;
} ;

Note: QosPolicyCountSeq policies is something that I have skipped and would like to add as an enhancement once this is completed and merged. This is because we need to figure out how memory allocation for the policies array would work. For now these would alone provide a non zero enhancement to the system as users will be notified when creating incompatible qos policies between publishers and subscribers

  • ros2/rcl

    • Add RCL_PUBLISHER_OFFERED_INCOMPATIBLE_QOS and RCL_SUBSCRIPTION_REQUESTED_INCOMPATIBLE_QOS to enums rcl_publisher_event_type_t and rcl_subscription_event_type_t respectively.
    • Handle event types of RCL_PUBLISHER_OFFERED_INCOMPATIBLE_QOS and RCL_SUBSCRIPTION_REQUESTED_INCOMPATIBLE_QOS inside of rmw_publisher_event_init and rmw_subscriber_event_init
    • Unit tests similar to these
  • ros2/rclpy

    • Add RCL_PUBLISHER_OFFERED_INCOMPATIBLE_QOS and RCL_SUBSCRIPTION_REQUESTED_INCOMPATIBLE_QOS to enums QoSPublisherEventType and QoSSubscriptionEventType
    • Define two new payloads that mirror rmw_requested_incompatible_qos_status_t and rmw_offered_incompatible_qos_status_t that were defined in the rmw package. These are defined as follows:
QoSRequestedIncompatibleQoSInfo = NamedTuple(
 'QoSRequestedIncompatibleQoSInfo', [
 ('total_count', 'int'),
 ('total_count_change', 'int'),
 ('last_policy_id', 'int'),
 ])
 
 QoSOfferedIncompatibleQoSInfo = NamedTuple(
 'QoSOfferedIncompatibleQoSInfo', [
 ('total_count', 'int'),
 ('total_count_change', 'int'),
 ('last_policy_id', 'int'),
 ])
  • Enhance the SubscriptionEventCallbacks and PublisherEventCallbacks classes to support the new incompatible_qos callback.
    • define new member incompatible_qos
    • handle it in the create_event_handlers function
  • In _rclpy_qos_event.c
    • Add two entries of type rmw_requested_incompatible_qos_status_t and rmw_offered_incompatible_qos_status_t to the union _qos_event_callback_data
    • Define and implement two functions which convert rmw_requested_incompatible_qos_status_t and rmw_offered_incompatible_qos_status_t to PyObject and return these in the _get_qos_event_data_filler_function_for function
  • Unit tests similar to the ones here
  • ros2/rclcpp

    • Modify the PublisherEventCallbacks and SubscriptionEventCallbacks structs to include a new incompatible_qos_callback. These callback members should be of type std::function<void (rmw_offered_incompatible_qos_status_t &)> and std::function<void (rmw_requested_incompatible_qos_status_t &)> respectively.
    • Handle the above defined callbacks inside the constructor for Publisher and Subscription
  • ros2/demos

    • Add an example showcasing this feature in the quality_of_service_demo package; using both rclpy and rclcpp

DDS Implementations

  • ros2/rmw_fastrtps: Unfortunately, Fastrtps does not have this implemented. I have raised an issue regarding this here.

  • ros2/rmw_opensplice: Opensplice does have this feature implemented and the work on the rmw_implementation package would be:

    • There is a variable called mask_map which holds the mapping between rmw_event_type_t to its equivalent opensplice type. Add a mapping from RMW_EVENT_REQUESTED_INCOMPATIBLE_QOS -> DDS::REQUESTED_INCOMPATIBLE_QOS_STATUS and RMW_EVENT_OFFERED_INCOMPATIBLE_QOS -> DDS::OFFERED_INCOMPATIBLE_QOS_STATUS
    • Handle cases for DDS::OFFERED_INCOMPATIBLE_QOS_STATUS and DDS::REQUESTED_INCOMPATIBLE_QOS_STATUS in OpenSpliceStaticPublisherInfo::get_status and OpenSpliceStaticSubscriberInfo::get_status respectively.
  • ros2/rmw_connext: Research pending; but from this, I reckon that it is implemented.

List of PRs to be tracked towards this:

@jaisontj
Copy link
Author

jaisontj commented Nov 27, 2019

@ivanpauno Could you please take a look at this? The PRs are also open for review.

@jaisontj
Copy link
Author

Extra feature: Adding incompatible event callbacks to ros2 topic echo and ros2 topic pub
PR: ros2/ros2cli:410

@wjwwood
Copy link
Member

wjwwood commented Jan 25, 2020

I gave an initial review of the connect pull requests. I think we need to do a few things related to rmw_fastrtps_cpp (our default currently) not being able to implement this yet:

  • I'm hesitant to merge it without support in fast-rtps, and at least there should be support in our other tier 1, rmw_connext_cpp.
  • I would like to see more detail on what happens when an rmw implementation does not support this, but the user configures it to be on. How does it fail? Does it fail? Is that an acceptable user experience, etc...

I'm inclined to be flexible on these points, so we can get it in if possible, because for the implementations where it does work it will be very useful.

@mm318
Copy link
Member

mm318 commented Mar 12, 2020

  • I would like to see more detail on what happens when an rmw implementation does not support this, but the user configures it to be on. How does it fail? Does it fail? Is that an acceptable user experience, etc...

@wjwwood, if the user registers incompatible QoS event callbacks and the RMW implementation being used does not support it, how about we log a warning and the user's callbacks will never be triggered? If the user does not register any incompatible QoS event callbacks, we do not log/issue such a warning.

@wjwwood
Copy link
Member

wjwwood commented Mar 12, 2020

I don't feel like a warning is sufficient myself. If the user wanted a callback for this, it should fail programmatically with an exception or return code. If the user chooses to do so, they may ignore this error and log a warning themselves. But allowing it to essentially pass silently when the user asks to be called when this event occurs (warnings from libraries are often overlooked) I think is a bad idea.

Also, I think unless the user defines their own callback it would be nice to have a default callback which logs a warning when this event occurs. It would be ok, in this default case, to ignore such an error as "unimplemented" I think. Though I'd like to see a notice or warning that this is the case, something like (once) "[WARN] the rmw implementation being used does not support ON_REQUESTED_INCOMPATIBLE_QOS events, you will not be notified when publishers and subscribers on the same topic do not have matching QoS".

@mm318
Copy link
Member

mm318 commented Mar 12, 2020

Okay, that's reasonable. Between throwing an exception and returning an error code, this is how I imagine them:

  • Exception: A throw occurs at time of attempting to register a callback in rclcpp/rclpy. rclcpp/rclpy will query the rmw implementation if such an event callback is supported. This behavior can be made universal for all event types, not just for incompatible QoS mismatch events. This will require adding new rcl/rmw APIs.
  • Error code: An rmw implementation that doesn't support this event will fire on startup one occurrence of this event, but the last_policy_kind field will contain a value that indicates an error, as opposed to a value that indicates a type of incompatible QoS policy. It is up to the user how to handle this error code.

I think having a default callback is a good idea as well.

@wjwwood
Copy link
Member

wjwwood commented Mar 12, 2020

Registration of this callback happens at creation of the pub/sub right? In that case I think it would be best to just have the creation fail. The caller can check the reason and retry without the callback if they so chose. That would avoid the need for "is supported" like API's.

@mm318
Copy link
Member

mm318 commented Mar 12, 2020

Correct, the registration of the callbacks happen at the creation of the publisher/subscription. "Is supported"-like APIs would still be needed even if we're returning an error code instead of an exception on creation of the publisher/subscription, or else those callbacks would just simply never fire.

Would returning an error code work? If the creation of a publisher/subscription fails, it will return a NULL shared_ptr. Where will the caller check the reason of failure?

EDIT: Oh okay, I think you're saying fail at the rmw_create_publisher/rmw_create_subscription level. While create_publisher/create_subscription at the rclcpp level does do the registration of callbacks, I am not sure if that's the case at the rmw level.

@wjwwood
Copy link
Member

wjwwood commented Mar 12, 2020

Oh I see, nvm, you're right that the support is not checked below rmw at creation of the pub/sub.

Instead, we could check if you tried to add those to the wait set and error at that point. So you'd get no exception if you created a pub/sub with the callback registered, but when you tried to spin on it, then it would fail due to not supporting that kind of event in the wait set.

@mm318
Copy link
Member

mm318 commented Mar 12, 2020

I think there might be a way. When create_publisher()/create_subscription() from rclcpp is called, it will create rmw_publisher_t/rmw_subscription_t as well as some rmw_event_ts. We can check at the time of creating the rmw_event_ts if the event type is supported or not, and bubble up the rmw_ret_t error to rclcpp's create_publisher()/create_subscription().

EDIT: Oh, it actually looks like __rmw_init_event() can't do any checks for whether or not the rmw_event_type_t is supported, unless it goes into the rmw_implementation.

@mm318
Copy link
Member

mm318 commented Mar 12, 2020

Instead, we could check if you tried to add those to the wait set and error at that point. So you'd get no exception if you created a pub/sub with the callback registered, but when you tried to spin on it, then it would fail due to not supporting that kind of event in the wait set.

That sounds like a bad experience. How would the user code respond at the point? It would need to recreate a bunch of publishers and subscriptions, as we don't support removing callbacks from already created publishers/subscriptions. Even if we did, I think that would make the user's code quite messy (ie. create publishers/subscriptions, do a trial spin, remove some callbacks, start spinning for real).

I think if we're okay with seeing errors after spinning starts (instead of at time of publisher/subscription creation / before spinning starts), having the callback return a last_policy_kind value that indicates unsupported is a better approach.

It looks like in order to see errors before spinning starts, new APIs would be needed.

@wjwwood
Copy link
Member

wjwwood commented Mar 12, 2020

That sounds like a bad experience.

I agree, I was just offering that we could check there too. Sounds like checking when we create the event types is the right way to do it.

@mm318
Copy link
Member

mm318 commented Mar 12, 2020

So we'll add a new is_event_type_supported rmw_implementation API? We won't need to bubble this up to rcl; rmw just needs to communicate with the underlying rmw_implementation and fail at rmw_publisher_event_init()/rmw_subscription_event_init().

@ivanpauno
Copy link
Member

I agree, I was just offering that we could check there too. Sounds like checking when we create the event types is the right way to do it.

👍

So we'll add a new is_event_type_supported rmw_implementation API? We won't need to bubble this up to rcl; rmw just needs to communicate with the underlying rmw_implementation and fail at rmw_publisher_event_init()/rmw_subscription_event_init().

Can't rmw_publisher_event_init just fail when the passed event type isn't supported?

@mm318
Copy link
Member

mm318 commented Mar 12, 2020

Can't rmw_publisher_event_init just fail when the passed event type isn't supported?

That's what I am suggesting we do, but rmw will need to query the rmw_implementation via a new API to see if the passed event type is supported or not.

@ivanpauno
Copy link
Member

That's what I am suggesting we do, but rmw will need to query the rmw_implementation via a new API to see if the passed event type is supported or not.

Oh, I didn't see that that function was implemented in rmw itself. Your proposal SGTM!

@mm318
Copy link
Member

mm318 commented Mar 13, 2020

I think we still haven't settled on how rclcpp/rclpy create_publisher() should respond to a failing rmw_publisher_event_init() call. These are some example ways that I can think of:

  • rclcpp's/rclpy's create_publisher() will throw an exception
  • rclcpp's/rclpy's create_publisher() will return nullptr/None and set an error code somewhere (is this currently being done or possible in rclcpp/rclpy?)
  • rclcpp's/rclpy's create_publisher() will return a valid publisher that does not have the requested callback registered, and set an error code somewhere

What do you think, @wjwwood and @ivanpauno?

@ivanpauno
Copy link
Member

rclcpp's/rclpy's create_publisher() will throw an exception

That's the best option IMO.

@mm318
Copy link
Member

mm318 commented Mar 16, 2020

Is it recommendable for a function in rmw to call a function in rmw_implementation? I am currently running into many instances of undefined reference errors when trying to build unit tests.

@mm318
Copy link
Member

mm318 commented Mar 17, 2020

Functions in rmw cannot call functions implemented in the rmw implementation.

Yeah, that's what I found out.

  • move the event init functions into the middleware implementations and drop the "is supported" functions all together.

I have indeed implemented it this way. The following are the set of pull requests related to this feature:

(rclpy did not need any changes as far as I can tell.)

@ivanpauno
Copy link
Member

@mm318 @wjwwood The proposed solution is ok, though it involves some duplicated code.
Another way to solve the dependency inversion problem is to avoid showing an error in rmw, and instead do it in the next layer (rcl). i.e.: rmw_event_type_is_supported is directly called from rcl_*_event_init.

@mm318
Copy link
Member

mm318 commented Mar 17, 2020

If there are no strong objections, I'd like to keep it the way it is. We're now at least able to properly check the implementation_identifier of the rmw_publisher_t/rmw_subscription_t that was passed in (we used to only check it against nullptr).

@wjwwood
Copy link
Member

wjwwood commented Mar 17, 2020

I don’t care strongly either though I have a slight preference for keeping it as is. It leaves the door open for other optimizations for the middleware impl and it ensures we don’t get in an inconsistent state event wise (if someone were to use rmw directly but not check if it is supported first, for example).

@ivanpauno
Copy link
Member

@mm318 It would be great to open a PR in rclpy too.

@mm318
Copy link
Member

mm318 commented Mar 17, 2020

@mm318 It would be great to open a PR in rclpy too.

Looking at the code in rclpy, I didn't think any changes were needed as a different error message will just be gotten from rcl_get_error_string().

@ivanpauno, perhaps do you think this should be a made into a PyExc_TypeError instead when ret == RCL_RET_UNSUPPORTED?

@ivanpauno
Copy link
Member

@ivanpauno, perhaps do you think this should be a made into a PyExc_TypeError instead when ret == RCL_RET_UNSUPPORTED?

I think that a custom exception (e.g.: EventTypeUnsupported) would be the best, though it's not required. You can left the code as-is, and a custom exception can be added in the future if really needed.

@ivanpauno
Copy link
Member

This one still should be finished with the others: ros2/rclpy#459.
right?

@mm318
Copy link
Member

mm318 commented Mar 17, 2020

Right. There are two sets of pull requests now.

We should merge in this set first:

Then look at this set of pull requests (rmw implementation repos will be added later to this set):

@ivanpauno
Copy link
Member

Right. There are two sets of pull requests now.

First set have been merged!

@mm318
Copy link
Member

mm318 commented Mar 26, 2020

Also, I think unless the user defines their own callback it would be nice to have a default callback which logs a warning when this event occurs. It would be ok, in this default case, to ignore such an error as "unimplemented" I think. Though I'd like to see a notice or warning that this is the case, something like (once) "[WARN] the rmw implementation being used does not support ON_REQUESTED_INCOMPATIBLE_QOS events, you will not be notified when publishers and subscribers on the same topic do not have matching QoS".

@wjwwood can we implement this as a follow-up?

@wjwwood
Copy link
Member

wjwwood commented Mar 26, 2020

Yeah, sure.

@mm318
Copy link
Member

mm318 commented Apr 1, 2020

I have opened issues ros2/rclcpp#1048 and ros2/rclpy#533 for tracking the addition of default incompatible QoS callbacks to rclcpp and rclpy respectively.

@mm318
Copy link
Member

mm318 commented Apr 15, 2020

All implementation work (except for rmw_fastrtps_cpp, which is gated on support by FastRTPS) is now complete. This issue may be closed. (Remaining rmw_fastrtps_cpp work is still being tracked at ros2/rmw_fastrtps#356.)

@ivanpauno
Copy link
Member

Sounds good, thanks for your work @jaisontj @mm318 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants