-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds design document for Deadline, Liveliness, and Lifespan. #212
Conversation
This commit adds a new design document that explores how support should be added to ROS for Deadline, Liveliness, and Lifespan QoS policies. Relates to issue: ros2/rclcpp#572
Please see the developer guide regarding the style of markdown files. |
I think there should be a section considering how this proposal affects RMW implementations which are not DDS based. |
@ros2/rmw_implementations FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I like the direction this is going, but the technical details need to be worked out more.
We had an offline review about this as well with a few people. The following are the notes from that meeting. Responses and updates to the open questions and comments above coming in the next day.
|
I don't agree with treating actions separately for QoS. While it may be like that now, I think the design should be unified. Either that or treat services in the same way. Having an API for checking RMW capabilities would be useful, and I agree that it's a separate discussion. |
A few minor notes: All three of the policies mentioned apply on a per-instance basis in DDS. In the not too distant future, ROS2 will support IDL 4.2, which will allow users to annotate fields as keys. Even though there are not currently any plans (that I know of) to support a concept of keyed fields in RMW, this may still result in the underlying DDS implementation using the keys to create instances. This could lead to some unintended consequences if the implementation in RMW does not take instances into account. One example is that in Connext, there is no guaranteed way to be notified of all instances within a datareader's (or writer's) cache that have violated a deadline:
I've spoken with some folks at RTI, and they agree that there should probably be a way to query or be alerted of all outstanding violations. I think it amounted to this aspect of the deadline API not being explicitly specified in the standard. I think you may also end up running into this issue with Liveliness unless you use a ReadCondition to handle the detection, as LivelinessChangedStatus similarly only contains a handle to the last instance handle that triggered it. All of this to say that there should probably be some footnotes somewhere in the docs if these features are implemented with only single instances in mind! I often wonder if for features like this it might just be better to have a DDS-specific interface if they may not be generally applicable across the entire ROS2 ecosystem. As someone who is working on multiple projects which use native DDS in production and have some hopes of moving completely to ROS2, I would find this very useful and be willing to help out. I think this discussion around RMW "capabilities" could be a good starting point. |
I believe that there was a design discussion that occurred around this topic during the design of Actions. I'm basing my design of not handling Actions explicitly like we do Topics and Services based on the interface that Actions currently implement that explicitly exposes the QoS policy for the underlying topics and also the lack of direct support for Actions in the rmw layer.
I've updated the interface slightly to allow for multiple violations instead of only the latest, but noted that it is dependent on the RMW layers implementation.
This seems like it would be a bigger problem outside even this design. Even if the RMW layer doesn't expose keyed fields it shouldn't break anything if the underlying DDS implementation needs to start using them. That seems like something that will need to be solved in the migration to IDL 4.2
|
Are there any other outstanding concerns with this design? |
I didn't know there was a first virtual meeting. I'd certainly like to have another to talk over any latent concerns and clarify points from the comments if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for taking the time to write it down. I do think it would be good to have a live conversation about it just to make sure we're all on the same page before merging this, but it's mostly to clarify details. I don't see any major blockers.
For Topic Subscribers it establishes the maximum amount of time allowed to pass between receiving messages. | ||
For Topic Publishers it establishes the maximum amount of time allowed to pass between sending messages. | ||
For Service Owners it establishes the maximum amount of time allowed to pass between receiving a request and when a response for that request is sent. | ||
For Service Clients it establishes the maximum amount of time allowed to pass between sending a request and when a response for that request is received. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a behavior that DDS ensures (like in RTI's request/reply API)? or is it something that we would ensure in rcl
and above? (presumably by emulating the behavior of deadline QoS for topics)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I looked before I did not see anything in the DDS RPC spec about deadline behavior for the request/reply pattern. This is something that would need to be tracked in the ROS layers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wjwwood, @nburek: There is LatencyBudged which is unfortunately just a hint to the DDS implementation and not enforced.
Maybe it makes sense to use separate two semantically different things by calling them differently in ROS 2:
- Deadline: An instance must be updated within a specified duration.
- Latency budget: Maximum acceptable delay from the time the data is written to the time it is received by the subscribing application.
The notation of a latency budget could then implemented enforceable on the ROS 2 level for pub/sub, RPC and actions as an additional QOS. Which could also be made even more useful by allowing the user to select between sender timestamp and data timestamp (stored in ROS 2 stamped types) for discerning if the latency budget was exceeded.
- How does the Deadline policy take into account the additional overhead of ROS (such as deserialization) when determining if a deadline was missed? | ||
- As a simplification it is not going to attempt to take into account any ROS overhead. A deadline will be considered missed if the rmw layer does not receive a message by the deadline and not if the user application on top of ROS does not receive it by the deadline. A new deadline policy could be added later that takes this into account. | ||
- What happens if multiple policy violations occur between the invocation of the callback? | ||
- This is dependent on the rmw implementation. If the rmw layer supports tracking multiple status events then it can return them in subsequent calls to rmw_take_status. If it does not support tracking multiple events at once then it may just return one of the events, such as the first or the last to occur in the sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be one way or the other and guaranteed by rmw, because I don't think it will acceptable to say "sometimes you'll get all the events, but in other situations you may not". The easy solution is to only ever have the latest event available. Again we can discuss this in more detail in a live meeting.
Let's plan to have another virtual meeting to discuss the remaining concerns. Please let me know if you can't attend and I can find another time. I'll post on discourse as well, but I want to make sure everyone already involved in this review can attend if they want. Currently I'm thinking Wednesday March 6th at 9:00am PST (UTC-08:00) You have been invited to an online meeting, powered by Amazon Chime.
Meeting ID: 8643 56 2590
United States Toll-Free: +1 855-552-4463 One-click Mobile Dial-in (United States (1)): +1 206-462-5569,,8643562590# United States (1): +1 206-462-5569 |
That time doesn't work for me, unfortunately. Due to family commitments, realistically the latest I can start is midnight and the earliest I can start is 7AM (Tokyo time). If you can find a time outside those, I'll be grateful. If not, then please record the meeting so I can listen to it later. |
@nburek since all participants on this thread are either from PST or JST time zone, a time between 3PM-9PM JST would work well: https://savvytime.com/converter/jst-to-pst. |
I've made updates to the design document that we discussed at the virtual meeting a couple weeks back. The biggest update was to move Services to future work. I think the last big open question was if we needed to support buffering events of the same type so that the application wouldn't miss an instance of an event. After a little more research we decided not to add a buffer but to coalesce multiple events for the same instance as that appears to be what the DDS api is tailored towards at the moment. |
I've updated the design doc to match the details of the implementation. It is ready for a final review before being merged. |
|
||
The liveliness policy establishes a contract for how entities report that they are still alive. | ||
For Subscriptions it establishes the level of reporting that they require from the Publishers to which they are subscribed. | ||
For Publishers it establishes the level of reporting that they will provide to Subscribers that they are alive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we merge this as-is, then we need to have an issue which describes the work to be done in order to properly support liveliness with instances.
Right now, we assume the data model is one to one with publishers and subscriptions. If someone were to use a keyed data type with instances, then I'm not sure how this would break. It is already possible for users to create keyed types using DDS style IDL files in ROS.
It may need to be mentioned as a limitation in the API docs as well until we decide what to do differently.
To be clear, I support merging this as-is, but I would like to see one of the authors open up the issue that describes the necessary follow up work and/or the current limitations w.r.t. instances and keyed data.
|
||
Both the Deadline and Liveliness policies generate events from the rmw layer of which the application will need to be informed. | ||
For Deadlines, the Subscriber receives event notifications if it doesn't receive anything within the deadline and the Publisher receives event notifications if it doesn't publish anything within the deadline. | ||
For Liveliness, Subscribers receive events when the Publisher they're listening to is no longer alive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also be covered in a follow up issue (the same one as for deadline or in a separate one).
Also, lifecycle impacts data buffering
I not sure what @deeplearningrobotics is speaking about here.
Sorry for letting it linger, but after looking once more I see a few small things we should tie up before merging. |
I've opened an issue to track enhancing the design of liveliness and deadline to support instanced topics with keys. See: #252 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for letting it linger, but after looking once more I see a few small things we should tie up before merging.
Thanks for taking a look again, @wjwwood. I've made the requested updates and opened one new issue to track keyed instances. See my comment above about opening the other issue to support data composition.
|
||
Both the Deadline and Liveliness policies generate events from the rmw layer of which the application will need to be informed. | ||
For Deadlines, the Subscriber receives event notifications if it doesn't receive anything within the deadline and the Publisher receives event notifications if it doesn't publish anything within the deadline. | ||
For Liveliness, Subscribers receive events when the Publisher they're listening to is no longer alive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can create a follow up issue here as well, but this sounds like a message composition feature to me. I know that DDS allows you to compose data from multiple publishers into a single subscription, but does ROS have plans to allow that as well? If not, then I don't think it makes sense to open an issue to update the design for it.
@wjwwood could you please review the latest changes? |
Yeah, I'll try to get to them soon. Sorry for the delay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, I think we just have the one point to clear up which I commented on yesterday. Once we decide what to do there I think this will be ready for merge.
Link to the outstanding thread, since I couldn't find it easily before: |
when a subscriber will receive a liveliness event when there are multiple publishers for a topic.
@wjwwood Thanks for giving it another look. Sorry for the delay in my response as I've been on vacation for the past few weeks. I've made an update for the particular section in question. Take a look and let me know if you think that addresses the concern. |
No worries @nburek, thanks for touching it up. |
This commit adds a new design document that explores how support should
be added to ROS for Deadline, Liveliness, and Lifespan QoS policies.
Design doc for task ros2/rclcpp#572