Skip to content
This repository has been archived by the owner on Oct 7, 2021. It is now read-only.

Fix default QoS settings for services #82

Merged
merged 1 commit into from
Sep 20, 2015
Merged

Conversation

jacquelinekay
Copy link
Contributor

Connects to ros2/rmw_connext#96

This fixes the issue @esteve and @dirk-thomas have observed where the add_two_ints client does not receive the response from the server unless it was started after the server was initialized. You may still see a few seconds between when the server is initialized and when the client receives the message, but the client should receive the response eventually.

I'm willing to wait on the service tests to be stable before others review. Otherwise you can use the add_two_ints example to test it.

Eventually we should consider if we want to expose QoS options for services, but I think this is better default behavior for now.

Connext PR: ros2/rmw_connext/pull/100

@jacquelinekay jacquelinekay added the in progress Actively being worked on (Kanban column) label Sep 9, 2015
@esteve
Copy link
Member

esteve commented Sep 10, 2015

Thanks for tracking down this bug. According to the documentation, Connext already uses the same QoS settings:

http://community.rti.com/rti-doc/500/ndds/doc/html/api_cpp/group__RequestReplyExampleModule.html#RequestReplyExampleModule_qos_profile

In any case, I think these settings should go in the services QoS profile:

https://github.com/ros2/rmw/blob/master/rmw/include/rmw/qos_profiles.h#L46

so we have a centralized place to document/track them. I also imagine that in the future, there may be cases where users want different QoS settings for services (e.g. over lossy links).

@jacquelinekay
Copy link
Contributor Author

Got it. It looks like the rmw_qos_profile_services_default struct isn't
being used anywhere. Shall I change this PR to load
rmw_qos_profile_services_default?

On Thu, Sep 10, 2015 at 9:19 AM, Esteve Fernandez notifications@github.com
wrote:

Thanks for tracking down this bug. According to the documentation, Connext
already uses the same QoS settings:

http://community.rti.com/rti-doc/500/ndds/doc/html/api_cpp/group__RequestReplyExampleModule.html#RequestReplyExampleModule_qos_profile

In any case, I think these settings should go in the services QoS profile:

https://github.com/ros2/rmw/blob/master/rmw/include/rmw/qos_profiles.h#L46

so we have a centralized place to document/track them. I also imagine that
in the future, there may be cases where users want different QoS settings
for services (e.g. over lossy links).


Reply to this email directly or view it on GitHub
#82 (comment).

@esteve
Copy link
Member

esteve commented Sep 10, 2015

@jacquelinekay my bad, I thought it was being used. Anyway, yeah, it'd be great if you could reuse it. Thanks!

@jacquelinekay
Copy link
Contributor Author

The note about Connext is interesting. On rmw_connext/master, I see the services bug (ros2/rmw_connext/#96), and it appears fixed on the fix_service_qos branch, in which the only change is the QoS settings for the topic, DataReader, and DataWriter for the requester and the responder.

I think the link you posted refers to the default QoS settings for the RequestReply Example Module.

It took some digging in the API reference, but it appears that the default History QoS policy for DataReaders and DataWriters is KEEP_LAST:

https://community.rti.com/static/documentation/connext-dds/5.2.0/doc/api/connext_dds/api_cpp/group__DDSHistoryQosModule.html

and, "The default (and most common setting) for depth is 1, indicating that only the most recent value should be delivered."

https://community.rti.com/static/documentation/connext-dds/5.2.0/doc/api/connext_dds/api_cpp/structDDS__HistoryQosPolicy.html

@jacquelinekay jacquelinekay self-assigned this Sep 10, 2015
@jacquelinekay jacquelinekay added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Sep 10, 2015
@esteve
Copy link
Member

esteve commented Sep 10, 2015

I think for services, keeping only one message on the history, instead of the entire history, makes sense. The current QoS profiles in qos_profiles.h are just stubs waiting to be tweaked.

@jacquelinekay
Copy link
Contributor Author

Alright. It seems like the root of the problem is that the default reliability setting for DataReaders is BEST_EFFORT. I'm guessing that the bug originates because the first request from the client does not get read because the DataReader on the server side does not try to re-read the missed request. The reliability needs to be set to RELIABLE for the server to respond to a client that started up before the server did.

I just tried this branch with KEEP_LAST and depth=1, and the bug is still fixed. However, I'm guessing this may cause performance problems when sending multiple client requests, since all request samples besides the newest one will get dropped, and the RELIABLE policy will cause the server block for a while trying to recover the samples.

We can make a PR to rmw setting the default QoS profile for services to KEEP_LAST instead of KEEP_ALL. The depth is currently set to 10, which I think is a reasonable number given the potential for the situation described above. We could do some experimentation to see if that situation actually happens and determine what a good queue size is from that.

@esteve
Copy link
Member

esteve commented Sep 10, 2015

@jacquelinekay sounds like a plan.

@dirk-thomas
Copy link
Member

Should we refactoring this PR similar to Connext to move the qos stuff into a function instead of duplicating the code N times? And also check if the logic can be done in rmw instead of the rosidl type support?

@jacquelinekay
Copy link
Contributor Author

Done.

Because the Publisher and Subscriber for the requester/responder are private in the headers defined in the type support library, I use the macro DATAREADER/WRITER_QOS_DEFAULT if that entity is not available in the rmw implementation (in create_service and create_client). Otherwise the default QoS setting is retrieved from the Publisher or Subscriber. The similar function in Connext is implemented differently because default QoS for datareaders/writers can be retrieved from the participant, which gets passed to get_datareader/writer_qos. However, I expect the behavior to be the same in the two implementations because the parent entity which provides the default QoS never changes the default QoS via the call set_default_datareader/writer_qos. I figured this was preferable to exposing the Publisher/Subscriber pointers in the requester/responder, since I don't know why those classes were designed in that way.

@jacquelinekay
Copy link
Contributor Author

@dirk-thomas
Copy link
Member

+1

…and share code for setting DataReader/Writer QoS.
jacquelinekay added a commit that referenced this pull request Sep 20, 2015
Fix default QoS settings for services
@jacquelinekay jacquelinekay merged commit be9896d into master Sep 20, 2015
@jacquelinekay jacquelinekay removed the in review Waiting for review (Kanban column) label Sep 20, 2015
@dirk-thomas dirk-thomas deleted the fix_service_qos branch November 4, 2015 18:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants