Use Fast-DDS Waitsets instead of listeners #619

richiware · 2022-06-27T07:04:02Z

Since version 2.4.0, Fast-DDS supports Waitsets. This PR brings new Fast-DDS Waitsets into rmw_fastrtps. The main motivation is to fix bugs related to the current "waitset" mechanism implemented on top of Fast-DDS listeners, like #613.

List of functionalities to be supported:

Publication/subscription
Request/Response
Events
Event callback

richiware · 2022-07-01T08:37:54Z

@clalancette Could you run a CI job against this PR, please?

richiware · 2022-07-06T15:25:51Z

rmw_fastrtps_shared_cpp/include/rmw_fastrtps_shared_cpp/custom_client_info.hpp

@@ -191,6 +116,11 @@ class ClientListener : public eprosima::fastdds::dds::DataReaderListener
    info_->response_subscriber_matched_count_.store(publishers_.size());
  }

+  size_t get_unread_responses()
+  {
+    return info_->response_reader_->get_unread_count();


I don't recommend this. This change could take into account same samples repeatly if they were not read.

@richiware Please see this discussion from which I understand the correct implementation is to return the total number of unread changes.

It would be great to have some input from @alsora and/or @mauropasse on this, BTW

@MiguelCompany @richiware our previous implementation relied simply on a counter to check unread events, instead of asking the DDS about unread messages. For example:

void on_data_available(eprosima::fastdds::dds::DataReader * reader) { if (on_new_message_cb_) { on_new_message_cb_(user_data_, 1); } else { new_data_unread_count_++; } } // Then when setting the user callback: void set_on_new_message_callback(const void * user_data, rmw_event_callback_t callback) { if (new_data_unread_count_) { callback(user_data, new_data_unread_count_); new_data_unread_count_ = 0; } }

The reason for having a different new_data_unread_count_ than the unread samples on the DDS queue, is that this function (set_on_new_message_callback) is sometimes called very fast twice in a row. With our approach:
First time: callback(user_data, new_data_unread_count_);
Second time: the callback is not called, since new_data_unread_count_ = 0.

With your approach here, calling info_->response_reader_->get_unread_count(); could potentially (if set_on_new_message_callback is called many times fast) lead to problems.
In our case, every call of the callback(...) pushes events into a queue, which are then processed (messages are read). With your approach here, the queue could potentially have dulicated/triplicated/.. events per unread sample, if we didn't have time to process them between calls.
I'd stay with the approach taken in this PR: #625

@mauropasse The thing is that there is no guarantee that on_data_available is called once per sample.

Another thing to consider is the case where you have KEEP_LAST with a depth of 1. As explained on this comment, there is always a possibility that some samples that were notified to the callback will not actually be read by the application layer because of QoS settings (mainly KEEP_LAST history). The samples could be counted but then be pushed out of the cache by newer samples before they can be take()n out of it.

Regarding the KEEP_LAST, we took that into account and we limited the amount of new_data_unread_count_ to be as big as the QoS history depth.
In case of multiple events but a single sample on DDS queue (due history = 1), we have means to mitigate it with a "bounded queue" on the executor, which checks the QoS depth before pushing new events into the queue.
In the normal queue, not performing this check, we just have "no op" messages taken (no messages taken).
With regards to the on_data_available not being called once per sample, this issue would affect also the default SingleThreadedExecutor (waitset not awaken since on_data_available is not called), and I don't have an answer for that issue right now. It has never happened before to me and I didn't know there was no guarantee for on_data_available called once per sample.

What about using

return info_->response_reader_->get_unread_count(true);

? (i.e. mark_as_read = true)

and keeping a count here?
does marking samples as read affect how take() work?

@richiware @MiguelCompany was this issue resolved?

Yes, it was. We did exactly what you mention, see here

MiguelCompany · 2022-07-08T08:12:55Z

rmw_fastrtps_shared_cpp/include/rmw_fastrtps_shared_cpp/custom_publisher_info.hpp

+  RCPPUTILS_TSA_GUARDED_BY(
+    discovery_m_);
+
+  std::mutex discovery_m_;


I would move this up, before the declaration of subscriptions_.
I also think we should make this mutable, in case we need to take it inside a const method in the future.

Done in ec7a576

MiguelCompany · 2022-07-08T08:16:41Z

rmw_fastrtps_shared_cpp/include/rmw_fastrtps_shared_cpp/custom_publisher_info.hpp


  std::set<eprosima::fastrtps::rtps::GUID_t> subscriptions_
-  RCPPUTILS_TSA_GUARDED_BY(internalMutex_);
+  RCPPUTILS_TSA_GUARDED_BY(


Keep this on a single line. Applies to all the RCPPUTILS_TSA_GUARDED_BY( below

Done in ec7a576

MiguelCompany · 2022-07-08T08:17:29Z

rmw_fastrtps_shared_cpp/include/rmw_fastrtps_shared_cpp/custom_publisher_info.hpp


-  std::atomic_bool deadline_changes_;
+  bool deadline_changes_;


Either keep this as an atomic_bool, or mark it with RCPPUTILS_TSA_GUARDED_BY.
Applies to all booleans below

Done in 850eeeb

MiguelCompany · 2022-07-08T08:19:12Z

rmw_fastrtps_shared_cpp/include/rmw_fastrtps_shared_cpp/custom_service_info.hpp

+  subscriptions_set_t subscriptions_ RCPPUTILS_TSA_GUARDED_BY(
+    mutex_);
+  clients_endpoints_map_t clients_endpoints_ RCPPUTILS_TSA_GUARDED_BY(
+    mutex_);


Suggested change

subscriptions_set_t subscriptions_ RCPPUTILS_TSA_GUARDED_BY(

mutex_);

clients_endpoints_map_t clients_endpoints_ RCPPUTILS_TSA_GUARDED_BY(

mutex_);

subscriptions_set_t subscriptions_ RCPPUTILS_TSA_GUARDED_BY(mutex_);

clients_endpoints_map_t clients_endpoints_ RCPPUTILS_TSA_GUARDED_BY(mutex_);

Done in ec7a576

MiguelCompany · 2022-07-08T08:37:13Z

rmw_fastrtps_shared_cpp/include/rmw_fastrtps_shared_cpp/custom_subscriber_info.hpp

+private:
+  CustomSubscriberInfo * subscriber_info_ = nullptr;
+
+  bool deadline_changes_;


Either keep booleans as atomic_bool or add RCPPUTILS_TSA_GUARDED_BY

Done in 850eeeb

MiguelCompany · 2022-07-08T09:40:07Z