Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

detect participant-local graph changes #168

Closed
wjwwood opened this issue May 28, 2016 · 7 comments
Closed

detect participant-local graph changes #168

wjwwood opened this issue May 28, 2016 · 7 comments
Assignees

Comments

@wjwwood
Copy link
Member

wjwwood commented May 28, 2016

For reference:

Basically, any newly created DataWriters and DataReaders generate an entry on a "builtin" DDS topic. In OpenSplice you get all notifications, but Connext follows a section of the DDS spec that says locally created (created in the same participant) DataWriters and DataReaders don't generate entries (the specifics are in the above links).

So for us to get notifications of local changes, we'll need to maintain some state ourselves.

Because of this I disabled the rcl_service_server_is_available function for Connext and Connext Dynamic:

I also disabled the related tests in rcl (which should be re-enabled after they're fixed):

@wjwwood wjwwood self-assigned this May 31, 2016
@wjwwood wjwwood added the in progress Actively being worked on (Kanban column) label May 31, 2016
@wjwwood wjwwood removed the in progress Actively being worked on (Kanban column) label Jun 13, 2016
@dhood
Copy link
Member

dhood commented Jun 24, 2016

@dirk-thomas
Copy link
Member

Once this is merged for Connext as well as FastRTPS the conditional checks last updated in ros2/system_tests#145 can be removed.

@wjwwood
Copy link
Member Author

wjwwood commented Sep 22, 2016

Just an update on this. I have a solution that "works". However, there are still spurious failures which I believe occur due to a race condition.

The current implementation works by updating the graph data when the user calls any of the create/destroy functions for any of the primitives (sub/pub, service/client). Updating the graph data means the functions like count_subscriptions(topic name) consider locally created subscriptions immediately, and it means a graph event is triggered (waking up anything waiting on the graph to change). The race comes in (I believe) when there is a latency between the create_* function returning and the underlying middleware being ready to deliver published data. So the timeline that fails is something like this:

  1. thread 1: wait for graph change
  2. thread 2: create_subscrition started (for example)
  3. thread 2: call DDS functions to "create" the subscription
  4. thread 2: update graph information
  5. thread 2: notify graph change
  6. thread 1: wakes up
  7. thread 1: publish data
  8. system: ready to deliver data
  9. thread 1: waits forever on the data to be delivered

The assumption is that 8 should happen before 5, which is also the assumption that the "create" or "destroy" functions at the DDS level are synchronous for local pub/sub. I can only guess at the reasons for 8 being delayed without more research. My guess it that there is some asynchronous discovery work that needs to be done even though the topic is local.

It seems that this is not a problem with FastRTPS, and the work around is not necessary for OpenSplice because they use the same notification system for local and remote changes to pubs/subs.


So, I guess what I need to do is either:

  • Change the tests to publish more than once if the first one is not received in short period of time
  • Or try to find another way to know when a connection is established (trigger somewhere else than in the create/destroy functions)

I'm going to pursue the second. There is a way to get a callback when a new endpoint match is made, but it requires a match to exist (would not work for the case where you want to know what pub/sub exist and get notified when that changes but do not want to pub or sub to the topic). In the case of services, you have to have a client already and so this method might be preferable.

For the former option, I could see the stance of "this is an anonymous, asynchronous activity and we shouldn't rely on receiving the first message sent (even though it is local)". That kind of sucks, but it's somewhat realistic for a "non-latched" topic.

@wjwwood
Copy link
Member Author

wjwwood commented Sep 22, 2016

Another update.

I cannot use the "when the connection is established" event rather than the "create"/"destroy" functions because the tests that are failing now are tests that only use publish and subscribe. In the particular case of the tests I have both a publisher and a subscriber, but in general I cannot use the "when the connection is established" event to increment the local count of publishers or subscribers for a topic. Because it requires that at least one matching peer exists. Basically a publisher that advertises but has no matching subscription (local or remote) would not contribute to the "count publishers" for that topic (locally).

So now I have to consider modifying the tests to not only retry spin_some when the callback is not called on the first check, but also to republish the message, in case the original one was sent before actual communications were set up.

@wjwwood
Copy link
Member Author

wjwwood commented Oct 28, 2016

Ok, these pr's are ready to go I think. Here is passing CI:

  • Linux:
    • Build Status
  • OS X:
    • Build Status
  • Windows:
    • Build Status

I'm moving these into "in review". 🎉

@wjwwood wjwwood added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Oct 28, 2016
@wjwwood
Copy link
Member Author

wjwwood commented Oct 29, 2016

Ok, I addressed a lot of comments and there were some non-trivial changes (re-enabling tests I missed, etc). So here is new CI:

  • Linux:
    • Build Status
  • OS X:
    • Build Status
  • Windows:
    • Build Status

@wjwwood
Copy link
Member Author

wjwwood commented Oct 29, 2016

I'm going to go ahead and merge these. I will address additional comments in follow up pr's.

@wjwwood wjwwood removed the in review Waiting for review (Kanban column) label Oct 29, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants