Workaround for wait_for_service lasting the full timeout with connext #476

dhood · 2018-05-22T02:48:12Z

We already have a workaround in the implementation of wait_for_service_nanoseconds to accommodate a race condition in connext between when the graph condition is triggered and when the service is reported as being available (#262).

While that workaround works, it can be slow if there are no other graph events coincidentally happening at the same time. Currently, when we are hit by that race condition, we get stuck waiting the full duration of the remaining time_to_wait before the wait set wakes up again and we notice the service is now available.

This PR changes the behaviour to put a max wait in the wait_for_graph_change so that in the event we are hit by the race condition, we recover from it in a limited amount of time (I chose 100ms arbitrarily).

I didn't find where the race condition is coming from yet. I have an idea which I'll put on ros2/rmw_connext#201

This adds overhead of graph wakeups and service_is_ready checks. We can only add that if connext is being used if we think it's worthwhile.

Standard CI:

Linux
Linux-aarch64
macOS
Windows

CI repeating test_client_scope_cpp 100 times with its timeout lowered from 60s to 10s: (used to be regularly flaky with a 30s timeout)

…next

wjwwood

LGTM

This is fine, and should only incur a small performance penalty. However, as you pointed out it just better masks the original issue. But this is a good incremental improvement to deal with that race until a proper fix can be found.

dhood added 2 commits May 21, 2018 17:37

Limit wait_for_graph_change timeout as alternative workaround for con…

d59f96c

…next

Increase max wait time to 100ms

42305a9

dhood self-assigned this May 22, 2018

dhood added in progress Actively being worked on (Kanban column) in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels May 22, 2018

dhood mentioned this pull request May 22, 2018

wait_for_service not being woken by graph events ros2/rmw_connext#280

Closed

dhood requested a review from wjwwood May 22, 2018 17:45

wjwwood approved these changes May 22, 2018

View reviewed changes

dhood merged commit f9a78df into master May 23, 2018

dhood deleted the wfs_connext_race branch May 23, 2018 01:01

dhood removed the in review Waiting for review (Kanban column) label May 23, 2018

This was referenced May 23, 2018

Composition test flakiness increase ros2/build_farmer#118

Closed

Build Farmer Handoff 2018-05-23 ros2/build_farmer#119

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround for wait_for_service lasting the full timeout with connext #476

Workaround for wait_for_service lasting the full timeout with connext #476

dhood commented May 22, 2018

wjwwood left a comment

Workaround for wait_for_service lasting the full timeout with connext #476

Workaround for wait_for_service lasting the full timeout with connext #476

Conversation

dhood commented May 22, 2018

wjwwood left a comment

Choose a reason for hiding this comment