Workaround for wait_for_service lasting the full timeout with connext #476
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes ros2/rmw_connext#280
We already have a workaround in the implementation of
wait_for_service_nanoseconds
to accommodate a race condition in connext between when the graph condition is triggered and when the service is reported as being available (#262).While that workaround works, it can be slow if there are no other graph events coincidentally happening at the same time. Currently, when we are hit by that race condition, we get stuck waiting the full duration of the remaining
time_to_wait
before the wait set wakes up again and we notice the service is now available.This PR changes the behaviour to put a max wait in the
wait_for_graph_change
so that in the event we are hit by the race condition, we recover from it in a limited amount of time (I chose 100ms arbitrarily).I didn't find where the race condition is coming from yet. I have an idea which I'll put on ros2/rmw_connext#201
This adds overhead of graph wakeups and
service_is_ready
checks. We can only add that if connext is being used if we think it's worthwhile.Standard CI:
CI repeating
test_client_scope_cpp
100 times with its timeout lowered from 60s to 10s: (used to be regularly flaky with a 30s timeout)