Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround for wait_for_service lasting the full timeout with connext #476

Merged
merged 2 commits into from
May 23, 2018

Conversation

dhood
Copy link
Member

@dhood dhood commented May 22, 2018

Fixes ros2/rmw_connext#280

We already have a workaround in the implementation of wait_for_service_nanoseconds to accommodate a race condition in connext between when the graph condition is triggered and when the service is reported as being available (#262).

While that workaround works, it can be slow if there are no other graph events coincidentally happening at the same time. Currently, when we are hit by that race condition, we get stuck waiting the full duration of the remaining time_to_wait before the wait set wakes up again and we notice the service is now available.

This PR changes the behaviour to put a max wait in the wait_for_graph_change so that in the event we are hit by the race condition, we recover from it in a limited amount of time (I chose 100ms arbitrarily).

I didn't find where the race condition is coming from yet. I have an idea which I'll put on ros2/rmw_connext#201

This adds overhead of graph wakeups and service_is_ready checks. We can only add that if connext is being used if we think it's worthwhile.

Standard CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

CI repeating test_client_scope_cpp 100 times with its timeout lowered from 60s to 10s: Build Status (used to be regularly flaky with a 30s timeout)

@dhood dhood self-assigned this May 22, 2018
@dhood dhood added in progress Actively being worked on (Kanban column) in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels May 22, 2018
@dhood dhood requested a review from wjwwood May 22, 2018 17:45
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This is fine, and should only incur a small performance penalty. However, as you pointed out it just better masks the original issue. But this is a good incremental improvement to deal with that race until a proper fix can be found.

@dhood dhood merged commit f9a78df into master May 23, 2018
@dhood dhood deleted the wfs_connext_race branch May 23, 2018 01:01
@dhood dhood removed the in review Waiting for review (Kanban column) label May 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants