Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes a race bug #64

Merged
merged 1 commit into from
Feb 14, 2017
Merged

Fixes a race bug #64

merged 1 commit into from
Feb 14, 2017

Conversation

pbeeson
Copy link
Contributor

@pbeeson pbeeson commented Feb 9, 2017

Fixes bug where client state machine never reaches DONE state.

Fix: client needs to send the goal to the server AFTER it sets up its state machine and adds that machine to the list_, which is looped over on callbacks from server messages.

The pre-patched code would send out a goal to the server before adding the state machine instance to the list_ data structure. Occasionally, if the server published back a result message prior to the state machine being added to the list, updateResults() would run, not see the state machine, so never transition the machine into the DONE state. It would stay stuck at ACTIVE forever, despite the server actually being not active/succeeded.

This happens rarely but I was able to make a setup to reproduce reasonably often by having a server setSucceeded() immediately, then max out my CPU and network until the client hung.

…server quickly responds with result (which triggers updateResults to run) prior to the state machine being added to the list of machines. When this happens, the update results CB (which only runs once) per goal doesn't get to move the machine into DONE and it stays ACTIVE forever, despite the server actually sending a result and being SUCEEDED.
@pbeeson
Copy link
Contributor Author

pbeeson commented Feb 9, 2017

@mikaelarguedas This is kind of a blocker for long-term, production level code, so I hope we can get it merged and pushed out into the rosbuild farm quickly.

@pbeeson pbeeson changed the title Fixes a deadlock bug Fixes a race bug Feb 9, 2017
@pbeeson
Copy link
Contributor Author

pbeeson commented Feb 9, 2017

Added Issue #66 that provides simple debug lines to reproduce the bug deterministically using simple test applications.

@mikaelarguedas
Copy link
Member

Thanks for the detailed issue and reproducible example, I'll test this soon.

Copy link
Member

@mikaelarguedas mikaelarguedas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reproduced and validated locally. thanks @pbeeson for the fix!

@mikaelarguedas mikaelarguedas merged commit 2f88f22 into ros:indigo-devel Feb 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants