Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datarace in SimpleActionClient if /use_sim_time is used #58

Closed
1r0b1n0 opened this issue Nov 16, 2016 · 3 comments
Closed

Datarace in SimpleActionClient if /use_sim_time is used #58

1r0b1n0 opened this issue Nov 16, 2016 · 3 comments

Comments

@1r0b1n0
Copy link
Contributor

1r0b1n0 commented Nov 16, 2016

Hello,

I stumbled on a datarace bug in the ActionClient class when using actionlib in a Gazebo simulation (thus having /use_sim_time=true, and an external node that publishes the /clock topic).

To be able to reproduce the bug I created two simple roslaunch files here : https://github.com/1r0b1n0/actionlib_sim_time_datarace_test
On my system, more or less 1 out of 5 times starting the fibonacci_client node will fail and return immediately.

I uploaded the log of fibonacci_client here : https://github.com/1r0b1n0/actionlib_sim_time_datarace_test/blob/master/fibonacci_client_bug.log

This is what happened :

  1. fibonacci_client calls ros::init(...), /use_sim_time is true so ros will subscribe to /clock (in its internal thread)
  2. One second later fibonacci_client calls ac.waitForResult(ros::Duration(30.0))
    In SimpleActionClient::waitForResult(ros::Duration &) , we have the instruction :
    ros::Time timeout_time = ros::Time::now() + timeout; (simple_action_client:551)
    Sometimes this instruction is called when ros has not yet received the /clock topic from Gazebo, so ros::Time::isValid() is false, and ros::Time::now() will return ros::Time(0,0)
  3. some milliseconds later, ros has received its first /clock from Gazebo ( in this case /clock is at 1271.800000000), so ros::Time::now() will return ros::Time(127.8) -> SimpleActionClient will timeout and return immediately

A solution would be to call ros::Time::waitForValid() in ActionClient::waitForActionServerToStart()

Thanks,
Robin

@mikaelarguedas
Copy link
Member

@1r0b1n0 Did #59 fix this issue ?
It looks like I can still reproduce it locally around 1/15 times

@1r0b1n0
Copy link
Contributor Author

1r0b1n0 commented Jan 11, 2017

Indeed in some cases I still had some issues (but like 1 out of 200 times).
Because the application is multi-threaded, we should also wait for ros:Time::now to become valid when constructing ConnectionManager
I submitted #62 that fixes those issues for me, does it work on your side ?

@mikaelarguedas
Copy link
Member

fixed by #59 and #62. Feel free to reopen or create another issue if the problem reappears

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants