Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bad_function_call by replacing rclcpp::spin_some with SingleThreadedExecutor #705

Merged
merged 2 commits into from
Apr 2, 2021

Conversation

emersonknapp
Copy link
Collaborator

Extension of #576 - same fix applied in more places. I have been seeing occasional test flakiness due to this cause. See details from #576 following

Every time we called spin_some, a new executor was constructed. This would happen potentially a few thousand times per test. Given that the global context was not shutdown during this time, it would accumulate on_shutdown_callbacks_, one from each executor that was created. When this list was very large we would occasionally get std::bad_function_call for one of the callbacks in the shutdown.

With this change, the global Context has only 2 items in on_shutdown_callbacks_, and I am not able to reproduce bad_function_call any more.

@emersonknapp emersonknapp force-pushed the emersonknapp/fix-qos-test-shutdown branch from ce4c324 to eb5e7f5 Compare April 1, 2021 09:01
@emersonknapp emersonknapp marked this pull request as ready for review April 1, 2021 09:01
@emersonknapp emersonknapp requested a review from a team as a code owner April 1, 2021 09:01
@emersonknapp emersonknapp requested review from Karsten1987 and zmichaels11 and removed request for a team April 1, 2021 09:01
@emersonknapp emersonknapp force-pushed the emersonknapp/fix-qos-test-shutdown branch 2 times, most recently from 7409653 to a89c100 Compare April 1, 2021 09:36
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a good thing to fix (it's one of the persistent failures in CI).

There's a problem in the code that I pointed out inline; once that is fixed, hopefully the tests will successfully pass.

Separately, this isn't a great way for the rclcpp::spin_some API to behave. This would be hard for others to debug. I have a running list of APIs that I think are "hard" to use; I'll add this to see what we can do to make it less dangerous.

rosbag2_transport/test/rosbag2_transport/test_record.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This improves things, certainly. The tests are still failing with a timeout in test_play__rmw_cyclonedds_cpp; I'm not sure if that is known or new from this PR. I'm going to approve anyway and let you figure out whether this is an improvement regardless of that failing test.

rosbag2_transport/test/rosbag2_transport/test_record.cpp Outdated Show resolved Hide resolved
@emersonknapp emersonknapp force-pushed the emersonknapp/fix-qos-test-shutdown branch 2 times, most recently from 5286065 to 4c21975 Compare April 1, 2021 20:43
@emersonknapp
Copy link
Collaborator Author

Gist: https://gist.githubusercontent.com/emersonknapp/ac0d6eb824f96d7d6dd8306fdace023b/raw/82faf3109ee0d7fdb8bbc453154b6b7ebd0a5be2/ros2.repos
BUILD args: --packages-up-to rosbag2_test_common rosbag2_transport rosbag2_tests rosbag2
TEST args: --packages-select rosbag2_test_common rosbag2_transport rosbag2_tests rosbag2
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/8074

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

…h SingleThreadedExecutor

Signed-off-by: Emerson Knapp <eknapp@amazon.com>
@emersonknapp emersonknapp force-pushed the emersonknapp/fix-qos-test-shutdown branch from 4c21975 to 7a5e04e Compare April 1, 2021 23:43
Signed-off-by: Emerson Knapp <eknapp@amazon.com>
@emersonknapp
Copy link
Collaborator Author

emersonknapp commented Apr 1, 2021

This improves things, certainly. The tests are still failing with a timeout in test_play__rmw_cyclonedds_cpp; I'm not sure if that is known or new from this PR.

It was new from this PR - after a bit of debugging I have realized the std::async thread from spin_subscriptions had no actual exit condition, and so was hanging the caller indefinitely. The reason that it worked before was that the Executor constructor in rclcpp::spin_some would throw an exception once rclcpp had been shutdown, thereby killing the thread. I think this was a totally accidental flow control - I've fixed it now by extending the loop check to while (rclcpp::ok() && other_condition)

@emersonknapp
Copy link
Collaborator Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@emersonknapp emersonknapp merged commit 0df91ea into master Apr 2, 2021
@delete-merged-branch delete-merged-branch bot deleted the emersonknapp/fix-qos-test-shutdown branch April 2, 2021 04:40
Kettenhoax pushed a commit to Kettenhoax/rosbag2 that referenced this pull request Apr 9, 2021
…dedExecutor (ros2#705)

* Fix bad_function_call in test_play by replacing rclcpp::spin_some with SingleThreadedExecutor
* Replace other instances of rclcpp::spin_some

Signed-off-by: Emerson Knapp <eknapp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants