Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite TimeControllerClockTest.unpaused_sleep_returns_true to be correct #1384

Merged
merged 3 commits into from
Jun 12, 2023

Conversation

emersonknapp
Copy link
Collaborator

Closes #1282
Related to #1012, #1290, #1367

The original test was written such that it requested a 100ns sleep. With such a short time it was either impossible or extremely unlikely to experience a spurious wakeup, so the unlooped condition never failed. When #1012 attempted an improvement, it added a 1s sleep (10 million times longer), making the chance of spurious wakeup much more likely. However, its loop condition checked for rclcpp::ok() which is always false in this test, so the loop was never entered. The outcome was that test failure was made far more likely.

This rewrites the test to have a correct spurious wakeup check loop with reasonable timeout. It should remove the flakiness from the CI builds. If so, it is purely a test change and can be easily backported to live distros.

…rect

Signed-off-by: Emerson Knapp <emerson.b.knapp@gmail.com>
@emersonknapp emersonknapp requested a review from a team as a code owner June 10, 2023 01:56
@emersonknapp emersonknapp requested review from gbiggs, hidmic and MichaelOrlov and removed request for a team June 10, 2023 01:56
@emersonknapp
Copy link
Collaborator Author

emersonknapp commented Jun 10, 2023

Pulls: #1384
Gist: https://gist.githubusercontent.com/emersonknapp/3f3be53971c13a762207805eda247757/raw/7cb44417585d1b87f69ab233ef35d7c4fc87f778/ros2.repos
BUILD args: --packages-up-to rosbag2_cpp
TEST args: --packages-select rosbag2_cpp --retest-until-fail 20 --ctest-args -R test_time_controller_clock
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/12194

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@emersonknapp
Copy link
Collaborator Author

I ran above CI on this single test with a restest-until-fail times 20 for all three platforms. In the past we were seeing it fail within 3 tries pretty reliably so that should be a solid stress test.

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emersonknapp Good catch about the useless check for the rclcpp::ok() and thanks for trying to improve this test. While the core solution looks good to me I have a couple concerns.

  1. The original purpose of this test was to verify that sleep_until(timestamp) works properly and returns true in the expected timeframe +- some tolerance after the explicit call for clock pause() and then resume().
    Although in your current implementation, there is no explicit call for clock->pause() and following resume().
    Please consider returning the implementation with an explicit call for pause-resume in the test.
  rosbag2_cpp::TimeControllerClock clock(ros_start_time);
  clock.resume();
  EXPECT_TRUE(clock.sleep_until(clock.now() + 100));

  clock.pause();
  clock.resume();

  auto ros_time_now = clock.now();
  auto sleep_until_timestamp = ros_time_now + RCUTILS_S_TO_NS(1);
  auto start = clock.ros_to_steady(ros_time_now);
  1. The tolerance for checking that clock->sleep_until(timestamp) was sleeping for the correct amount of time should be in some reasonable range about 10-20% i.e. 100-200 msec. In the current implementation, it is 100% for the upper bound i.e. timestamp * 2 = 2000 msec. With tolerance in 100% for upper bound we can't judge about validity of the clock->sleep_until(timestamp) call.

@emersonknapp
Copy link
Collaborator Author

  1. after the explicit call for clock pause() and then resume()

OK - yeah I see that's how I originally wrote it. Added back a second loop

range about 10-20% i.e. 100-200 msec

Sure, fair point

Signed-off-by: Emerson Knapp <emerson.b.knapp@gmail.com>
@emersonknapp emersonknapp force-pushed the emersonknapp/fix-timecontroller-test branch from 4e4fcdf to 15b7421 Compare June 10, 2023 08:26
@emersonknapp
Copy link
Collaborator Author

https://ci.ros2.org/job/ci_launcher/12195/

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. With green CI.
i.e. need to address new warning on Windows build.

Signed-off-by: Emerson Knapp <emerson.b.knapp@gmail.com>
@emersonknapp
Copy link
Collaborator Author

https://ci.ros2.org/job/ci_launcher/12200

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@emersonknapp emersonknapp merged commit 38e9c96 into rolling Jun 12, 2023
14 checks passed
@delete-merged-branch delete-merged-branch bot deleted the emersonknapp/fix-timecontroller-test branch June 12, 2023 17:54
@emersonknapp
Copy link
Collaborator Author

@Mergifyio backport iron humble

@mergify
Copy link

mergify bot commented Jun 12, 2023

backport iron humble

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jun 12, 2023
…rect (#1384)

Previous fix attempt's loop over spurious wakeups was never entered.

Signed-off-by: Emerson Knapp <emerson.b.knapp@gmail.com>
(cherry picked from commit 38e9c96)
mergify bot pushed a commit that referenced this pull request Jun 12, 2023
…rect (#1384)

Previous fix attempt's loop over spurious wakeups was never entered.

Signed-off-by: Emerson Knapp <emerson.b.knapp@gmail.com>
(cherry picked from commit 38e9c96)

# Conflicts:
#	rosbag2_cpp/test/rosbag2_cpp/test_time_controller_clock.cpp
emersonknapp pushed a commit that referenced this pull request Jun 12, 2023
…rect (backport #1384) (#1390)

* Rewrite TimeControllerClockTest.unpaused_sleep_returns_true to be correct (#1384)

Previous fix attempt's loop over spurious wakeups was never entered.

Signed-off-by: Emerson Knapp <emerson.b.knapp@gmail.com>
(cherry picked from commit 38e9c96)
emersonknapp pushed a commit that referenced this pull request Jun 12, 2023
…rect (#1384) (#1389)

Previous fix attempt's loop over spurious wakeups was never entered.

Signed-off-by: Emerson Knapp <emerson.b.knapp@gmail.com>
(cherry picked from commit 38e9c96)
@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-06-15/32038/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failure rosbag2_cpp/test_time_controller_clock on nightly_win_rep
3 participants