Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable timing tests in DEBUG mode #3012

Merged
merged 2 commits into from Dec 28, 2021

Conversation

rhaschke
Copy link
Contributor

Debug builds are significantly slower and thus shouldn't be considered for timing-based tests.
They frequently fail, e.g. https://github.com/ros-planning/moveit/runs/4648164688?check_suite_focus=true

@rhaschke rhaschke force-pushed the disable-timing-tests-in-debug branch from 3ebeb65 to 350e45a Compare December 28, 2021 12:29
@codecov
Copy link

codecov bot commented Dec 28, 2021

Codecov Report

Merging #3012 (42f5fdd) into master (dffc178) will decrease coverage by 0.03%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3012      +/-   ##
==========================================
- Coverage   61.81%   61.79%   -0.02%     
==========================================
  Files         370      370              
  Lines       33178    33171       -7     
==========================================
- Hits        20506    20494      -12     
- Misses      12672    12677       +5     
Impacted Files Coverage Δ
...it/collision_detection/test_collision_common_pr2.h 98.12% <ø> (-0.04%) ⬇️
...eit_ros/manipulation/pick_place/src/pick_place.cpp 88.58% <0.00%> (-3.80%) ⬇️
...ipulation/pick_place/src/manipulation_pipeline.cpp 71.91% <0.00%> (-2.47%) ⬇️
...raint_samplers/src/default_constraint_samplers.cpp 79.89% <0.00%> (+0.29%) ⬆️
..._interface/src/detail/constrained_goal_sampler.cpp 76.48% <0.00%> (+1.97%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dffc178...42f5fdd. Read the comment docs.

Copy link
Contributor

@v4hn v4hn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite pragmatic. I mostly agree that we should not test timing of debug builds.
I'm not sure plain disabling them (silently!) is a good idea though.

From my side I wanted to migrate all such tests into https://github.com/captain-yoshi/moveit_benchmark_suite because timing tests should simply not be hard tests at all.
But I can't afford to spend work on improving that project at this time. @captain-yoshi works on it on the side and keeps making various incremental improvements. (Thanks you David!)

Lastly, please consider #2107 . The main offender for runtime in debug mode is https://github.com/ros-planning/moveit/blob/master/moveit_ros/robot_interaction/test/locked_robot_state_test.cpp , which should probably also just be disabled due to overly long runtime if we go with your proposed approach.

@@ -87,6 +87,12 @@ class CollisionDetectorTest : public ::testing::Test
std::string kinect_dae_resource_;
};

#ifdef NDEBUG // Don't perform timing tests in Debug mode (but evaluate expression)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#ifdef NDEBUG // Don't perform timing tests in Debug mode (but evaluate expression)
#ifndef NDEBUG // Don't perform timing tests in Debug mode (but evaluate expression)

Or do I misunderstand this expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. In my original commit, it was the correct expression. Here, it is inverted.

@rhaschke
Copy link
Contributor Author

rhaschke commented Dec 28, 2021

@v4hn, thanks for pointing out #2107. However, here I'm fighting flaky tests due to timing, while #2107 complains about slow tests in general. Factoring out the test into a benchmark suite isn't appropriate IMO. It actually tests that collision checking

  1. doesn't crash and
  2. doesn't take too long in release builds

@rhaschke rhaschke merged commit 872d33d into moveit:master Dec 28, 2021
@rhaschke rhaschke deleted the disable-timing-tests-in-debug branch December 28, 2021 16:05
@v4hn
Copy link
Contributor

v4hn commented Dec 29, 2021

Somewhat off-topic for this thread, but anyway

Factoring out the test into a benchmark suite isn't appropriate IMO. It actually tests that collision checking doesn't crash and

I fully agree that all functional tests belong to the core tests and for improved efficiency all tests should also check more functional aspects than just "doesn't crash" (at least one of them doesn't O.o).

doesn't take too long in release builds

I absolutely disagree with this aspect. The threshold is arbitrary and there is no sane threshold that could account for the machine running the test, system load, compiler, file system, moon phase, ... Tests can be arbitrarily slow on insane configurations, but should always pass if they compute the correct results. Thus I would prefer to leave any time measurement out of our gtests.

You are right that this is out of scope here, and it makes sense to keep it until we have something better.
If we had the engineering resources, I would love to have the moveit_benchmark_suite (MBS :)) become a standalone performance testing suite (which it already is to some level) that you can run with multiple configurations and versions of the software stacks to compare how changes impact any performance metric on a variety of scenarios. That would include pointing out values that differ significantly between different software stacks.
But none of these should be hard tests and they could be treated just like our codecov reports.

@rhaschke
Copy link
Contributor Author

I agree with you, Michael, that a benchmark suite as you describe it, would be great. However, I'm not aware of any framework to compare benchmarks of different snapshots in a similar fashion like codecov. Is there something?
If not, we should keep some key timing tests as part of our hard tests. I agree that they are highly unreliable, but they ensure that we don't break anything completely by accident.

@v4hn
Copy link
Contributor

v4hn commented Dec 29, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants