Disable timing tests in DEBUG mode #3012

rhaschke · 2021-12-28T10:00:30Z

Debug builds are significantly slower and thus shouldn't be considered for timing-based tests.
They frequently fail, e.g. https://github.com/ros-planning/moveit/runs/4648164688?check_suite_focus=true

codecov · 2021-12-28T12:56:35Z

Codecov Report

Merging #3012 (42f5fdd) into master (dffc178) will decrease coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3012      +/-   ##
==========================================
- Coverage   61.81%   61.79%   -0.02%     
==========================================
  Files         370      370              
  Lines       33178    33171       -7     
==========================================
- Hits        20506    20494      -12     
- Misses      12672    12677       +5

Impacted Files	Coverage Δ
...it/collision_detection/test_collision_common_pr2.h	`98.12% <ø> (-0.04%)`	⬇️
...eit_ros/manipulation/pick_place/src/pick_place.cpp	`88.58% <0.00%> (-3.80%)`	⬇️
...ipulation/pick_place/src/manipulation_pipeline.cpp	`71.91% <0.00%> (-2.47%)`	⬇️
...raint_samplers/src/default_constraint_samplers.cpp	`79.89% <0.00%> (+0.29%)`	⬆️
..._interface/src/detail/constrained_goal_sampler.cpp	`76.48% <0.00%> (+1.97%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dffc178...42f5fdd. Read the comment docs.

v4hn

That's quite pragmatic. I mostly agree that we should not test timing of debug builds.
I'm not sure plain disabling them (silently!) is a good idea though.

From my side I wanted to migrate all such tests into https://github.com/captain-yoshi/moveit_benchmark_suite because timing tests should simply not be hard tests at all.
But I can't afford to spend work on improving that project at this time. @captain-yoshi works on it on the side and keeps making various incremental improvements. (Thanks you David!)

Lastly, please consider #2107 . The main offender for runtime in debug mode is https://github.com/ros-planning/moveit/blob/master/moveit_ros/robot_interaction/test/locked_robot_state_test.cpp , which should probably also just be disabled due to overly long runtime if we go with your proposed approach.

v4hn · 2021-12-28T13:22:23Z

moveit_core/collision_detection/include/moveit/collision_detection/test_collision_common_pr2.h

@@ -87,6 +87,12 @@ class CollisionDetectorTest : public ::testing::Test
  std::string kinect_dae_resource_;
 };

+#ifdef NDEBUG  // Don't perform timing tests in Debug mode (but evaluate expression)


Suggested change

#ifdef NDEBUG // Don't perform timing tests in Debug mode (but evaluate expression)

#ifndef NDEBUG // Don't perform timing tests in Debug mode (but evaluate expression)

Or do I misunderstand this expression?

You are right. In my original commit, it was the correct expression. Here, it is inverted.

rhaschke · 2021-12-28T15:35:23Z

@v4hn, thanks for pointing out #2107. However, here I'm fighting flaky tests due to timing, while #2107 complains about slow tests in general. Factoring out the test into a benchmark suite isn't appropriate IMO. It actually tests that collision checking

doesn't crash and
doesn't take too long in release builds

v4hn · 2021-12-29T12:26:42Z

Somewhat off-topic for this thread, but anyway

Factoring out the test into a benchmark suite isn't appropriate IMO. It actually tests that collision checking doesn't crash and

I fully agree that all functional tests belong to the core tests and for improved efficiency all tests should also check more functional aspects than just "doesn't crash" (at least one of them doesn't O.o).

doesn't take too long in release builds

I absolutely disagree with this aspect. The threshold is arbitrary and there is no sane threshold that could account for the machine running the test, system load, compiler, file system, moon phase, ... Tests can be arbitrarily slow on insane configurations, but should always pass if they compute the correct results. Thus I would prefer to leave any time measurement out of our gtests.

You are right that this is out of scope here, and it makes sense to keep it until we have something better.
If we had the engineering resources, I would love to have the moveit_benchmark_suite (MBS :)) become a standalone performance testing suite (which it already is to some level) that you can run with multiple configurations and versions of the software stacks to compare how changes impact any performance metric on a variety of scenarios. That would include pointing out values that differ significantly between different software stacks.
But none of these should be hard tests and they could be treated just like our codecov reports.

rhaschke · 2021-12-29T13:38:22Z

I agree with you, Michael, that a benchmark suite as you describe it, would be great. However, I'm not aware of any framework to compare benchmarks of different snapshots in a similar fashion like codecov. Is there something?
If not, we should keep some key timing tests as part of our hard tests. I agree that they are highly unreliable, but they ensure that we don't break anything completely by accident.

v4hn · 2021-12-29T14:17:39Z

However, I'm not aware of any framework to compare benchmarks of different snapshots in a similar fashion like codecov. Is there something?

As I wrote above, it was/is my goal for the benchmark suite that was started in last year's GSoC project and [MBS aims to compare different software stacks](https://github.com/captain-yoshi/moveit_benchmark_suite/blob/master/.docs/regression.md). But I don't think it's in a state we could integrate into CI (or a custom runner for fixed ) right away. That being said I did not actually test the current state for some time...

If not, we should keep some key timing tests as part of our hard tests. I agree that they are highly unreliable, but they ensure that we don't break anything completely by accident.

I agree, that's why I marked my remarks as offtopic. :) Let's hope we can remove them at some point once we have benchmarking setup (if we ever get there...)

Disable timing tests in DEBUG mode

350e45a

rhaschke force-pushed the disable-timing-tests-in-debug branch from 3ebeb65 to 350e45a Compare December 28, 2021 12:29

JafarAbdi approved these changes Dec 28, 2021

View reviewed changes

v4hn reviewed Dec 28, 2021

View reviewed changes

Invert logic

42f5fdd

rhaschke merged commit 872d33d into moveit:master Dec 28, 2021

rhaschke deleted the disable-timing-tests-in-debug branch December 28, 2021 16:05

v4hn mentioned this pull request Dec 29, 2021

disable robot_interaction tests without NDEBUG #3014

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable timing tests in DEBUG mode #3012

Disable timing tests in DEBUG mode #3012

rhaschke commented Dec 28, 2021

codecov bot commented Dec 28, 2021 •

edited

v4hn left a comment

v4hn Dec 28, 2021

rhaschke Dec 28, 2021

rhaschke commented Dec 28, 2021 •

edited

v4hn commented Dec 29, 2021

rhaschke commented Dec 29, 2021

v4hn commented Dec 29, 2021 via email

	#ifdef NDEBUG // Don't perform timing tests in Debug mode (but evaluate expression)
	#ifndef NDEBUG // Don't perform timing tests in Debug mode (but evaluate expression)

Disable timing tests in DEBUG mode #3012

Disable timing tests in DEBUG mode #3012

Conversation

rhaschke commented Dec 28, 2021

codecov bot commented Dec 28, 2021 • edited

Codecov Report

v4hn left a comment

Choose a reason for hiding this comment

v4hn Dec 28, 2021

Choose a reason for hiding this comment

rhaschke Dec 28, 2021

Choose a reason for hiding this comment

rhaschke commented Dec 28, 2021 • edited

v4hn commented Dec 29, 2021

rhaschke commented Dec 29, 2021

v4hn commented Dec 29, 2021 via email

codecov bot commented Dec 28, 2021 •

edited

rhaschke commented Dec 28, 2021 •

edited