Added test cased for missing events & double call to take_data #2371

jmachowinski · 2023-11-17T14:35:38Z

Added a test, if a trigger event for a waitable was missed.
Added a test, if take_data was called two times in a row, without calling is_ready in between.
This is related to #2250.

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

jmachowinski · 2023-11-17T14:37:06Z

@mjcarroll here is the merge request for the tests.
Note, one of them currently fails !

wjwwood · 2023-11-30T04:47:26Z

rclcpp/test/rclcpp/executors/test_executors.cpp

+TYPED_TEST(TestExecutorsOnlyNode, missing_event)
+{
+  using ExecutorType = TypeParam;
+  ExecutorType executor;
+
+  rclcpp::Node::SharedPtr node(this->node);
+  auto callback_group = node->create_callback_group(
+    rclcpp::CallbackGroupType::MutuallyExclusive,
+    true);
+
+  auto waitable_interfaces = node->get_node_waitables_interface();
+  auto my_waitable = std::make_shared<TestWaitable>();
+  auto my_waitable2 = std::make_shared<TestWaitable>();
+  waitable_interfaces->add_waitable(my_waitable, callback_group);
+  waitable_interfaces->add_waitable(my_waitable2, callback_group);
+  executor.add_node(this->node);
+
+  my_waitable->trigger();
+  my_waitable2->trigger();
+
+  // a node has some default subscribers, that need to get executed first, therefore the loop
+  for (int i = 0; i < 10; i++) {
+    executor.spin_once(std::chrono::milliseconds(10));
+    if (my_waitable->get_count() > 0) {
+      // stop execution, after the first waitable has been executed
+      break;
+    }
+  }
+
+  EXPECT_EQ(1u, my_waitable->get_count());
+  EXPECT_EQ(0u, my_waitable2->get_count());
+
+  // block the callback group, this is something that may happen during multi threaded execution
+  // This removes my_waitable2 from the list of ready events, and triggers a call to wait_for_work
+  callback_group->can_be_taken_from().exchange(false);
+
+  //now there should be no ready event
+  executor.spin_once(std::chrono::milliseconds(10));
+
+  EXPECT_EQ(1u, my_waitable->get_count());
+  EXPECT_EQ(0u, my_waitable2->get_count());
+
+  //unblock the callback group
+  callback_group->can_be_taken_from().exchange(true);
+
+  //now the second waitable should get processed
+  executor.spin_once(std::chrono::milliseconds(10));
+
+  EXPECT_EQ(1u, my_waitable->get_count());
+  EXPECT_EQ(1u, my_waitable2->get_count());
+}


At least on my machine, this test only fails for the static single threaded executor and then events executor:

% ~/ros2_ws/build/rclcpp/test/rclcpp/test_executors --gtest_filter=\*missing_event Running main() from /Users/william/ros2_ws/install/src/gtest_vendor/src/gtest_main.cc Note: Google Test filter = *missing_event [==========] Running 4 tests from 4 test suites. [----------] Global test environment set-up. [----------] 1 test from TestExecutorsOnlyNode/SingleThreadedExecutor, where TypeParam = rclcpp::executors::SingleThreadedExecutor [ RUN ] TestExecutorsOnlyNode/SingleThreadedExecutor.missing_event [ OK ] TestExecutorsOnlyNode/SingleThreadedExecutor.missing_event (116 ms) [----------] 1 test from TestExecutorsOnlyNode/SingleThreadedExecutor (116 ms total) [----------] 1 test from TestExecutorsOnlyNode/MultiThreadedExecutor, where TypeParam = rclcpp::executors::MultiThreadedExecutor [ RUN ] TestExecutorsOnlyNode/MultiThreadedExecutor.missing_event [ OK ] TestExecutorsOnlyNode/MultiThreadedExecutor.missing_event (17 ms) [----------] 1 test from TestExecutorsOnlyNode/MultiThreadedExecutor (17 ms total) [----------] 1 test from TestExecutorsOnlyNode/StaticSingleThreadedExecutor, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor [ RUN ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event /Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:692: Failure Expected equality of these values: 1u Which is: 1 my_waitable->get_count() Which is: 0 /Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:702: Failure Expected equality of these values: 1u Which is: 1 my_waitable->get_count() Which is: 0 /Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:711: Failure Expected equality of these values: 1u Which is: 1 my_waitable->get_count() Which is: 0 /Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:712: Failure Expected equality of these values: 1u Which is: 1 my_waitable2->get_count() Which is: 0 [ FAILED ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor (107 ms) [----------] 1 test from TestExecutorsOnlyNode/StaticSingleThreadedExecutor (107 ms total) [----------] 1 test from TestExecutorsOnlyNode/EventsExecutor, where TypeParam = rclcpp::experimental::executors::EventsExecutor [ RUN ] TestExecutorsOnlyNode/EventsExecutor.missing_event /Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:703: Failure Expected equality of these values: 0u Which is: 0 my_waitable2->get_count() Which is: 1 [ FAILED ] TestExecutorsOnlyNode/EventsExecutor.missing_event, where TypeParam = rclcpp::experimental::executors::EventsExecutor (30 ms) [----------] 1 test from TestExecutorsOnlyNode/EventsExecutor (30 ms total) [----------] Global test environment tear-down [==========] 4 tests from 4 test suites ran. (272 ms total) [ PASSED ] 2 tests. [ FAILED ] 2 tests, listed below: [ FAILED ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor [ FAILED ] TestExecutorsOnlyNode/EventsExecutor.missing_event, where TypeParam = rclcpp::experimental::executors::EventsExecutor 2 FAILED TESTS

Is that your experience too @jmachowinski?

Also, this is a bit more complicated than it needs to be, I'll open a pr (cellumation#1) to make it simpler (without breaking its purpose I think) by only adding the callback group with the waitables to the executor, which you can take or leave/discuss if you see an issue with it. With that change I do get a slightly different result:

% ~/ros2_ws/build/rclcpp/test/rclcpp/test_executors --gtest_filter=\*missing_event Running main() from /Users/william/ros2_ws/install/src/gtest_vendor/src/gtest_main.cc Note: Google Test filter = *missing_event [==========] Running 4 tests from 4 test suites. [----------] Global test environment set-up. [----------] 1 test from TestExecutorsOnlyNode/SingleThreadedExecutor, where TypeParam = rclcpp::executors::SingleThreadedExecutor [ RUN ] TestExecutorsOnlyNode/SingleThreadedExecutor.missing_event [ OK ] TestExecutorsOnlyNode/SingleThreadedExecutor.missing_event (118 ms) [----------] 1 test from TestExecutorsOnlyNode/SingleThreadedExecutor (118 ms total) [----------] 1 test from TestExecutorsOnlyNode/MultiThreadedExecutor, where TypeParam = rclcpp::executors::MultiThreadedExecutor [ RUN ] TestExecutorsOnlyNode/MultiThreadedExecutor.missing_event [ OK ] TestExecutorsOnlyNode/MultiThreadedExecutor.missing_event (17 ms) [----------] 1 test from TestExecutorsOnlyNode/MultiThreadedExecutor (17 ms total) [----------] 1 test from TestExecutorsOnlyNode/StaticSingleThreadedExecutor, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor [ RUN ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event /Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:705: Failure Expected equality of these values: 1u Which is: 1 my_waitable2->get_count() Which is: 0 [ FAILED ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor (47 ms) [----------] 1 test from TestExecutorsOnlyNode/StaticSingleThreadedExecutor (47 ms total) [----------] 1 test from TestExecutorsOnlyNode/EventsExecutor, where TypeParam = rclcpp::experimental::executors::EventsExecutor [ RUN ] TestExecutorsOnlyNode/EventsExecutor.missing_event /Users/william/ros2_ws/src/ros2/rclcpp/rclcpp/test/rclcpp/executors/test_executors.cpp:696: Failure Expected equality of these values: 0u Which is: 0 my_waitable2->get_count() Which is: 1 [ FAILED ] TestExecutorsOnlyNode/EventsExecutor.missing_event, where TypeParam = rclcpp::experimental::executors::EventsExecutor (30 ms) [----------] 1 test from TestExecutorsOnlyNode/EventsExecutor (30 ms total) [----------] Global test environment tear-down [==========] 4 tests from 4 test suites ran. (213 ms total) [ PASSED ] 2 tests. [ FAILED ] 2 tests, listed below: [ FAILED ] TestExecutorsOnlyNode/StaticSingleThreadedExecutor.missing_event, where TypeParam = rclcpp::executors::StaticSingleThreadedExecutor [ FAILED ] TestExecutorsOnlyNode/EventsExecutor.missing_event, where TypeParam = rclcpp::experimental::executors::EventsExecutor 2 FAILED TESTS

It is still the events executor and static single threaded executors which fail. Note also that they fail in different ways, i.e. the EventsExecutor is a bit too eager and it executes it when your test assumes it should not be, and the static single threaded executor never executes the second waitable.

I still need to figure out why that is, but at the same time, I think the assumptions in this test are a bit flawed. Specifically, I think that assuming that spin_once() will ever execute the event you think it should next is a dangerous assumption and I don't think it should be a condition we should try to enforce on the spin variants. Instead I think a variant of this test which uses futures and spins in a loop until they are complete, or some timeout is probably better. After all, we care that the events eventually get called and not double "called", not that they happen in a specific order or number of spin_once calls.

I'll open a pr with the futures alternative soon.

Here's the pr for using futures: cellumation#2

Is that your experience too @jmachowinski?

Yes, I missed this during my testing, as I set a filter...

Hm I had a quick look at the SingleStaticExecutor case, this looks like a lost wakeup to me.

@wjwwood This brings me back to a question I asked a while ago, if a lost wakeup would actually be the expected behavior for this test. At some point you mentioned, that an object using a guard condition must make sure that the signal does not get lost.

This brings me back to a question I asked a while ago, if a lost wakeup would actually be the expected behavior for this test. At some point you mentioned, that an object using a guard condition must make sure that the signal does not get lost.

Right, that's a good question. I think that the "TestWaitable" in this test file has fallen into the same issue we discussed elsewhere, which is that if a waitable is triggered but not handled, should that waitable re-trigger itself or should the executor handle the retriggering. I believe it should probably be up to the waitable to ensure that behavior, leaving room for waitables that for some reason it makes sense to not retrigger (that is to say, if the conditions that cause it to be triggered no longer apply, for example if data is no longer available in a subscription due to it exceeding its lifetime QoS setting). So part of this test is testing "if you have two things ready, then wait, then execute one, then wait again, will you execute the second?". I think that's valid test, but whether it passes or not comes down to how we ensure that the second waitable "stays ready" and ensures the second wait doesn't block, and that could either be the executor or the waitable, which is the point of the question you asked. The pr #2109 tried (perhaps naively) to fix this at the executor level, but only for some of the executors, but as we've been suggesting, perhaps that pr isn't correct.

Based on that, I set out to simplify the test further to avoid that particular issue, and test the "other" part of this test only, doing something like this:

create a mutually exclusive callback group

create a TestWaitable instance and add it to the callback group

create the executor and add the callback group

trigger the waitable

manually set the callback group's can_be_taken_from to false

spin until a future is complete, where the future is set when the waitable is executed, expecting a timeout

assert it timed out (i.e. the waitable was not executed)

manually set the callback group's can_be_taken_from to true

spin until a future is complete, where the future is set when the waitable is executed

assert the waitable was executed

This is essentially testing whether or not an executor is adhering to the callback group. Another, more contrived, version of this would involve two waitables in the callback group, and ensuring that one is being executed while spinning on the other, but that requires a multi-threaded executor (executing one waitable while spinning on the other implies at least two threads). So it might be "ok" for a single threaded executor to ignore the callback group's can_be_taken_from. Ideally they would not ignore this, even if in normal practice it should never be set while spinning, but it's an under-defined part of the interface. It will, however, become very important when/if we ever get a multi-threaded version of the events executor.

jmachowinski · 2023-11-30T16:18:26Z

I got knocked out by a cold, I will take a look at this when I 'm recovered.

Signed-off-by: William Woodall <william@osrfoundation.org>

jmachowinski · 2024-01-24T14:20:36Z

I reworked the double take data test, it works not on the 'standard' multithreaded executor, showing the race that was introduces with #2109

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

fujitatomoya

@jmachowinski just checking if this is still go? or is there anything else to replace this one?

jmachowinski · 2024-06-10T09:09:53Z

I think we ca drop this PR. We came to the conclusion, that loss events is intended, and its up the the waitables to deal with this. @wjwwood did a PR for this, so we should be fine on that front.

As for the double take data, the test was very specific to the old implementation, as the timing needed to be right for the test to trigger. The bug has also been fixed in jazzy / rolling, so I think its save to drop the test as well.

Janosch Machowinski added 2 commits November 16, 2023 10:58

fix(test_executors): Fix is_ready of TestWaitable

eb39a3a

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

test: Added test for missed trigger event during execution

09cad72

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

jmachowinski requested review from ivanpauno, hidmic and wjwwood as code owners November 17, 2023 14:35

test: Added test for waitable::take_data double call without is_ready

58718f1

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

jmachowinski force-pushed the missing_event branch from 64e3d5e to 58718f1 Compare November 17, 2023 14:36

jmachowinski changed the title ~~Added test cased for missing evetns & double call to take_data~~ Added test cased for missing events & double call to take_data Nov 17, 2023

mjcarroll self-requested a review November 20, 2023 15:19

wjwwood reviewed Nov 30, 2023

View reviewed changes

wjwwood mentioned this pull request Nov 30, 2023

Utilize rclcpp::WaitSet as part of the executors #2142

Merged

3 tasks

wjwwood added 2 commits December 4, 2023 18:57

simplify the missing_event test to not spin on the node itself

a4ff2fe

Signed-off-by: William Woodall <william@osrfoundation.org>

use futures to make the test rebust to spurious spins

8c68c71

Signed-off-by: William Woodall <william@osrfoundation.org>

wjwwood mentioned this pull request Jan 4, 2024

Avoid losing waitable handles while using MultiThreadedExecutor #2109

Merged

clalancette assigned wjwwood Jan 11, 2024

jmachowinski force-pushed the missing_event branch from b625939 to 12bd3b4 Compare January 24, 2024 14:47

Made double_take_data test work without intrusive behavior

b5eb517

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

jmachowinski force-pushed the missing_event branch from 12bd3b4 to fdb656d Compare January 24, 2024 14:53

fix: catch exceptions from failed tests

3ceaa9b

Signed-off-by: Janosch Machowinski <J.Machowinski@cellumation.com>

jmachowinski force-pushed the missing_event branch from fdb656d to 3ceaa9b Compare January 24, 2024 14:53

fujitatomoya reviewed Jun 8, 2024

View reviewed changes

jmachowinski closed this Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added test cased for missing events & double call to take_data #2371

Added test cased for missing events & double call to take_data #2371

jmachowinski commented Nov 17, 2023

jmachowinski commented Nov 17, 2023

wjwwood Nov 30, 2023

wjwwood Nov 30, 2023

jmachowinski Dec 4, 2023

jmachowinski Dec 4, 2023

wjwwood Jan 4, 2024 •

edited

Loading

jmachowinski commented Nov 30, 2023

jmachowinski commented Jan 24, 2024

fujitatomoya left a comment

jmachowinski commented Jun 10, 2024

Added test cased for missing events & double call to take_data #2371

Added test cased for missing events & double call to take_data #2371

Conversation

jmachowinski commented Nov 17, 2023

jmachowinski commented Nov 17, 2023

wjwwood Nov 30, 2023

Choose a reason for hiding this comment

wjwwood Nov 30, 2023

Choose a reason for hiding this comment

jmachowinski Dec 4, 2023

Choose a reason for hiding this comment

jmachowinski Dec 4, 2023

Choose a reason for hiding this comment

wjwwood Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

jmachowinski commented Nov 30, 2023

jmachowinski commented Jan 24, 2024

fujitatomoya left a comment

Choose a reason for hiding this comment

jmachowinski commented Jun 10, 2024

wjwwood Jan 4, 2024 •

edited

Loading