Improve the reliability of test_get_type_description_service. #1107

clalancette · 2023-10-04T19:52:34Z

We were seeing occasional failures of test_get_type_description_service on all platforms.

It turns out that the removal of a service from the graph cache is an asynchronous operation. In particular, all of our current RMW implementations publish a graph update to the graph topic, and only remove things from the graph once that data has been delivered. This can potentially happen in another thread.

That means that immediately after service_fini(), a call to get_service_names_and_types() may actually still have the old service name in it. Since our get_type_description tests were relying on this to go away, this was causing it to be flakey.

We fix this here by adding in a new helper function, service_not_exists(). This helper is not just the inverse of service_exists() (which returns immediately when it finds the service in the graph cache). Instead, service_not_exists() waits until the service has left the cache before returning (or times out).

In my testing, this fixes the flakiness locally.

We were seeing occasional failures of test_get_type_description_service on all platforms. It turns out that the removal of a service from the graph cache is an asynchronous operation. In particular, all of our current RMW implementations publish a graph update to the graph topic, and only remove things from the graph once that data has been delivered. This can potentially happen in another thread. That means that immediately after service_fini(), a call to get_service_names_and_types() may actually still have the old service name in it. Since our get_type_description tests were relying on this to go away, this was causing it to be flakey. We fix this here by adding in a new helper function, service_not_exists(). This helper is not just the inverse of service_exists() (which returns immediately when it finds the service in the graph cache). Instead, service_not_exists() waits until the service has *left* the cache before returning (or times out). In my testing, this fixes the flakiness locally. Signed-off-by: Chris Lalancette <clalancette@gmail.com>

clalancette · 2023-10-04T19:53:44Z

CI:

Linux
Linux-aarch64
Windows
Linux

claraberendsen · 2023-10-04T21:16:26Z

This would close #1108

fujitatomoya · 2023-10-04T21:25:33Z

rcl/test/rcl/test_get_type_description_service.cpp

-    std::this_thread::sleep_for(std::chrono::milliseconds(100));
+    std::this_thread::sleep_for(std::chrono::milliseconds(10));


(nice to have) adding argument std::chrono::milliseconds period would be more flexible.

(nice to have) adding argument std::chrono::milliseconds period would be more flexible.

Yeah, that's true. Given that this is only used in this test, it would be easy enough to add later if we need the flexibility. So I'm going to not do that for now, but thanks for the idea.

In particular, make sure to match all service_description_init with the appropriate finis. While we are in here, also add in a "service_not_exists" function that will quit much earlier, speeding up the test. This is essentially a backport of #1107 and parts of #1112, but adapted for Iron. Signed-off-by: Chris Lalancette <clalancette@gmail.com>

clalancette requested review from audrow, ivanpauno and wjwwood as code owners October 4, 2023 19:52

claraberendsen linked an issue Oct 4, 2023 that may be closed by this pull request

👩‍🌾 ❄ Flaky test_get_type_description_service__rmw_cyclonedds_cpp & rcl.TestGetTypeDescSrvFixture__rmw_cyclonedds_cpp.test_service_init_and_fini_functions tests #1108

Closed

fujitatomoya approved these changes Oct 4, 2023

View reviewed changes

clalancette merged commit 0a537d2 into rolling Oct 4, 2023
2 checks passed

clalancette deleted the clalancette/improve-get-type-desc-test-reliability branch October 4, 2023 21:51

tfoote mentioned this pull request Oct 4, 2023

Update SignalHandler get_global_signal_handler to avoid complex types in static memory (backport #2316) ros2/rclcpp#2322

Merged

clalancette mentioned this pull request Jun 25, 2024

Fix up type_description tests. #1160

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the reliability of test_get_type_description_service. #1107

Improve the reliability of test_get_type_description_service. #1107

clalancette commented Oct 4, 2023

clalancette commented Oct 4, 2023 •

edited

Loading

claraberendsen commented Oct 4, 2023 •

edited

Loading

fujitatomoya Oct 4, 2023

clalancette Oct 4, 2023

		std::this_thread::sleep_for(std::chrono::milliseconds(100));
		std::this_thread::sleep_for(std::chrono::milliseconds(10));

Improve the reliability of test_get_type_description_service. #1107

Improve the reliability of test_get_type_description_service. #1107

Conversation

clalancette commented Oct 4, 2023

clalancette commented Oct 4, 2023 • edited Loading

claraberendsen commented Oct 4, 2023 • edited Loading

fujitatomoya Oct 4, 2023

Choose a reason for hiding this comment

clalancette Oct 4, 2023

Choose a reason for hiding this comment

clalancette commented Oct 4, 2023 •

edited

Loading

claraberendsen commented Oct 4, 2023 •

edited

Loading