Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sleep before getting publisher info. #646

Closed

Conversation

nuclearsandwich
Copy link
Member

This test appears to be timing/load sensitive. I can reproduce
intermittent failures locally when stressing my host and the more stress
the faster the reproduction. Since adding the sleep I have not seen any
failures to find the expected publisher info locally.

This seems to resolve unreliable test failures such as this one. There still appear to be occasional segfaults when using rmw_fastrtps_cpp or rmw_fastrtps_dynamic_cpp which I don't think are related.

@nuclearsandwich nuclearsandwich self-assigned this May 8, 2020
@nuclearsandwich nuclearsandwich force-pushed the nuclearsandwich/sleep-before-getting-info branch from 0fdf84a to ad44e39 Compare May 8, 2020 19:56
@nuclearsandwich
Copy link
Member Author

@jacobperron I paged you since I know you'd been looking at timing-sensitive tests in other parts of the stack. I don't know whether it's expected that there's a time after the creation of the publisher and subscriber where they may not be added to the count.

@jacobperron
Copy link
Member

jacobperron commented May 8, 2020

don't know whether it's expected that there's a time after the creation of the publisher and subscriber where they may not be added to the count

I wouldn't expect a delay, but I suppose it can vary based on what's happening in the RMW implementation. Does the failure only occur with rmw_cyclonedds_cpp?

@@ -357,6 +359,7 @@ TEST_F(
&subscription_options);
ASSERT_EQ(ret, RCL_RET_OK) << rcl_get_error_string().str;
const std::string fqdn = std::string("/") + this->topic_name;
std::this_thread::sleep_for(std::chrono::milliseconds(250));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this does help, then I'd recommend polling in a loop, because this only requires a minor hiccup in the machine to still flake.

@nuclearsandwich
Copy link
Member Author

I wouldn't expect a delay, but I suppose it can vary based on what's happening in the RMW implementation. Does the failure only occur with rmw_cyclonedds_cpp?

I have to go back and check my test logs. I think I've seen this failure with Fast-RTPS as well but I also get occasional segfaults with Fast-RTPS (as does the build farm).

@nuclearsandwich
Copy link
Member Author

In all my test runs I only ever saw this exact failure with cyclonedds.

38222-[62.103s] 14: [ RUN      ] TestInfoByTopicFixture__rmw_cyclonedds_cpp.test_rcl_get_publishers_subscription_info_by_topic
38223-[62.107s] 14: /home/steven/osrf/bugcrunch/ws/src/ros2/rcl/rcl/test/rcl/test_info_by_topic.cpp:380: Failure
38224:[62.108s] 14: Expected equality of these values:
38225-[62.108s] 14:   topic_endpoint_info_array_sub.size
38226-[62.108s] 14:     Which is: 0
38227-[62.108s] 14:   1u
38228-[62.108s] 14:     Which is: 1

@fujitatomoya
Copy link
Collaborator

@nuclearsandwich

i would use rcl_count_publishers & rcl_count_subscribers to make sure if publication and subscription are ready instead of sleeping. if it can count writer/reader from rmw graph cache, which means that it can get writer/reader info for publication and subscription.

@nuclearsandwich
Copy link
Member Author

i would use rcl_count_publishers & rcl_count_subscribers to make sure if publication and subscription are ready instead of sleeping. if it can count writer/reader from rmw graph cache, which means that it can get writer/reader info for publication and subscription.

Thanks for the suggestion. If I understand correctly this would essentially be creating the publisher and subscriber and polling to determine that they were actually created before trying to use them?

@fujitatomoya
Copy link
Collaborator

@nuclearsandwich

No, rcl_count_publishers & rcl_count_subscribers does not include polling, it is one shot to count the endpoint. so as @wjwwood mentioned, polling would be required in the test code to confirm the publication and subscription are ready.

i would do the following,

  1. create publication and subscription with test topic.
  2. wait until publisher count and subscriber count with test topic becomes greater than 0.
  3. issue rcl_get_publishers_info_by_topic/rcl_get_subscriptions_info_by_topic.

BTW, helper functions are getting organized #631, so this might get along with it.

@nuclearsandwich nuclearsandwich force-pushed the nuclearsandwich/sleep-before-getting-info branch 3 times, most recently from 104654e to 2d1c16b Compare May 12, 2020 14:42
@nuclearsandwich
Copy link
Member Author

I appreciate the guidance @fujitatomoya. I'm stepping out of my usual wheelhouse for this one. I took a look at #631 and rebased this PR onto it. Which means that I'll rebase this once more when #631 actually merges. Using the wait for entity helpers does show improvement.

@fujitatomoya
Copy link
Collaborator

@nuclearsandwich

could you rebase? #631 is merged.

This test appears to be timing/load sensitive. I can reproduce
intermittent failures locally when stressing my host and the more stress
the faster the reproduction. Since adding the sleep I have not seen any
failures to find the expected publisher info locally.

Signed-off-by: Steven! Ragnarök <steven@nuclearsandwich.com>
Signed-off-by: Steven! Ragnarök <steven@nuclearsandwich.com>
Signed-off-by: Steven! Ragnarök <steven@nuclearsandwich.com>
@nuclearsandwich nuclearsandwich force-pushed the nuclearsandwich/sleep-before-getting-info branch from 2d1c16b to 50a5530 Compare May 15, 2020 15:29
Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but one minor comment, I would like to hear from others.

@@ -357,6 +361,8 @@ TEST_F(
&subscription_options);
ASSERT_EQ(ret, RCL_RET_OK) << rcl_get_error_string().str;
const std::string fqdn = std::string("/") + this->topic_name;
ASSERT_TRUE(wait_for_established_subscription(&publisher, 10, 100));
ASSERT_TRUE(wait_for_subscription_to_be_ready(&subscription, &this->context, 10, 100));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this needs to be called here. I believe that we can confirm that graph cache is constructed vi wait_for_established_subscription.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case tests both rcl_get_publishers_info_by_topic and rcl_get_subscriptions_info_by_topic, so I think we need to add a new wait_helper(such as wait_for_established_publisher) which is similar to wait_for_established_subscription. (make sure two directions are established.)

NOTE: wait_for_subscription_to_be_ready is used for a subscriber to wait to get message data from a publisher.
it means there should be a rcl_publish calling before (or at a new thread) and then rcl_take to get message data.
So we don't use it here.

@clalancette
Copy link
Contributor

This looks to be superseded by #859, so I'm going to close it out. Feel free to reopen if you think otherwise.

@clalancette clalancette closed this Nov 5, 2020
@wjwwood wjwwood deleted the nuclearsandwich/sleep-before-getting-info branch April 22, 2021 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants