Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run QoS tests. #441

Merged
merged 9 commits into from Feb 16, 2021
Merged

Run QoS tests. #441

merged 9 commits into from Feb 16, 2021

Conversation

hidmic
Copy link
Contributor

@hidmic hidmic commented Jul 8, 2020

As a result of the investigation for ros2/rmw_fastrtps#384, it was apparent that test_quality_of_service tests were actually not being run. This patch corrects the situation.

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic
Copy link
Contributor Author

hidmic commented Jul 8, 2020

Several tests fail on master for different RMW implementations. We may have to delay merging this PR until that is resolved, considering ongoing efforts to go green. FYI @dirk-thomas.

@dirk-thomas
Copy link
Member

dirk-thomas commented Jul 8, 2020

@dabonnie FYI since you added this package in #347.

@clalancette
Copy link
Contributor

Several tests fail on master for different RMW implementations. We may have to delay merging this PR until that is resolved, considering ongoing efforts to go green. FYI @dirk-thomas.

One suggestion; enable the tests that actually pass right now, then do a follow-up issue/PR to fix and enable the ones that don't. That way we at least will get some of the test coverage here.

@dabonnie
Copy link
Contributor

dabonnie commented Jul 8, 2020

@dabonnie FYI since you added this package in #347.

Thanks for the visibility, I'll follow up.

@dabonnie
Copy link
Contributor

dabonnie commented Jul 9, 2020

Several tests fail on master for different RMW implementations. We may have to delay merging this PR until that is resolved, considering ongoing efforts to go green. FYI @dirk-thomas.

Do you happen to have a link to the failures?

@hidmic
Copy link
Contributor Author

hidmic commented Jul 13, 2020

@dabonnie I've only reproduced locally. Here's a CI run for reference:

  • Linux Build Status

I wonder why there's no Rolling PR job though. @nuclearsandwich ?

@jacobperron
Copy link
Member

jacobperron commented Jul 13, 2020

I wonder why there's no Rolling PR job though.

The job is not enabled in the rosdistro distribution.yaml. I'm not sure if it can be enabled without a binary job though.

@dabonnie
Copy link
Contributor

I wonder why there's no Rolling PR job though.

The job is not enabled in the rosdistro distribution.yaml. I'm not sure if it can be enabled without a binary job though.

I think it's worthwhile to run / test via actions-ros-ci: I'll be adding an issue / PR.

@dirk-thomas
Copy link
Member

The job is not enabled in the rosdistro distribution.yaml. I'm not sure if it can be enabled without a binary job though.

A repository can have dev and PR jobs if all its dependencies are released. It doesn't have to be release itself.

I think it's worthwhile to run / test via actions-ros-ci: I'll be adding an issue / PR.

For these repositories please continue due use the Jenkins-based PR / CI builds since they provide a lot of benefits over the in-development actions atm.

@dabonnie
Copy link
Contributor

For these repositories please continue due use the Jenkins-based PR / CI builds since they provide a lot of benefits over the in-development actions atm.

Can't we do both? What's the downside to having scheduled actions running at a 1h or 1day interval exposing these kinds of issues? It may be that I'm not aware that adding a job on Jenkins is a fairly easy task, so in that case I would agree we only need your approach.

@dirk-thomas
Copy link
Member

Can't we do both?

Sure, we can have both. But not only actions since the Jenkins based jobs provide more functionality. So please also enable pull request jobs for this repository.

@dirk-thomas
Copy link
Member

It may be that I'm not aware that adding a job on Jenkins is a fairly easy task, so in that case I would agree we only need your approach.

The Jenkins jobs in build.ros.org and build.ros2.org are fully automated. All it takes is seeing the key test_pull_requests in the rosdistro for this repository.

@nuclearsandwich
Copy link
Member

PR to enable pull request testing for Rolling on build.ros2.org ros/rosdistro#25806

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic
Copy link
Contributor Author

hidmic commented Jul 16, 2020

Alright, based on ros2/rmw_fastrtps#384 (comment) I believe I got the tests right. @dabonnie PTAL.

@@ -30,57 +30,13 @@

using namespace std::chrono_literals;

///// Test Deadline with a single subscriber node
TEST_F(QosRclcppTestFixture, test_deadline_no_publisher) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this test was removed, as it checks for spurious events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on its current definition, deadline QoS establishes the maximum inter-arrival time for messages. If no message is ever published, no deadline event will be received. It is a bit counter-intuitive, but that's how it is for DDS and ROS.

@hidmic
Copy link
Contributor Author

hidmic commented Jul 16, 2020

PR job's failing because FastDDS has to be re-released against rolling.

Running CI up to test_quality_of_service:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
To cope with longer discovery times.

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
To cope with longer discovery times.

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic hidmic requested a review from dabonnie July 17, 2020 20:27
@hidmic
Copy link
Contributor Author

hidmic commented Jul 17, 2020

Ready for another look, PTAL! I had to tweak the lifespan test a bit and increase timeouts and period for rmw_connext_cpp's ridiculous discovery latencies.

@@ -33,7 +33,8 @@ using namespace std::chrono_literals;
/// Test Deadline with a single publishing node and single subscriber node
TEST_F(QosRclcppTestFixture, test_deadline) {
int expected_number_of_events = 4;
const std::chrono::milliseconds deadline_duration = 1s;
const std::chrono::milliseconds deadline_duration =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: an explanation why this is different for connext (observed via CI, your testing, etc)

Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
@hidmic
Copy link
Contributor Author

hidmic commented Jul 30, 2020

I'm starting to wonder if it's even possible to write a QoS test like these that's not intrinsically flaky.

@hidmic
Copy link
Contributor Author

hidmic commented Jul 30, 2020

The last few failing tests on Windows are due to Fast-RTPS hanging on subscription destruction.

@richiprosima @MiguelCompany I've tracked it down to ResourceEvent::unregister_timer during ~SubscriberImpl. Can you think of any recent changes that may explain the issue?

@hidmic
Copy link
Contributor Author

hidmic commented Aug 10, 2020

@richiprosima @MiguelCompany friendly ping. Would a backtrace help you guys?

@hidmic
Copy link
Contributor Author

hidmic commented Dec 22, 2020

@EduPonz @MiguelCompany ping

@EduPonz
Copy link

EduPonz commented Dec 23, 2020

Sorry @hidmic , this flew completely under our radar. Some backtrace would indeed be most welcomed. Also, anything you think worth mentioning for replicating would be much appreciated. We did some refactoring on the ResourceEvent in eProsima/Fast-DDS#1597 (backported to 2.0.x in eProsima/Fast-DDS#1651). I do not think that it'll have any effect on this particular failure, but I'd like to try out.

Thanks!

@hidmic
Copy link
Contributor Author

hidmic commented Feb 15, 2021

@ros-pull-request-builder retest this please

@hidmic
Copy link
Contributor Author

hidmic commented Feb 15, 2021

It seems I can no longer reproduce. Re-running CI to confirm:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me and has green CI.

@hidmic
Copy link
Contributor Author

hidmic commented Feb 16, 2021

Alright, let's get this in. Thanks @clalancette !

@hidmic hidmic merged commit e443904 into master Feb 16, 2021
@delete-merged-branch delete-merged-branch bot deleted the hidmic/run-qos-tests branch February 16, 2021 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants