-
Notifications
You must be signed in to change notification settings - Fork 35
Flaky test #29
Comments
Unfortunately, I'm unable to reproduce this. I did this (after getting the lastest
After over an hour it had not failed. So I guess I'll have to run the farm until it fails and then hope to look to the OpenSplice logs. |
I tried more this weekend and I was unable to reproduce this test on the VM, on the Farm, or on my machine. I guess I'll keep pushing jobs to the farm looking for a failure, but right now I don't know how to reproduce it reliably. |
Ok, I've run the test in a loop hundreds of times without failure and done 10+ full builds on the farm (107 - 117 and then earlier builds): http://54.183.26.131:8080/view/ros2/job/ros2_batch_ci_osx/ I can't seem to reproduce this. I'd say we should close it and reopen if it pops up again. If we want to leave it open, then I'm fine with that as well, but I don't think it makes sense for me to spend more time trying to reproduce it. |
This looks like the same problem but on Linux: |
+1 for close cannot reproduce
|
I'll just store the ospl log here so it doesn't get lost:
|
I don't see anything at issue in the ospl log. I suppose it could be a setup race condition. There's no way of knowing when the ospl subscriber is ready to receive message and if the (I think) 10s of publishing before giving up is sufficient. If 10s is not enough, there maybe some other issue, but currently it's pretty open loop, and that'd be my best guess as to why this occurs. I'm open to ideas. |
-1 on closing this. We see it happening on at least two platforms. We might want to reach out to the OpenSplice people and ask for their support. |
This one also seems to have failed: http://54.183.26.131:8080/job/ros2_batch_ci_osx/118/ |
Yet another instance: http://54.183.26.131:8080/job/ros2_batch_ci_linux/142/testReport/junit/(root)/test_publisher_subscriber_cpp__builtins__rmw_opensplice_cpp/test_publisher_subscriber/ This is after we added the environment variables which direct warnings to stdout. |
Since all builds referenced in this ticket are not available anymore I will add links to a recent (hopefully similar incident): One flaky test (works in different builds): http://ci.ros2.org/view/nightly/job/ros2_batch_ci_osx_nightly/14/ But the gtest output shows a successful and complete run (not available anymore http://ci.ros2.org/view/nightly/job/ros2_batch_ci_osx_nightly/ws/workspace/build/test_rclcpp/test_results/test_rclcpp/gtest_subscription__rmw_connext_cpp.xml/*view*/):
Same for the
The conclusion seems to be that the test runner script executes the gtest binary successful but that fails to exit itself. CTest then kills the test runner script when the timeout is reached and marks the test as failed. Two parts which need fixing:
|
ament/ament_cmake#24 is an attempt to fix the reading from |
Duplicate of ros2/ros2#127 |
The following test failed: http://54.183.26.131:8080/view/ros2/job/ros2_batch_ci_osx/97/testReport/(root)/test_publish_subscribe__rmw_opensplice_cpp/test_publish_subscribe/
But the same state passed in the other build: http://54.183.26.131:8080/view/ros2/job/ros2_batch_ci_osx/98/testReport/
The text was updated successfully, but these errors were encountered: