Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

LinuxMemoryMeasurementTestFixture.TestPublishMetricsMessage test is flaky #96

Closed
thomas-moulard opened this issue Feb 26, 2020 · 4 comments · Fixed by #100
Closed

LinuxMemoryMeasurementTestFixture.TestPublishMetricsMessage test is flaky #96

thomas-moulard opened this issue Feb 26, 2020 · 4 comments · Fixed by #100
Labels
bug Something isn't working

Comments

@thomas-moulard
Copy link
Member

Description

LinuxMemoryMeasurementTestFixture.TestPublishMetricsMessage is flaky

Expected Behavior

The test should always succeed.

Actual Behavior

The test sometimes fail because it relies on having messages processed "fast enough". This has been reported in an ASAN build, most likely due to the fact that ASAN builds are slower than normal builds.

https://github.com/ros-tooling/system_metrics_collector/runs/470335108?check_suite_focus=true

Failed test output
2020-02-26T17:21:35.8557391Z test 4
2020-02-26T17:21:35.8557460Z       Start  4: test_linux_memory_measurement_node
2020-02-26T17:21:35.8557524Z 
2020-02-26T17:21:35.8558472Z 4: Test command: /usr/bin/python3 "-u" "/home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/install/ament_cmake_test/share/ament_cmake_test/cmake/run_test.py" "/home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/build/system_metrics_collector/test_results/system_metrics_collector/test_linux_memory_measurement_node.gtest.xml" "--package-name" "system_metrics_collector" "--output-file" "/home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/build/system_metrics_collector/ament_cmake_gtest/test_linux_memory_measurement_node.txt" "--command" "/home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/build/system_metrics_collector/test_linux_memory_measurement_node" "--gtest_output=xml:/home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/build/system_metrics_collector/test_results/system_metrics_collector/test_linux_memory_measurement_node.gtest.xml"
2020-02-26T17:21:35.8558649Z 4: Test timeout computed to be: 60
2020-02-26T17:21:35.8558978Z 4: -- run_test.py: invoking following command in '/home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/build/system_metrics_collector':
2020-02-26T17:21:35.8559465Z 4:  - /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/build/system_metrics_collector/test_linux_memory_measurement_node --gtest_output=xml:/home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/build/system_metrics_collector/test_results/system_metrics_collector/test_linux_memory_measurement_node.gtest.xml
2020-02-26T17:21:35.8559602Z 4: Running main() from /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/install/gtest_vendor/src/gtest_vendor/src/gtest_main.cc
2020-02-26T17:21:35.8559712Z 4: [==========] Running 3 tests from 2 test cases.
2020-02-26T17:21:35.8559945Z 4: [----------] Global test environment set-up.
2020-02-26T17:21:35.8560246Z 4: [----------] 1 test from LinuxMemoryMeasurementTest
2020-02-26T17:21:35.8560353Z 4: [ RUN      ] LinuxMemoryMeasurementTest.TestReadInvalidFile
2020-02-26T17:21:35.8560456Z 4: [ERROR] [1582737685.733303781] [ReadFileToString]: unable to parse file: this_will_fail.txt
2020-02-26T17:21:35.8560813Z 4: [       OK ] LinuxMemoryMeasurementTest.TestReadInvalidFile (0 ms)
2020-02-26T17:21:35.8561153Z 4: [----------] 1 test from LinuxMemoryMeasurementTest (0 ms total)
2020-02-26T17:21:35.8561247Z 4: 
2020-02-26T17:21:35.8561484Z 4: [----------] 2 tests from LinuxMemoryMeasurementTestFixture
2020-02-26T17:21:35.8561583Z 4: [ RUN      ] LinuxMemoryMeasurementTestFixture.testManualMeasurement
2020-02-26T17:21:35.8561776Z 4: [       OK ] LinuxMemoryMeasurementTestFixture.testManualMeasurement (19 ms)
2020-02-26T17:21:35.8561862Z 4: [ RUN      ] LinuxMemoryMeasurementTestFixture.TestPublishMetricsMessage
2020-02-26T17:21:35.8561985Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8562092Z 4: Expected equality of these values:
2020-02-26T17:21:35.8562180Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8562270Z 4:     Which is: 44.148414611816406
2020-02-26T17:21:35.8562355Z 4:   stat.data
2020-02-26T17:21:35.8562421Z 4:     Which is: 84.423248291015625
2020-02-26T17:21:35.8562524Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8562634Z 4: Expected equality of these values:
2020-02-26T17:21:35.8562722Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8562806Z 4:     Which is: 44.148414611816406
2020-02-26T17:21:35.8562891Z 4:   stat.data
2020-02-26T17:21:35.8562957Z 4:     Which is: 98.533164978027344
2020-02-26T17:21:35.8563068Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8563170Z 4: Expected equality of these values:
2020-02-26T17:21:35.8563254Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8563337Z 4:     Which is: 44.148414611816406
2020-02-26T17:21:35.8563422Z 4:   stat.data
2020-02-26T17:21:35.8563487Z 4:     Which is: 70.313331604003906
2020-02-26T17:21:35.8563591Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8563695Z 4: Expected equality of these values:
2020-02-26T17:21:35.8563778Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8563861Z 4:     Which is: 1
2020-02-26T17:21:35.8563928Z 4:   stat.data
2020-02-26T17:21:35.8564011Z 4:     Which is: 2
2020-02-26T17:21:35.8564113Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8564214Z 4: Expected equality of these values:
2020-02-26T17:21:35.8564297Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8564379Z 4:     Which is: 0
2020-02-26T17:21:35.8564444Z 4:   stat.data
2020-02-26T17:21:35.8564524Z 4:     Which is: 14.109916687011719
2020-02-26T17:21:35.8564626Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8564733Z 4: Expected equality of these values:
2020-02-26T17:21:35.8564815Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8564900Z 4:     Which is: 84.423248291015625
2020-02-26T17:21:35.8564967Z 4:   stat.data
2020-02-26T17:21:35.8565046Z 4:     Which is: 70.385238647460938
2020-02-26T17:21:35.8565201Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8565314Z 4: Expected equality of these values:
2020-02-26T17:21:35.8565397Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8565466Z 4:     Which is: 98.533164978027344
2020-02-26T17:21:35.8565550Z 4:   stat.data
2020-02-26T17:21:35.8565630Z 4:     Which is: 70.385238647460938
2020-02-26T17:21:35.8565733Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8565877Z 4: Expected equality of these values:
2020-02-26T17:21:35.8565962Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8566028Z 4:     Which is: 70.313331604003906
2020-02-26T17:21:35.8566114Z 4:   stat.data
2020-02-26T17:21:35.8566195Z 4:     Which is: 70.385238647460938
2020-02-26T17:21:35.8566299Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8566398Z 4: Expected equality of these values:
2020-02-26T17:21:35.8566466Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8566550Z 4:     Which is: 2
2020-02-26T17:21:35.8566632Z 4:   stat.data
2020-02-26T17:21:35.8566713Z 4:     Which is: 1
2020-02-26T17:21:35.8566812Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8566918Z 4: Expected equality of these values:
2020-02-26T17:21:35.8566985Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8567069Z 4:     Which is: 14.109916687011719
2020-02-26T17:21:35.8567154Z 4:   stat.data
2020-02-26T17:21:35.8567234Z 4:     Which is: 0
2020-02-26T17:21:35.8567335Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:331: Failure
2020-02-26T17:21:35.8567419Z 4: Expected equality of these values:
2020-02-26T17:21:35.8567502Z 4:   3
2020-02-26T17:21:35.8568507Z 4:   test_receive_measurements->GetNumReceived()
2020-02-26T17:21:35.8568605Z 4:     Which is: 2
2020-02-26T17:21:35.8568709Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:352: Failure
2020-02-26T17:21:35.8568818Z 4: Expected equality of these values:
2020-02-26T17:21:35.8568885Z 4:   3
2020-02-26T17:21:35.8569119Z 4:   test_receive_measurements->GetNumReceived()
2020-02-26T17:21:35.8569207Z 4:     Which is: 2
2020-02-26T17:21:35.8569312Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8569412Z 4: Expected equality of these values:
2020-02-26T17:21:35.8569496Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8569564Z 4:     Which is: 70.385238647460938
2020-02-26T17:21:35.8569647Z 4:   stat.data
2020-02-26T17:21:35.8569727Z 4:     Which is: 70.330429077148438
2020-02-26T17:21:35.8569828Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8569932Z 4: Expected equality of these values:
2020-02-26T17:21:35.8569999Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8570082Z 4:     Which is: 70.385238647460938
2020-02-26T17:21:35.8570165Z 4:   stat.data
2020-02-26T17:21:35.8570346Z 4:     Which is: 70.330429077148438
2020-02-26T17:21:35.8570463Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8570565Z 4: Expected equality of these values:
2020-02-26T17:21:35.8570633Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8570717Z 4:     Which is: 70.385238647460938
2020-02-26T17:21:35.8570800Z 4:   stat.data
2020-02-26T17:21:35.8570879Z 4:     Which is: 70.330429077148438
2020-02-26T17:21:35.8570984Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8571128Z 4: Expected equality of these values:
2020-02-26T17:21:35.8571212Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8571298Z 4:     Which is: 70.330429077148438
2020-02-26T17:21:35.8571382Z 4:   stat.data
2020-02-26T17:21:35.8571462Z 4:     Which is: 70.328407287597656
2020-02-26T17:21:35.8571564Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8571652Z 4: Expected equality of these values:
2020-02-26T17:21:35.8571735Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8571820Z 4:     Which is: 70.330429077148438
2020-02-26T17:21:35.8571901Z 4:   stat.data
2020-02-26T17:21:35.8571982Z 4:     Which is: 70.33050537109375
2020-02-26T17:21:35.8572074Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8572175Z 4: Expected equality of these values:
2020-02-26T17:21:35.8572259Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8572342Z 4:     Which is: 70.330429077148438
2020-02-26T17:21:35.8572424Z 4:   stat.data
2020-02-26T17:21:35.8572505Z 4:     Which is: 70.326309204101562
2020-02-26T17:21:35.8572592Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8572692Z 4: Expected equality of these values:
2020-02-26T17:21:35.8572776Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8572859Z 4:     Which is: 1
2020-02-26T17:21:35.8572939Z 4:   stat.data
2020-02-26T17:21:35.8573007Z 4:     Which is: 2
2020-02-26T17:21:35.8573105Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8573209Z 4: Expected equality of these values:
2020-02-26T17:21:35.8573294Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8573375Z 4:     Which is: 0
2020-02-26T17:21:35.8573455Z 4:   stat.data
2020-02-26T17:21:35.8573523Z 4:     Which is: 0.0020976180676370859
2020-02-26T17:21:35.8573625Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8573724Z 4: Expected equality of these values:
2020-02-26T17:21:35.8573807Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8573889Z 4:     Which is: 70.328407287597656
2020-02-26T17:21:35.8573959Z 4:   stat.data
2020-02-26T17:21:35.8574040Z 4:     Which is: 70.326553344726562
2020-02-26T17:21:35.8574141Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8574287Z 4: Expected equality of these values:
2020-02-26T17:21:35.8574380Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8574465Z 4:     Which is: 70.33050537109375
2020-02-26T17:21:35.8574532Z 4:   stat.data
2020-02-26T17:21:35.8574614Z 4:     Which is: 70.326553344726562
2020-02-26T17:21:35.8574717Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8574821Z 4: Expected equality of these values:
2020-02-26T17:21:35.8574903Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8575033Z 4:     Which is: 70.326309204101562
2020-02-26T17:21:35.8575101Z 4:   stat.data
2020-02-26T17:21:35.8575369Z 4:     Which is: 70.326553344726562
2020-02-26T17:21:35.8575478Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8575579Z 4: Expected equality of these values:
2020-02-26T17:21:35.8575665Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8575734Z 4:     Which is: 2
2020-02-26T17:21:35.8575817Z 4:   stat.data
2020-02-26T17:21:35.8575898Z 4:     Which is: 1
2020-02-26T17:21:35.8576000Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:275: Failure
2020-02-26T17:21:35.8576104Z 4: Expected equality of these values:
2020-02-26T17:21:35.8576188Z 4:   expected_stat.at(stat.data_type)
2020-02-26T17:21:35.8576263Z 4:     Which is: 0.0020976180676370859
2020-02-26T17:21:35.8576348Z 4:   stat.data
2020-02-26T17:21:35.8576430Z 4:     Which is: 0
2020-02-26T17:21:35.8576536Z 4: /home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/src/system_metrics_collector/system_metrics_collector/test/system_metrics_collector/test_linux_memory_measurement.cpp:371: Failure
2020-02-26T17:21:35.8576636Z 4: Expected equality of these values:
2020-02-26T17:21:35.8576706Z 4:   6
2020-02-26T17:21:35.8576982Z 4:   test_receive_measurements->GetNumReceived()
2020-02-26T17:21:35.8577074Z 4:     Which is: 5
2020-02-26T17:21:35.8577170Z 4: [  FAILED  ] LinuxMemoryMeasurementTestFixture.TestPublishMetricsMessage (766 ms)
2020-02-26T17:21:35.8577443Z 4: [----------] 2 tests from LinuxMemoryMeasurementTestFixture (785 ms total)
2020-02-26T17:21:35.8577523Z 4: 
2020-02-26T17:21:35.8577750Z 4: [----------] Global test environment tear-down
2020-02-26T17:21:35.8577845Z 4: [==========] 3 tests from 2 test cases ran. (785 ms total)
2020-02-26T17:21:35.8577943Z 4: [  PASSED  ] 2 tests.
2020-02-26T17:21:35.8578028Z 4: [  FAILED  ] 1 test, listed below:
2020-02-26T17:21:35.8578123Z 4: [  FAILED  ] LinuxMemoryMeasurementTestFixture.TestPublishMetricsMessage
2020-02-26T17:21:35.8578202Z 4: 
2020-02-26T17:21:35.8578284Z 4:  1 FAILED TEST
2020-02-26T17:21:35.8578686Z 4: -- run_test.py: return code 1
2020-02-26T17:21:35.8579088Z 4: -- run_test.py: inject classname prefix into gtest result file '/home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/build/system_metrics_collector/test_results/system_metrics_collector/test_linux_memory_measurement_node.gtest.xml'
2020-02-26T17:21:35.8579485Z 4: -- run_test.py: verify result file '/home/runner/work/system_metrics_collector/system_metrics_collector/ros_ws/build/system_metrics_collector/test_results/system_metrics_collector/test_linux_memory_measurement_node.gtest.xml'

To Reproduce

Run:

./test_linux_memory_measurement_node --gtest_repeat=99999999

Wait until it starts failing. On fast workstations, buildling with ASAN enabled may be necessary to slow-down the execution enough to exhibit the bug.

System (please complete the following information)

  • OS: Bionic
  • ROS2 Distro: master

Additional context

To make this bug reliable, each step in the test needs to wait until the ROS messages are actually processed. The callback may switch a std::atomic_bool or a std::atomic_flag for instance. Blocking until the lifecycle API starts reporting the new state may also be another possibility.

If waiting for messages to be processed is not enough, we should explore whether cache coherency could play a role in this bug.

@thomas-moulard thomas-moulard added the bug Something isn't working label Feb 26, 2020
@thomas-moulard
Copy link
Member Author

This assertion is failing:

@thomas-moulard
Copy link
Member Author

This test needs to be modified to remove kTestDuration, we should spin until a premise is satisfied instead.

@thomas-moulard
Copy link
Member Author

I'm also a bit concerned w.r.t. cache coherency, making expected_stats_ const would solve this. times_received_ should be protected with a mutex.

@dabonnie
Copy link
Contributor

Rather than add the complexity testing for message publication in the unit tests, see this PR #98 which adds a test to check for multiple received messages.

To fix the failures here, your comments are still valid when attempting to receive a single message and check its contents. I would expect the tests to fail in the LinuxProcessCpuMeasurementTestFixture as well, as they also fail here: #83

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants