Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize test_record by reducing copies of executors and messages #576

Merged
merged 2 commits into from
Dec 4, 2020

Conversation

emersonknapp
Copy link
Collaborator

@emersonknapp emersonknapp commented Dec 3, 2020

Fixes #528

Tested locally with

stress -c 16
# and
colcon test --packages-select rosbag2_transport --retest-until-fail 10

As well as 1400 successful sequential runs (closed at that point, not failed) of test_record__rmw_fastrtps_cpp

Introduces 2 changes which both caused issues:

Executor copies (wait_for.hpp)

Every time we called spin_some, a new executor was constructed. This would happen potentially a few thousand times per test. Given that the global context was not shutdown during this time, it would accumulate on_shutdown_callbacks_, one from each executor that was created. When this list was very large we would occasionally get std::bad_function_call for one of the callbacks in the shutdown.

With this change, the global Context has only 2 items in on_shutdown_callbacks_, and I am not able to reproduce bad_function_call any more.

After this change I still see a segfault, usually after 100-500 iterations of (for example) RecordIntegrationTestFixture.records_sensor_data. I'm still investigating this, but it is a separate error.

STL container copies

The MockSequentialWriter used in test_record would pass its internal containers back to viewers by value. This created many copies of the shared_ptr<SerializedBagMessage>s. Upon running repeatedly, this copy operation would occasionally end in a segfault. Since outside the MockSequentialWriter, the values are only observed, we are able to avoid that situation by never copying the values and returning const references to the existing structures.

Conclusion

Though it's a little unclear exactly what caused either issue at the cause end, both changes are best-practices more correct code and avoid the situations that brought about the problems, which improves performance and stability of these tests.

Recommended followups:

  • find out why rclcpp::Context throws bad_function_call on shutdown when on_shutdown_callbacks_ is very large. A simple repro case should be easy enough to construct

…he global context shutdown

Signed-off-by: Emerson Knapp <eknapp@amazon.com>
@emersonknapp emersonknapp requested a review from a team as a code owner December 3, 2020 21:27
@emersonknapp emersonknapp requested review from Karsten1987 and jaisontj and removed request for a team December 3, 2020 21:27
@emersonknapp emersonknapp force-pushed the emersonknapp/fewer_executors_in_tests branch from f35e06d to ff81785 Compare December 3, 2020 23:39
…eference

Signed-off-by: Emerson Knapp <eknapp@amazon.com>
@emersonknapp emersonknapp changed the title Use a single executor in the wait_for utility to put less strain on the global context shutdown Stabilize test_record by reducing copies of executors and messages Dec 4, 2020
@emersonknapp
Copy link
Collaborator Author

emersonknapp commented Dec 4, 2020

Gist: https://gist.githubusercontent.com/emersonknapp/d47520e678e57235041ab30236612b47/raw/bce3c825c328d028c5604e18798bf07c217c9976/ros2.repos
BUILD args: --packages-up-to rosbag2_transport
TEST args: --packages-select rosbag2_transport
Job: ci_launcher

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@emersonknapp emersonknapp merged commit a5a8638 into master Dec 4, 2020
@delete-merged-branch delete-merged-branch bot deleted the emersonknapp/fewer_executors_in_tests branch December 4, 2020 17:45
@jacobperron
Copy link
Member

@emersonknapp Thanks for the fix!

emersonknapp added a commit that referenced this pull request Feb 2, 2021
)

* Use a single executor in the wait_for utility to put less strain on the global context shutdown
* Don't copy around message pointers vectors, rather observe by const reference

Signed-off-by: Emerson Knapp <eknapp@amazon.com>
emersonknapp added a commit that referenced this pull request Feb 17, 2021
)

* Use a single executor in the wait_for utility to put less strain on the global context shutdown
* Don't copy around message pointers vectors, rather observe by const reference

Signed-off-by: Emerson Knapp <eknapp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_record__rmw_fastrtps_cpp failing
4 participants