Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deadlock race condition on compression shutdown #616

Merged

Conversation

emersonknapp
Copy link
Collaborator

@emersonknapp emersonknapp commented Jan 26, 2021

When running the test test_sequential_compression_writer repeatedly, the test sometimes hangs on shutdown, due to a gap in the threading synchronization between the compressor threads and the main thread.

The order of operations is:

  • Worker thread N: compressor_thread_fn reads while(compression_is_running_) to be true, so it enters the while block
  • Main thread: stop_compressor_threads runs compression_is_running_=false; compressor_condition_.notify_all(), notifying any currently waiting compression threads to wake up, which lets them exit after finding no work to do
    • main thread then calls .join() on all the threads
  • Worker thread N: compressor_thread_fn, having entered the while loop successfully, initializes some variables, acquires the compressor_queue_mutex_, and calls compressor_condition_.wait()
  • Because the compressor_thread has called wait after the final notify_all, it will never receive a notification, and the .join() call waits forever for this waiting thread to exit

This PR fixes the problem by synchronizing the checks of both exit conditions so that the main thread can't interleave the notifications in a way that causes the deadlock.

NOTE: I am not sure how to add a repeatable regression test for this - the way I can reproduce it is by running the following, which reliably makes it happen quite quickly (within a second). After my change it'll run indefinitely without failure.

build/rosbag2_compression/test_sequential_compression_writer --gtest_repeat=-1 --gtest_throw_on_failure

@emersonknapp emersonknapp requested a review from a team as a code owner January 26, 2021 21:10
@emersonknapp emersonknapp requested review from mjeronimo and jaisontj and removed request for a team January 26, 2021 21:10
@emersonknapp emersonknapp force-pushed the emersonknapp/fix-compression-shutdown-race-condition branch from 1c3da2a to 1a6dc0b Compare January 26, 2021 21:19
@emersonknapp emersonknapp changed the title HELP - race condition on shutdown due to timing of condition reading Fix deadlock race condition on compression shutdown Jan 26, 2021
Copy link
Contributor

@zmichaels11 zmichaels11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to wrap sections where compression_is_running_ is assigned with a lock on the same mutex?

@emersonknapp emersonknapp force-pushed the emersonknapp/fix-compression-shutdown-race-condition branch from 9971460 to a9e435c Compare January 26, 2021 22:31
@emersonknapp emersonknapp force-pushed the emersonknapp/fix-compression-shutdown-race-condition branch from f09345a to 32519d6 Compare January 26, 2021 23:46
Emerson Knapp added 4 commits January 26, 2021 15:48
Signed-off-by: Emerson Knapp <eknapp@amazon.com>
Signed-off-by: Emerson Knapp <eknapp@amazon.com>
Signed-off-by: Emerson Knapp <eknapp@amazon.com>
Signed-off-by: Emerson Knapp <eknapp@amazon.com>
@emersonknapp emersonknapp force-pushed the emersonknapp/fix-compression-shutdown-race-condition branch from 32519d6 to 3bf0fc2 Compare January 26, 2021 23:48
@emersonknapp
Copy link
Collaborator Author

Gist: https://gist.githubusercontent.com/emersonknapp/1d3e63e26efdc9f1596e750454471f19/raw/198dbe7ad4255058e89eefccdfb923da7ff52b7e/ros2.repos
BUILD args: --packages-up-to rosbag2_compression rosbag2_tests
TEST args: --packages-select rosbag2_compression rosbag2_tests
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/7528

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@emersonknapp emersonknapp merged commit f0e5744 into master Jan 27, 2021
@delete-merged-branch delete-merged-branch bot deleted the emersonknapp/fix-compression-shutdown-race-condition branch January 27, 2021 01:05
emersonknapp added a commit that referenced this pull request Feb 2, 2021
* Synchronize compression shutdown correctly, avoiding occasional deadlock

Signed-off-by: Emerson Knapp <eknapp@amazon.com>
emersonknapp added a commit that referenced this pull request Feb 17, 2021
* Synchronize compression shutdown correctly, avoiding occasional deadlock

Signed-off-by: Emerson Knapp <eknapp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants