-
Notifications
You must be signed in to change notification settings - Fork 522
Race condition when shutting down contexts (or ctrl+c) #812
Description
Bug report
Required Info:
- Operating System:
- Ubuntu 18.04 (it should be reproducible in others)
- Installation type:
- from source
- Version or commit hash:
- master (9be3e08)
- DDS implementation:
- fastrtps (it should happen in the others)
- Client library (if applicable):
- rclcpp
Steps to reproduce issue
Modify demo_nodes_cpp talker publish rate:
- timer_ = this->create_wall_timer(1s, publish_message);
+ timer_ = this->create_wall_timer(10ms, publish_message);
Use the following script:
#!/bin/bash
set -e
COUNTER=0
while [ $COUNTER -lt 1000 ]; do
echo "Retrying. Counter: $COUNTER"
exec -a "TALKER${COUNTER}" <demo_nodes_cpp PREFIX>/lib/demo_nodes_cpp/talker > /dev/null &
sleep 0.1
pkill --signal 2 -f "TALKER${COUNTER}"
let COUNTER=COUNTER+1
doneExpected behavior
Script should finish without showing any error message.
Actual behavior
I got once this:
Retrying. Counter: 940
Retrying. Counter: 941
Failed to publish log message to rosout: publisher's context is invalid, at /home/ivanpauno/ros2_ws/src/ros2/rcl/rcl/src/rcl/publisher.c:343
Retrying. Counter: 942
Retrying. Counter: 943
Retrying. Counter: 944
The script continued, so the return code was 0.
The script does one thousands try, and I got just once of this errors.
Additional information
rclcpp shuts down context when handling SIGINT:
rclcpp/rclcpp/src/rclcpp/signal_handler.cpp
Lines 255 to 280 in 9be3e08
| void | |
| SignalHandler::deferred_signal_handler() | |
| { | |
| while (true) { | |
| if (signal_received_.exchange(false)) { | |
| RCLCPP_DEBUG(get_logger(), "deferred_signal_handler(): SIGINT received, shutting down"); | |
| for (auto context_ptr : rclcpp::get_contexts()) { | |
| if (context_ptr->get_init_options().shutdown_on_sigint) { | |
| RCLCPP_DEBUG( | |
| get_logger(), | |
| "deferred_signal_handler(): " | |
| "shutting down rclcpp::Context @ %p, because it had shutdown_on_sigint == true", | |
| context_ptr.get()); | |
| context_ptr->shutdown("signal handler"); | |
| } | |
| } | |
| } | |
| if (!is_installed()) { | |
| RCLCPP_DEBUG(get_logger(), "deferred_signal_handler(): signal handling uninstalled"); | |
| break; | |
| } | |
| RCLCPP_DEBUG(get_logger(), "deferred_signal_handler(): waiting for SIGINT or uninstall"); | |
| wait_for_signal(); | |
| RCLCPP_DEBUG(get_logger(), "deferred_signal_handler(): woken up due to SIGINT or uninstall"); | |
| } | |
| } |
There is no synchronization between this shut down and the executors (which are still running).
It may happen that something that is being executed needs a context that has been shut down, so an error is possible.
Depending on how an invalid context is handled in each case, it could be just an error message or something else (e.g.: throwing an exception).
The error is unlikely as most of the time the executor is waiting for work, and after waiting we check if the context is still valid. But without any synchronization between shutdown and the executors, I think that a variety of different errors are possible.