Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully stop recording upon SIGTERM and SIGINT #2038

Merged
merged 15 commits into from
Sep 23, 2020

Conversation

dabonnie
Copy link
Contributor

@dabonnie dabonnie commented Sep 4, 2020

Resolves #2006

  • Add handling for SIGTERM in recorder
  • Preserve handling for SIGINT in recorder

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
Add unit test for rosbag record SIGTERM handling

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
@dabonnie
Copy link
Contributor Author

dabonnie commented Sep 4, 2020

@dirk-thomas could you please review?

@dirk-thomas
Copy link
Member

The default signal handler calls ros::requestShutdown() since it needs other threads to gracefully end their execution. I don't sync synchronously calling ros::shutdown() in the signal handler is a viable option. Other threads are likely still using the various singleton managers.

@dabonnie
Copy link
Contributor Author

dabonnie commented Sep 4, 2020

The default signal handler calls ros::requestShutdown() since it needs other threads to gracefully end their execution. I don't sync synchronously calling ros::shutdown() in the signal handler is a viable option. Other threads are likely still using the various singleton managers.

Thanks for the info. I will ensure that the default behavior happens for the SIGINT case and will test for the SIGTERM case.

@dabonnie
Copy link
Contributor Author

dabonnie commented Sep 4, 2020

@dirk-thomas is this incorrect? This is where I thought the default behavior was calling shutdown. FWIW using ros::requestShutdown() for both SIGINT and SIGTERM yields a foo.bag.active file.

@thomas-moulard
Copy link

@dabonnie PTAL at man 7 signal-safety : https://man7.org/linux/man-pages/man7/signal-safety.7.html

There are many functions you should not call from a signal handler, and generally signal handlers should not be blocking. You can call requestShutdown in sigterm too if you want, but for the rest, we should patch rosbag so that when rospy.is_shutting_down is true, we properly process the latest active bag file.

@dabonnie
Copy link
Contributor Author

Note: sending kill -2 $(pgrep rosbag) where, per kill -l 2) SIGINT, also results in bad behavior: the bag file is left active and the child record process is left running. So, to reproduce:

  1. Run roscore in terminal window a
  2. Run rosbag record -a in terminal window b
  3. Run kill -2 $(pgrep rosbag) in terminal window c
  4. Run pgrep record in terminal window b or c, notice a returned PID
  5. Run ls *.bag in terminal window b, notice an active bag file

Fix sending SIGINT to main process

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
@dabonnie
Copy link
Contributor Author

@dirk-thomas addressed feedback we discussed earlier today.

tools/rosbag/src/record.cpp Outdated Show resolved Hide resolved
tools/rosbag/src/record.cpp Outdated Show resolved Hide resolved
Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
@dabonnie
Copy link
Contributor Author

@dirk-thomas test failure seems unrelated?

test/test_rosbag/test/record_cleanup.py Outdated Show resolved Hide resolved
test/test_rosbag/test/record_cleanup.py Outdated Show resolved Hide resolved
tools/rosbag/src/record.cpp Outdated Show resolved Hide resolved
tools/rosbag/src/rosbag/rosbag_main.py Show resolved Hide resolved
Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
@dabonnie dabonnie changed the title Gracefully stop recording upon SIGTERM Gracefully stop recording upon SIGTERM and SIGINT Sep 18, 2020
Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
@dabonnie
Copy link
Contributor Author

@dirk-thomas addressed review comments and green CI, please review.

test/test_rosbag/test/record_cleanup.py Outdated Show resolved Hide resolved
tools/rosbag/src/record.cpp Outdated Show resolved Hide resolved
Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
Add record cleanup unit test helper

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
dabonnie and others added 2 commits September 23, 2020 13:30
Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>
@dirk-thomas
Copy link
Member

@ros-pull-request-builder retest this please

@dirk-thomas
Copy link
Member

Thanks for the patch and for iterating on it!

@dirk-thomas dirk-thomas merged commit 7b5a0be into ros:noetic-devel Sep 23, 2020
dabonnie added a commit to dabonnie/ros_comm that referenced this pull request Sep 23, 2020
* Add SIGTERM and SIGINT handlers to rosbag record

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Add unit test for rosbag record SIGINT handling
Add unit test for rosbag record SIGTERM handling

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Address review comments
Fix sending SIGINT to main process

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Revert added whitespace

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Revert SIGINT handler addition: use default

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Remove unnecessary wait

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Use BSD License

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Add test improvements

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Move test helper function

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Remove redundant test rosbag launch

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Add Amazon to new python test copyright

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Remove unrelated whitespace

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Split record cleanup unit tests
Add record cleanup unit test helper

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Revert whitespace change

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* revert white space change

Co-authored-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
jacobperron pushed a commit that referenced this pull request Oct 16, 2020
* Add SIGTERM and SIGINT handlers to rosbag record

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Add unit test for rosbag record SIGINT handling
Add unit test for rosbag record SIGTERM handling

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Address review comments
Fix sending SIGINT to main process

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Revert added whitespace

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Revert SIGINT handler addition: use default

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Remove unnecessary wait

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Use BSD License

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Add test improvements

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Move test helper function

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Remove redundant test rosbag launch

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Add Amazon to new python test copyright

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Remove unrelated whitespace

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Split record cleanup unit tests
Add record cleanup unit test helper

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Revert whitespace change

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* revert white space change

Co-authored-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
jacobperron pushed a commit that referenced this pull request Oct 22, 2020
* Add SIGTERM and SIGINT handlers to rosbag record

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Add unit test for rosbag record SIGINT handling
Add unit test for rosbag record SIGTERM handling

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Address review comments
Fix sending SIGINT to main process

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Revert added whitespace

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Revert SIGINT handler addition: use default

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Remove unnecessary wait

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Use BSD License

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Add test improvements

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Move test helper function

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Remove redundant test rosbag launch

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Add Amazon to new python test copyright

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Remove unrelated whitespace

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Split record cleanup unit tests
Add record cleanup unit test helper

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* Revert whitespace change

Signed-off-by: Devin Bonnie <dbbonnie@amazon.com>

* revert white space change

Co-authored-by: Dirk Thomas <dirk-thomas@users.noreply.github.com>
nim65s added a commit to nim65s/robotpkg that referenced this pull request Mar 12, 2021
Because DEPEND_ABI.ros-comm.noetic?= ros-comm>=1.15

1.15.9 (2020-10-16)
-------------------
* Fix deadlock when service connection is dropped (ros/ros_comm#2074)
* Update maintainers (ros/ros_comm#2075)
* Fix case where accessing cached parameters shuts down another node (ros/ros_comm#2068)
* Fix spelling (ros/ros_comm#2066)
* Fix Lost Wake Bug in ROSOutAppender (ros/ros_comm#2033)
* Fix compatibility issue with boost 1.73 and above (ros/ros_comm#2023)
* Gracefully stop recording upon SIGTERM and SIGINT (ros/ros_comm#2038)
* Use heapq.merge instead of custom merge sort code (ros/ros_comm#2017)
* Fix handling of single quotes in command arguments on Windows (ros/ros_comm#2051)
* Clearer error message (ros/ros_comm#2035)
* Ignore underscores when parsing literal numeric values for Python 3 compatibility (ros/ros_comm#2022)
* Clear cached URI for a node that has gone offline (ros/ros_comm#2010)
* Add skip_cache parameter to rosnode_ping() (ros/ros_comm#2009)
* Install advertisetest (ros/ros_comm#2046)
* Use range instead of xrange for Python 3 compatibility (ros/ros_comm#2013)
* Fix to address CVE-2020-16124 (ros/ros_comm#2065)
* Fix XmlRpcValue::_doubleFormat being unused (ros/ros_comm#2003)

1.15.8 (2020-07-23)
-------------------
* change is_async_connected to use epoll when available (ros/ros_comm#1983)
* allow mixing latched and unlatched publishers (ros/ros_comm#1991)
* remove not existing NodeProxy from rospy __all_\_ (ros/ros_comm#2007)
* fix typo in topics.py (ros/ros_comm#1977)
* fix bad relative import (still Python 2 style) (ros/ros_comm#1973)
* improve shutdown message with duplicate node name (ros/ros_comm#1992)
* remove dependency on rostopic from rostest package (ros/ros_comm#2002)
* fix missing reload() function in Python 3 (ros/ros_comm#1968)
* add latch param to throttle (ros/ros_comm#1944)
* add const versions of XmlRpcValue converting operators (ros/ros_comm#1978)

1.15.7 (2020-05-28)
-------------------
* fix Windows build break (ros/ros_comm#1961)
* fix NameError in launch error handling (ros/ros_comm#1965)

1.15.6 (2020-05-21)
-------------------
* fix a bug that using a destroyed connection object (ros/ros_comm#1950)

1.15.5 (2020-05-15)
-------------------
* check if async socket connect is success or failure before TransportTCP::read() and TransportTCP::write() (ros/ros_comm#1954)
* fix bug that connection drop signal related funtion throw a bad_weak exception (ros/ros_comm#1940)
* multiple latched publishers per process on the same topic (ros/ros_comm#1544)
* fix negative numbers in ros statistics (ros/ros_comm#1531)
* remove extra \n in ROS_DEBUG (ros/ros_comm#1925)
* add option to repeat latched messages at the start of bag splits (ros/ros_comm#1850)
* fix bag migration failures caused by typo in connection_header assignment (ros/ros_comm#1952)
* fix brief description comments after members (ros/ros_comm#1920)
* add --sigint-timeout and --sigterm-timeout parameters (ros/ros_comm#1937)
* roslaunch-check: search dir recursively (ros/ros_comm#1914)
* sort printed nodes by namespace alphabetically (ros/ros_comm#1934)
* remove pycrypto import (not used) (ros/ros_comm#1922)
* avoid infinite recursion in rosrun tab completion when rosbash is not installed (ros/ros_comm#1948)
* fix bare pointer in topic_tools::ShapeShifter (ros/ros_comm#1722)
* clear message queue on simtime jumping back (ros/ros_comm#1518)
* use undefined dynamic_lookup on macOS (ros/ros_comm#1923)
* check if enough FDs are free, instead counting the total free FDs (ros/ros_comm#1929)

1.15.4 (2020-03-19)
-------------------
* restrict boost dependencies to components used (ros/ros_comm#1871)
* add exception for ConnectionAbortedError (ros/ros_comm#1908)
* fix mac trying to use epoll instead of kqueue (ros/ros_comm#1907)
* fix AttributeError: __exit__ (ros/ros_comm#1915, regression from 1.14.4)

1.15.3 (2020-02-28)
-------------------
* remove Boost version check since Noetic only targets platforms with 1.67+ (ros/ros_comm#1903)

1.15.2 (2020-02-25)
-------------------
* export missing Boost dependency (ros/ros_comm#1898)
* add timestamp formatting for rosconsole (ros/ros_comm#1892)

1.15.1 (2020-02-24)
-------------------
* fix missing boost dependencies (ros/ros_comm#1895)
* use setuptools instead of distutils (ros/ros_comm#1870)
* increase time limit of advertisetest/publishtest.test to reduce flakyness (ros/ros_comm#1897)

1.15.0 (2020-02-21)
-------------------
* fix dictionary changed size during iteration (ros/ros_comm#1894)
* update test to pass with old and new yaml (ros/ros_comm#1893)

Packaging changes:
- removed patch-an, as there are no more boost version checks
- updated patch-ao
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rosbag leaves an ".active" file dangling when receiving SIGTERM
3 participants