Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore old signal handler after shutdown #353

Merged
merged 3 commits into from
Aug 11, 2017
Merged

Conversation

dhood
Copy link
Member

@dhood dhood commented Aug 8, 2017

connects to ros2/rmw_implementation#25

This PR makes two main changes:

  1. the original signal handler is restored in an on_shutdown callback. This allows the original signal handler to be called even after rclcpp::shutdown has been called within a process
  2. during shutdown, ignore sigints from interrupting the shutdown process. otherwise deadlocks described in Deadlock on sigint when multiple rmw impl's available rmw_implementation#25 can occur

This has been done to fix flaky tests caused by the deadlock described in ros2/rmw_implementation#25. For it to take effect, it has to be combined with a change such as ros2/demos@190d2f5 so the processes call shutdown. If we were to instead call shutdown in the rclcpp signal handler, that change wouldn't be necessary. Is it appropriate to replace this block of code with a call to shutdown?

I have tried to maintain the preference for sigaction where available, following existing code, but please keep in mind that there may be subtleties of signal handlers that I'm likely to overlook

Standard CI

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

repeating the list_paramters* tests (usually very flaky): Build Status (passed 60 times, failed on 60th because of startup issue that I understand to be unrelated)

@dhood dhood self-assigned this Aug 8, 2017
@dhood dhood added the in progress Actively being worked on (Kanban column) label Aug 8, 2017
@dhood
Copy link
Member Author

dhood commented Aug 9, 2017

@dirk-thomas has clarified that it is appropriate for the demo nodes to call shutdown themselves instead of having it called from rclcpp's signal handler. it doesn't necessarily make sense for the interrupt handler in rclcpp to call shutdown, since maybe users want to spin after sigint in some case.

ros2/system_tests#215 adds some tests

@dhood dhood added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Aug 9, 2017
@dhood
Copy link
Member Author

dhood commented Aug 10, 2017

even if it's not appropriate to call shutdown from the signal handler I still think it's appropriate to ignore interrupts, because the deadlock occurs as a consequence of the guard condition triggering that happens in both places. as this PR is right now, launch_testing sending two interrupts in a row to a node can cause deadlock.

I'm going to factorise out the logic of (ingoring interrupts + triggering the guard condition) and call it from both the signal handler and rclcpp::shutdown

@dhood
Copy link
Member Author

dhood commented Aug 10, 2017

ec03f5f factors the guard condition triggering logic out, and also changes from manually ignoring SIGINTS to just skipping responding to them if g_is_interrupted is true (but signal_value != g_signal_status might be more appropriate?).

@dirk-thomas
Copy link
Member

Why should a second SIGINT not notify the guard condition? I would expect a second signal to notify the condition again. Only in shutdown the signal handler is restored (symmetric to init). E.g. consider the following use case:

init()
...
spin()  // waiting for sigint to return from wait
// do some else
spin()  // a second SIGINT while waiting here should wake up the wait again
...
shutdown()

@dhood
Copy link
Member Author

dhood commented Aug 10, 2017

Thanks for the clear example: I had the ideas of interrupt and shutdown conflated. This is because I thought that it was a double interrupt that was causing another deadlock to occur, since the first interrupt triggered some destruction. Looking closer, the example that I was double interrupting was exiting after the first interrupt (that's where the destruction was coming from), so it was just another instance of the same issue as in ros2/rmw_implementation#25.

Just de-registering our signal handler should be sufficient to fix the deadlocks. I'll post back with CI to confirm

@dhood
Copy link
Member Author

dhood commented Aug 11, 2017

ok finally got to 50 test passes without the other parameter flakiness issue interfering: Build Status (this branch only)

the other flakiness issue is fixed in #356, so this job which includes commits from both branches was able to get the tests to pass 100 times in a row: Build Status

conclusion is that ignoring sigints during shutdown is not necessary; restoring the state of the old signal handler is sufficient

@dhood dhood changed the title Restore old signal handler after shutdown and ignore sigints during shutdown Restore old signal handler after shutdown Aug 11, 2017
@dirk-thomas
Copy link
Member

It would be good to separate the refatoring parts of the patch into one commit and the functional changes into a second commit.

@dhood dhood force-pushed the restore_old_signal_handler branch from 21c595a to d7b7d74 Compare August 11, 2017 17:32
@dhood
Copy link
Member Author

dhood commented Aug 11, 2017

done; the refactoring in d7b7d74 is optional, we can leave it out if it's easier

Copy link
Member

@dirk-thomas dirk-thomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (to be merged without squashing)

@dhood
Copy link
Member Author

dhood commented Aug 11, 2017

CI after rebase

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status (known flaky tests)

@dhood dhood merged commit 89c43e7 into master Aug 11, 2017
@dhood dhood deleted the restore_old_signal_handler branch August 11, 2017 21:02
@dhood dhood mentioned this pull request Feb 27, 2018
nnmm pushed a commit to ApexAI/rclcpp that referenced this pull request Jul 9, 2022
DensoADAS pushed a commit to DensoADAS/rclcpp that referenced this pull request Aug 5, 2022
* QoS Profile Overrides - Player

Signed-off-by: Anas Abou Allaban <aabouallaban@pm.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in review Waiting for review (Kanban column)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants