Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOOLS-2366: Update signal handler to exit after first signal #41

Closed
wants to merge 2 commits into from

Conversation

sphanse99
Copy link

No description provided.

Copy link
Collaborator

@mattChiaravalloti mattChiaravalloti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, well done! 🎉 🏆 💯

signals/signals.go Outdated Show resolved Hide resolved
signals/signals.go Show resolved Hide resolved
@mattChiaravalloti
Copy link
Collaborator

Oh! Actually one follow up at the commit message. Have you seen this resource: https://chris.beams.io/posts/git-commit/ ? Can you update this PR's commit message (the title) to follow those guidelines? Maybe something like "TOOLS-2366: Update signal handler to exit after first signal" or something

@sphanse99 sphanse99 changed the title TOOLS-2366: ^C isn't handled by mongodump TOOLS-2366: Update signal handler to exit after first signal Jul 8, 2020
// first signal use finalizer to terminate cleanly
switch sig := <-sigChan; sig {
// SIGINT & SIGTERM: use finalizer & terminate cleanly
case syscall.SIGINT, syscall.SIGTERM:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question is for syscall.SIGINT, syscall.SIGTERM, do we need to follow the previous design, i.e. call finalizer to clean up the resources such as doc channel etc on the first signal and then exit on the second signal. With current change, I believe restore will exit with the first ^C attempt.

Copy link
Author

@sphanse99 sphanse99 Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, finalizer() is still called in this implementation (line 47), but now we call os.Exit() after the finalizer instead of waiting for the second signal. We decided this design was more intuitive. Let me know if further clarification would be helpful!

Although, I wonder if we should send a os.Exit(util.ExitSuccess) here since we did shutdown processes through finalizer... thoughts? Not entirely sure what the difference between util.ExitFailure and util.ExitSuccess is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. It's OK then if you are aware this change from past design.
Btw, I don't think we shutdown process in the finalizer, looking from restore.HandleInterrupt, it's just close channels instead of callingos.Exit. So we need to call the exit here.
The difference between util.ExitFailure and util.ExitSuccess is the status code program exits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good correction. In that case, do you think it's more appropriate to use ExitSuccess or ExitFailure here (or does it matter)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since in the previous code it uses ExitFailure, I think it's fine to use ExitFailure here.

Copy link
Collaborator

@mattChiaravalloti mattChiaravalloti Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huan is correct that the existing finalizer functions don't call os.Exit (though it is possible for a caller of this function to do that if they want). I think os.Exit(util.ExitSuccess) makes sense when we shutdown gracefully. Not sure what the convention for this is. (ignore this, I agree with the outcome you guys came up with)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, after a second thought, I believe there is some other content. So in the previous design, if you hit ^C once, it will call finalizer to shutdown channel and after a short time, the main function should exit gracefully, which would return ExitSuccess. Now, since you changed the two step procedure, the program will exit in the first attempt, then the program might not shutdown gracefully. We should return a ExitFailure here.
So the extra content is in the previous design, if you hit ^C once, the program should not immediately, but eventually shutdown gracefully. But now it would immediate exit with a failure status code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and after a short time, the main function should exit gracefully

Yeah, it seems like that is currently happening correctly. I think I originally saw mongodump hanging instead of exiting, but I can't reproduce that right now. @sphanse99, did you observe the hanging behavior in your testing?

Re exit codes, I think exiting with 0 or 1 when catching a signal is incorrect behavior. The POSIX spec says the exit code should be greater than 128 when a program is terminated due to a signal. It seems that most bash-like shells construct this as 128 + n where n is the code of the signal used to terminate the program. So a SIGINT should exit with a code 130 for example. However, it looks like this varies between shells. I read a recommendation to do the following to ensure the proper exit code:

  1. Catch the signal and do any necessary cleanup.
  2. Stop catching the signal.
  3. Send a the signal to the current process and let the OS terminate it up and generate the correct exit code.

The recommendation says there is no way to fake the correct exit code, so sending a signal to yourself is the only way to ensure correct behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed the hanging when I started the ticket, but also noticed this morning that I couldn't reproduce it. It is unclear why this behavior changed -- I will investigate further!

Copy link
Member

@tfogo tfogo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff Sanika! I think we need to discuss the desired behavior a bit more, but well done so far!

@sphanse99 sphanse99 closed this Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants