Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix signal handler to have stable shutdown #523

Merged
merged 3 commits into from Jun 24, 2016
Merged

Fix signal handler to have stable shutdown #523

merged 3 commits into from Jun 24, 2016

Conversation

coolo
Copy link
Contributor

@coolo coolo commented Jun 21, 2016

Do not call stop_vm through JSON but just kill the backend process
and leave the shutdown to itself

Do not call stop_vm through JSON but just kill the backend process
and leave the shutdown to itself
my $sig = shift;
diag("signalhandler $$: got $sig");
print STDERR "$$: signalhandler got $sig\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not diag? maybe duplicate these messages to both log and system journal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just my debugging - diag is fine. And no, diag can't write in the system journal - the worker catches all output of isotovideo into autoinst-log.txt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I am saying, diag is writing to autoinst-log.txt but "print STDERR" might print to syslog so could do

sub super_important_log {
    my (@args) = @_;
    diag(@args);
    print STDERR "@args\n";
}
…
super_important_log("signalhandler $$: got $sig");

@coolo
Copy link
Contributor Author

coolo commented Jun 21, 2016

Added tidy and diag

@okurz
Copy link
Member

okurz commented Jun 24, 2016

As I could just observe my s390x worker to fool me, it can't reconnect, i investigated and found a stale worker process to be the culprit. So I will try now your change. Additionally, what do you think about:

  • If SIGINT is called a second time or more, try harder to kill ourselves and children
  • Spawn an internal "watchdog" on termination and kill children and parent process after a grace period if not shut down already, i.e. KILL after 30s
  • If a connection can not be established to e.g. vnc or something, die hard. I see error messages like "Connection refused" but nobody cares

I did not observe problems when ending the process with SIGQUIT

@coolo coolo merged commit fe19b00 into master Jun 24, 2016
@coolo coolo deleted the fix_shutdown_ng branch June 24, 2016 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants