exit notebook cleanly on SIGINT, SIGTERM #1609

Merged
merged 3 commits into from Apr 16, 2012

Conversation

Projects
None yet
2 participants
@minrk
Member

minrk commented Apr 15, 2012

makes it a bit more likely security files, etc. will be cleaned up.

We already handled SIGINT, but now SIGTERM results in clean exit as well.

Closes #1601

exit notebook cleanly on SIGINT, SIGTERM
makes it a bit more likely security files, etc. will be cleaned up.
@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 16, 2012

Member

Actually, I've been wondering, given that currently the only way to stop the notebook server is to use Ctrl-C, if we shouldn't at least do a 'are you sure you wan to exit? [Y/n]' kind of thing on SIGINT... Since accidentally killing the nb server right now can potentially kill multiple computations for a user, a minimal check might not be a bad idea. Thoughts?

Member

fperez commented Apr 16, 2012

Actually, I've been wondering, given that currently the only way to stop the notebook server is to use Ctrl-C, if we shouldn't at least do a 'are you sure you wan to exit? [Y/n]' kind of thing on SIGINT... Since accidentally killing the nb server right now can potentially kill multiple computations for a user, a minimal check might not be a bad idea. Thoughts?

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 16, 2012

Member

That makes good sense, though there is one problem: asking y/n blocks, so while it is waiting for the response, the notebook server is actually unresponsive. If you do kill -INT <nbserver>, you wouldn't cause a shutdown, but you would render the notebook useless, unless you can get back to the shell where the notebook was started (may not exist anymore). Any idea to get around that?

Member

minrk commented Apr 16, 2012

That makes good sense, though there is one problem: asking y/n blocks, so while it is waiting for the response, the notebook server is actually unresponsive. If you do kill -INT <nbserver>, you wouldn't cause a shutdown, but you would render the notebook useless, unless you can get back to the shell where the notebook was started (may not exist anymore). Any idea to get around that?

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 16, 2012

Member

Mmh, we could put the asker in a thread that dies with a timeout of say 5 seconds. The user can then either reply, or send a second SIGINT to force a kill, but if 5 seconds go by, the server will assume it was an accidental SIGINT and resume. How does that sound?

Member

fperez commented Apr 16, 2012

Mmh, we could put the asker in a thread that dies with a timeout of say 5 seconds. The user can then either reply, or send a second SIGINT to force a kill, but if 5 seconds go by, the server will assume it was an accidental SIGINT and resume. How does that sound?

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 16, 2012

Member

that could work. So the SIGINT handler would do:

  1. register secondary SIGINT handler which causes immediate exit
  2. spawns thread with ask y/n
    • if y: exit immediately
    • if n or timeout: restore original ask y/n handler

yes?

I forget how to do timeouts / interrupt a thread.

Member

minrk commented Apr 16, 2012

that could work. So the SIGINT handler would do:

  1. register secondary SIGINT handler which causes immediate exit
  2. spawns thread with ask y/n
    • if y: exit immediately
    • if n or timeout: restore original ask y/n handler

yes?

I forget how to do timeouts / interrupt a thread.

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 16, 2012

Member

Yup. I don't remember the timeout api either, let me know if you want me to tackle this one and I'll look into it. Otherwise, I'll continue writing the new magics stuff (I finally got to it after my blitz on PRs and issues :)

Member

fperez commented Apr 16, 2012

Yup. I don't remember the timeout api either, let me know if you want me to tackle this one and I'll look into it. Otherwise, I'll continue writing the new magics stuff (I finally got to it after my blitz on PRs and issues :)

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 16, 2012

Member

By all means, work on magics. I'll peek at this one, and see if I can't get close enough.

Member

minrk commented Apr 16, 2012

By all means, work on magics. I'll peek at this one, and see if I can't get close enough.

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 16, 2012

Member

Got it, thx. I'm on IRC too if you need a quick ping on anything.

Member

fperez commented Apr 16, 2012

Got it, thx. I'm on IRC too if you need a quick ping on anything.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 16, 2012

Member

confirmation dialog added on SIGINT.

  • confirmation happens in background thread, to avoid blocking app thread
  • confirmation times out after 5s before resuming
  • ^C^C counts as confirmation
Member

minrk commented Apr 16, 2012

confirmation dialog added on SIGINT.

  • confirmation happens in background thread, to avoid blocking app thread
  • confirmation times out after 5s before resuming
  • ^C^C counts as confirmation
@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 16, 2012

Member

This looks great, glad you found a non-thread solution! (and I learned how to do that with select today :) Let me know if it was a typo and we'll wrap it up. Awesome job.

Member

fperez commented Apr 16, 2012

This looks great, glad you found a non-thread solution! (and I learned how to do that with select today :) Let me know if it was a typo and we'll wrap it up. Awesome job.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 16, 2012

Member

Yes, sorry - the 2 was because I didn't want to wait 5 seconds while testing. It should be back to 5 now.

It is a threaded solution, but by using select instead of raw_input, I could have a timeout without having to do any interrupting or communication between threads - it just always runs to completion, in at most 5s.

Member

minrk commented Apr 16, 2012

Yes, sorry - the 2 was because I didn't want to wait 5 seconds while testing. It should be back to 5 now.

It is a threaded solution, but by using select instead of raw_input, I could have a timeout without having to do any interrupting or communication between threads - it just always runs to completion, in at most 5s.

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 16, 2012

Member

Thanks for the quick fix. Never mind the thread comment, I didn't see the threading code further up :) Code looks good, but on testing I got this:

[NotebookApp] Use Control-C to stop this server and shut down all kernels.
^CShutdown Notebook Server (y/[n])? WARNING:root:Interrupted system call
Traceback (most recent call last):
  File "/home/fperez/usr/opt/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 293, in start
    event_pairs = self._impl.poll(1000*poll_timeout)
  File "poll.pyx", line 189, in zmq.core.poll.Poller.poll (zmq/core/poll.c:2215)
  File "poll.pyx", line 101, in zmq.core.poll._poll (zmq/core/poll.c:1460)
ZMQError: Interrupted system call
no answer for 5s resuming

Any idea why? It seems to work fine, but the ZMQ errors on-screen are a bit nasty.

ps - I'd adjust the message to: "No answer for 5s: resuming operation...", a bit more readable.

Member

fperez commented Apr 16, 2012

Thanks for the quick fix. Never mind the thread comment, I didn't see the threading code further up :) Code looks good, but on testing I got this:

[NotebookApp] Use Control-C to stop this server and shut down all kernels.
^CShutdown Notebook Server (y/[n])? WARNING:root:Interrupted system call
Traceback (most recent call last):
  File "/home/fperez/usr/opt/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 293, in start
    event_pairs = self._impl.poll(1000*poll_timeout)
  File "poll.pyx", line 189, in zmq.core.poll.Poller.poll (zmq/core/poll.c:2215)
  File "poll.pyx", line 101, in zmq.core.poll._poll (zmq/core/poll.c:1460)
ZMQError: Interrupted system call
no answer for 5s resuming

Any idea why? It seems to work fine, but the ZMQ errors on-screen are a bit nasty.

ps - I'd adjust the message to: "No answer for 5s: resuming operation...", a bit more readable.

confirm notebook shutdown on SIGINT
confirmation in bg thread, to avoid blocking
5s timeout before restoring original state if no response

^C^C == confirmation
@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 16, 2012

Member

Thanks, message updated.

The interrupt message you mentioned is inherited from tornado, which logged interrupts. I discovered that it's actually worse than you mentioned, as ^C actually goes unhandled, and the IOLoop raises if you have pyzmq ≤ 2.1.7, invoking the crash-handler. This one I can fix, though the prompt never shows up, and IPython just exits on ^C with old pyzmq.

In 2.1.9, you will get the message as you see it, but Tornado changed their behavior to be quiet about it (and pyzmq followed suit), so there is no such message in 2.1.11 and above.

If we want to suppress this message, we would have to silence the root logger used by tornado/the IOLoop, or require/recommend pyzmq ≥ 2.1.11.

Member

minrk commented Apr 16, 2012

Thanks, message updated.

The interrupt message you mentioned is inherited from tornado, which logged interrupts. I discovered that it's actually worse than you mentioned, as ^C actually goes unhandled, and the IOLoop raises if you have pyzmq ≤ 2.1.7, invoking the crash-handler. This one I can fix, though the prompt never shows up, and IPython just exits on ^C with old pyzmq.

In 2.1.9, you will get the message as you see it, but Tornado changed their behavior to be quiet about it (and pyzmq followed suit), so there is no such message in 2.1.11 and above.

If we want to suppress this message, we would have to silence the root logger used by tornado/the IOLoop, or require/recommend pyzmq ≥ 2.1.11.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 16, 2012

Member

Another alternative could be to only enable this confirmation dialog when using pyzmq ≥ 2.1.11, where it works without any ugliness.

Member

minrk commented Apr 16, 2012

Another alternative could be to only enable this confirmation dialog when using pyzmq ≥ 2.1.11, where it works without any ugliness.

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 16, 2012

Member

Nah, I'd say just apply the fix you can to prevent the crash handler from kicking in, and we'll leave it at that. The message is a bit scary looking but things actually work fine in practice, and as people gradually update the relevance of this will quickly die down, so let's not bog our own code with unnecessary special-casing.

Member

fperez commented Apr 16, 2012

Nah, I'd say just apply the fix you can to prevent the crash handler from kicking in, and we'll leave it at that. The message is a bit scary looking but things actually work fine in practice, and as people gradually update the relevance of this will quickly die down, so let's not bog our own code with unnecessary special-casing.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 16, 2012

Member

hm, the log message tends to print after the prompt, so it's pretty bad on 2.1.9. Should I add a tiny delay to avoid this, or make 2.1.11 the minimum version for the confirmation?

Member

minrk commented Apr 16, 2012

hm, the log message tends to print after the prompt, so it's pretty bad on 2.1.9. Should I add a tiny delay to avoid this, or make 2.1.11 the minimum version for the confirmation?

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 16, 2012

Member

Since this is going for human response, a small delay is fine IMO.

Member

fperez commented Apr 16, 2012

Since this is going for human response, a small delay is fine IMO.

handle old pyzmq in notebook exit confirmation
* 2.1.7 (earliest dep) will simply not work, so skip it
* 2.1.9-10 log the interrupts, so add a short delay to ensure the log
  messages don't come after the prompt.
@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 16, 2012

Member

okay, short delay added - 2.1.7 will get no dialog, 2.1.9-10 will get the log messages, and 2.1.11 will be peachy.

Member

minrk commented Apr 16, 2012

okay, short delay added - 2.1.7 will get no dialog, 2.1.9-10 will get the log messages, and 2.1.11 will be peachy.

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 16, 2012

Member

Great, merging now (had been reviewed and tested).

Member

fperez commented Apr 16, 2012

Great, merging now (had been reviewed and tested).

fperez added a commit that referenced this pull request Apr 16, 2012

Merge pull request #1609 from minrk/sigterm
exit notebook cleanly on SIGINT, SIGTERM, and add a safety (text) dialog to ask for exit confirmation.  This prevents an accidental Ctrl-C in the wrong window from destroying a user's potentially large set of notebooks that could have taken a long time to create.

Closes #1601

@fperez fperez merged commit 4f9b234 into ipython:master Apr 16, 2012

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

Merge pull request #1609 from minrk/sigterm
exit notebook cleanly on SIGINT, SIGTERM, and add a safety (text) dialog to ask for exit confirmation.  This prevents an accidental Ctrl-C in the wrong window from destroying a user's potentially large set of notebooks that could have taken a long time to create.

Closes #1601

@stevengj stevengj referenced this pull request in JuliaLang/IJulia.jl Mar 3, 2015

Closed

killing notebook is unreliable #270

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment