Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

misc notebook: connection file cleanup, first heartbeat, startup flush #1187

Merged
merged 6 commits into from

2 participants

@minrk
Owner

Kernels would not linger, but the KernelManagers are not garbage-collected on shutdown. This means that connection files for kernels still running at notebook shutdown would not be removed.

Now, kernels are explicitly killed at server shutdown, allowing the KernelManagers to cleanup files.

Small changes along the way:

  • disables the unnecessary (and actively detrimental) SIGINT handler inherited from the original copy/paste from the qt app.
  • put webapp initialization in init_webapp out of initialize, to preserve convention of there being no unique code in initialize().
  • don't warn about http on all interfaces if running in 100% read-only mode, because no login or execution is possible.

(Yay, my first PR for 0.13!)

minrk added some commits
@minrk minrk cleanup connection files on notebook shutdown
Kernels would not linger, but the KernelManagers are not garbage-collected on shutdown.
This means that connection files for kernels still running at notebook shutdown would not be removed.

Also disable the unnecessary (and actively unhelpful) SIGINT handler inherited from the original
copy/paste from the qt app.
1e91756
@minrk minrk flush stdout/err after init_code
prevents startup script/file output from being attached to the first cell of a frontend
8d7b393
@minrk
Owner

includes stdout/err flush fix mentioned in #1191

@minrk minrk add first_beat delay to notebook heartbeats
Heartbeats start immediately, causing false heart failures on slow systems that can take a while to start kernel subprocesses.

Also adds a 'flush' to the heartbeat callback (just like in IPython.parallel), to protect against server load being detected as heart failures.
45a89f9
@minrk
Owner

include heartbeat delay mentioned in #1198.

I still don't like the way heartbeats are done in either the notebook or the base KernelManager, and the base KernelManager also really must be configurable. But that's for another time.

@fperez
Owner

Mmh, I'm gettting these messages in the starting consoles:

ERROR:root:Error in periodic callback
Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/zmq/eventloop/ioloop.py", line 432, in _run
    self.callback()
  File "/home/fperez/usr/lib/python2.7/site-packages/IPython/frontend/html/notebook/handlers.py", line 451, in ping_or_dead
    self.hb_stream.flush()
  File "/usr/lib/python2.6/dist-packages/zmq/eventloop/zmqstream.py", line 282, in flush
    self._check_closed()
  File "/usr/lib/python2.6/dist-packages/zmq/eventloop/zmqstream.py", line 458, in _check_closed
    raise IOError("Stream is closed")
IOError: Stream is closed
@minrk
Owner

hm, any earlier info?

That's always a side effect of a previous error.

I'll look into it. PyZMQ version?

@fperez fperez referenced this pull request
Closed

Dead kernel loop #1232

@fperez
Owner

No, prior to that it's all normal output. It seems to only happen on FF, but it doesn't happen always. I can't seem to get what triggers it. I'm using pyzmq 2.1.9, the system one from ubuntu 11.10 (64bit).

@minrk
Owner

Ah, I know what this is. Try opening a new notebook, then closing the tab within 5 seconds. You should see it again. I'll push a fix shortly.

IPython/frontend/html/notebook/notebookapp.py
((4 lines not shown))
+
+ @catch_config_error
+ def initialize(self, argv=None):
+ super(NotebookApp, self).initialize(argv)
+ self.init_configurables()
+ self.init_webapp()
+
+ def cleanup_kernels(self):
+ """shutdown all kernels
+
+ The kernels will shutdown themselves when this process no longer exists,
+ but explicit shutdown allows the KernelManagers to cleanup the connection files.
+ """
+ self.log.info('Shutting down kernels')
+ km = self.kernel_manager
+ while km.kernel_ids:
@fperez Owner
fperez added a note

I'm wondering if there should be a safety exit here. While loops whose exit condition might not be triggered are dangerous. If for some reason the kernel_ids list doesn't fully empty out, this guy will hang forever.

Perhaps this logic would be cleaner as

for i in range(3): # try no more than 3 times
  for k in km.kernel_ids:
    km.kill_kernel(k)
  if not km.kernel_ids:
    break
else:
  self.log.warn('Unkillable kernels...')

What do you think?

@minrk Owner
minrk added a note

km.kill_kernel is the explicit method for deleting kernel ids, and cannot fail to do so without actually raising and exiting the loop. Your proposal as it is will not work because kill_kernel changes km.kernel_ids, so a copy would have to be made. But if I do make the copy, I don't think there's any need to try multiple times:

for k in list(km.kernel_ids):
    km.kill_kernel(k)

should do just fine.

@fperez Owner
fperez added a note

Ah, thanks for the clarification. Then yes, I think that's a cleaner-looking code; if nothing else it's obvious that it can't get stuck infinitely in a while loop even for someone who doesn't know how the kill_kernel function behaves internally.

Other than this, I think it's good to go. I checked the behavior and read the rest of the code and see no other issues. Thanks!

@minrk Owner
minrk added a note

Makes sense. Change above has been pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@minrk minrk explicit for-loop in cleanup_kernels
makes single-iteration clearer than while loop, as reviewed by @fperez.
0f2a7ee
@fperez
Owner

I'm glad the function is called kill_kernel and not kill, otherwise that last line would read pretty darkly ;) Thanks!

Owner

Indeed it would. I hadn't seen that.

@fperez fperez merged commit e73fe99 into ipython:master
@fperez fperez referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Dec 20, 2011
  1. @minrk

    cleanup connection files on notebook shutdown

    minrk authored
    Kernels would not linger, but the KernelManagers are not garbage-collected on shutdown.
    This means that connection files for kernels still running at notebook shutdown would not be removed.
    
    Also disable the unnecessary (and actively unhelpful) SIGINT handler inherited from the original
    copy/paste from the qt app.
Commits on Dec 21, 2011
  1. @minrk

    flush stdout/err after init_code

    minrk authored
    prevents startup script/file output from being attached to the first cell of a frontend
Commits on Dec 23, 2011
  1. @minrk

    add first_beat delay to notebook heartbeats

    minrk authored
    Heartbeats start immediately, causing false heart failures on slow systems that can take a while to start kernel subprocesses.
    
    Also adds a 'flush' to the heartbeat callback (just like in IPython.parallel), to protect against server load being detected as heart failures.
Commits on Jan 5, 2012
  1. @minrk
Commits on Jan 6, 2012
  1. @minrk
  2. @minrk

    explicit for-loop in cleanup_kernels

    minrk authored
    makes single-iteration clearer than while loop, as reviewed by @fperez.
This page is out of date. Refresh to see the latest.
View
4 IPython/core/shellapp.py
@@ -184,6 +184,10 @@ def init_code(self):
self._run_exec_files()
self._run_cmd_line_code()
+ # flush output, so itwon't be attached to the first cell
+ sys.stdout.flush()
+ sys.stderr.flush()
+
# Hide variables defined here from %who etc.
self.shell.user_ns_hidden.update(self.shell.user_ns)
View
20 IPython/frontend/html/notebook/handlers.py
@@ -18,6 +18,7 @@
import logging
import Cookie
+import time
import uuid
from tornado import web
@@ -412,6 +413,7 @@ def on_first_message(self, msg):
return
km = self.application.kernel_manager
self.time_to_dead = km.time_to_dead
+ self.first_beat = km.first_beat
kernel_id = self.kernel_id
try:
self.iopub_stream = km.create_iopub_stream(kernel_id)
@@ -446,6 +448,7 @@ def start_hb(self, callback):
self._kernel_alive = True
def ping_or_dead():
+ self.hb_stream.flush()
if self._kernel_alive:
self._kernel_alive = False
self.hb_stream.send(b'ping')
@@ -455,25 +458,36 @@ def ping_or_dead():
except:
pass
finally:
- self._hb_periodic_callback.stop()
+ self.stop_hb()
def beat_received(msg):
self._kernel_alive = True
self.hb_stream.on_recv(beat_received)
- self._hb_periodic_callback = ioloop.PeriodicCallback(ping_or_dead, self.time_to_dead*1000)
- self._hb_periodic_callback.start()
+ loop = ioloop.IOLoop.instance()
+ self._hb_periodic_callback = ioloop.PeriodicCallback(ping_or_dead, self.time_to_dead*1000, loop)
+ loop.add_timeout(time.time()+self.first_beat, self._really_start_hb)
self._beating= True
+
+ def _really_start_hb(self):
+ """callback for delayed heartbeat start
+
+ Only start the hb loop if we haven't been closed during the wait.
+ """
+ if self._beating and not self.hb_stream.closed():
+ self._hb_periodic_callback.start()
def stop_hb(self):
"""Stop the heartbeating and cancel all related callbacks."""
if self._beating:
+ self._beating = False
self._hb_periodic_callback.stop()
if not self.hb_stream.closed():
self.hb_stream.on_recv(None)
def kernel_died(self):
self.application.kernel_manager.delete_mapping_for_kernel(self.kernel_id)
+ self.application.log.error("Kernel %s failed to respond to heartbeat", self.kernel_id)
self.write_message(
{'header': {'msg_type': 'status'},
'parent_header': {},
View
3  IPython/frontend/html/notebook/kernelmanager.py
@@ -195,7 +195,10 @@ class MappingKernelManager(MultiKernelManager):
kernel_argv = List(Unicode)
kernel_manager = Instance(KernelManager)
+
time_to_dead = Float(3.0, config=True, help="""Kernel heartbeat interval in seconds.""")
+ first_beat = Float(5.0, config=True, help="Delay (in seconds) before sending first heartbeat.")
+
max_msg_size = Integer(65536, config=True, help="""
The max raw message size accepted from the browser
over a WebSocket connection.
View
42 IPython/frontend/html/notebook/notebookapp.py
@@ -303,9 +303,6 @@ def parse_command_line(self, argv=None):
self.kernel_argv.append("--KernelApp.parent_appname='%s'"%self.name)
def init_configurables(self):
- # Don't let Qt or ZMQ swallow KeyboardInterupts.
- signal.signal(signal.SIGINT, signal.SIG_DFL)
-
# force Session default to be secure
default_secure(self.config)
# Create a KernelManager and start a kernel.
@@ -322,11 +319,9 @@ def init_logging(self):
# self.log is a child of. The logging module dipatches log messages to a log
# and all of its ancenstors until propagate is set to False.
self.log.propagate = False
-
- @catch_config_error
- def initialize(self, argv=None):
- super(NotebookApp, self).initialize(argv)
- self.init_configurables()
+
+ def init_webapp(self):
+ """initialize tornado webapp and httpserver"""
self.web_app = NotebookWebApplication(
self, self.kernel_manager, self.notebook_manager, self.log,
self.webapp_settings
@@ -339,7 +334,7 @@ def initialize(self, argv=None):
ssl_options = None
self.web_app.password = self.password
self.http_server = httpserver.HTTPServer(self.web_app, ssl_options=ssl_options)
- if ssl_options is None and not self.ip:
+ if ssl_options is None and not self.ip and not (self.read_only and not self.password):
self.log.critical('WARNING: the notebook server is listening on all IP addresses '
'but not using any encryption or authentication. This is highly '
'insecure and not recommended.')
@@ -357,6 +352,24 @@ def initialize(self, argv=None):
else:
self.port = port
break
+
+ @catch_config_error
+ def initialize(self, argv=None):
+ super(NotebookApp, self).initialize(argv)
+ self.init_configurables()
+ self.init_webapp()
+
+ def cleanup_kernels(self):
+ """shutdown all kernels
+
+ The kernels will shutdown themselves when this process no longer exists,
+ but explicit shutdown allows the KernelManagers to cleanup the connection files.
+ """
+ self.log.info('Shutting down kernels')
+ km = self.kernel_manager
+ # copy list, since kill_kernel deletes keys
+ for kid in list(km.kernel_ids):
+ km.kill_kernel(kid)
def start(self):
ip = self.ip if self.ip else '[all ip addresses on your system]'
@@ -371,15 +384,20 @@ def start(self):
b = lambda : webbrowser.open("%s://%s:%i" % (proto, ip, self.port),
new=2)
threading.Thread(target=b).start()
-
- ioloop.IOLoop.instance().start()
+ try:
+ ioloop.IOLoop.instance().start()
+ except KeyboardInterrupt:
+ info("Interrupted...")
+ finally:
+ self.cleanup_kernels()
+
#-----------------------------------------------------------------------------
# Main entry point
#-----------------------------------------------------------------------------
def launch_new_instance():
- app = NotebookApp()
+ app = NotebookApp.instance()
app.initialize()
app.start()
Something went wrong with that request. Please try again.