Kernel Has Died error in Notebook #1198

anderwm · 2011-12-22T20:29:45Z

I get this message occasionally at seemingly random times during a notebook session. Although the kernel restarts fine, without a more interesting error message there is not a lot I can do. I am sure this was added to recover from errors without much effort from the end user, but is there somewhere I can look to see who is the serial killer that keeps murdering my beloved kernels?

minrk · 2011-12-23T03:32:31Z

The odds are that your kernel has not died, and the bug is false positive in the heartbeat code.

Can you post:

pyzmq version (zmq.__version__)
libzmq version (zmq.zmq_version())
Python version
OS

Can you describe what you are typically doing when this happens? Are you using pylab mode?

anderwm · 2011-12-23T04:00:50Z

zmq.__version__
Out[6]: '2.1.11'

zmq.zmq_version()
Out[7]: '2.1.11'

Python version 2.7 on Windows XP 32 bit

The most repeatable way it happens is when first loading a notebook from the dashboard...like 5 seconds after I see the cells load I get the dead kernel message. If I x out the window(choose nothing), I am not able to execute code. If I restart the kernel it works fine, and if I close the notebook and open it back up it does not happen again. Yes, I am using pylab mode in general, but this happens without it enabled as well.

minrk · 2011-12-23T04:24:49Z

Okay, thanks.

Try increasing the heartbeat period by setting MappingKernelManager.time_to_dead=10, either in config or at the command-line.

anderwm · 2011-12-23T04:59:52Z

On first test that seems to work, I'll play around with it some more tomorrow. This computer is a bit old/slow, is the kernel manager tripping because it is taking a while?

minrk · 2011-12-23T06:08:57Z

Without injecting some debugging statements it's a bit hard to tell, but the way the heartbeat works in the notebook is this:

The kernel manager sends a ping every so often (in this case every time_to_dead in seconds), and if it doesn't receive a reply by the time it would send out the next ping, it believes the kernel is dead. The kernel remains responsive, even during GIL-holding blocking code, because its responder is actually an extremely tiny pure-C 0MQ thread.

Looking at the code, I can see two problems:

(this is most likely the cause of your issue) The heartbeat mechanism starts right away when the kernel subprocess is initiated. So if the kernel isn't up and running and responsive within time_to_dead, then it will fail to make the first heartbeat, and essentially be treated as DOA.
(unlikely issue, but still a bug) if the server is slow, it might queue up the heart-failed action while a heartbeat reply is waiting in the queue (hb_stream is not flushed, as it is in the parallel code's more elaborate heartbeat mechanism). But that would require some seriously heavy load on the server.

I was able to replicate (as far as I can tell) your issue by cutting the heartbeat time down to 300 milliseconds, so my notebook keeps seeing dead kernels. But if I delay the first heartbeat for the original 3 seconds, I once again have a perfectly responsive kernel. So the solution here is that we need to allow user-configurable delay on the first beat, and it should probably start out around 5s, which should cover any normal environment.

minrk · 2011-12-23T06:50:05Z

I added what should be a fix to my existing notebook PR #1187, if you want to try that out.

ellisonbg · 2011-12-23T07:00:42Z

I too see this occasionally but have no idea of what is causing it.
What version of pyzmq are you using?

Cheers,

Brian

On Thu, Dec 22, 2011 at 12:29 PM, anderwm
reply@reply.github.com
wrote:

I get this message occasionally at seemingly random times during a notebook session. Although the kernel restarts fine, without a more interesting error message there is not a lot I can do. I am sure this was added to recover from errors without much effort from the end user, but is there somewhere I can look to see who is the serial killer that keeps murdering my beloved kernels?

Reply to this email directly or view it on GitHub:
#1198

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

minrk · 2011-12-23T17:44:21Z

@ellisonbg - it's most likely that it's taking more than one heartbeat cycle for the kernel to start. See above comments and link for discussion and a proposed fix.

anderwm · 2011-12-23T18:16:43Z

This is probably not appropriate here, but I am new to Git. Is the best way to try your code:

1)make your fork a remote
2)pull the branch that you committed to (nbShutdown I believe)
3)run ipython from inside my local folder (from the git clone)

or is there a more effective mechanism, or hit me with a link to read

minrk · 2011-12-23T18:21:08Z

Yes, that's exactly right. For more specific steps:

git remote add minrk git://github.com/minrk/ipython.git
git fetch minrk
git checkout -b nbshutdown minrk/nbshutdown
python ipython.py notebook --pylab inline --notebook-dir=/path/to/your/notebooks

anderwm · 2011-12-23T18:25:15Z

And just to make sure the comparison is apt,

If I run

ipython notebook

from any other directory it will be the original code...like from site_packages?

or I should say, it will get the code from site_packages/ipython/whatever

minrk · 2011-12-23T18:30:44Z

Yes, unless you have done a 'dev' install, either with python setupegg.py develop or using symlinks so that site-packages points to the git tree. I recommend doing one of these for any project for which you actually plan to track the development version.

anderwm · 2011-12-23T23:48:17Z

So far I have not been able to recreate the error with the new code, so that's a good sign. I will continue to test it as I have time.

minrk · 2012-01-06T10:06:48Z

closed by PR #1187

minrk mentioned this issue Dec 23, 2011

misc notebook: connection file cleanup, first heartbeat, startup flush #1187

Merged

minrk closed this as completed Jan 6, 2012

helgammal mentioned this issue Feb 25, 2015

IPython notebook in tutorials report that the 'kernel died' BIDData/BIDMach#31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel Has Died error in Notebook #1198

Kernel Has Died error in Notebook #1198

anderwm commented Dec 22, 2011

minrk commented Dec 23, 2011

anderwm commented Dec 23, 2011

minrk commented Dec 23, 2011

anderwm commented Dec 23, 2011

minrk commented Dec 23, 2011

minrk commented Dec 23, 2011

ellisonbg commented Dec 23, 2011

minrk commented Dec 23, 2011

anderwm commented Dec 23, 2011

minrk commented Dec 23, 2011

anderwm commented Dec 23, 2011

minrk commented Dec 23, 2011

anderwm commented Dec 23, 2011

minrk commented Jan 6, 2012

Kernel Has Died error in Notebook #1198

Kernel Has Died error in Notebook #1198

Comments

anderwm commented Dec 22, 2011

minrk commented Dec 23, 2011

anderwm commented Dec 23, 2011

minrk commented Dec 23, 2011

anderwm commented Dec 23, 2011

minrk commented Dec 23, 2011

minrk commented Dec 23, 2011

ellisonbg commented Dec 23, 2011

minrk commented Dec 23, 2011

anderwm commented Dec 23, 2011

minrk commented Dec 23, 2011

anderwm commented Dec 23, 2011

minrk commented Dec 23, 2011

anderwm commented Dec 23, 2011

minrk commented Jan 6, 2012