I get this message occasionally at seemingly random times during a notebook session. Although the kernel restarts fine, without a more interesting error message there is not a lot I can do. I am sure this was added to recover from errors without much effort from the end user, but is there somewhere I can look to see who is the serial killer that keeps murdering my beloved kernels?
The odds are that your kernel has not died, and the bug is false positive in the heartbeat code.
Can you post:
pyzmq version (zmq.__version__)
libzmq version (zmq.zmq_version())
Can you describe what you are typically doing when this happens? Are you using pylab mode?
Python version 2.7 on Windows XP 32 bit
The most repeatable way it happens is when first loading a notebook from the dashboard...like 5 seconds after I see the cells load I get the dead kernel message. If I x out the window(choose nothing), I am not able to execute code. If I restart the kernel it works fine, and if I close the notebook and open it back up it does not happen again. Yes, I am using pylab mode in general, but this happens without it enabled as well.
Try increasing the heartbeat period by setting MappingKernelManager.time_to_dead=10, either in config or at the command-line.
On first test that seems to work, I'll play around with it some more tomorrow. This computer is a bit old/slow, is the kernel manager tripping because it is taking a while?
Without injecting some debugging statements it's a bit hard to tell, but the way the heartbeat works in the notebook is this:
The kernel manager sends a ping every so often (in this case every time_to_dead in seconds), and if it doesn't receive a reply by the time it would send out the next ping, it believes the kernel is dead. The kernel remains responsive, even during GIL-holding blocking code, because its responder is actually an extremely tiny pure-C 0MQ thread.
Looking at the code, I can see two problems:
(this is most likely the cause of your issue) The heartbeat mechanism starts right away when the kernel subprocess is initiated. So if the kernel isn't up and running and responsive within time_to_dead, then it will fail to make the first heartbeat, and essentially be treated as DOA.
(unlikely issue, but still a bug) if the server is slow, it might queue up the heart-failed action while a heartbeat reply is waiting in the queue (hb_stream is not flushed, as it is in the parallel code's more elaborate heartbeat mechanism). But that would require some seriously heavy load on the server.
I was able to replicate (as far as I can tell) your issue by cutting the heartbeat time down to 300 milliseconds, so my notebook keeps seeing dead kernels. But if I delay the first heartbeat for the original 3 seconds, I once again have a perfectly responsive kernel. So the solution here is that we need to allow user-configurable delay on the first beat, and it should probably start out around 5s, which should cover any normal environment.
I added what should be a fix to my existing notebook PR #1187, if you want to try that out.
@ellisonbg - it's most likely that it's taking more than one heartbeat cycle for the kernel to start. See above comments and link for discussion and a proposed fix.
This is probably not appropriate here, but I am new to Git. Is the best way to try your code:
1)make your fork a remote
2)pull the branch that you committed to (nbShutdown I believe)
3)run ipython from inside my local folder (from the git clone)
or is there a more effective mechanism, or hit me with a link to read
Yes, that's exactly right. For more specific steps:
git remote add minrk git://github.com/minrk/ipython.git
git fetch minrk
git checkout -b nbshutdown minrk/nbshutdown
python ipython.py notebook --pylab inline --notebook-dir=/path/to/your/notebooks
And just to make sure the comparison is apt,
If I run
from any other directory it will be the original code...like from site_packages?
or I should say, it will get the code from site_packages/ipython/whatever
Yes, unless you have done a 'dev' install, either with python setupegg.py develop or using symlinks so that site-packages points to the git tree. I recommend doing one of these for any project for which you actually plan to track the development version.
python setupegg.py develop
So far I have not been able to recreate the error with the new code, so that's a good sign. I will continue to test it as I have time.
closed by PR #1187