Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter notebook died #149

Closed
s041629 opened this issue Sep 19, 2016 · 61 comments
Closed

Jupyter notebook died #149

s041629 opened this issue Sep 19, 2016 · 61 comments

Comments

@s041629
Copy link

s041629 commented Sep 19, 2016

Hi,
I am trying to run one of my Jupyter notebooks, but when I execute any command (can even be 1+1), I get the message

Kernel Restarting
The kernel appears to have died. It will restart automatically.

Looking in the log from the command line, I get this message:

Error in unlockBinding("base_display", displayenv) :
no binding for "base_display"
Calls: ... -> handle_shell -> -> unlockBinding
Execution halted
[I 08:26:03.366 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel 044080e4-d152-4c4b-8bb2-46d2b4b1d9fe restarted

What should I do?

Thanks!

@tatarsky
Copy link
Collaborator

I do not know. I would make sure all your processes involved in it are dead and then restart from scratch.

@s041629
Copy link
Author

s041629 commented Sep 19, 2016

I have done it: I tried to disconnect to the server and reconnect, I tried killing all my processes (with kill -9 -1), but nothing seems to work.

I have also tried with different notebooks, but I always get the same message

@tatarsky
Copy link
Collaborator

Then when I have time I will take a look. That won't be for awhile.

@tatarsky
Copy link
Collaborator

Also you didn't include the hostname of the system you are doing this on. Please confirm where you are running the program for when I look at items. And the path and method you start the program.

@s041629
Copy link
Author

s041629 commented Sep 19, 2016

I am working on flh1 and running the notebook by launching jupyter notebook, as I have always done. Am I the only one to have this problem today?

@tatarsky
Copy link
Collaborator

Are you quite sure you've killed all instances?

I believe your username is djakubosky and UID=1043 and I show numerous ipython processes as your account:

1043     30722  0.0  0.0 733760 26276 ?        Ssl  Aug11   2:19 /frazer01/home/djakubosky/software/anaconda/bin/python -m ipykernel -f /frazer01/home/djakubosky/.local/share/jupyter/runtime/
1043     30786  0.0  0.1 2526256 300508 ?      Ssl  Aug29   1:29 /frazer01/home/djakubosky/software/anaconda/bin/python -m ipykernel -f /frazer01/home/djakubosky/.local/share/jupyter/runtime/
1043     30925  0.0  0.0 106568   464 pts/5    Ss+  Aug16   0:00 -bash
1043     30943  0.0  0.0 733760 28148 ?        Ssl  Jul13   4:18 /frazer01/home/djakubosky/software/anaconda/bin/python -m ipykernel -f /frazer01/home/djakubosky/.local/share/jupyter/runtime/
1043     32464  0.0  0.0 733760 26188 ?        Ssl  Jul27   3:24 /frazer01/home/djakubosky/software/anaconda/bin/python -m ipykernel -f /frazer01/home/djakubosky/.local/share/jupyter/runtime/

If you would like I can kill all of those but you should be able to as well.

@s041629
Copy link
Author

s041629 commented Sep 19, 2016

I am matteo and yes, I killed all jobs

@tatarsky
Copy link
Collaborator

Sorry, on fl-hn1 I do not show you killed all your jobs. The process table continues to have those items above with your user id. I am going to kill them for you.

@tatarsky
Copy link
Collaborator

Oh wait, I've got you confused with somebody else. One moment.

@tatarsky
Copy link
Collaborator

So fl-hn1 does seem to be a bit memory loaded but I'm not sure if that is your issues or not. Please ask around for others to consider killing off the NUMEROUS notebooks running on fl-hn1 if not used and we'll go from there.

I show over 163 processes with notebooks attached. Those can't all be active.

@tatarsky
Copy link
Collaborator

mdonovan and hel070 have some notebook as old as May

It would be nice to kill those off if not active.

@tatarsky
Copy link
Collaborator

So if I su to you and run the command:

$ jupyter notebook
[I 09:12:24.814 NotebookApp] Serving notebooks from local directory: /frazer01/home/matteo
[I 09:12:24.814 NotebookApp] 0 active kernels 
[I 09:12:24.814 NotebookApp] The IPython Notebook is running at: https://[all ip addresses on your system]:7757/
[I 09:12:24.814 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
^C[I 09:12:32.574 NotebookApp] interrupted
Serving notebooks from local directory: /frazer01/home/matteo
0 active kernels 
The IPython Notebook is running at: https://[all ip addresses on your system]:7757/
Shutdown this notebook server (y/[n])? y
[C 09:12:35.493 NotebookApp] Shutdown confirmed
[I 09:12:35.494 NotebookApp] Shutting down kernels

A. Is that working?

And if not, what are you seeing? Feel free to paste full output.

@s041629
Copy link
Author

s041629 commented Sep 19, 2016

this is what I get when I start, open a notebook, try to execute a command (the kernel dies) and exit:

-bash-4.1$ jupyter notebook
[I 09:14:34.275 NotebookApp] Serving notebooks from local directory: /frazer01/home/matteo
[I 09:14:34.275 NotebookApp] 0 active kernels
[I 09:14:34.275 NotebookApp] The IPython Notebook is running at: https://[all ip addresses on your system]:7757/
[I 09:14:34.275 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 09:15:11.302 NotebookApp] Kernel started: b950d6a6-b307-421c-b4ef-0549bba812ff
[1] "Got unhandled msg_type:" "comm_open"
Error in unlockBinding("base_display", displayenv) :
no binding for "base_display"
Calls: ... -> handle_shell -> -> unlockBinding
Execution halted
[I 09:15:20.303 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel b950d6a6-b307-421c-b4ef-0549bba812ff restarted
^C[I 09:15:28.590 NotebookApp] interrupted
Serving notebooks from local directory: /frazer01/home/matteo
1 active kernels
The IPython Notebook is running at: https://[all ip addresses on your system]:7757/
Shutdown this notebook server (y/[n])? ^C[C 09:15:28.884 NotebookApp] received signal 2, stopping
[I 09:15:28.885 NotebookApp] Shutting down kernels
[I 09:15:28.986 NotebookApp] Kernel shutdown: b950d6a6-b307-421c-b4ef-0549bba812ff

thanks

@tatarsky
Copy link
Collaborator

Awhile I attempt to decipher that, please go try to do the exact same thing on fl-hn2 to try to rule out if its in your environment somehow or a system resource problem (which I am suspecting at the moment)

@s041629
Copy link
Author

s041629 commented Sep 19, 2016

I get the same error message:

-bash-4.1$ jupyter notebook
[I 09:24:10.125 NotebookApp] Serving notebooks from local directory: /frazer01/home/matteo
[I 09:24:10.126 NotebookApp] 0 active kernels
[I 09:24:10.126 NotebookApp] The IPython Notebook is running at: https://[all ip addresses on your system]:7757/
[I 09:24:10.126 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 09:24:39.911 NotebookApp] 302 GET /tree (132.239.163.3) 1.71ms
[I 09:24:44.533 NotebookApp] 302 POST /login?next=%2Ftree (132.239.163.3) 2.09ms
[I 09:24:50.966 NotebookApp] Kernel started: ab1f2416-28e0-4ab6-837f-35da3161a8c3
[1] "Got unhandled msg_type:" "comm_open"
Error in unlockBinding("base_display", displayenv) :
no binding for "base_display"
Calls: ... -> handle_shell -> -> unlockBinding
Execution halted
[I 09:24:56.923 NotebookApp] KernelRestarter: restarting kernel (1/5)
WARNING:root:kernel ab1f2416-28e0-4ab6-837f-35da3161a8c3 restarted
^C[I 09:25:03.763 NotebookApp] interrupted
Serving notebooks from local directory: /frazer01/home/matteo
1 active kernels
The IPython Notebook is running at: https://[all ip addresses on your system]:7757/
Shutdown this notebook server (y/[n])? ^C[C 09:25:03.929 NotebookApp] received signal 2, stopping
[I 09:25:03.930 NotebookApp] Shutting down kernels
[I 09:25:04.032 NotebookApp] Kernel shutdown: ab1f2416-28e0-4ab6-837f-35da3161a8c3
^C
-bash-4.1$

@tatarsky
Copy link
Collaborator

Anything change in your software tree? Nothing known changed in system environment.
Googling that error message has been not overly helpful.

Please ask if others see the same thing when convenient. I am looking for possible system level causes but not finding anything concrete yet.

@s041629
Copy link
Author

s041629 commented Sep 19, 2016

I haven't changed anything since Friday (and everything was working fine then). I just arrived this morning, created a new notebook and started working on it.

@tatarsky
Copy link
Collaborator

No idea whats wrong then. If others see the same issue can consider rebooting a head node to debug further.

@tatarsky
Copy link
Collaborator

I'm going to strace the items from your account on fl-hn2. Please stand clear of that system....

@s041629
Copy link
Author

s041629 commented Sep 19, 2016

it seems to be only my problem: Hurley says his notebooks work fine

@tatarsky
Copy link
Collaborator

If you could place your notebook password into a file I can use to test I would appreciate it. Make it read only your account. Alternately, provide me a way to contact you to get it. I am stracing your python to see if I see anything odd.

@tatarsky
Copy link
Collaborator

Also, sometimes to separate problems of "the program" v.s. "the data fed to the program" I try the same item with a fresh (empty) data directory. I don't know how easy that is (sorry, I don't use notebooks) but if you could run your current notebook code against basically an empty notebook directory it would be a useful debug step.

@tatarsky
Copy link
Collaborator

Closest match on the error BTW seems to be this.
conda/conda#3316

@joreynajr
Copy link

Hey Matteo,

Just thought I would let you know that my notebook is working. Really does seem to be just in your account. Weird. If I can think of something that would fix it I'll let you know. I'm not sure right now. @djakubosky uses notebooks a ton. David, have you seen this at any point?

@s041629
Copy link
Author

s041629 commented Sep 19, 2016

Hi Paul,
I have created a file password.txt in my home.
I have tried different browsers and creating a new notebook in a completely different folder, but nothing happens.

Thank you

@tatarsky
Copy link
Collaborator

OK. I'm in as your account to a session running on fl-hn2. But I assume this still isn't what you are seeing as not working. Can you give me a simple test to reproduce your issue? I have so far "logged in" and am looking at list of your files.

@tatarsky
Copy link
Collaborator

I opened "heatmap_david" and got some nice lists of R commands a neato graph. But I'm unlikely to just click other items without your advice.

@tatarsky
Copy link
Collaborator

Oh, and I believe you could also connect to the same instance:

https://fl-hn2.ucsd.edu:7757

I am connecting with a Firefox browser.

@s041629
Copy link
Author

s041629 commented Sep 19, 2016

I tried both chrome and firefox and the problem stays. Python notebooks work fine, though, so it should be a R kernel problem. You can insert a new cell by clicking on the "+" sign and write anything ("1+1") is fine. You will see that it the kernel dies when you try to execute it. I have created a new notebook that you can use to test in notebooks/for_Paul.

@tatarsky
Copy link
Collaborator

I am out BTW. I've killed the notebook I had on fl-hn2.

@s041629
Copy link
Author

s041629 commented Sep 19, 2016

ok. I will look into the gihtub issue and then will also try to re-install the R kernel and see if this solves the problem.

Thanks

@tatarsky
Copy link
Collaborator

If I spot anything else I will let you know. Keep it open.

@tatarsky
Copy link
Collaborator

So somebody in this git issue claims its a version mismatch.

IRkernel/IRkernel#204

There is a small script a person claims on August 4th fixed it for them.

IRkernel/IRkernel#204 (comment)

@tatarsky
Copy link
Collaborator

Note also the comment down below it "unlockBinding("base_display", displayenv) isn’t part of any code we have anymore. you somehow still have an old version…"

This leans toward some kind of version mismatch again. No clue why it would suddenly appear for your code.

@s041629
Copy link
Author

s041629 commented Sep 20, 2016

so, I really don't understand why, but now it works fine.
I installed R in anaconda, then installed the notebook. At first it didn't work, then I disconnected everything and when I reconnected, at first it didn't work, but the second time I tried, I finally got 1+1 = 2.

However, I could not install the pbdZMQ package (required for the notebook, but the notebook now works fine). I really no idea why it works now

@s041629
Copy link
Author

s041629 commented Sep 20, 2016

and now it doesn't work again, with a different error:

Error in .External2(C_X11, paste("png::", filename, sep = ""), g$width, :
unable to start device PNG
Calls: ... evaluate -> dev.new -> do.call -> -> ok_device
In addition: Warning message:
In ok_device(filename, ...) : unable to open connection to X11 display ''
Execution halted

@tatarsky
Copy link
Collaborator

Why would a web application be trying to connect to a remote X11 display?

Check the value of your DISPLAY environment variable and remove it if for some reason its trying to spawn something back to your remote desk which is never going to work unless you've got X11 configured to allow such connections and an X11 server on your desktop.

@s041629
Copy link
Author

s041629 commented Sep 20, 2016

my $DISPLAY is already empty.
I also tried to unset it, but I still get the same error

@tatarsky
Copy link
Collaborator

OK. Well, somehow the PNG rendering of the server process thinks its best to do it over X11.

The latency of Git is not real conducive to debugging items like this. Lets see if we can pick a time perhaps we could watch items at the same time in the next few days. I have a few pre-scheduled meetings this week, but if you pick a time I can call or Skype or something. My work hours however at Central Time Zone (2 hours ahead of yours) so mornings better.

@s041629
Copy link
Author

s041629 commented Sep 20, 2016

Ok, let's aim for tomorrow at 11 am your time. I am out sick today.

Thanks!

@s041629
Copy link
Author

s041629 commented Sep 20, 2016

my skype ID is matteo_dantonio

@tatarsky
Copy link
Collaborator

11 my time will not work for me as I have a 10:30 switch project with another group at UCSD. I can Skype message when that is done....

@s041629
Copy link
Author

s041629 commented Sep 20, 2016

that would be perfect. Thanks!

@tatarsky
Copy link
Collaborator

Sorry today is not likely to happen. Problems with project mentioned continue. I will Git update when done but probably means tomorrow.

@s041629
Copy link
Author

s041629 commented Sep 21, 2016

ok

@tatarsky
Copy link
Collaborator

You will see a Skype request from devnull22563 in a moment. That is me.

@tatarsky
Copy link
Collaborator

Also I've reproduced your item but I believe it to be related to this Git issue for your IRkernel settings. I would try these items mentioned:

IRkernel/IRkernel#388

Down at the bottom is a user setting that appears to say "don't try X11 PNG items".

In that case it would be your config, not root as you run these as your own PID.

I could attempt that but will wait for you to try it so I don't mess up other notebooks.

@s041629
Copy link
Author

s041629 commented Sep 22, 2016

I'm working on it

@tatarsky
Copy link
Collaborator

Sounds good. Am around (eating lunch but around) if a shared screen helps. I will stand clear.

@s041629
Copy link
Author

s041629 commented Sep 22, 2016

I followed the suggestions in the link you sent and I added options(device = "svg") to my .Rprofile.

Now I get a different error, which seems to be related to cairo:

Error in dev.control(displaylist = "enable") :
dev.control() called without an open graphics device
Calls: ... tryCatch -> tryCatchList -> evaluate -> dev.control
In addition: Warning messages:
1: In (function (filename = if (onefile) "Rplots.svg" else "Rplot%03d.svg", :
unable to load shared object '/frazer01/home/matteo/software/anaconda3/lib/R/library/grDevices/libs//cairo.so':
/frazer01/home/matteo/software/anaconda3/lib/R/library/grDevices/libs//cairo.so: cannot open shared object file: No such file or directory
2: In (function (filename = if (onefile) "Rplots.svg" else "Rplot%03d.svg", :
failed to load cairo DLL
Execution halted

I have found this solution: http://stackoverflow.com/questions/13793763/error-while-using-cairo-devices-in-r-on-ubuntu but it requires sudo

@s041629
Copy link
Author

s041629 commented Sep 22, 2016

but apparently, if I set options(device = "pdf") everything works fine. I tried also to make a plot and it works

@tatarsky
Copy link
Collaborator

We don't run Ubuntu. And that would be even if we did a system R library and you are running software you've built yourself.

Can you add cairo support carefully to your conda tree?

Saw second comment. Hooray?

@s041629
Copy link
Author

s041629 commented Sep 22, 2016

yeah! I would say we are happy. Although I still don't understand what changed between Friday night and Monday morning to start getting this error

@tatarsky
Copy link
Collaborator

No changes to system software occurred (or if it did, I didn't do it and yum didn't log anything...I don't believe anyone involved in this environment would modify items outside of yum)

But the fact you have a self-contained Anaconda world there might be worth looking at the concept of conda environments to control versions and save off configs.

Alternately I have seen people once they get everything "clean" in their personal Anaconda tree tar'ing it up periodically as a backup.

Close if desired or can dig around your conda environment. I believe there are logs.

@tatarsky
Copy link
Collaborator

Only thing I see in I believe the "history" file is the previously mentioned zeromq update on the morning of the 19th. It did update I believe a ton of stuff to do that update...but I believe that is after you had issues, not before.

@s041629
Copy link
Author

s041629 commented Sep 22, 2016

that was probably when I tried to update the packages

@s041629
Copy link
Author

s041629 commented Sep 22, 2016

in the end, I deleted anaconda, I reinstalled it (with python 2.7), reinstalled the R kernel, and now everything works fine

@s041629
Copy link
Author

s041629 commented Sep 22, 2016

with this, I think we can close this issue

@s041629 s041629 closed this as completed Sep 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants