Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter Lab stalls when choosing "Restart Kernel and Run All Cells..." #9008

Closed
DevinRouth opened this issue Sep 15, 2020 · 16 comments · Fixed by #9484
Closed

Jupyter Lab stalls when choosing "Restart Kernel and Run All Cells..." #9008

DevinRouth opened this issue Sep 15, 2020 · 16 comments · Fixed by #9484
Labels
status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion.
Milestone

Comments

@DevinRouth
Copy link

Description

I recently needed to do a full reinstall of Jupyter Lab on MacOS 10.15. I use Homebrew as my package manager, running Python 3.8.5.

Jupyter Lab installed correctly, as did the R kernel (which I use frequently). However, I noticed that whenever I choose the "Restart Kernel and Run All Cells..." option, the individual notebook within Jupyter Lab stalls until I restart the kernel. Manually selecting and running all cells runs the code without any errors or stalling.

Reproduce

I tried running the same notebook file by launching a classic Jupyter Notebook from Jupyter Lab, and the "Restart Kernel and Run All Cells..." option worked perfectly.

Expected behavior

I expected Jupyter Lab to restart the kernel and run all cells without any lag.

Context

MacOS 10.15.2
Python 3.8.5 (installed via HomeBrew)
Chrome Version 84.0.4147.135

Output of jupyter --version:

jupyter core     : 4.6.3
jupyter-notebook : 6.1.4
qtconsole        : 4.7.7
ipython          : 7.18.1
ipykernel        : 5.3.4
jupyter client   : 6.1.7
jupyter lab      : 2.2.8
nbconvert        : 6.0.2
ipywidgets       : 7.5.1
nbformat         : 5.0.7
traitlets        : 5.0.4

When I run jupyter lab --debug and attempt the "Restart Kernel and Run All Cells...", the output continuously prints:

Accepting token-authenticated connection from ::1 and a variation of either 200 GET /api/sessions?1600160701879 (::1) 1.71ms, 200 GET /api/kernels?1600160731926 (::1) 0.89ms, or 200 GET /api/terminals?1600160759346 (::1) 1.16ms.

I attempted to solve the issue by running pip3 install --upgrade ipykernel per this issue, but it did not solve the problem.

Any help would be appreciated, and thanks!

@nighcoder
Copy link

I'm having the same issue here.
Output of jupyter --version:

jupyter core     : 4.6.3
jupyter-notebook : 6.1.4
qtconsole        : 4.7.5
ipython          : 7.18.1
ipykernel        : 5.3.4
jupyter client   : 6.1.7
jupyter lab      : 2.2.7
nbconvert        : 6.0.2
ipywidgets       : 7.5.1
nbformat         : 5.0.7
traitlets        : 5.0.4

I'm having this issue most reliably with the clojupyter (clojure) kernel, but sometime I can get the same problem with the python kernel.
I did some debugging on this problem and found that jupyter lab actually restarts the kernel, but never follows up with sending the cells for execution. The cells are marked for execution, but the kernel status is actually idle (you can see it in the lower left corner). You can also check that the kernel is available and is not blocking on anything by just manually sending a cell to execution after you restart the kernel. @DevinRouth , can you reproduce the issues I'm describing?

Note: Running restart kernel from the Kernel menu and Run all cells from the Run menu work as intented.

@DevinRouth
Copy link
Author

Yes, I confirm the same behavior as @nighcoder. All cells are run when specified to run from the "Run" menu, but not from the "Kernel" menu. When selecting "Restart Kernel and Run All Cells", the cells are marked with an asterisk and the kernel remains idle.

@phish108
Copy link

phish108 commented Sep 30, 2020

I confirm this behaviour for JL 2.2.8, too. As reported "shutting down" the kernel forcefully from the running kernels and terminals-sidetab and then restarting it from the menu works. But restarting the Kernel works from neither menu option in Run or Kernel.

I run a modified r-notebook-stack as a docker container on Kubernetes.

@sdegrace
Copy link

>jupyter --version
jupyter core     : 4.6.3
jupyter-notebook : 6.1.4
qtconsole        : 4.7.7
ipython          : 7.18.1
ipykernel        : 5.3.4
jupyter client   : 6.1.7
jupyter lab      : 3.0.0rc4
nbconvert        : 6.0.7
ipywidgets       : 7.5.1
nbformat         : 5.0.7
traitlets        : 5.0.4

Same issue, though to be clear it doesn't hang the kernel - it just appears to. At any point I can select Run -> Run all Cells and it works as expected. It seems as if it gets as far as restarting the kernel and preparing the UI for a run all cells operation, then just forgets to do it.

@phish108
Copy link

phish108 commented Oct 17, 2020

@karlaspuldaro Note sure if this issue is fixed, but I think it should be backported as mentioned in #9160.

@karlaspuldaro
Copy link
Contributor

@phish108 2.2.9 release has been already published. I don't see a PR for this issue but once it's fixed, the backport can go in another patch release.

@jamesmyatt
Copy link

jamesmyatt commented Dec 1, 2020

I also have this problem using IRKernel and Jupyterlab 2.2.9, but with not IPython (IRkernel/IRkernel#671).

I can confirm that the problem is with "Restart and run all" specifically, because "Restart kernel" then "Run All" separately is OK.

jupyter core     : 4.7.0
jupyter-notebook : 6.1.5
qtconsole        : not installed
ipython          : 7.19.0
ipykernel        : 5.3.4
jupyter client   : 6.1.7
jupyter lab      : 2.2.9
nbconvert        : 6.0.7
ipywidgets       : 7.5.1
nbformat         : 5.0.8
traitlets        : 5.0.5

@SylvainCorlay
Copy link
Member

We are looking into this now with @JohanMabille @jtpio and @martinRenou.

@jtpio
Copy link
Member

jtpio commented Dec 9, 2020

xref jupyter/jupyter_client#593 for more info

@minrk
Copy link
Contributor

minrk commented Dec 16, 2020

In investigating jupyter/jupyter_client#593 I'm fairly confident that there is something amiss in the restart & run all logic in JLab 3.0rc13. On restart & run all, after the websocket connection is re-established, JupyterLab sends no messages at all. A manual click restart, then click run all works just fine. It seems like it's waiting for a condition that's never met before starting the "& run all" part.

Adding debug logging to the zmq handler shows that jupyterlab is not sending the requests after restart at all. So the server-side pub/sub issue we are working on over there doesn't seem to be the only issue, at least.

pip freeze
anyio==2.0.2
appnope==0.1.2
argon2-cffi==20.1.0
async-generator==1.10
attrs==20.3.0
Babel==2.9.0
backcall==0.2.0
bleach==3.2.1
certifi==2020.12.5
cffi==1.14.4
chardet==3.0.4
click==7.1.2
decorator==4.4.2
defusedxml==0.7.0rc1
entrypoints==0.3
idna==2.10
ipykernel==5.4.2
ipython==7.19.0
ipython-genutils==0.2.0
jedi==0.17.2
Jinja2==3.0.0a1
json5==0.9.5
jsonschema==3.2.0
jupyter-client==6.1.7
jupyter-core==4.7.0
-e git+git@github.com:jupyter/jupyter_server.git@a3a3a46b907cabd7be66c639cf161aafbf20b2e5#egg=jupyter_server
jupyterlab==3.0.0rc13
jupyterlab-pygments==0.1.2
jupyterlab-server==2.0.0rc8
MarkupSafe==2.0.0a1
mistune==0.8.4
nbclassic==0.2.5
nbclient==0.5.1
nbconvert==6.0.7
nbformat==5.0.8
nest-asyncio==1.4.3
notebook==6.1.5
packaging==20.8
pandocfilters==1.4.3
parso==0.7.1
pexpect==4.8.0
pickleshare==0.7.5
pip-tools==5.4.0
prometheus-client==0.9.0
prompt-toolkit==3.0.8
ptyprocess==0.6.0
pycparser==2.20
Pygments==2.7.3
pyparsing==3.0.0b1
pyrsistent==0.17.3
python-dateutil==2.8.1
pytz==2020.4
pyzmq==20.0.0
requests==2.25.0
Send2Trash==1.6.0b1
six==1.15.0
sniffio==1.2.0
terminado==0.9.1
testpath==0.4.4
tornado==6.1
traitlets==5.0.5
urllib3==1.26.2
wcwidth==0.2.5
webencodings==0.5.1

@minrk
Copy link
Contributor

minrk commented Dec 16, 2020

I tracked this down to the separate tracking of 'restarting' and connectionStatus here:

this.connectionStatus === 'connected' &&
this._kernelSession !== RESTARTING_KERNEL_SESSION

_sendPending() is triggered when the connection state is back to 'connected' here:

if (connectionStatus === 'connected') {
// Send pending messages, and make sure we send at least one message
// to get kernel status back.
if (this._pendingMessages.length > 0) {
this._sendPending();

but that does nothing if the _kernelSession is still RESTARTING_KERNEL_SESSION:

private _sendPending(): void {
// We check to make sure we are still connected each time. For
// example, if a websocket buffer overflows, it may close, so we should
// stop sending messages.
while (
this.connectionStatus === 'connected' &&
this._kernelSession !== RESTARTING_KERNEL_SESSION &&
this._pendingMessages.length > 0

_kernelSession is only reset upon receiving a websocket message:

this._kernelSession = reply.header.session;

so I think there are two problems:

  1. if connection state resolves to 'connected' before the _kernelSession is resolved away from RESTARTING_KERNEL_SESSION, the pending queue is never flushed (it will be flushed on reconnect). It works only if an unprompted iopub message arrives before entering the connected state.
  2. resolving the _kernelSession out of the 'restarting' state assumes an unprompted websocket message will arrive (because all prompting requests are queued if the kernel is restarting), triggering the reset of _kernelSession. It is not a safe assumption that this will happen (this related to the root of SUB sockets take time to subscribe to the IOPub channel and miss important messages jupyter/jupyter_client#593 ). ipykernel sends a starting status message, but it is quite possible to miss this while subscriptions are established.

I think this should be fixed with two changes:

  1. allow 'restarting' state to resolve after connected state, flushing the queue in both cases
  2. if the kernel info requests that are currently registered after restart are actually sent when the connection state is resolved, rather than added to _pendingMessages, then it all ought to be okay.

I believe the right logic is there to send a kernelInfoRequest on completed connection, but this must be actually sent, not added to the pending queue, which is what happens now.

@phish108
Copy link

@minrk is this the same logic as in 2.2.x releases?

@minrk
Copy link
Contributor

minrk commented Dec 16, 2020

I'm not sure, but I don't think so. I have never been able to reproduce the issue with 2.2.9, but I can 100% of the time with 3.0rc13.

@phish108
Copy link

I can reproduce this issue all the time with 2.2.8 and 2.2.9. (See my message above)

@minrk
Copy link
Contributor

minrk commented Dec 16, 2020

#9484 fixes the issue for me with jlab 3.0. I don't know what to say about 2.x, other than I cannot reproduce it except with 3.0

@jasongrout
Copy link
Contributor

jasongrout commented Dec 17, 2020

For reference, the restarting status sentinel for the kernel session id was added in #8562

@jasongrout jasongrout modified the milestones: 3.0, 3.1 Dec 17, 2020
@github-actions github-actions bot added the status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion. label Jun 17, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status:resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants