Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jupyter realtime sync is not sufficiently robust #1754

Closed
williamstein opened this issue Mar 8, 2017 · 35 comments
Closed

jupyter realtime sync is not sufficiently robust #1754

williamstein opened this issue Mar 8, 2017 · 35 comments

Comments

@williamstein
Copy link
Contributor

  • On Firefox we disabled Jupyter entirely, temporarily (see jupyter python2 notebook may truncate in firefox #1537).

  • We have had regular reports from users of issues over the years, and no amount of rewrites or subtle bug fixes has 100% eliminated these.

  • Upgrading to new versions of Jupyter is terrifying since the API changes, so things randomly break (for our very brittle use of Jupyter).

  • We are massively abusing and monkey patching Jupyter as it is to get sync to work.

The only viable solution I can think of is to completely rewrite Jupyter from scratch, except the backend kernel functionality. Thus this issue is labeled "insanely hard".

@williamstein
Copy link
Contributor Author

Work happening here: https://github.com/sagemathinc/smc/commits/jupyter

@williamstein
Copy link
Contributor Author

williamstein commented Mar 21, 2017

[edited] My goal is to finish and launch this by April 12.

@jasongrout
Copy link
Contributor

I just noticed this. To clarify, is your goal to write a new Jupyter frontend?

@williamstein
Copy link
Contributor Author

Yes. Similar to what the nteract project has done, but with different constraints, including (1) realtime sync, (2) compatibility with the official Jupyter UI (so don't change anything significantly unless necessary).

@jasongrout
Copy link
Contributor

Thanks. Did you look at building a notebook component with SMC syncing based on the JupyterLab notebook component? It may be much easier to get sync working in the JupyterLab notebook component - the codebase there is in much better shape, especially for extensibility (still alpha or beta-level, though) - CC @blink1073, @ian-r-rose. See the jlab notebook package: https://github.com/jupyterlab/jupyterlab/tree/master/packages/notebook (this was just factored out of jupyterlab in the last week or so and published as https://www.npmjs.com/package/@jupyterlab/notebook).

Of course, you may still decide (or have already decided) it's worth it to have a react-based in-house Jupyter frontend implementation anyway.

@jasongrout
Copy link
Contributor

much easier to get sync working in the JupyterLab notebook

(much easier than the classic Jupyter notebook...)

@jasongrout
Copy link
Contributor

Also, it would be much appreciated if you would please let us know of any warts or other difficulties you find on our end that we can improve.

@haraldschilly
Copy link
Contributor

Also, it would be much appreciated if you would please let us know of any warts or other difficulties you find on our end that we can improve.

Is there some sort of test suite for the whole range of possible messages in the protocol? The main point would be to check if the front-end is able to properly interpret what's coming in, etc. I don't know that much about the protocol or what to really look after, though.

My very first naive thought about this is to have an artificial test-kernel and the client asks to evaluate strings like test(1), test(2), ...

@williamstein
Copy link
Contributor Author

Did you look at building a notebook component with SMC syncing based on the JupyterLab notebook component?

No, I didn't.

It may be much easier to get sync working in the JupyterLab notebook component - the codebase there is in much better shape, especially for extensibility

I'm already pretty far along with this project (after one week of work), and sync works perfectly already in what we have.

(still alpha or beta-level, though)

The key motivation for this project is needing something that doesn't feel alpha/beta level quality as soon as possible.

Harald:

Is there some sort of test suite [...]

Even a sample Jupyter notebook that illustrates a wide range of possible outputs/inputs/cells would be very useful.

@jasongrout
Copy link
Contributor

I'm already pretty far along with this project (after one week of work), and sync works perfectly already in what we have.

Cool! I look forward to testing it out. Another person that would be very valuable to hammer on it is Matthias (@Carreau) - he knows the current notebook extremely well, and has been helping us in JLab to cover a lot of corner cases.

Is there some sort of test suite for the whole range of possible messages in the protocol? The main point would be to check if the front-end is able to properly interpret what's coming in, etc. I don't know that much about the protocol or what to really look after, though.

Good question. There are a lot of tests in the jupyter_client project (https://github.com/jupyter/jupyter_client/tree/master/jupyter_client/tests). CCing @minrk and @takluyver, who know that code better.

Even a sample Jupyter notebook that illustrates a wide range of possible outputs/inputs/cells would be very useful.

I don't know of a comprehensive test notebook, but there are lots of example notebooks out there, of course. I'll see if I can find or make something that exercises various messages.

Are you implementing comm messages so that widgets can work on top of it? Are you planning on implementing your own widget frontend? Syncing widgets across multiple frontends was something we were just discussing (and is quite a bit easier for us to implement with the limitations of the current jupyter notebook server).

@williamstein
Copy link
Contributor Author

Are you implementing comm messages so that widgets can work on top of it?
Are you planning on implementing your own widget frontend?
Syncing widgets across multiple frontends was something we were just discussing (and is quite a bit easier for us to implement with the limitations of the current jupyter notebook server).

I haven't decided whether (or to what extent) we will support widgets. Our top priority is something that very solidly solves the majority of our users problems regarding collaborative Jupyter.

@jasongrout
Copy link
Contributor

My very first naive thought about this is to have an artificial test-kernel and the client asks to evaluate strings like test(1), test(2), ...

That sounds like a great idea. @minrk, @takluyver, do you know of a sample test kernel that could exercise the messages in the message spec so a frontend could be tested?

@minrk
Copy link

minrk commented Mar 22, 2017

@jasongrout I don't think so. We have the other way around in https://github.com/jupyter/jupyter_kernel_test. You could base such a thing on https://github.com/jupyter/echo_kernel, I think.

@Carreau
Copy link

Carreau commented Mar 22, 2017

Is there any beta-deployment where to test this, I can have a go at the UI and list the differences with current notebook.

@williamstein
Copy link
Contributor Author

There was zero code two weeks ago -- so no it's not really ready for testing. I greatly appreciate your offer to test though, and will let you know when something is available.

Comment/question: there are three states related to running a cell:
0. [ ] New cell

  1. Client requests to execute
  2. [*] Backend receives request and starts running cell
  3. [number] Cell is done executing

Right now I think jupyter equates 1 and 2. However, I would rather distinguish them somehow. Can you clarify the situation here?


Regarding testing, "mocking" a kernel is something that shouldn't involve jupyter at all. What's critically needed is -- for a given kernel (e.g., "python3") -- a document that describes, for a given sequence of inputs to the kernel, what the responses should be. That is, of course, precisely the sort of thing that doctests in sage provide for sage itself. With such a document, one could easily make a mocked kernel for testing. It could also be used to verify that a kernel behaves properly. Sorry if you already have these, and I just missed them. A directory of files like this would be very good to include right next to kernel.json, say called "test/", so it's easy to test that a given installed kernel is working.

William

@minrk
Copy link

minrk commented Mar 22, 2017

Right now I think jupyter equates 1 and 2. However, I would rather distinguish them somehow. Can you clarify the situation here?

The current notebook UI doesn't care to distinguish between submitted and started, but there are messages/events that indicate this. Whenever a kernel starts handling a request, the first thing it does is sent a status message with execution_state=busy. This can be your signal for state 2.

The best document we have is the message spec, but that doesn't go into detailed example of "running this code will produce this exact sequence of messages".

A script like this will quickly run through an execution and show which messages are produced. In general, when an execution is submitted, the messages are:

  • iopub: msg_type: status; content.execution_state = busy
  • iopub: msg_type: execute_input; content.code = <executed code>
  • iopub: <any display outputs>
  • iopub: msg_type: status; content.execution_state = idle
  • shell: execute_reply; content.status = ok

The tests we have for this are in the IPython kernel package here. The jupyter_kernel_test package aims to provide the tools to run such tests for any Jupyter kernel.

@williamstein
Copy link
Contributor Author

williamstein commented Mar 23, 2017 via email

@williamstein
Copy link
Contributor Author

@minrk, @jasongrout Another very obvious change I want to make to Jupyter is to store the name of the kernel used for execution of each cell. I have the impression you guys don't do that. I'm now doing that with my rewrite, but haven't decided how to make that information available. It's relevant for when the user changes the kernel while working with a notebook. Question: have you thought about this?

@williamstein
Copy link
Contributor Author

williamstein commented Mar 23, 2017

@minrk @jasongrout -- can you guys clearly answer the questions above about testing infrastructure? Does the jupyter project really have no good answer to the automated testing of kernels problem? Have you worried about it? For us (SMC) where we aim to provide professional quality products, this is a no brainer step to need, and it might be something valuable we could contribute/encourage. It's also something I'm very used to from Sage dev, where our test suite is pretty useful.

@minrk
Copy link

minrk commented Mar 24, 2017

But I don't really know what character to use for state 1.... right now I'm using a "-". Any suggestions?

Maybe an ellipsis indicating "...not started yet..."? You could also explore icon options, perhaps. I'm not sure how useful a visual distinction between the two states is, though.

Another very obvious change I want to make to Jupyter is to store the name of the kernel used for execution of each cell. I have the impression you guys don't do that.

It's true we don't, because all cells necessarily have the same kernel info as each other and the notebook, so we store this info only at the notebook level. That's at the document format-level, of course. Language info does need to be attached to the runtime model of the cells for CodeMirror's syntax highlighting, etc., which gets updated when the kernelspec changes. This is more efficiently reflected as a model update on the cells in JupyterLab. In the live notebook, a notebook-level set_codemirror_mode is called that updates the cells.

can you guys clearly answer the questions above about testing infrastructure? Does the jupyter project really have no good answer to the automated testing of kernels problem?

I feel like I've answered this a few times. Kernels have their own test suites. jupyter_kernel_test provides the automation for testing spec compliance of kernels, which is derived from ipykernel's own tests for the same thing. You can see examples for testing the R and Python kernels in the repo.

We have been discussing something centralized for browsing the testing/compliance for kernels for some time, which would build on top of jupyter_kernel_test as it is right now. We haven't managed to build that, yet. Ultimately, it would just be gathering the results of each kernel's test suite run with jupyter_kernel_test and reporting on them.

it might be something valuable we could contribute/encourage

That would be awesome. jupyter_kernel_test would be the place to start today. If you have cycles to put into and want to try a different approach, that would be fine, too.

@williamstein
Copy link
Contributor Author

I'm not sure how useful a visual distinction between the two states is, though.

That might be because you often run your own server yourself, usually locally on the same machine (e.g., your laptop), and all libraries you use (not sage) take little time to import...

It's true we don't, because all cells necessarily have the same kernel info as each other and the notebook, so we store this info only at the notebook level.

Imagine a user who does this:

  1. Execute cell 1 using the python 3 kernel (say it is the default).
  2. Change the kernel to R.
  3. Execute another cell 2 now using the R kernel.

From that user's point of view, cell 1 and cell 2 have different "kernel info". (They have also created a very "non-reproducible" notebook situation.) If you really wanted to enforce "because all cells necessarily have the same kernel info", then the kernel would be immutable and would have to be chosen at creation time; or the filename would have the kernel in it (ugh).

This is more efficiently reflected as a model update on the cells in JupyterLab.

I don't know what those words mean, but maybe you are just saying that JupyterLab does "store the name of the kernel used for execution of each cell" in JupyterLab, but don't save it in the ipynb file? (Maybe I'm a little confused by the different Jupyter notebook client implementations too...)

Testing...

Thanks for explaining your approach to testing. So, tests of the kernels use a Python library and Python unit testing. The data that defines the tests is embedded as code in those Python (or other) files, rather than being in some declarative "data" format, which could be easily re-used by other testing systems. It might be useful to have a suite of declarative test data, since that same data could alternatively be used for mocking kernels (for other testing/dev purposes). This could fall under "If you have cycles to put into and want to try a different approach, that would be fine, too." Anyway, over the years, I've found automated testing to be incredibly powerful to improving development speed...

@williamstein
Copy link
Contributor Author

I'm not sure how useful a visual distinction between the two states is, though.
That might be because...

Also, users frequently start one cell running somewhere in a notebook, then start another one somewhere else running, which does nothing until the first cell completes. A visual distinction between "waiting to run" (because another cell is running) and "actually running" is really valuable. I don't know how often you do support for new users or walk around a room full of students using notebooks for the first time... but in my experience this sort of confusion is pretty common, especially in math where some computations can reasonably take a long time.

@minrk
Copy link

minrk commented Mar 24, 2017

I see what you mean about kernel info associated with cells - I think you are talking about recording historical execution context, as opposed to current state, which is what the notebook stores. Storing records about the historical context of the execution would be new information, and could definitely be persisted in cell metadata via an extension. This could potentially include info like timestamps, etc. as well.

then the kernel would be immutable and would have to be chosen at creation time

Just because the whole notebook is associated with one kernel doesn't mean the user cannot change their mind about which kernel that is. We do enforce that it's the same kernel for the whole notebook, though. Since a notebook is connected to a single kernel, changing the kernel associated with the notebook is changing the kernel associated with every cell in that notebook. So if each cell had 'language' metadata, the application would enforce that all cells have the same language metadata as the notebook. It doesn't matter to the application that a cell was executed with Python. What matters is that if you select and execute that cell now, it will run with R, so it should be highlighted for R syntax, etc.

I don't know what those words mean

Sorry, I just meant that the runtime model of 'CodeCell' in jupyterlab has a language field, which is easily updated. I thought you were asking about how to update the runtime model, as opposed to changing the document format, which I think is what you meant. The persisted data is the same - no language info on the cells, since it is defined as being redundant with notebook-level metadata.

The data that defines the tests is embedded as code in those Python (or other) files, rather than being in some declarative "data" format, which could be easily re-used by other testing systems.

There is this PR to allow specifying test data with a yaml/toml/whatever declarative format. Feel free to express support for the idea there, if you are interested, or would like to suggest a different direction.

I've found automated testing to be incredibly powerful to improving development speed

Me, too. That's why we do it, like I described above :)

A visual distinction between "waiting to run" (because another cell is running) and "actually running" is really valuable.

Good point.

in my experience this sort of confusion is pretty common

This is the absolute best sort of feedback, and your experience is super valuable, thanks!

@williamstein
Copy link
Contributor Author

I think you are talking about recording historical execution context, as opposed to current state, which is what the notebook stores

Yes, that's it.

cell metadata via an extension

OK; I want to include such metadata in the ipynb file. Has somebody already proposed a spec for something like this? How does that work? It would be nice to do this in a way that is compatible.

It doesn't matter to the application that a cell was executed with Python.

OK, by definition. In any case, what was used to execute the cell does matter to the user. I'm not going to change your model/application of course for my Jupyter client, where compatibility is a primary design constraint (I've already implemented a very different execution model with Sage worksheets, and will keep such changes there, though I'm thinking of creating a single kernel that can encode that model where each cell can have a different explicitly specified way of executing).

There is this PR

Thanks -- posted there. That is indeed precisely what I was hoping for...

@rgbkrk
Copy link
Contributor

rgbkrk commented Mar 24, 2017

OK; I want to include such metadata in the ipynb file. Has somebody already proposed a spec for something like this? How does that work? It would be nice to do this in a way that is compatible.

My suggestion: squat on your own namespace within metadata so you can do migrations with less worry of collisions:

"metadata": {
  "smc": {
    "someField": "asdf",
    "version": "1.0.0"
  }
}

@minrk
Copy link

minrk commented Mar 24, 2017

👍 to @rgbkrk - extensions can declare their own namespace in metadata, and put whatever they want there.

There may eventually be a top-level spec for certain execution metadata like timestamps, but there isn't yet. If/when that happens, it can be promoted from your extension namespace to the top-level.

@jasongrout
Copy link
Contributor

A visual distinction between "waiting to run" (because another cell is running) and "actually running" is really valuable.

+1

@DrXyzzy
Copy link
Contributor

DrXyzzy commented Mar 24, 2017

all cells necessarily have the same kernel info as each other and the notebook

This precludes having a single notebook controlling

  • different language engines
  • multiple instances of the same engine

@jasongrout
Copy link
Contributor

This precludes having a single notebook controlling

different language engines
multiple instances of the same engine

A kernel can dispatch things - right now the ipython kernel can evaluate things in the shell, R, etc, for example.

@nthiery
Copy link

nthiery commented Mar 27, 2017

Hi William!
Thanks for your work on reenabling the Jupyter notebook with firefox in SMC! I am looking forward to it. If you have some estimated time line for the feature would be helpful.
Context: we are currently preparing an updated edition of our book Calcul Mathématique avec Sage, together with a translation in English, and one in Spanish in the work. In the new introduction, we advertise SageMathCloud as one way to get easy access to Sage. We are also advertising the Jupyter notebook. Of course, it's annoying if the first experience that our new readers get is that the combination of both does not work currently. So depending on the timeline, we may have to change a bit the wording.
On our side, Paul hopes to get something out in a couple weeks.

@williamstein
Copy link
Contributor Author

If you have some estimated time line for the feature would be helpful.

Copying from above: "My goal is to finish and launch this by April 12, 2017." Obviously, it's somewhat ambitious to completely rewrite the Jupyter client (and much of the server) this quickly, but it needs to get done right.

@haraldschilly
Copy link
Contributor

haraldschilly commented Apr 5, 2017

as of today, the following causes a reproducible sync problem:

  1. set kernel to sagemath ("sagemath latest") which is right now the default one from within sage 7.5
    1.1 it also fails for "python2"
  2. plot(x^2) + evaluate, you see the plot
    2.1 for python2, run
import matplotlib.pyplot as plt
plt.plot([x**2 for x in range(-100, 100)])
  1. copy the URL, open new window, load page a second time there and view it side by side
  2. in the newly opened window, re-evaluate the cell and wait a few secs ... it seems to only be triggered when the image changes, so maybe you have to tweak the plot a bit, like x**3
  3. in the other window, first an "image can't be loaded" flashes by and then "Graphics object consisting of 1 graphics primitive"
    5.1 here it says: [<matplotlib.lines.Line2D at 0x7ff324bfc090>] <matplotlib.figure.Figure at 0x7ff327ddb750>

console, probably relevant:

Failed to load resource: net::ERR_INVALID_URL        data:image/png;base...
main.min.js:20089 Invalid type for image/png Object

@williamstein
Copy link
Contributor Author

as of today, the following causes a reproducible sync problem:

This example works absolutely 100% perfectly in the rewrite, without the slightest problem. In fact, so far as I know, absolutely everything related to sync is just perfect, due to the fundamentally very sync-friendly design.

@williamstein
Copy link
Contributor Author

Closing an "insanely hard" issue :-)

@williamstein
Copy link
Contributor Author

That was not easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants