Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging - the custom msg, entry points, and cached static assets solution. #116

Closed
jdfreder opened this issue May 26, 2015 · 89 comments
Closed

Comments

@jdfreder
Copy link
Contributor

Since we haven't agreed on this yet, I'm opening an issue instead of writing an IPEP. If this is agreed on, I'll write an IPEP.

The week before last, @ellisonbg and I brainstormed about Jupyter packaging. As I remember it, the best solution we came up with requires a combination of Python packaging entry-points, a new message type, and static asset caching (in the web server). This is my understanding of how this solution would work (my notes from our meeting are at home, and my apartment is being fumigated, so I don't have access to them).

Jupyter level, kernel extensions

screen shot 2015-05-26 at 1 47 13 pm
A new message, the first, (in blue) would be added, allowing the server to ask the kernel if the static assets it knows about, associated with that kernel, is correct. The message would be a dict of static asset path and contents hashes.

The same message in the opposite direction is the kernel's response. It would be some type of data structure, maybe a binary message, containing static asset paths and their contents, and a list of the static assets that can be deleted from the cache.

A second new message (in red), would be added that would allow the kernel to invoke a require.js call in the front-end. This is preferred over standard display(JS) calls, because the notebook contents will remain unaffected.

IPython level, kernel extensions

Python entry points will be used as a registry. Two entry points will be defined:

  1. an entry point for code to run when the kernel is started.
  2. an entry point for a method that returns static assets (paths).

Jupyter level, server extensions and notebook extensions

Python entry points will be used as a registry. Three entry points will be defined:

  1. an entry point for code to run when the server is started.
  2. an entry point for a method that returns static assets (paths).
  3. an entry point for a paths to be requireed when the notebook page loads.

EDIT
To help the discussion, issues and specific cases are listed here: https://jupyter.hackpad.com/Packaging-PbIgxnC71or

@rgbkrk
Copy link
Member

rgbkrk commented May 26, 2015

A new message, the first, (in blue) would be added, allowing the server [frontend] to ask the kernel if the static assets it knows about, associated with that kernel, is correct. The message would be a dict of static asset path and contents hashes.

The same message in the opposite direction is the kernel's response. It would be some type of data structure, maybe a binary message, containing static asset paths and their contents, and a list of the static assets that can be deleted from the cache.

I certainly like this approach. It makes sure that assets are based on the kernel runtime rather than associated with the overall notebook server (or other frontend).

Can path be remote or local, depending on the author's implementation?

@Carreau
Copy link
Member

Carreau commented May 26, 2015

It breaks the assumption that the kernel does not know it is in a notebook/js environment, and make it complicated to map kernel-path, to server-path, to frontend-path.

The Python packaging registry is not language agnostic. It forces each kernel(s) to reimplement a static-webserver, our server only act as a proxy.

I can see a problem with identical-path in many kernels. Once one is cached, it shadows other kernels resources. or you install a new version, and restart your kernel. You get the cached versions.

Kernel authors will never bother to implement delete messages.

@rgbkrk
Copy link
Member

rgbkrk commented May 26, 2015

It breaks the assumption that the kernel does not know it is in a notebook/js environment, and make it complicated to map kernel-path, to server-path, to frontend-path.

If you include the require bits, I'd say that's true. However, treating this as a resource query and response relative to the kernel does not make it coupled to the notebook. We'd want this for any other HTML based frontends, including Hydrogen.

I'm not in agreement about this using a Python packaging registry, as I think resources should be installed per kernel.

Kernel authors will never bother to implement delete messages.

Don't you think that would effect their users negatively enough that eventually they would?

@minrk
Copy link
Member

minrk commented May 26, 2015

The kernel knowing about static assets and telling the server seems problematic. I think if the kernel is being asked about the assets, it should be responsible for serving them, as well.

@Carreau
Copy link
Member

Carreau commented May 26, 2015

Don't you think that would effect their users negatively enough that eventually they would?

No they won't thay are developper, it work for them if they restart the server, which they do every 10 minutes.

We'd want this for any other HTML based frontends, including Hydrogen.

Nothing tell you that resources will be the same for hydrogen and the notebook.

@minrk
Copy link
Member

minrk commented May 26, 2015

I can see a problem with identical-path in many kernels.

I don't expect this to be a problem. Any resource fetched from a kernel should necessarily be served from a kernel-specific path. So when kernel K is asked for resource R, the server maps it to /K/R, not /R, so kernels are not capable of collision with each other.

I do think if we are going as far as making the Kernels responsible for static resources via messages, the most logical way to do that is to proxy requests to the Kernels themselves, and expect Kernels to run an HTTP server to serve the files. HTTP already has all the features we are describing here, I think.

@minrk
Copy link
Member

minrk commented May 26, 2015

A second new message (in red), would be added that would allow the kernel to invoke a require.js call in the front-end. This would eliminate the need of a notebook extensions list, and it's need to be configured.

This statement isn't true. nbextensions aren't limited to kernel-specific behavior. toc, slideshow, nbgrader, etc. would all not be addressed by the proposal, and continue to require nbextensions as it is.

@jdfreder
Copy link
Contributor Author

Hey guys, glad we are talking about this. Here are my responses.

@rgbkrk

Can path be remote or local, depending on the author's implementation?

Sorry! I really should have clarified, "path" here means "unique name". It can be whatever string the package author wants!

@Carreau

It forces each kernel(s) to reimplement a static-webserver, our server only act as a proxy.

The only piece of the above that the kernel authors need to implement is the single message, in blue.

It's up to kernel authors to choose a mechanism equivalent to Python's entry points, or something that can be used as an alternative.

It breaks the assumption that the kernel does not know it is in a notebook/js environment,

No.
Webserver says "hey these are assets I know about"
Kernel says "these are assets you are missing, and while you're at it delete these others"
Kernel says "load this asset" (which doesn't have to be JS)
Webserver says to client "load this asset" (which doesn't have to be JS)

I can see a problem with identical-path in many kernels. Once one is cached, it shadows other kernels resources. or you install a new version, and restart your kernel. You get the cached versions.

You missed the part where I mentioned caches are associated to specific kernels, by id.

Kernel authors will never bother to implement delete messages.

That means their kernels aren't up to spec.

@minrk

it should be responsible for serving them, as well.

But then if the kernel hangs, or is thinking, the assets are unavailable.

@jdfreder
Copy link
Contributor Author

This statement isn't true. nbextensions aren't limited to kernel-specific behavior. toc, slideshow, nbgrader, etc. would all not be addressed by the proposal, and continue to require nbextensions as it is.

Thanks for catching that! I'll edit my post.

@rgbkrk
Copy link
Member

rgbkrk commented May 26, 2015

I think if the kernel is being asked about the assets, it should be responsible for serving them, as well.

That's fair. Wait... How many ports are we talking then? That doesn't seem tractable unless those are proxied to the main notebook server.

@minrk
Copy link
Member

minrk commented May 26, 2015

But then if the kernel hangs, or is thinking, the assets are unavailable.

That's true, but how else are you going to get the resources from the kernel to the notebook server? It sounds like you have to either:

  1. assume shared filesystem, and make it impossible for kernels to be isolated or remote
  2. reimplement http over zmq, and fetch from the kernel anway

@jdfreder
Copy link
Contributor Author

Nothing tell you that resources will be the same for hydrogen and the notebook.

I don't think we'd need to differentiate. The same way the rich display system works, if a front-end can load an asset, it wont.

@Carreau
Copy link
Member

Carreau commented May 26, 2015

I mean the JS could be different in notebook than in hydrogen. or rodeo, or thebe. do you introduce mimetype per frontend ?

@rgbkrk
Copy link
Member

rgbkrk commented May 26, 2015

@minrk

assume shared filesystem, and make it impossible for kernels to be isolated or remote

I'm certainly going to reject that one. Doesn't work right for thebe or any other remote context.

reimplement http over zmq, and fetch from the kernel anyway

At first I thought you were joking, then I assumed someone implemented that. Like this? https://github.com/fanout/zurl

My thinking was that resources can be local paths or fully qualified URLs.

@minrk
Copy link
Member

minrk commented May 26, 2015

How many ports are we talking then?

One. The notebook server would proxy requests like /kernel/:kernel_name]/static/... to kernel_name.

There's also a question of whether these should be per kernel name or per kernel id. If it's per id, it's going to mean roughly 0 cache hits as every kernel instance would get its own URL.

@jdfreder
Copy link
Contributor Author

That's true, but how else are you going to get the resources from the kernel to the notebook server? It sounds like you have to either:

I may not understand, but this is what the cache is for. The webserver would ask the kernel about the assets once the kernel is started, and wouldn't need to later.

@minrk
Copy link
Member

minrk commented May 26, 2015

My thinking was that resources can be local paths or fully qualified URLs.

That is forcing knowledge of the notebook server onto the kernels. Do we really want to do that? I assumed not.

@jdfreder
Copy link
Contributor Author

My thinking was that resources can be local paths or fully qualified URLs.

Yes

@minrk
Copy link
Member

minrk commented May 26, 2015

I may not understand, but this is what the cache is for.

Cache only helps mitigate future requests, it still needs to get them from the kernel in the first place.

The webserver would ask the kernel about the assets once the kernel is started, and wouldn't need to later.

So all resources are known ahead of time, and no new resources are requested during the lifetime of the kernel?

@takluyver
Copy link
Member

Webserver says to client "load this asset" (which doesn't have to be JS)

This feels like the wrong way round to do things. The webserver shouldn't be telling the client what to load, the client should be asking the server for the things it determines it needs. Like the way widget display messages can include a require path for a module to load the view from. There are established mechanisms for caching to avoid loading the same thing twice.

@jdfreder
Copy link
Contributor Author

So all resources are known ahead of time, and no new resources are requested during the lifetime of the kernel?

Yes, that was our thinking. It's totally possible we overlooked a use case where that was incorrect.

Also, you could re-request assets on kernel restart (not just first start).

@minrk
Copy link
Member

minrk commented May 26, 2015

I'm struggling to see what problems this solves. If we are assuming the kernel knows everything about the server's filesystem in order to tell the server where verything else, then what's the advantage of the kernel managing resources at all, if it can only manage them in a way that the server can understand and access?

@jdfreder
Copy link
Contributor Author

Like the way widget display messages can include a require path for a module to load the view from. There are established mechanisms for caching to avoid loading the same thing twice.

The widget display message does exactly that, "hey load this"

@minrk
Copy link
Member

minrk commented May 26, 2015

Does this mechanism provide any benefit over a /kernels/:kernel_name/static directory?

@takluyver
Copy link
Member

The widget display message does exactly that, "hey load this"

Possibly I misunderstood. It sounds like in your proposal, the server is just telling the frontend to load something, as a separate message from anything that might actually use it. The widget display messages say 'create this class, loading it from X if you need to'. Crucially, loading the resource is tightly tied to using it, which makes it easy to avoid the race conditions where something would try to use the resource just before it was loaded.

@jdfreder
Copy link
Contributor Author

Does this mechanism provide any benefit over a /kernels/:kernel_name/static directory?

If the client, webserver, and kernel exist on three different machines, it does.

Also, the /kernels/:kernel_name/static directory still has the problem of installation being a two step process (yes this is a problem). This is where the kernel being in control of the asset locating offers a large benefit. Package writers can use methods native to their language for packaging static assets, for IPython & Python this is entry points.

@jdfreder
Copy link
Contributor Author

Crucially, loading the resource is tightly tied to using it, which makes it easy to avoid the race conditions where something would try to use the resource just before it was loaded.

That's a good point, about the backend not being aware of when the resource is loaded. Unfortunatley this problem already exists in our current architecture. A solution would be to make the red message request/response, so in the kernel the API could be implemented using an asynchronous design pattern.

@minrk
Copy link
Member

minrk commented May 26, 2015

If the client, webserver, and kernel exist on three different machines, it does.

How? I don't see a mechanism for getting the files from the kernel to the webserver, only communicating paths, which require the filesystem to be the same.

the /kernels/:kernel_name/static directory still has the problem of installation being a two step process (yes this is a problem).

It also doesn't solve that problem, it just punts it to the kernel. How does the package communicate this information to the kernel, such that the kernel knows at startup, before any imports, what resources are available?

@minrk
Copy link
Member

minrk commented May 26, 2015

If we use setuptools entrypoints for this, and communicate files from the kernel to the server at startup and only at startup, this means potentially 100s of MB of file transfer on every kernel startup to the web server. e.g. if a kernel plugin makes MathJax available, there's no mechanism to make the pieces available on request, which proxying http would do, instead it requires all possible resources to be moved at once to the server on every kernel start.

@takluyver
Copy link
Member

A solution would be to make the red message request/response, so in the kernel the API could be implemented using an asynchronous design pattern.

The bit about request/response makes sense to me, but I'm not sure what you mean about using async patterns in the kernel. I was thinking about race conditions in the frontend: if 'load this resource' and 'do something that needs that resource' are two separate messages, the 'do something' message can arrive before loading has finished, and then things get tricky. If the frontend requests (with caching) the resources as it needs them, you avoid this problem.

@JamiesHQ
Copy link
Member

All: can this issue be closed? If not, what next steps are required? thanks!

@ellisonbg ellisonbg added this to the No Action milestone Apr 24, 2017
@ellisonbg
Copy link
Contributor

I think we have basically solved this in JupyterLab and we don't have plans on back porting to the classic notebook as it would require a massive amount of work. Closing.

@rgbkrk
Copy link
Member

rgbkrk commented Apr 25, 2017

Certainly going to agree there, we ran into too much difficulty with needing to continue support of requirejs. It's still a core problem that people want to be able to declare an asset once for the life of a document -- we're not solving this in Jupyter notebooks (spec wise at least).

@JamiesHQ JamiesHQ modified the milestones: JupyterLab, No Action Apr 25, 2017
@jankatins
Copy link
Contributor

As this is also a topic which is relevant to other clients/kernels implementing the stuff: can someone give a pointer how to issue should now be handled? AT a first glance, I couldn't find anything about this in http://jupyter-client.readthedocs.io/en/latest/messaging.html

E.g. how should a javascript library (e.g. for a plot) be sent from an R kernel so that it is cached in the frontend and doesn't need to be resent (or at least not be saved multiple times)?

@ellisonbg
Copy link
Contributor

ellisonbg commented Apr 25, 2017 via email

@rgbkrk
Copy link
Member

rgbkrk commented Apr 25, 2017

To add on, my position on javascript (and html) is that outputs should be sandboxed in an iframe. Within that iframe though, we should be able to load assets.

@jankatins
Copy link
Contributor

@ellisonbg Are there any examples, where a (python/R) package implemented such a thing to display something?

Also, is there an implementation of a "consumer" of such new mime types, e.g. how would nbconvert handle such messages (when converting to docx via pandoc) and how would a package contribute a "handler" to such a consumer?

Building a npm extension to display a R based plot sounds like a lot of work (judging by my knowledge of npm and such stuff, it's probably for most R/python devs something new to learn). On the R side, knitr is king and they have a very easy model with a way to display certain js/html only once. It would be unfortunate if we can't match the ease to display something. So I would be very interested to see such examples. :-)

@rgbkrk
Copy link
Member

rgbkrk commented Apr 25, 2017

I'm interested to hear how knitr does it, since they receive such high praise from a lot of folks I work with.

@jankatins
Copy link
Contributor

jankatins commented Apr 25, 2017

See here https://cran.r-project.org/web/packages/knitr/vignettes/knit_print.html -> the "Metadata" section.

The biggest difference between jupyter an knitr is that knitr is optimized for converting an object to single output format (mostly markdown+html/js) and jupyter to display something in as much ways as possible. In contrast to jupyter, knitr knows all displayed objects as the complete document is converted and not like in the notebook, only a single cell. Knitr and the objects which get converted also know the final output format.

To display something you would add a single-dispatch implementation of the knit_print() method for you data structure (similar to IPythons display system for mpl Figure). In it's low level implementation is returns a structure ('asis_output`) which contains the representation of the object in the current output format and you can add a metadata object which contains stuff like javascript libs and css.

From the above sections:

library(knitr)
knit_print.foo = function(x, ...) {
  res = paste('**This is a `foo` object**:', x)
  asis_output(res, meta = list(
    js  = system.file('www', 'shared', 'shiny.js',  package = 'shiny'),
    css = system.file('www', 'shared', 'shiny.css', package = 'shiny')
  ))
}

Knitr will then render the whole document, insert the object representation in the document and collect the meta objects. The meta objects will be made unique and then inserted in the head of the document.

When we implemented repr (the equivalent of the ipython display system in the IRkernel), one big "problem" was that we couldn't reuse the knitr_print implementations (which almost every object in the R world has). The problem was that you can't be sure what kind of "mimetype" (js, png, etc) the knit_print call would return, as a) sometimes it's not markdown + js/html and b) sometimes an object can change the output based on the current final output format (it's available in the options argument to knit_print(obj, options, inline)). On the other hand, you can't use the repr_* methods to implement a knit_print method, as we didn't add a way to add meta objects as the jupyter messages didn't have a way to handle such "display only once" stuff.

R also has a very nice way to create and display html widgets, which are then handled nicely by knitr (knitr even seems to do screenshotting the html structure to embedded in other than html formats).

@takluyver
Copy link
Member

mobilechelonian demonstrates one way to get JS to the notebook interface without re-sending it every time: it copies JS to the nbextensions directory, and then sends code to load it from there. The limitation, of course, is that the JS does not become part of the notebook, so it's harder to share the notebook with all its output.

@jankatins
Copy link
Contributor

it copies JS to the nbextensions directory,

So this is as best a "workaround" for python packages, but not kernels of other languages. It would also not work on nbviewer. Is that right? How would nbviewer actually handle plots from plotting libs which send their plot as a new mimetype?

@rgbkrk
Copy link
Member

rgbkrk commented Apr 30, 2017

As for the question about support for new mimetypes on nbviewer, support has to be added. Plotly, Vega, geojson, and the new tables are the prime ones to bring in.

@ellisonbg
Copy link
Contributor

ellisonbg commented Apr 30, 2017 via email

@takluyver
Copy link
Member

So this is as best a "workaround" for python packages, but not kernels of other languages.

I don't think this is specific to Python kernels. It's convenient to reuse the existing Python function to install the nbextension, but all it's really doing is copying some files, and it wouldn't be hard to implement in another language.

It does assume that the kernel is accessing the same filesystem as the server, which doesn't have to be true, but in practice it usually is.

It would also not work on nbviewer. Is that right?

That is right.

@minrk
Copy link
Member

minrk commented Aug 21, 2018

Re-reading this from the future, and I want to add some clarification to this comment that prompted closing:

I think we have basically solved this in JupyterLab

JupyterLab does not solve this. I think it would be more accurate to say that JuptyerLab has instead committed to not solving this issue. The JupyterLab approach to extensions makes solving these target use cases that prompted this issue more difficult to impossible:

  • installing a kernel package (not in the server env) wants to deliver the required js (at a minimum, requires runtime-loaded js)
  • two kernels require incompatible versions of the same extension, e.g. myextension@2.0 and myextension@1.0 (requires being able to load different versions of the same library in different notebooks, which is possible via nbextensions if installed with a version in the path, but impossible in JupyterLab, in my understanding, due to the monolithic app bundle)

Instead, I would say that JupyterLab draws a more explicit line, that kernel packages and frontend packages are fundamentally separate, and to install a tool that has both frontend and backend components will always require two separate, explicit installation steps (which may be encapsulated in a single metapackage install in the common case where the kernel and server are in the same env). I think the JupyterLab position is that kernel packages should never be able to deliver javascript to the frontend, and instead choose to communicate with mime-types and protocols. There are plenty of reasons for working this way, but we shouldn't claim to have a solution to this issue.

@jdfreder
Copy link
Contributor Author

I agree that there's still a problem worth solving here. Today's workarounds require kernel packages to know what notebook client they are installed to. Suboptimal

I think generically, Packages need to be able to define blobs. These blobs can then be loaded dynamically by the client by hash or by alias. Blobs don't have to be JS.

Whether the blobs would be stored in the kernel, cached in the notebook server, stored in the notebook server, or a separate service is still unknown. Since I operate at a much lower capacity now, my hope is that you guys can take ownership of this and push it forward.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants