New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging - the custom msg, entry points, and cached static assets solution. #116

Closed
jdfreder opened this Issue May 26, 2015 · 87 comments

Comments

Projects
None yet
10 participants
@jdfreder
Contributor

jdfreder commented May 26, 2015

Since we haven't agreed on this yet, I'm opening an issue instead of writing an IPEP. If this is agreed on, I'll write an IPEP.

The week before last, @ellisonbg and I brainstormed about Jupyter packaging. As I remember it, the best solution we came up with requires a combination of Python packaging entry-points, a new message type, and static asset caching (in the web server). This is my understanding of how this solution would work (my notes from our meeting are at home, and my apartment is being fumigated, so I don't have access to them).

Jupyter level, kernel extensions

screen shot 2015-05-26 at 1 47 13 pm
A new message, the first, (in blue) would be added, allowing the server to ask the kernel if the static assets it knows about, associated with that kernel, is correct. The message would be a dict of static asset path and contents hashes.

The same message in the opposite direction is the kernel's response. It would be some type of data structure, maybe a binary message, containing static asset paths and their contents, and a list of the static assets that can be deleted from the cache.

A second new message (in red), would be added that would allow the kernel to invoke a require.js call in the front-end. This is preferred over standard display(JS) calls, because the notebook contents will remain unaffected.

IPython level, kernel extensions

Python entry points will be used as a registry. Two entry points will be defined:

  1. an entry point for code to run when the kernel is started.
  2. an entry point for a method that returns static assets (paths).

Jupyter level, server extensions and notebook extensions

Python entry points will be used as a registry. Three entry points will be defined:

  1. an entry point for code to run when the server is started.
  2. an entry point for a method that returns static assets (paths).
  3. an entry point for a paths to be requireed when the notebook page loads.

EDIT
To help the discussion, issues and specific cases are listed here: https://jupyter.hackpad.com/Packaging-PbIgxnC71or

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk May 26, 2015

Member

A new message, the first, (in blue) would be added, allowing the server [frontend] to ask the kernel if the static assets it knows about, associated with that kernel, is correct. The message would be a dict of static asset path and contents hashes.

The same message in the opposite direction is the kernel's response. It would be some type of data structure, maybe a binary message, containing static asset paths and their contents, and a list of the static assets that can be deleted from the cache.

I certainly like this approach. It makes sure that assets are based on the kernel runtime rather than associated with the overall notebook server (or other frontend).

Can path be remote or local, depending on the author's implementation?

Member

rgbkrk commented May 26, 2015

A new message, the first, (in blue) would be added, allowing the server [frontend] to ask the kernel if the static assets it knows about, associated with that kernel, is correct. The message would be a dict of static asset path and contents hashes.

The same message in the opposite direction is the kernel's response. It would be some type of data structure, maybe a binary message, containing static asset paths and their contents, and a list of the static assets that can be deleted from the cache.

I certainly like this approach. It makes sure that assets are based on the kernel runtime rather than associated with the overall notebook server (or other frontend).

Can path be remote or local, depending on the author's implementation?

@Carreau

This comment has been minimized.

Show comment
Hide comment
@Carreau

Carreau May 26, 2015

Member

It breaks the assumption that the kernel does not know it is in a notebook/js environment, and make it complicated to map kernel-path, to server-path, to frontend-path.

The Python packaging registry is not language agnostic. It forces each kernel(s) to reimplement a static-webserver, our server only act as a proxy.

I can see a problem with identical-path in many kernels. Once one is cached, it shadows other kernels resources. or you install a new version, and restart your kernel. You get the cached versions.

Kernel authors will never bother to implement delete messages.

Member

Carreau commented May 26, 2015

It breaks the assumption that the kernel does not know it is in a notebook/js environment, and make it complicated to map kernel-path, to server-path, to frontend-path.

The Python packaging registry is not language agnostic. It forces each kernel(s) to reimplement a static-webserver, our server only act as a proxy.

I can see a problem with identical-path in many kernels. Once one is cached, it shadows other kernels resources. or you install a new version, and restart your kernel. You get the cached versions.

Kernel authors will never bother to implement delete messages.

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk May 26, 2015

Member

It breaks the assumption that the kernel does not know it is in a notebook/js environment, and make it complicated to map kernel-path, to server-path, to frontend-path.

If you include the require bits, I'd say that's true. However, treating this as a resource query and response relative to the kernel does not make it coupled to the notebook. We'd want this for any other HTML based frontends, including Hydrogen.

I'm not in agreement about this using a Python packaging registry, as I think resources should be installed per kernel.

Kernel authors will never bother to implement delete messages.

Don't you think that would effect their users negatively enough that eventually they would?

Member

rgbkrk commented May 26, 2015

It breaks the assumption that the kernel does not know it is in a notebook/js environment, and make it complicated to map kernel-path, to server-path, to frontend-path.

If you include the require bits, I'd say that's true. However, treating this as a resource query and response relative to the kernel does not make it coupled to the notebook. We'd want this for any other HTML based frontends, including Hydrogen.

I'm not in agreement about this using a Python packaging registry, as I think resources should be installed per kernel.

Kernel authors will never bother to implement delete messages.

Don't you think that would effect their users negatively enough that eventually they would?

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

The kernel knowing about static assets and telling the server seems problematic. I think if the kernel is being asked about the assets, it should be responsible for serving them, as well.

Member

minrk commented May 26, 2015

The kernel knowing about static assets and telling the server seems problematic. I think if the kernel is being asked about the assets, it should be responsible for serving them, as well.

@Carreau

This comment has been minimized.

Show comment
Hide comment
@Carreau

Carreau May 26, 2015

Member

Don't you think that would effect their users negatively enough that eventually they would?

No they won't thay are developper, it work for them if they restart the server, which they do every 10 minutes.

We'd want this for any other HTML based frontends, including Hydrogen.

Nothing tell you that resources will be the same for hydrogen and the notebook.

Member

Carreau commented May 26, 2015

Don't you think that would effect their users negatively enough that eventually they would?

No they won't thay are developper, it work for them if they restart the server, which they do every 10 minutes.

We'd want this for any other HTML based frontends, including Hydrogen.

Nothing tell you that resources will be the same for hydrogen and the notebook.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

I can see a problem with identical-path in many kernels.

I don't expect this to be a problem. Any resource fetched from a kernel should necessarily be served from a kernel-specific path. So when kernel K is asked for resource R, the server maps it to /K/R, not /R, so kernels are not capable of collision with each other.

I do think if we are going as far as making the Kernels responsible for static resources via messages, the most logical way to do that is to proxy requests to the Kernels themselves, and expect Kernels to run an HTTP server to serve the files. HTTP already has all the features we are describing here, I think.

Member

minrk commented May 26, 2015

I can see a problem with identical-path in many kernels.

I don't expect this to be a problem. Any resource fetched from a kernel should necessarily be served from a kernel-specific path. So when kernel K is asked for resource R, the server maps it to /K/R, not /R, so kernels are not capable of collision with each other.

I do think if we are going as far as making the Kernels responsible for static resources via messages, the most logical way to do that is to proxy requests to the Kernels themselves, and expect Kernels to run an HTTP server to serve the files. HTTP already has all the features we are describing here, I think.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

A second new message (in red), would be added that would allow the kernel to invoke a require.js call in the front-end. This would eliminate the need of a notebook extensions list, and it's need to be configured.

This statement isn't true. nbextensions aren't limited to kernel-specific behavior. toc, slideshow, nbgrader, etc. would all not be addressed by the proposal, and continue to require nbextensions as it is.

Member

minrk commented May 26, 2015

A second new message (in red), would be added that would allow the kernel to invoke a require.js call in the front-end. This would eliminate the need of a notebook extensions list, and it's need to be configured.

This statement isn't true. nbextensions aren't limited to kernel-specific behavior. toc, slideshow, nbgrader, etc. would all not be addressed by the proposal, and continue to require nbextensions as it is.

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

Hey guys, glad we are talking about this. Here are my responses.

@rgbkrk

Can path be remote or local, depending on the author's implementation?

Sorry! I really should have clarified, "path" here means "unique name". It can be whatever string the package author wants!

@Carreau

It forces each kernel(s) to reimplement a static-webserver, our server only act as a proxy.

The only piece of the above that the kernel authors need to implement is the single message, in blue.

It's up to kernel authors to choose a mechanism equivalent to Python's entry points, or something that can be used as an alternative.

It breaks the assumption that the kernel does not know it is in a notebook/js environment,

No.
Webserver says "hey these are assets I know about"
Kernel says "these are assets you are missing, and while you're at it delete these others"
Kernel says "load this asset" (which doesn't have to be JS)
Webserver says to client "load this asset" (which doesn't have to be JS)

I can see a problem with identical-path in many kernels. Once one is cached, it shadows other kernels resources. or you install a new version, and restart your kernel. You get the cached versions.

You missed the part where I mentioned caches are associated to specific kernels, by id.

Kernel authors will never bother to implement delete messages.

That means their kernels aren't up to spec.

@minrk

it should be responsible for serving them, as well.

But then if the kernel hangs, or is thinking, the assets are unavailable.

Contributor

jdfreder commented May 26, 2015

Hey guys, glad we are talking about this. Here are my responses.

@rgbkrk

Can path be remote or local, depending on the author's implementation?

Sorry! I really should have clarified, "path" here means "unique name". It can be whatever string the package author wants!

@Carreau

It forces each kernel(s) to reimplement a static-webserver, our server only act as a proxy.

The only piece of the above that the kernel authors need to implement is the single message, in blue.

It's up to kernel authors to choose a mechanism equivalent to Python's entry points, or something that can be used as an alternative.

It breaks the assumption that the kernel does not know it is in a notebook/js environment,

No.
Webserver says "hey these are assets I know about"
Kernel says "these are assets you are missing, and while you're at it delete these others"
Kernel says "load this asset" (which doesn't have to be JS)
Webserver says to client "load this asset" (which doesn't have to be JS)

I can see a problem with identical-path in many kernels. Once one is cached, it shadows other kernels resources. or you install a new version, and restart your kernel. You get the cached versions.

You missed the part where I mentioned caches are associated to specific kernels, by id.

Kernel authors will never bother to implement delete messages.

That means their kernels aren't up to spec.

@minrk

it should be responsible for serving them, as well.

But then if the kernel hangs, or is thinking, the assets are unavailable.

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

This statement isn't true. nbextensions aren't limited to kernel-specific behavior. toc, slideshow, nbgrader, etc. would all not be addressed by the proposal, and continue to require nbextensions as it is.

Thanks for catching that! I'll edit my post.

Contributor

jdfreder commented May 26, 2015

This statement isn't true. nbextensions aren't limited to kernel-specific behavior. toc, slideshow, nbgrader, etc. would all not be addressed by the proposal, and continue to require nbextensions as it is.

Thanks for catching that! I'll edit my post.

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk May 26, 2015

Member

I think if the kernel is being asked about the assets, it should be responsible for serving them, as well.

That's fair. Wait... How many ports are we talking then? That doesn't seem tractable unless those are proxied to the main notebook server.

Member

rgbkrk commented May 26, 2015

I think if the kernel is being asked about the assets, it should be responsible for serving them, as well.

That's fair. Wait... How many ports are we talking then? That doesn't seem tractable unless those are proxied to the main notebook server.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

But then if the kernel hangs, or is thinking, the assets are unavailable.

That's true, but how else are you going to get the resources from the kernel to the notebook server? It sounds like you have to either:

  1. assume shared filesystem, and make it impossible for kernels to be isolated or remote
  2. reimplement http over zmq, and fetch from the kernel anway
Member

minrk commented May 26, 2015

But then if the kernel hangs, or is thinking, the assets are unavailable.

That's true, but how else are you going to get the resources from the kernel to the notebook server? It sounds like you have to either:

  1. assume shared filesystem, and make it impossible for kernels to be isolated or remote
  2. reimplement http over zmq, and fetch from the kernel anway
@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

Nothing tell you that resources will be the same for hydrogen and the notebook.

I don't think we'd need to differentiate. The same way the rich display system works, if a front-end can load an asset, it wont.

Contributor

jdfreder commented May 26, 2015

Nothing tell you that resources will be the same for hydrogen and the notebook.

I don't think we'd need to differentiate. The same way the rich display system works, if a front-end can load an asset, it wont.

@Carreau

This comment has been minimized.

Show comment
Hide comment
@Carreau

Carreau May 26, 2015

Member

I mean the JS could be different in notebook than in hydrogen. or rodeo, or thebe. do you introduce mimetype per frontend ?

Member

Carreau commented May 26, 2015

I mean the JS could be different in notebook than in hydrogen. or rodeo, or thebe. do you introduce mimetype per frontend ?

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk May 26, 2015

Member

@minrk

assume shared filesystem, and make it impossible for kernels to be isolated or remote

I'm certainly going to reject that one. Doesn't work right for thebe or any other remote context.

reimplement http over zmq, and fetch from the kernel anyway

At first I thought you were joking, then I assumed someone implemented that. Like this? https://github.com/fanout/zurl

My thinking was that resources can be local paths or fully qualified URLs.

Member

rgbkrk commented May 26, 2015

@minrk

assume shared filesystem, and make it impossible for kernels to be isolated or remote

I'm certainly going to reject that one. Doesn't work right for thebe or any other remote context.

reimplement http over zmq, and fetch from the kernel anyway

At first I thought you were joking, then I assumed someone implemented that. Like this? https://github.com/fanout/zurl

My thinking was that resources can be local paths or fully qualified URLs.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

How many ports are we talking then?

One. The notebook server would proxy requests like /kernel/:kernel_name]/static/... to kernel_name.

There's also a question of whether these should be per kernel name or per kernel id. If it's per id, it's going to mean roughly 0 cache hits as every kernel instance would get its own URL.

Member

minrk commented May 26, 2015

How many ports are we talking then?

One. The notebook server would proxy requests like /kernel/:kernel_name]/static/... to kernel_name.

There's also a question of whether these should be per kernel name or per kernel id. If it's per id, it's going to mean roughly 0 cache hits as every kernel instance would get its own URL.

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

That's true, but how else are you going to get the resources from the kernel to the notebook server? It sounds like you have to either:

I may not understand, but this is what the cache is for. The webserver would ask the kernel about the assets once the kernel is started, and wouldn't need to later.

Contributor

jdfreder commented May 26, 2015

That's true, but how else are you going to get the resources from the kernel to the notebook server? It sounds like you have to either:

I may not understand, but this is what the cache is for. The webserver would ask the kernel about the assets once the kernel is started, and wouldn't need to later.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

My thinking was that resources can be local paths or fully qualified URLs.

That is forcing knowledge of the notebook server onto the kernels. Do we really want to do that? I assumed not.

Member

minrk commented May 26, 2015

My thinking was that resources can be local paths or fully qualified URLs.

That is forcing knowledge of the notebook server onto the kernels. Do we really want to do that? I assumed not.

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

My thinking was that resources can be local paths or fully qualified URLs.

Yes

Contributor

jdfreder commented May 26, 2015

My thinking was that resources can be local paths or fully qualified URLs.

Yes

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

I may not understand, but this is what the cache is for.

Cache only helps mitigate future requests, it still needs to get them from the kernel in the first place.

The webserver would ask the kernel about the assets once the kernel is started, and wouldn't need to later.

So all resources are known ahead of time, and no new resources are requested during the lifetime of the kernel?

Member

minrk commented May 26, 2015

I may not understand, but this is what the cache is for.

Cache only helps mitigate future requests, it still needs to get them from the kernel in the first place.

The webserver would ask the kernel about the assets once the kernel is started, and wouldn't need to later.

So all resources are known ahead of time, and no new resources are requested during the lifetime of the kernel?

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver May 26, 2015

Member

Webserver says to client "load this asset" (which doesn't have to be JS)

This feels like the wrong way round to do things. The webserver shouldn't be telling the client what to load, the client should be asking the server for the things it determines it needs. Like the way widget display messages can include a require path for a module to load the view from. There are established mechanisms for caching to avoid loading the same thing twice.

Member

takluyver commented May 26, 2015

Webserver says to client "load this asset" (which doesn't have to be JS)

This feels like the wrong way round to do things. The webserver shouldn't be telling the client what to load, the client should be asking the server for the things it determines it needs. Like the way widget display messages can include a require path for a module to load the view from. There are established mechanisms for caching to avoid loading the same thing twice.

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

So all resources are known ahead of time, and no new resources are requested during the lifetime of the kernel?

Yes, that was our thinking. It's totally possible we overlooked a use case where that was incorrect.

Also, you could re-request assets on kernel restart (not just first start).

Contributor

jdfreder commented May 26, 2015

So all resources are known ahead of time, and no new resources are requested during the lifetime of the kernel?

Yes, that was our thinking. It's totally possible we overlooked a use case where that was incorrect.

Also, you could re-request assets on kernel restart (not just first start).

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

I'm struggling to see what problems this solves. If we are assuming the kernel knows everything about the server's filesystem in order to tell the server where verything else, then what's the advantage of the kernel managing resources at all, if it can only manage them in a way that the server can understand and access?

Member

minrk commented May 26, 2015

I'm struggling to see what problems this solves. If we are assuming the kernel knows everything about the server's filesystem in order to tell the server where verything else, then what's the advantage of the kernel managing resources at all, if it can only manage them in a way that the server can understand and access?

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

Like the way widget display messages can include a require path for a module to load the view from. There are established mechanisms for caching to avoid loading the same thing twice.

The widget display message does exactly that, "hey load this"

Contributor

jdfreder commented May 26, 2015

Like the way widget display messages can include a require path for a module to load the view from. There are established mechanisms for caching to avoid loading the same thing twice.

The widget display message does exactly that, "hey load this"

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

Does this mechanism provide any benefit over a /kernels/:kernel_name/static directory?

Member

minrk commented May 26, 2015

Does this mechanism provide any benefit over a /kernels/:kernel_name/static directory?

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver May 26, 2015

Member

The widget display message does exactly that, "hey load this"

Possibly I misunderstood. It sounds like in your proposal, the server is just telling the frontend to load something, as a separate message from anything that might actually use it. The widget display messages say 'create this class, loading it from X if you need to'. Crucially, loading the resource is tightly tied to using it, which makes it easy to avoid the race conditions where something would try to use the resource just before it was loaded.

Member

takluyver commented May 26, 2015

The widget display message does exactly that, "hey load this"

Possibly I misunderstood. It sounds like in your proposal, the server is just telling the frontend to load something, as a separate message from anything that might actually use it. The widget display messages say 'create this class, loading it from X if you need to'. Crucially, loading the resource is tightly tied to using it, which makes it easy to avoid the race conditions where something would try to use the resource just before it was loaded.

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

Does this mechanism provide any benefit over a /kernels/:kernel_name/static directory?

If the client, webserver, and kernel exist on three different machines, it does.

Also, the /kernels/:kernel_name/static directory still has the problem of installation being a two step process (yes this is a problem). This is where the kernel being in control of the asset locating offers a large benefit. Package writers can use methods native to their language for packaging static assets, for IPython & Python this is entry points.

Contributor

jdfreder commented May 26, 2015

Does this mechanism provide any benefit over a /kernels/:kernel_name/static directory?

If the client, webserver, and kernel exist on three different machines, it does.

Also, the /kernels/:kernel_name/static directory still has the problem of installation being a two step process (yes this is a problem). This is where the kernel being in control of the asset locating offers a large benefit. Package writers can use methods native to their language for packaging static assets, for IPython & Python this is entry points.

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

Crucially, loading the resource is tightly tied to using it, which makes it easy to avoid the race conditions where something would try to use the resource just before it was loaded.

That's a good point, about the backend not being aware of when the resource is loaded. Unfortunatley this problem already exists in our current architecture. A solution would be to make the red message request/response, so in the kernel the API could be implemented using an asynchronous design pattern.

Contributor

jdfreder commented May 26, 2015

Crucially, loading the resource is tightly tied to using it, which makes it easy to avoid the race conditions where something would try to use the resource just before it was loaded.

That's a good point, about the backend not being aware of when the resource is loaded. Unfortunatley this problem already exists in our current architecture. A solution would be to make the red message request/response, so in the kernel the API could be implemented using an asynchronous design pattern.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

If the client, webserver, and kernel exist on three different machines, it does.

How? I don't see a mechanism for getting the files from the kernel to the webserver, only communicating paths, which require the filesystem to be the same.

the /kernels/:kernel_name/static directory still has the problem of installation being a two step process (yes this is a problem).

It also doesn't solve that problem, it just punts it to the kernel. How does the package communicate this information to the kernel, such that the kernel knows at startup, before any imports, what resources are available?

Member

minrk commented May 26, 2015

If the client, webserver, and kernel exist on three different machines, it does.

How? I don't see a mechanism for getting the files from the kernel to the webserver, only communicating paths, which require the filesystem to be the same.

the /kernels/:kernel_name/static directory still has the problem of installation being a two step process (yes this is a problem).

It also doesn't solve that problem, it just punts it to the kernel. How does the package communicate this information to the kernel, such that the kernel knows at startup, before any imports, what resources are available?

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

If we use setuptools entrypoints for this, and communicate files from the kernel to the server at startup and only at startup, this means potentially 100s of MB of file transfer on every kernel startup to the web server. e.g. if a kernel plugin makes MathJax available, there's no mechanism to make the pieces available on request, which proxying http would do, instead it requires all possible resources to be moved at once to the server on every kernel start.

Member

minrk commented May 26, 2015

If we use setuptools entrypoints for this, and communicate files from the kernel to the server at startup and only at startup, this means potentially 100s of MB of file transfer on every kernel startup to the web server. e.g. if a kernel plugin makes MathJax available, there's no mechanism to make the pieces available on request, which proxying http would do, instead it requires all possible resources to be moved at once to the server on every kernel start.

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver May 26, 2015

Member

A solution would be to make the red message request/response, so in the kernel the API could be implemented using an asynchronous design pattern.

The bit about request/response makes sense to me, but I'm not sure what you mean about using async patterns in the kernel. I was thinking about race conditions in the frontend: if 'load this resource' and 'do something that needs that resource' are two separate messages, the 'do something' message can arrive before loading has finished, and then things get tricky. If the frontend requests (with caching) the resources as it needs them, you avoid this problem.

Member

takluyver commented May 26, 2015

A solution would be to make the red message request/response, so in the kernel the API could be implemented using an asynchronous design pattern.

The bit about request/response makes sense to me, but I'm not sure what you mean about using async patterns in the kernel. I was thinking about race conditions in the frontend: if 'load this resource' and 'do something that needs that resource' are two separate messages, the 'do something' message can arrive before loading has finished, and then things get tricky. If the frontend requests (with caching) the resources as it needs them, you avoid this problem.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

Even if the caching works well, you will have to hash every resource at startup to validate the cache, rather than at request time. To get a sense of what order of magnitude this might have, try:

time find notebook/static/components -type f -exec md5 "{}" > /dev/null \;

in the notebook repo

Member

minrk commented May 26, 2015

Even if the caching works well, you will have to hash every resource at startup to validate the cache, rather than at request time. To get a sense of what order of magnitude this might have, try:

time find notebook/static/components -type f -exec md5 "{}" > /dev/null \;

in the notebook repo

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

I don't see a mechanism for getting the files from the kernel to the webserver, only communicating paths, which require the filesystem to be the same.

This is why I apologized in my first response, "path" really should be "name". What's being communicated in the message to the kernel are "names" & hashes of corresponding contents. And the other way is "names" and actual file contents. How the file contents make there way over the line is up for discussion, but I was thinking binary messages of some sort.

It also doesn't solve that problem, it just punts it to the kernel. How does the package communicate this information to the kernel, such that the kernel knows at startup, before any imports, what resources are available?

Punting the problem to the kernel is the whole point. Python has a mechanism for this, entry points, which means it's solved for IPython and Jupyter which is all I'm concerned about. The generic messages allow other kernel authors to solve the problem how they want. i.e. IJulia will have to implement their own registry, but as long as they implement the messages, they can do it however they want.

If we use setuptools entrypoints for this, and communicate files from the kernel to the server at startup and only at startup, this means potentially 100s of MB of file transfer on every kernel startup to the web server. e.g. if a kernel plugin makes MathJax available, there's no mechanism to make the pieces available on request, which proxying http would do, instead it requires all possible resources to be moved at once to the server on every kernel start.

The caches stored in the web server would be persisted to the disk. On request, if a resource doesn't exist because blue message #2 hasn't been received yet, the request will be deferred until that message has been received. Once the message is received, if the content still doesn't exist, 404, otherwise respond with the contents.

Contributor

jdfreder commented May 26, 2015

I don't see a mechanism for getting the files from the kernel to the webserver, only communicating paths, which require the filesystem to be the same.

This is why I apologized in my first response, "path" really should be "name". What's being communicated in the message to the kernel are "names" & hashes of corresponding contents. And the other way is "names" and actual file contents. How the file contents make there way over the line is up for discussion, but I was thinking binary messages of some sort.

It also doesn't solve that problem, it just punts it to the kernel. How does the package communicate this information to the kernel, such that the kernel knows at startup, before any imports, what resources are available?

Punting the problem to the kernel is the whole point. Python has a mechanism for this, entry points, which means it's solved for IPython and Jupyter which is all I'm concerned about. The generic messages allow other kernel authors to solve the problem how they want. i.e. IJulia will have to implement their own registry, but as long as they implement the messages, they can do it however they want.

If we use setuptools entrypoints for this, and communicate files from the kernel to the server at startup and only at startup, this means potentially 100s of MB of file transfer on every kernel startup to the web server. e.g. if a kernel plugin makes MathJax available, there's no mechanism to make the pieces available on request, which proxying http would do, instead it requires all possible resources to be moved at once to the server on every kernel start.

The caches stored in the web server would be persisted to the disk. On request, if a resource doesn't exist because blue message #2 hasn't been received yet, the request will be deferred until that message has been received. Once the message is received, if the content still doesn't exist, 404, otherwise respond with the contents.

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

if 'load this resource' and 'do something that needs that resource' are two separate messages, the 'do something' message can arrive before loading has finished, and then things get tricky.

The 'do something' message, like the 'load this resource' message comes from the kernel. Hence, if the message is request/response, 'load this resource' function in the kernel would return a defered (or something, whatever is best for the language), in which, once it's resolved would send the 'do something' message.

Contributor

jdfreder commented May 26, 2015

if 'load this resource' and 'do something that needs that resource' are two separate messages, the 'do something' message can arrive before loading has finished, and then things get tricky.

The 'do something' message, like the 'load this resource' message comes from the kernel. Hence, if the message is request/response, 'load this resource' function in the kernel would return a defered (or something, whatever is best for the language), in which, once it's resolved would send the 'do something' message.

@jdfreder

This comment has been minimized.

Show comment
Hide comment
@jdfreder

jdfreder May 26, 2015

Contributor

you will have to hash every resource at startup to validate the cache,

Yes, that could be a problem for the kernel. hmmm. I hope I don't sound ridiculous saying this, but you could cache the hashes in the kernel by the file name and timestamp...?

Contributor

jdfreder commented May 26, 2015

you will have to hash every resource at startup to validate the cache,

Yes, that could be a problem for the kernel. hmmm. I hope I don't sound ridiculous saying this, but you could cache the hashes in the kernel by the file name and timestamp...?

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

@jdfreder I'm not sure what the initial kernel->server publish accomplishes. Why not load on first request for a given resource from the server, and cache that? It wouldn't have the unbounded cost at startup. It would be possible for the kernel to be slow on the first request of a particular resource if the kernel is busy, but I'm not sure that's worse than being slow on every startup.

Member

minrk commented May 26, 2015

@jdfreder I'm not sure what the initial kernel->server publish accomplishes. Why not load on first request for a given resource from the server, and cache that? It wouldn't have the unbounded cost at startup. It would be possible for the kernel to be slow on the first request of a particular resource if the kernel is busy, but I'm not sure that's worse than being slow on every startup.

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk May 26, 2015

Member
~/code/jupyter/notebook$ time find notebook/static/components -type f -exec md5 "{}" > /dev/null \;

real    0m51.753s
user    0m19.012s
sys 0m27.365s
Member

rgbkrk commented May 26, 2015

~/code/jupyter/notebook$ time find notebook/static/components -type f -exec md5 "{}" > /dev/null \;

real    0m51.753s
user    0m19.012s
sys 0m27.365s
@Carreau

This comment has been minimized.

Show comment
Hide comment
@Carreau

Carreau May 26, 2015

Member

Kyle machine is faster than mine :

$ time find notebook/static/components -type f -exec md5 "{}" > /dev/null \;

real    1m22.300s
user    0m27.233s
sys 0m51.760s
Member

Carreau commented May 26, 2015

Kyle machine is faster than mine :

$ time find notebook/static/components -type f -exec md5 "{}" > /dev/null \;

real    1m22.300s
user    0m27.233s
sys 0m51.760s
@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk May 26, 2015

Member

I hope I don't sound ridiculous saying this, but you could cache the hashes in the kernel by the file name and timestamp.

Not ridiculous, we probably should do that if we require publishing all resources at kernel start time. But now we're caching our cache, so we can cache while we cache :)

Member

minrk commented May 26, 2015

I hope I don't sound ridiculous saying this, but you could cache the hashes in the kernel by the file name and timestamp.

Not ridiculous, we probably should do that if we require publishing all resources at kernel start time. But now we're caching our cache, so we can cache while we cache :)

@Carreau

This comment has been minimized.

Show comment
Hide comment
@Carreau

Carreau May 26, 2015

Member

I also feel that this thread is a "let's abstract things in a way that will allow us to get an abstraction to abstract what we need to be abstracted to solve it."

Member

Carreau commented May 26, 2015

I also feel that this thread is a "let's abstract things in a way that will allow us to get an abstraction to abstract what we need to be abstracted to solve it."

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver May 26, 2015

Member

The 'do something' message, like the 'load this resource' message comes from the kernel. Hence, if the message is request/response, 'load this resource' function in the kernel would return a defered (or something, whatever is best for the language), in which, once it's resolved would send the 'do something' message.

But to implement that, you need the frontend to send a receipt right back to the kernel to acknowledge that the resource has been received. It's not enough for the kernel to know that it sent the resource, it has to wait until the frontend has received it. And then it needs to think about what to do if it doesn't get that receipt within a timeout, and so on.

It really seems like it would be much simpler to have the frontend request resources when it's trying to do something that requires them. That's already the way HTTP+HTML works anyway, so it should be easier to implement.

Member

takluyver commented May 26, 2015

The 'do something' message, like the 'load this resource' message comes from the kernel. Hence, if the message is request/response, 'load this resource' function in the kernel would return a defered (or something, whatever is best for the language), in which, once it's resolved would send the 'do something' message.

But to implement that, you need the frontend to send a receipt right back to the kernel to acknowledge that the resource has been received. It's not enough for the kernel to know that it sent the resource, it has to wait until the frontend has received it. And then it needs to think about what to do if it doesn't get that receipt within a timeout, and so on.

It really seems like it would be much simpler to have the frontend request resources when it's trying to do something that requires them. That's already the way HTTP+HTML works anyway, so it should be easier to implement.

@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Apr 2, 2016

Contributor

The R kernel discussed this (or a subset of this ) problem recently: IRkernel/IRdisplay#14 Basically: how should JS/css libs for visualisations be handled

The knitr/rmarkdown system in R has this problem solved: the "knit_asis" object (kind of like the display_data message) gets styling in an extra attribute knit_meta which lists dependencies (each time such an object is produced) and rmarkdown (in this case in a similar role like the notebook frontend) manages what is outputted in the final document. Knitr/rmarkdown has is slightly easier, as it only makes one pass over the document and you can't reevaluate the notebook, so the solution here needs to handle reevaluating cells and removing styling when no cell references it (and this needs to be in the on-disc format as well...). Also, the "producer" of the knit_asis object has information about the final output format (html/latex) and so knitmeta only contains stuff for that format.

As such visualisations need to be available in html content which is derived from ipynb files (nbconvert, nbviewer, github,...), I don't think any "request from kernel" steps can be part of a solution for this problem, as the ipynb file needs to have all such dependencies available. A solutions should also handle that different output formats (html, latex,...) need different dependencies (message format and stored in the ipynb).

But this would also make the implementation for this easier:

  • The messaging format would need an update to handle assets, maybe by using metadata.<mimetype>.dependency.xxx = [] (with xxx = js|html|latex|... -> whatever the mimetype can handle)
  • The frontend would need to implement a deduplicator to only include this code only once per document and handle removals if reevaluation of cells is possible:
    • the ipynb/model would include each dependency only once (new "dependencies" section in the json -> <mimetype>.<hash>.(type, content))
    • each "cell" contains only a reference to the (hash of the) dependency
    • when new messages come in, all dependencies are moved to the dependency store and removed from the message, which is then "normally" handled. "moved to the dependency store" means, that each dependency is hashed and either newly included in the store and in the document (in a special section, not the output area!) or simply dropped.
    • periodically (or on removal/reevaluation), the dependency store is cleaned of all not anymore used dependencies and such dependencies are removed from the document. Or this only happens on save and the user would need to reload to clean up such libraries.

The downside is that each time such a message is send, the whole dependency chain is send as well :-( But this is happening right now anyway and such dependencies are included in the ipynb file each tome. This will probably encourage the use of external dependencies/URLs instead of files. Pandoc AFAIK can then include url content inline :-)

Contributor

jankatins commented Apr 2, 2016

The R kernel discussed this (or a subset of this ) problem recently: IRkernel/IRdisplay#14 Basically: how should JS/css libs for visualisations be handled

The knitr/rmarkdown system in R has this problem solved: the "knit_asis" object (kind of like the display_data message) gets styling in an extra attribute knit_meta which lists dependencies (each time such an object is produced) and rmarkdown (in this case in a similar role like the notebook frontend) manages what is outputted in the final document. Knitr/rmarkdown has is slightly easier, as it only makes one pass over the document and you can't reevaluate the notebook, so the solution here needs to handle reevaluating cells and removing styling when no cell references it (and this needs to be in the on-disc format as well...). Also, the "producer" of the knit_asis object has information about the final output format (html/latex) and so knitmeta only contains stuff for that format.

As such visualisations need to be available in html content which is derived from ipynb files (nbconvert, nbviewer, github,...), I don't think any "request from kernel" steps can be part of a solution for this problem, as the ipynb file needs to have all such dependencies available. A solutions should also handle that different output formats (html, latex,...) need different dependencies (message format and stored in the ipynb).

But this would also make the implementation for this easier:

  • The messaging format would need an update to handle assets, maybe by using metadata.<mimetype>.dependency.xxx = [] (with xxx = js|html|latex|... -> whatever the mimetype can handle)
  • The frontend would need to implement a deduplicator to only include this code only once per document and handle removals if reevaluation of cells is possible:
    • the ipynb/model would include each dependency only once (new "dependencies" section in the json -> <mimetype>.<hash>.(type, content))
    • each "cell" contains only a reference to the (hash of the) dependency
    • when new messages come in, all dependencies are moved to the dependency store and removed from the message, which is then "normally" handled. "moved to the dependency store" means, that each dependency is hashed and either newly included in the store and in the document (in a special section, not the output area!) or simply dropped.
    • periodically (or on removal/reevaluation), the dependency store is cleaned of all not anymore used dependencies and such dependencies are removed from the document. Or this only happens on save and the user would need to reload to clean up such libraries.

The downside is that each time such a message is send, the whole dependency chain is send as well :-( But this is happening right now anyway and such dependencies are included in the ipynb file each tome. This will probably encourage the use of external dependencies/URLs instead of files. Pandoc AFAIK can then include url content inline :-)

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 2, 2016

Member

There are three (or more!) parts to write a specification for. The backend/filesystem layout, how a frontend requests them (is it direct, is it a url path in the notebook server /kernelspec/ir/static/..., is it per running kernel), as well as how it ends up in the notebook document (per cell, metadata across the notebook).

The frontend would need to implement a deduplicator to only include this code only once per document and handle removals if reevaluation of cells is possible

Definitely. It seems like the approach you outlined for knitr is great for us all to think about in terms of the notebook @janschulz. I prefer a URL based approach to asset requiring up until I'm offline (which is fairly frequent). A big draw to our current format is that it works without having to be connected to the wider internet all the time.

Since I also care about Hydrogen, Thebe, and other frontends beyond the notebook, my primary interest is in getting the backend specification done across kernels. One approach is for kernel spec directories to contain static assets:

├── kernels
│   ├── ir
│   │   ├── kernel.json
│   │   ├── logo-64x64.png
│   │   └── static

What belongs in static I'm unsure of. Let's say we operated with npm packages underneath:

├── kernels
│   ├── ir
│   │   ├── kernel.json
│   │   ├── logo-64x64.png
│   │   └── static
│   │       ├── node_modules
│   │       │   └── d3
│   │       └── package.json

While this would work well for node based frontends (hydrogen, nteract, sidecar), it would not work well on the main notebook or any other remote environment (thebe, dashboards, etc.) without also specifying how we do bundling (webpack, browserify, etc.).

Member

rgbkrk commented Apr 2, 2016

There are three (or more!) parts to write a specification for. The backend/filesystem layout, how a frontend requests them (is it direct, is it a url path in the notebook server /kernelspec/ir/static/..., is it per running kernel), as well as how it ends up in the notebook document (per cell, metadata across the notebook).

The frontend would need to implement a deduplicator to only include this code only once per document and handle removals if reevaluation of cells is possible

Definitely. It seems like the approach you outlined for knitr is great for us all to think about in terms of the notebook @janschulz. I prefer a URL based approach to asset requiring up until I'm offline (which is fairly frequent). A big draw to our current format is that it works without having to be connected to the wider internet all the time.

Since I also care about Hydrogen, Thebe, and other frontends beyond the notebook, my primary interest is in getting the backend specification done across kernels. One approach is for kernel spec directories to contain static assets:

├── kernels
│   ├── ir
│   │   ├── kernel.json
│   │   ├── logo-64x64.png
│   │   └── static

What belongs in static I'm unsure of. Let's say we operated with npm packages underneath:

├── kernels
│   ├── ir
│   │   ├── kernel.json
│   │   ├── logo-64x64.png
│   │   └── static
│   │       ├── node_modules
│   │       │   └── d3
│   │       └── package.json

While this would work well for node based frontends (hydrogen, nteract, sidecar), it would not work well on the main notebook or any other remote environment (thebe, dashboards, etc.) without also specifying how we do bundling (webpack, browserify, etc.).

@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Apr 2, 2016

Contributor

What is actually the problem here?

  • "Too big ipynb files" or "too much in RAM" because every plot includes jquery/... again -> solved by properly labeling dependencies in the over-the-wire messages and deduplicating them in the frontend
  • "too much send over the wire" -> not solved by deduplicating in the frontend and must get a solution in the kernel or in a caching webserver if it acts as a proxy between kernel and frontend

If the latter is a problem (and one assumes that the kernelserver/webserver and the kernel are on the same host and the frontend communicates with the kernel via the webserver), then the above (labeled dependencies in the message) plus a proxy which does caching and replaces dependencies with hashes which are then loaded from the webserver could work:

-> kernel sends message with css/html labeled as dependency
-> proxy/"webserver" replaces dependency with hashes
-> frontend finds hashes -> requests content from webserver
-> webserver sends content for hash or error message
-> frontend includes hash and dependency in json and sends the complete stuff to be saved (or the frontend sends only hashes and the webserver replaces them)

Going through https://jupyter.hackpad.com/Packaging-crate-PbIgxnC71or#:h=Specific-cases, the above can be used to solve the ipywidget case (widgets would add their js/css dependecies on cell execution and these would be included when the notebook is reloaded) but not the other three (e.g is has nothing to with packaging extensions for the frontend or backend).

Contributor

jankatins commented Apr 2, 2016

What is actually the problem here?

  • "Too big ipynb files" or "too much in RAM" because every plot includes jquery/... again -> solved by properly labeling dependencies in the over-the-wire messages and deduplicating them in the frontend
  • "too much send over the wire" -> not solved by deduplicating in the frontend and must get a solution in the kernel or in a caching webserver if it acts as a proxy between kernel and frontend

If the latter is a problem (and one assumes that the kernelserver/webserver and the kernel are on the same host and the frontend communicates with the kernel via the webserver), then the above (labeled dependencies in the message) plus a proxy which does caching and replaces dependencies with hashes which are then loaded from the webserver could work:

-> kernel sends message with css/html labeled as dependency
-> proxy/"webserver" replaces dependency with hashes
-> frontend finds hashes -> requests content from webserver
-> webserver sends content for hash or error message
-> frontend includes hash and dependency in json and sends the complete stuff to be saved (or the frontend sends only hashes and the webserver replaces them)

Going through https://jupyter.hackpad.com/Packaging-crate-PbIgxnC71or#:h=Specific-cases, the above can be used to solve the ipywidget case (widgets would add their js/css dependecies on cell execution and these would be included when the notebook is reloaded) but not the other three (e.g is has nothing to with packaging extensions for the frontend or backend).

@ellisonbg

This comment has been minimized.

Show comment
Hide comment
@ellisonbg

ellisonbg Apr 4, 2016

Contributor

To summarize the 3 areas we need to solve that @rgbkrk listed:

  1. The backend/filesystem layout of static assets.
  2. How a frontend requests them (is it direct, is it a url path in the notebook server /kernelspec/ir/static/..., is it per running kernel).
  3. How it ends up in the notebook document (per cell, metadata across the notebook).

On 1) my initial though is that because different deployment scenarios and frontend architetures will be so different, that we don't specify the filesytem layout. If we get into that, I can't imagine things get really difficult to reason about all of the different choices: inside/outside Docker, using conda or not, electron or server, where is the kernel running. By this, I mean that a given frontend should be able the use the information from parts 2/3 and translate that into whatever filesytem layout is needed. The other issue is how to deal with a node_modules that is effectively spread out all over the place between the main server and kernels. Do you end up with multiple deployment bundles? How do you deduplicate packages across them?

On 2) is it not sufficient to specify all the things using npm package names and versions? If not, what is missing? I am concerned about making decisions at this level that assume particular bundling tools or path conventions.

One 3) I do think it is pretty important that it is easy for track down all of the static assets for a single notebook, so those assets can be bundled in different contexts such as nbonvert/static, etc. That would seem to point to notebook level metadata, but it probably also has to be in the cells that use those assets? Maybe both? Not sure.

Contributor

ellisonbg commented Apr 4, 2016

To summarize the 3 areas we need to solve that @rgbkrk listed:

  1. The backend/filesystem layout of static assets.
  2. How a frontend requests them (is it direct, is it a url path in the notebook server /kernelspec/ir/static/..., is it per running kernel).
  3. How it ends up in the notebook document (per cell, metadata across the notebook).

On 1) my initial though is that because different deployment scenarios and frontend architetures will be so different, that we don't specify the filesytem layout. If we get into that, I can't imagine things get really difficult to reason about all of the different choices: inside/outside Docker, using conda or not, electron or server, where is the kernel running. By this, I mean that a given frontend should be able the use the information from parts 2/3 and translate that into whatever filesytem layout is needed. The other issue is how to deal with a node_modules that is effectively spread out all over the place between the main server and kernels. Do you end up with multiple deployment bundles? How do you deduplicate packages across them?

On 2) is it not sufficient to specify all the things using npm package names and versions? If not, what is missing? I am concerned about making decisions at this level that assume particular bundling tools or path conventions.

One 3) I do think it is pretty important that it is easy for track down all of the static assets for a single notebook, so those assets can be bundled in different contexts such as nbonvert/static, etc. That would seem to point to notebook level metadata, but it probably also has to be in the cells that use those assets? Maybe both? Not sure.

@parente

This comment has been minimized.

Show comment
Hide comment
@parente

parente Apr 4, 2016

Member

To summarize the 3 areas we need to solve that @rgbkrk listed:

I'd say there's a 4th: how to deal with the asynchronicity of loading frontend assets with respect to kernel code execution.

For example, how do you ensure the future version of jupyter-js-widgets is done loading on the page before some @interact decorator or the equivalent JS tries to instantiate a view when a user does a Run All?

EDIT: s/emails/tries/ ... thanks a lot autocorrect!

Member

parente commented Apr 4, 2016

To summarize the 3 areas we need to solve that @rgbkrk listed:

I'd say there's a 4th: how to deal with the asynchronicity of loading frontend assets with respect to kernel code execution.

For example, how do you ensure the future version of jupyter-js-widgets is done loading on the page before some @interact decorator or the equivalent JS tries to instantiate a view when a user does a Run All?

EDIT: s/emails/tries/ ... thanks a lot autocorrect!

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 4, 2016

Member

we don't specify the filesytem layout

We need to specify (or even just explore) the filesystem layout so that:

  • the server in (2) that is publishing assets actually knows how to load/serve them
  • package authors and kernel authors need a place to install them

If the answer continues to be nbextensions, we still run into the problem across multiple kernels.

At least for kernel gateway and the notebook, whether they exist in Docker or not, it's the same local directory structure for that kernel.

Member

rgbkrk commented Apr 4, 2016

we don't specify the filesytem layout

We need to specify (or even just explore) the filesystem layout so that:

  • the server in (2) that is publishing assets actually knows how to load/serve them
  • package authors and kernel authors need a place to install them

If the answer continues to be nbextensions, we still run into the problem across multiple kernels.

At least for kernel gateway and the notebook, whether they exist in Docker or not, it's the same local directory structure for that kernel.

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 4, 2016

Member

Ok, now I recall why there was hesitance to having a filesystem layout (as outlined in this issue at the top 😉). We would make the actual kernel serve the assets.

Member

rgbkrk commented Apr 4, 2016

Ok, now I recall why there was hesitance to having a filesystem layout (as outlined in this issue at the top 😉). We would make the actual kernel serve the assets.

@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Apr 5, 2016

Contributor

package authors and kernel authors need a place to install them

What "packages" are you talking here (and so I know if this affects the R kernel) : R/Python packages which the user uses in the code cells of the notebook and which implement functions which need to send js/css dependencies? Or things like a nbextension which wants to install something so that the notebook has a new function? My above comments were only for the first case (package which are executed in the code cells) and if this is only about the second case, then I should open a new issue here :-)

Contributor

jankatins commented Apr 5, 2016

package authors and kernel authors need a place to install them

What "packages" are you talking here (and so I know if this affects the R kernel) : R/Python packages which the user uses in the code cells of the notebook and which implement functions which need to send js/css dependencies? Or things like a nbextension which wants to install something so that the notebook has a new function? My above comments were only for the first case (package which are executed in the code cells) and if this is only about the second case, then I should open a new issue here :-)

@lbustelo

This comment has been minimized.

Show comment
Hide comment
@lbustelo

lbustelo Apr 5, 2016

I want to echo the issue that @parente is adding to this PR and maybe the hardest to fix; the loading of dependency libraries and the right timing to render/execute cell output.

As we've been working in declarativewidgets on a way to change how the user initializes the extension on a particular notebook, we've been struggling a lot with the 'chicken or the egg' problem. We've tried many things, and along the way, it was surprising to find out that on a page refresh, cell output is rendered before extensions are fully loaded. I guess the limitation is understandable after you think about the implications, but at least form me, that was an expectation.

Anyway, I think that to fully understand this issue, we need to think about different scenarios on the client side and the timing of execution as it relates to kernel code and client side extension/library. Here are some of the ones that I can think of.

  1. User creates a new notebook and executes a cell that requires some client side code. (when is that cell really done)
  2. User visits an existing notebook that is cleared of output but performs a Run all (inter-cell dependencies)
  3. User saves a notebook with output and refreshes the browser. (rendering of cell output in relation to dependencies being loaded)
  4. User restarts the kernel and re-run cells (client side code is already loaded, should it be reinit)

For all the above we need to answer:

  • when can cells be executed?
  • when can cell output be rendered?
  • when is the cell output done so that the next cell can execute

lbustelo commented Apr 5, 2016

I want to echo the issue that @parente is adding to this PR and maybe the hardest to fix; the loading of dependency libraries and the right timing to render/execute cell output.

As we've been working in declarativewidgets on a way to change how the user initializes the extension on a particular notebook, we've been struggling a lot with the 'chicken or the egg' problem. We've tried many things, and along the way, it was surprising to find out that on a page refresh, cell output is rendered before extensions are fully loaded. I guess the limitation is understandable after you think about the implications, but at least form me, that was an expectation.

Anyway, I think that to fully understand this issue, we need to think about different scenarios on the client side and the timing of execution as it relates to kernel code and client side extension/library. Here are some of the ones that I can think of.

  1. User creates a new notebook and executes a cell that requires some client side code. (when is that cell really done)
  2. User visits an existing notebook that is cleared of output but performs a Run all (inter-cell dependencies)
  3. User saves a notebook with output and refreshes the browser. (rendering of cell output in relation to dependencies being loaded)
  4. User restarts the kernel and re-run cells (client side code is already loaded, should it be reinit)

For all the above we need to answer:

  • when can cells be executed?
  • when can cell output be rendered?
  • when is the cell output done so that the next cell can execute
@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 5, 2016

Member

What "packages" are you talking here (and so I know if this affects the R kernel) : R/Python packages which the user uses in the code cells of the notebook and which implement functions which need to send js/css dependencies?

I'm talking about any kernel, this definitely affects the R kernel. If for some reason the R kernel wants to use the frontend bits of ipywidgets yet is dependent on an older version than is installed (or newer) than the Python side installed into nbextensions, it would have problems. I'd like a way for frontend dependencies to be isolated to the environment they're running with (and to provide a means for fetching them).

Member

rgbkrk commented Apr 5, 2016

What "packages" are you talking here (and so I know if this affects the R kernel) : R/Python packages which the user uses in the code cells of the notebook and which implement functions which need to send js/css dependencies?

I'm talking about any kernel, this definitely affects the R kernel. If for some reason the R kernel wants to use the frontend bits of ipywidgets yet is dependent on an older version than is installed (or newer) than the Python side installed into nbextensions, it would have problems. I'd like a way for frontend dependencies to be isolated to the environment they're running with (and to provide a means for fetching them).

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 5, 2016

Member

I want to echo the issue that @parente is adding to this PR and maybe the hardest to fix; the loading of dependency libraries and the right timing to render/execute cell output.

That is likely the hardest to fix as it very much dictates how a frontend gets built.

Member

rgbkrk commented Apr 5, 2016

I want to echo the issue that @parente is adding to this PR and maybe the hardest to fix; the loading of dependency libraries and the right timing to render/execute cell output.

That is likely the hardest to fix as it very much dictates how a frontend gets built.

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 5, 2016

Member

All the scenarios you outlined @lbustelo I've run into in the current notebook in some way, or have assumed I would run into an issue. While developing a custom widget, this forced me to clear all output, restart the kernel, and hard refresh the page.

Member

rgbkrk commented Apr 5, 2016

All the scenarios you outlined @lbustelo I've run into in the current notebook in some way, or have assumed I would run into an issue. While developing a custom widget, this forced me to clear all output, restart the kernel, and hard refresh the page.

@JamiesHQ

This comment has been minimized.

Show comment
Hide comment
@JamiesHQ

JamiesHQ Apr 24, 2017

Member

All: can this issue be closed? If not, what next steps are required? thanks!

Member

JamiesHQ commented Apr 24, 2017

All: can this issue be closed? If not, what next steps are required? thanks!

@ellisonbg ellisonbg added this to the No Action milestone Apr 24, 2017

@ellisonbg

This comment has been minimized.

Show comment
Hide comment
@ellisonbg

ellisonbg Apr 24, 2017

Contributor

I think we have basically solved this in JupyterLab and we don't have plans on back porting to the classic notebook as it would require a massive amount of work. Closing.

Contributor

ellisonbg commented Apr 24, 2017

I think we have basically solved this in JupyterLab and we don't have plans on back porting to the classic notebook as it would require a massive amount of work. Closing.

@ellisonbg ellisonbg closed this Apr 24, 2017

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 25, 2017

Member

Certainly going to agree there, we ran into too much difficulty with needing to continue support of requirejs. It's still a core problem that people want to be able to declare an asset once for the life of a document -- we're not solving this in Jupyter notebooks (spec wise at least).

Member

rgbkrk commented Apr 25, 2017

Certainly going to agree there, we ran into too much difficulty with needing to continue support of requirejs. It's still a core problem that people want to be able to declare an asset once for the life of a document -- we're not solving this in Jupyter notebooks (spec wise at least).

@JamiesHQ JamiesHQ modified the milestones: JupyterLab, No Action Apr 25, 2017

@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Apr 25, 2017

Contributor

As this is also a topic which is relevant to other clients/kernels implementing the stuff: can someone give a pointer how to issue should now be handled? AT a first glance, I couldn't find anything about this in http://jupyter-client.readthedocs.io/en/latest/messaging.html

E.g. how should a javascript library (e.g. for a plot) be sent from an R kernel so that it is cached in the frontend and doesn't need to be resent (or at least not be saved multiple times)?

Contributor

jankatins commented Apr 25, 2017

As this is also a topic which is relevant to other clients/kernels implementing the stuff: can someone give a pointer how to issue should now be handled? AT a first glance, I couldn't find anything about this in http://jupyter-client.readthedocs.io/en/latest/messaging.html

E.g. how should a javascript library (e.g. for a plot) be sent from an R kernel so that it is cached in the frontend and doesn't need to be resent (or at least not be saved multiple times)?

@ellisonbg

This comment has been minimized.

Show comment
Hide comment
@ellisonbg

ellisonbg Apr 25, 2017

Contributor
Contributor

ellisonbg commented Apr 25, 2017

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 25, 2017

Member

To add on, my position on javascript (and html) is that outputs should be sandboxed in an iframe. Within that iframe though, we should be able to load assets.

Member

rgbkrk commented Apr 25, 2017

To add on, my position on javascript (and html) is that outputs should be sandboxed in an iframe. Within that iframe though, we should be able to load assets.

@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Apr 25, 2017

Contributor

@ellisonbg Are there any examples, where a (python/R) package implemented such a thing to display something?

Also, is there an implementation of a "consumer" of such new mime types, e.g. how would nbconvert handle such messages (when converting to docx via pandoc) and how would a package contribute a "handler" to such a consumer?

Building a npm extension to display a R based plot sounds like a lot of work (judging by my knowledge of npm and such stuff, it's probably for most R/python devs something new to learn). On the R side, knitr is king and they have a very easy model with a way to display certain js/html only once. It would be unfortunate if we can't match the ease to display something. So I would be very interested to see such examples. :-)

Contributor

jankatins commented Apr 25, 2017

@ellisonbg Are there any examples, where a (python/R) package implemented such a thing to display something?

Also, is there an implementation of a "consumer" of such new mime types, e.g. how would nbconvert handle such messages (when converting to docx via pandoc) and how would a package contribute a "handler" to such a consumer?

Building a npm extension to display a R based plot sounds like a lot of work (judging by my knowledge of npm and such stuff, it's probably for most R/python devs something new to learn). On the R side, knitr is king and they have a very easy model with a way to display certain js/html only once. It would be unfortunate if we can't match the ease to display something. So I would be very interested to see such examples. :-)

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 25, 2017

Member

I'm interested to hear how knitr does it, since they receive such high praise from a lot of folks I work with.

Member

rgbkrk commented Apr 25, 2017

I'm interested to hear how knitr does it, since they receive such high praise from a lot of folks I work with.

@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Apr 25, 2017

Contributor

See here https://cran.r-project.org/web/packages/knitr/vignettes/knit_print.html -> the "Metadata" section.

The biggest difference between jupyter an knitr is that knitr is optimized for converting an object to single output format (mostly markdown+html/js) and jupyter to display something in as much ways as possible. In contrast to jupyter, knitr knows all displayed objects as the complete document is converted and not like in the notebook, only a single cell. Knitr and the objects which get converted also know the final output format.

To display something you would add a single-dispatch implementation of the knit_print() method for you data structure (similar to IPythons display system for mpl Figure). In it's low level implementation is returns a structure ('asis_output`) which contains the representation of the object in the current output format and you can add a metadata object which contains stuff like javascript libs and css.

From the above sections:

library(knitr)
knit_print.foo = function(x, ...) {
  res = paste('**This is a `foo` object**:', x)
  asis_output(res, meta = list(
    js  = system.file('www', 'shared', 'shiny.js',  package = 'shiny'),
    css = system.file('www', 'shared', 'shiny.css', package = 'shiny')
  ))
}

Knitr will then render the whole document, insert the object representation in the document and collect the meta objects. The meta objects will be made unique and then inserted in the head of the document.

When we implemented repr (the equivalent of the ipython display system in the IRkernel), one big "problem" was that we couldn't reuse the knitr_print implementations (which almost every object in the R world has). The problem was that you can't be sure what kind of "mimetype" (js, png, etc) the knit_print call would return, as a) sometimes it's not markdown + js/html and b) sometimes an object can change the output based on the current final output format (it's available in the options argument to knit_print(obj, options, inline)). On the other hand, you can't use the repr_* methods to implement a knit_print method, as we didn't add a way to add meta objects as the jupyter messages didn't have a way to handle such "display only once" stuff.

R also has a very nice way to create and display html widgets, which are then handled nicely by knitr (knitr even seems to do screenshotting the html structure to embedded in other than html formats).

Contributor

jankatins commented Apr 25, 2017

See here https://cran.r-project.org/web/packages/knitr/vignettes/knit_print.html -> the "Metadata" section.

The biggest difference between jupyter an knitr is that knitr is optimized for converting an object to single output format (mostly markdown+html/js) and jupyter to display something in as much ways as possible. In contrast to jupyter, knitr knows all displayed objects as the complete document is converted and not like in the notebook, only a single cell. Knitr and the objects which get converted also know the final output format.

To display something you would add a single-dispatch implementation of the knit_print() method for you data structure (similar to IPythons display system for mpl Figure). In it's low level implementation is returns a structure ('asis_output`) which contains the representation of the object in the current output format and you can add a metadata object which contains stuff like javascript libs and css.

From the above sections:

library(knitr)
knit_print.foo = function(x, ...) {
  res = paste('**This is a `foo` object**:', x)
  asis_output(res, meta = list(
    js  = system.file('www', 'shared', 'shiny.js',  package = 'shiny'),
    css = system.file('www', 'shared', 'shiny.css', package = 'shiny')
  ))
}

Knitr will then render the whole document, insert the object representation in the document and collect the meta objects. The meta objects will be made unique and then inserted in the head of the document.

When we implemented repr (the equivalent of the ipython display system in the IRkernel), one big "problem" was that we couldn't reuse the knitr_print implementations (which almost every object in the R world has). The problem was that you can't be sure what kind of "mimetype" (js, png, etc) the knit_print call would return, as a) sometimes it's not markdown + js/html and b) sometimes an object can change the output based on the current final output format (it's available in the options argument to knit_print(obj, options, inline)). On the other hand, you can't use the repr_* methods to implement a knit_print method, as we didn't add a way to add meta objects as the jupyter messages didn't have a way to handle such "display only once" stuff.

R also has a very nice way to create and display html widgets, which are then handled nicely by knitr (knitr even seems to do screenshotting the html structure to embedded in other than html formats).

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver Apr 26, 2017

Member

mobilechelonian demonstrates one way to get JS to the notebook interface without re-sending it every time: it copies JS to the nbextensions directory, and then sends code to load it from there. The limitation, of course, is that the JS does not become part of the notebook, so it's harder to share the notebook with all its output.

Member

takluyver commented Apr 26, 2017

mobilechelonian demonstrates one way to get JS to the notebook interface without re-sending it every time: it copies JS to the nbextensions directory, and then sends code to load it from there. The limitation, of course, is that the JS does not become part of the notebook, so it's harder to share the notebook with all its output.

@jankatins

This comment has been minimized.

Show comment
Hide comment
@jankatins

jankatins Apr 30, 2017

Contributor

it copies JS to the nbextensions directory,

So this is as best a "workaround" for python packages, but not kernels of other languages. It would also not work on nbviewer. Is that right? How would nbviewer actually handle plots from plotting libs which send their plot as a new mimetype?

Contributor

jankatins commented Apr 30, 2017

it copies JS to the nbextensions directory,

So this is as best a "workaround" for python packages, but not kernels of other languages. It would also not work on nbviewer. Is that right? How would nbviewer actually handle plots from plotting libs which send their plot as a new mimetype?

@rgbkrk

This comment has been minimized.

Show comment
Hide comment
@rgbkrk

rgbkrk Apr 30, 2017

Member

As for the question about support for new mimetypes on nbviewer, support has to be added. Plotly, Vega, geojson, and the new tables are the prime ones to bring in.

Member

rgbkrk commented Apr 30, 2017

As for the question about support for new mimetypes on nbviewer, support has to be added. Plotly, Vega, geojson, and the new tables are the prime ones to bring in.

@ellisonbg

This comment has been minimized.

Show comment
Hide comment
@ellisonbg

ellisonbg Apr 30, 2017

Contributor
Contributor

ellisonbg commented Apr 30, 2017

@takluyver

This comment has been minimized.

Show comment
Hide comment
@takluyver

takluyver May 2, 2017

Member

So this is as best a "workaround" for python packages, but not kernels of other languages.

I don't think this is specific to Python kernels. It's convenient to reuse the existing Python function to install the nbextension, but all it's really doing is copying some files, and it wouldn't be hard to implement in another language.

It does assume that the kernel is accessing the same filesystem as the server, which doesn't have to be true, but in practice it usually is.

It would also not work on nbviewer. Is that right?

That is right.

Member

takluyver commented May 2, 2017

So this is as best a "workaround" for python packages, but not kernels of other languages.

I don't think this is specific to Python kernels. It's convenient to reuse the existing Python function to install the nbextension, but all it's really doing is copying some files, and it wouldn't be hard to implement in another language.

It does assume that the kernel is accessing the same filesystem as the server, which doesn't have to be true, but in practice it usually is.

It would also not work on nbviewer. Is that right?

That is right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment