Proof point: Show Jupyter Notebook launching kernels remotely #16

parente · 2015-10-15T12:19:52Z

To prove out the concept, try the following in a personal fork somewhere and let's see how it goes. The changes are something like so:

Notebook frontend JavaScript communicates with its backend Jupyter Notebook server to CRUD notebooks on disk and manage notebook sessions
Notebook frontend JS gets the remote kernel gateway URL via a config option that passes through to it somehow. (There must be a related code path already because the frontend is aware of the things like the base_url.)
Notebook frrontend JavaScript communicates with a remote kernel gateway server to CRUD kernels and communicate with them via Websockets

It sounds straightforward, but there are unknowns around how tightly coupled sessions are to kernels in the local notebook backend. Can the two be divorced so that the sessions are maintained locally in the notebook server and the kernels remotely in the kernel gateway?

/cc @jtyberg

jtyberg · 2015-10-15T13:14:31Z

I'll take a look at this.

jtyberg · 2015-10-15T20:01:27Z

It looks like there would be backend changes necessary to make this work. The sessionmanager.py starts a kernel on the local notebook server using the configured kernel manager. We would have to write our own kernel manager that talks REST to a kernel_gateway to manage kernels remotely.

In addition, we would need to make sure that the web socket URL option is set on the frontend, so kernel.js will use that for the web socket connection instead of location.host. We may be able to do this in notebook.js, a la

this.ws_url = options.ws_url || this.config.data.ws_url;

jtyberg · 2015-10-16T19:49:23Z

It turns out that the websocket URL can be set as an option to the Notebook app, so no changes to the frontend JS are needed. I just ran my notebook server using

jupyter notebook \
--NotebookApp.websocket_url=ws://<my_kernel_gateway> \
--NotebookApp.kernel_manager_class=notebook.services.kernels.kernelmanager.RemoteKernelManager

I'm also using a RemoteKernelManager class to make the REST calls to the kernel gateway to create a kernel, etc. I got stuck because the kernel gateway (actually, WebSocket mixin) is checking CORS headers on the websocket connection:

[KernelGatewayApp] WARNING | Blocking Cross Origin WebSocket Attempt.  Origin: 

http://192.168.99.100:8889, Host: <my_kernel_gateway>
WARNING:tornado.access:403 GET /api/kernels/75680ed5-3a58-41a5-97fc-e0b7829b6dd3/channels?session_id=89550D40320C430FBDBEF2D75DD6C826 (10.122.193.145) 2.74ms

parente · 2015-10-16T20:32:38Z

Turns out the Websocket check is supported by Tornado as an optional way to prevent XSS, even though it's not part of the Websocket standard:

http://tornado.readthedocs.org/en/latest/websocket.html?highlight=check_origin#tornado.websocket.WebSocketHandler.check_origin

The implementation in the notebook handlers doesn't provide security for non-notebook clients which can set Origin to anything. For cross-domain security, we need more than Origin checks anyway and is one reason for the auth token support that went in. That said, we're still going to need to think security through in the long run for a notebook server requesting a remote kernel. Putting the kernel gateway key in the frontend for the client JS to pass to the remote server is a bad, bad idea. (We may wind up proxying Websockets from the notebook server to the kernel gateway after all for this very reason.)

parente · 2015-10-16T21:27:36Z

Another related problem: there's no way in the browser Websocket API to pass through additional headers. So a direct-from-browser websocket connection is not going to be able to take advantage of token auth via headers down the line. It'll need to switch to passing a token through the URL as a query param or some such. But, as noted above, we don't want a single shared token appearing in the HTML / JS sent down to a browser. It's gotta be the equivalent of a one-time CSRF-token.

I'm hesitant to hack in changes for this quick proof of concept. I can push a branch that turns off the origin check on Websocket for now. After this exploration, and we have something working, we need to sit and think through how to to do this securely. Options on the table that I see:

Break the tight session+kernel linkage in sessionmanager.py noted above. Let sessions be created against the local server, and all kernel REST + Websocket calls be made cross domain with proper tokens to a kernel gateway.
Continue to walk the hybrid tightrope we're on, but with a scheme for generating short lived tokens on the kernel gateway to be included in the URL upon Websocket connection to the kernel channels endpoint. (If the kernel listing API is inaccessible, does the kernel UUID already serve this purpose?)
Proxy everything through the backend server, REST calls and Websocket connections. Avoids all the CORS mess, but means new support is needed for proxying websockets, something Min wanted to avoid in the original proposal for this repo.

parente · 2015-10-16T21:28:04Z

/cc @rgbkrk @minrk for thoughts on what we're finding so far. Maybe we're missing some shortcuts.

jtyberg · 2015-10-20T15:18:09Z

Working from @parente's branch to disable WS CORS on the kernel_gateway, I have remote kernels pretty much working. I can launch new kernels when notebooks are opened or created, and delete them on notebook close or nbserver shutdown.

The biggest issue is restarting kernels. The SessionManager creates a kernel when a session is created, and stores the kernel_id as part of the session (kernel_id is part of the session SQL schema), which means kernel_id gets out of sync unless the kernel manager updates the session with a new kernel_id on a kernel restart (which DELETEs the old kernel_id, then POSTs to get a new kernel_id).

rgbkrk · 2015-10-20T15:22:56Z

Sorry I had not followed up on this one yet. How are we doing auth on websockets now?

parente · 2015-10-20T15:31:05Z

Token auth in headers, which doesn't square with the capabilities of the browser WebSocket APIs (but works fine in all other clients).

We need to sit and think through the options in #16 (comment) or other options for how to design this properly for the long haul. At the moment, @jtyberg is just working on a quick hack to flush out these issues.

jtyberg · 2015-10-20T18:19:40Z

With regard to kernel restarts, interrupts, etc. from the notebook browser UI, kernel.js is hitting the kernel REST endpoints directly, and assumes the endpoints are on the same server as the notebook server. So it looks like frontend JS changes will also be necessary to support kernels that are remote from the notebook server.

Also, as expected, we have to set Access-Control-Allow-Origin on the kernel_gateway to enable this to work.

parente · 2015-10-20T18:24:08Z

In prep for a convo about this with others, maybe a small wiki page attached to this project with the three options for how to proceed with a real impl with their pros/cons is in order.

jtyberg · 2015-10-21T02:22:49Z

Well, I got the remote kernel restarts working, but I had to muck with the JS a bit.

There doesn't appear to be a NotebookApp option similar to websockets_url for the notebook REST API. session.js just assumes the host for the kernel REST endpoints is the same as the notebook server. It feels like there ought to be a base kernel url option from which we derive both the websocket and REST URIs, buts that probably the least of our concerns.

I created a branch here:

https://github.com/jtyberg/notebook/tree/remote_kernels

I'll take a stab at a wiki page to summarize decision points.

rgbkrk · 2015-10-21T09:40:23Z

There doesn't appear to be a NotebookApp option similar to websockets_url for the notebook REST API. session.js just assumes the host for the kernel REST endpoints is the same as the notebook server. It feels like there ought to be a base kernel url option from which we derive both the websocket and REST URIs, buts that probably the least of our concerns.

This seems like something that @zischwartz had to patch in Thebe.

parente · 2015-10-26T20:31:39Z

Aforementioned wiki page: https://github.com/jupyter-incubator/kernel_gateway/wiki/notebook_kernel_gateway

I think the experiment here fleshed out pain points and options for remote kernels in the notebook if we want to go there one day. We're not the only ones, btw, and certainly not the only approach (e.g., https://github.com/danielballan/remotekernel). I don't think there's anything else to do here for now.

parente · 2016-06-17T15:33:40Z

@jtyberg has implemented a demo flavor of option 3 from the wiki page from way-back-when over in jupyter/kernel_gateway_demos#21

Going to close this issue out. We can continue the convo over in the PR.

parente assigned jtyberg Oct 15, 2015

parente added the example label Dec 15, 2015

parente closed this as completed Jun 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof point: Show Jupyter Notebook launching kernels remotely #16

Proof point: Show Jupyter Notebook launching kernels remotely #16

parente commented Oct 15, 2015

jtyberg commented Oct 15, 2015

jtyberg commented Oct 15, 2015

jtyberg commented Oct 16, 2015

parente commented Oct 16, 2015

parente commented Oct 16, 2015

parente commented Oct 16, 2015

jtyberg commented Oct 20, 2015

rgbkrk commented Oct 20, 2015

parente commented Oct 20, 2015

jtyberg commented Oct 20, 2015

parente commented Oct 20, 2015

jtyberg commented Oct 21, 2015

rgbkrk commented Oct 21, 2015

parente commented Oct 26, 2015

parente commented Jun 17, 2016

Proof point: Show Jupyter Notebook launching kernels remotely #16

Proof point: Show Jupyter Notebook launching kernels remotely #16

Comments

parente commented Oct 15, 2015

jtyberg commented Oct 15, 2015

jtyberg commented Oct 15, 2015

jtyberg commented Oct 16, 2015

parente commented Oct 16, 2015

parente commented Oct 16, 2015

parente commented Oct 16, 2015

jtyberg commented Oct 20, 2015

rgbkrk commented Oct 20, 2015

parente commented Oct 20, 2015

jtyberg commented Oct 20, 2015

parente commented Oct 20, 2015

jtyberg commented Oct 21, 2015

rgbkrk commented Oct 21, 2015

parente commented Oct 26, 2015

parente commented Jun 17, 2016