Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof point: Show Jupyter Notebook launching kernels remotely #16

Closed
parente opened this issue Oct 15, 2015 · 15 comments
Closed

Proof point: Show Jupyter Notebook launching kernels remotely #16

parente opened this issue Oct 15, 2015 · 15 comments
Assignees
Labels

Comments

@parente
Copy link
Contributor

parente commented Oct 15, 2015

To prove out the concept, try the following in a personal fork somewhere and let's see how it goes. The changes are something like so:

  • Notebook frontend JavaScript communicates with its backend Jupyter Notebook server to CRUD notebooks on disk and manage notebook sessions
  • Notebook frontend JS gets the remote kernel gateway URL via a config option that passes through to it somehow. (There must be a related code path already because the frontend is aware of the things like the base_url.)
  • Notebook frrontend JavaScript communicates with a remote kernel gateway server to CRUD kernels and communicate with them via Websockets

It sounds straightforward, but there are unknowns around how tightly coupled sessions are to kernels in the local notebook backend. Can the two be divorced so that the sessions are maintained locally in the notebook server and the kernels remotely in the kernel gateway?

/cc @jtyberg

@jtyberg
Copy link
Collaborator

jtyberg commented Oct 15, 2015

I'll take a look at this.

@jtyberg
Copy link
Collaborator

jtyberg commented Oct 15, 2015

It looks like there would be backend changes necessary to make this work. The sessionmanager.py starts a kernel on the local notebook server using the configured kernel manager. We would have to write our own kernel manager that talks REST to a kernel_gateway to manage kernels remotely.

In addition, we would need to make sure that the web socket URL option is set on the frontend, so kernel.js will use that for the web socket connection instead of location.host. We may be able to do this in notebook.js, a la

this.ws_url = options.ws_url || this.config.data.ws_url;

@jtyberg
Copy link
Collaborator

jtyberg commented Oct 16, 2015

It turns out that the websocket URL can be set as an option to the Notebook app, so no changes to the frontend JS are needed. I just ran my notebook server using

jupyter notebook \
--NotebookApp.websocket_url=ws://<my_kernel_gateway> \
--NotebookApp.kernel_manager_class=notebook.services.kernels.kernelmanager.RemoteKernelManager

I'm also using a RemoteKernelManager class to make the REST calls to the kernel gateway to create a kernel, etc. I got stuck because the kernel gateway (actually, WebSocket mixin) is checking CORS headers on the websocket connection:

[KernelGatewayApp] WARNING | Blocking Cross Origin WebSocket Attempt.  Origin: 

http://192.168.99.100:8889, Host: <my_kernel_gateway>
WARNING:tornado.access:403 GET /api/kernels/75680ed5-3a58-41a5-97fc-e0b7829b6dd3/channels?session_id=89550D40320C430FBDBEF2D75DD6C826 (10.122.193.145) 2.74ms

@parente
Copy link
Contributor Author

parente commented Oct 16, 2015

Turns out the Websocket check is supported by Tornado as an optional way to prevent XSS, even though it's not part of the Websocket standard:

http://tornado.readthedocs.org/en/latest/websocket.html?highlight=check_origin#tornado.websocket.WebSocketHandler.check_origin

The implementation in the notebook handlers doesn't provide security for non-notebook clients which can set Origin to anything. For cross-domain security, we need more than Origin checks anyway and is one reason for the auth token support that went in. That said, we're still going to need to think security through in the long run for a notebook server requesting a remote kernel. Putting the kernel gateway key in the frontend for the client JS to pass to the remote server is a bad, bad idea. (We may wind up proxying Websockets from the notebook server to the kernel gateway after all for this very reason.)

@parente
Copy link
Contributor Author

parente commented Oct 16, 2015

Another related problem: there's no way in the browser Websocket API to pass through additional headers. So a direct-from-browser websocket connection is not going to be able to take advantage of token auth via headers down the line. It'll need to switch to passing a token through the URL as a query param or some such. But, as noted above, we don't want a single shared token appearing in the HTML / JS sent down to a browser. It's gotta be the equivalent of a one-time CSRF-token.

I'm hesitant to hack in changes for this quick proof of concept. I can push a branch that turns off the origin check on Websocket for now. After this exploration, and we have something working, we need to sit and think through how to to do this securely. Options on the table that I see:

  1. Break the tight session+kernel linkage in sessionmanager.py noted above. Let sessions be created against the local server, and all kernel REST + Websocket calls be made cross domain with proper tokens to a kernel gateway.
  2. Continue to walk the hybrid tightrope we're on, but with a scheme for generating short lived tokens on the kernel gateway to be included in the URL upon Websocket connection to the kernel channels endpoint. (If the kernel listing API is inaccessible, does the kernel UUID already serve this purpose?)
  3. Proxy everything through the backend server, REST calls and Websocket connections. Avoids all the CORS mess, but means new support is needed for proxying websockets, something Min wanted to avoid in the original proposal for this repo.

@parente
Copy link
Contributor Author

parente commented Oct 16, 2015

/cc @rgbkrk @minrk for thoughts on what we're finding so far. Maybe we're missing some shortcuts.

@jtyberg
Copy link
Collaborator

jtyberg commented Oct 20, 2015

Working from @parente's branch to disable WS CORS on the kernel_gateway, I have remote kernels pretty much working. I can launch new kernels when notebooks are opened or created, and delete them on notebook close or nbserver shutdown.

The biggest issue is restarting kernels. The SessionManager creates a kernel when a session is created, and stores the kernel_id as part of the session (kernel_id is part of the session SQL schema), which means kernel_id gets out of sync unless the kernel manager updates the session with a new kernel_id on a kernel restart (which DELETEs the old kernel_id, then POSTs to get a new kernel_id).

@rgbkrk
Copy link
Contributor

rgbkrk commented Oct 20, 2015

Sorry I had not followed up on this one yet. How are we doing auth on websockets now?

@parente
Copy link
Contributor Author

parente commented Oct 20, 2015

Token auth in headers, which doesn't square with the capabilities of the browser WebSocket APIs (but works fine in all other clients).

We need to sit and think through the options in #16 (comment) or other options for how to design this properly for the long haul. At the moment, @jtyberg is just working on a quick hack to flush out these issues.

@jtyberg
Copy link
Collaborator

jtyberg commented Oct 20, 2015

With regard to kernel restarts, interrupts, etc. from the notebook browser UI, kernel.js is hitting the kernel REST endpoints directly, and assumes the endpoints are on the same server as the notebook server. So it looks like frontend JS changes will also be necessary to support kernels that are remote from the notebook server.

Also, as expected, we have to set Access-Control-Allow-Origin on the kernel_gateway to enable this to work.

@parente
Copy link
Contributor Author

parente commented Oct 20, 2015

In prep for a convo about this with others, maybe a small wiki page attached to this project with the three options for how to proceed with a real impl with their pros/cons is in order.

@jtyberg
Copy link
Collaborator

jtyberg commented Oct 21, 2015

Well, I got the remote kernel restarts working, but I had to muck with the JS a bit.

There doesn't appear to be a NotebookApp option similar to websockets_url for the notebook REST API. session.js just assumes the host for the kernel REST endpoints is the same as the notebook server. It feels like there ought to be a base kernel url option from which we derive both the websocket and REST URIs, buts that probably the least of our concerns.

I created a branch here:

https://github.com/jtyberg/notebook/tree/remote_kernels

I'll take a stab at a wiki page to summarize decision points.

@rgbkrk
Copy link
Contributor

rgbkrk commented Oct 21, 2015

There doesn't appear to be a NotebookApp option similar to websockets_url for the notebook REST API. session.js just assumes the host for the kernel REST endpoints is the same as the notebook server. It feels like there ought to be a base kernel url option from which we derive both the websocket and REST URIs, buts that probably the least of our concerns.

This seems like something that @zischwartz had to patch in Thebe.

@parente
Copy link
Contributor Author

parente commented Oct 26, 2015

Aforementioned wiki page: https://github.com/jupyter-incubator/kernel_gateway/wiki/notebook_kernel_gateway

I think the experiment here fleshed out pain points and options for remote kernels in the notebook if we want to go there one day. We're not the only ones, btw, and certainly not the only approach (e.g., https://github.com/danielballan/remotekernel). I don't think there's anything else to do here for now.

@parente
Copy link
Contributor Author

parente commented Jun 17, 2016

@jtyberg has implemented a demo flavor of option 3 from the wiki page from way-back-when over in jupyter/kernel_gateway_demos#21

Going to close this issue out. We can continue the convo over in the PR.

@parente parente closed this as completed Jun 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants