Explore proxy controller in notebooks #146942

rebornix · 2022-04-06T18:17:39Z

We want to explore how we can build extensions which contribute a kernel/controller resolver that when selected provides functional notebook controllers for execution. The resolver handles necessary authentication for users and should be responsible for spinning up the real kernel/servers and work with controller extensions (e.g. Jupyter) to register notebook controllers (or register controllers themselves)

rebornix · 2022-04-06T18:50:31Z

I attempted to register an async notebook controller to represent lazy controller / controller resolver and quite a few UX and technical questions occurred to me which I might or might not have answers for yet, some notes for myself:

When should the kernel resolving happen, on quick pick selection or first cell execution?
- Based on current design, we will start the resolver when users select it from the quick pick
- When there is no other available kernel, should the lazy kernel be the selected/suggsted kernel
  - If we show it on the status bar, how would we trigger the "resolver"? We can start the resolver on execution but we might also want to allow users to start the resolver through a single mouse click
Lazy kernel becomes the active/selected "kernel" while it's resolving. However it should not be cached as the selected kernel for the notebook.
What is the expected result of the lazy kernel "resolution"? Should it silently finish (as it can pass all the remote server info to Jupyter) or should it tell VS Code which kernel should now be used? What should we do if there are more than one available kernels behind that remote server?
- If it silently finishes and doesn't tell which non-lazy kernel it delegates to, the core wouldn't know how to pick a kernel as there might be more than one "suggested" kernel available.

rebornix · 2022-04-13T00:05:54Z

Initial connection: select and run

Based on exploration/mockups @misolori put together, here is a prototype of how the kernel resolver would work with integration with Jupyter

In the recording:

Users have selected a remote server from GitHub
Run code
The server initializes, get Jupyter server endpoint from remote, pass it to Jupyter extension
Jupyter extension resolves kernels from the endpoint and contributes the controllers to VS Code
The cell is executed against the resolved kernel

To archive above, we would need following APIs

Kernel resolver API in VS Code side, which allows github extension to register a lazy/proxy kernel entry. When this lazy kernel is picked and being requested code execution, it will spin up remote machines (which has Jupyter server installed and launched), do the proper authentication, and lastly get a remote jupyter server url.

export interface NotebookProxyController {
    readonly id: string;
    readonly notebookType: string;
    label: string;
    kind?: string;
    resolveHandler: () => NotebookController | string | Thenable<NotebookController | string>;
    readonly onDidChangeSelectedNotebooks: Event<{ readonly notebook: NotebookDocument; readonly selected: boolean }>;
    dispose(): void;
}

export namespace notebooks {
    export function createNotebookProxyController(id: string, notebookType: string, label: string, resolveHandler: () => NotebookController | string | Thenable<NotebookController | string>): NotebookProxyController;
}

The Jupyter extension will allow extensions to register/contribute remote Jupyter servers with proper authentication support, resolve kernels from the remote Jupyter server, register notebook controllers to VS Code.

export interface IJupyterServerUri {
    baseUrl: string;
    token: string;
    authorizationHeader: any; // JSON object for authorization header.
    expiration?: Date; // Date/time when header expires and should be refreshed.
	displayName: string;
}

export type JupyterServerUriHandle = string;

export interface IJupyterUriProvider {
    readonly id: string; // Should be a unique string (like a guid)
    onDidChangeHandles: Event<void>;
    getHandles(): Promise<JupyterServerUriHandle[]>;
    getServerUri(handle: JupyterServerUriHandle): Promise<IJupyterServerUri>;
}

export interface IExtensionApi {
    registerRemoteServerProvider(serverProvider: IJupyterUriProvider): void;
    addRemoteJupyterServer(providerId: string, handle: JupyterServerUriHandle): Promise<void>;
}

Lazy kernel resolver extension is responsible for setting up remote Jupyter servers and contributing to Jupyter (through registerRemoteServerProvider). It will tell Jupyter which servers are available and when a server is picked, Jupyter extension will request the resolver extension for authorization details (getServerUri) since that's the knowledge only available to the resolver extension.

Todos/notes

Support handles in IJupyterUriProvider Support picking IJupyterServerUriInfo in command jupyter.selectjupyteruri vscode-jupyter#9683
Jupyter only allows one remote server to be selected but we want to allow users to connect to different server from different notebook documents. Support multiple remote servers vscode-jupyter#9684
- selectJupyterUrl should probably be addRemoteJupyterServer
Currently we only support token based authentication but if we want to support others, we may need to extend IJupyterServerUri.

Reload and reconnect

Workspace reload/reconnect is more complicated than the initial connection. Let's say the first time users open notebook, the kernel quick pick has two options

- Connect to GitHub
- Install kernels from marketplace

After users pick Connect to GitHub, go through authentication, waiting for the server to start up and launch Jupyter server, the kernel quick pick now has one more entry Python 3.9, which is the selected remote kernel

- Python 3.9 (selected)
- Connect to GitHub
- Install kernels from marketplace

The Python 3.9 kernel is offered by a remote service, which means it might not be available all the time. If users have the notebook editor idle (or even close the workspace) for a day, and when they reopen the notebook next day, the remote Jupyter kernel might already be paused/killed, or even worse, the VM/server hosting the remote Jupyter server might be released. Now if users open the kernel quick pick, what kernels are available to them? We have three different options:

It behaves the same as initial connection, users need to connect to a new remote server
```
- Connect to GitHub
```
There would be two options in the quick pick, one is the remote server used last time and the other option is connecting a new remote server
```
- GitHub (...)
- Connect to GitHub
```
Users see two options in the quick pick, but the first one is the kernel they used last time
```
- Python 3.9
- Connect to GitHub
```

All 3 options are valid and the 3rd one meets users' expectation the most, but it has high chances to fail as there is no guarantee the same kernel users use last time is still available (like the server is released). As we discussed offline, we would want to conditionally show option 1 and 3, it would require the github extension and the Jupyter extension coordinate with each other

Jupyter extension would cache the lastly used kernel (kernel id/path and which Jupyter server it's from)
GitHub extension would cache remote servers used for notebook documents, check if they are still available on notebook document open. If they still exist, register the remote server urls to Jupyter registerRemoteServerProvider @rebornix
Jupyter extension periodically checks if the remote server for the cached kernel is available (through registered remoteServerProviders), if it exists, create notebook controller for the cached kernel. if it doesn't exist yet, no-op

rebornix · 2022-04-14T00:05:38Z

Extension snippet code for contributing a lazy kernel which can launch a local jupyter server. The code demonstrates how the lazy kernel integrates with VS Code and Jupyter extension:

register lazy kernel to VS Code createNotebookProxyController
register remote server provider jupyterApi.registerRemoteServerProvider and the provider will handle authentication
jupyter.selectjupyteruri to ask Jupyter to use the server created by the lazy kernel extension.

interface ICachedServer {
    serverProcess: any;
    baseUrl: string;
    token: string;
}

function registerLazyKernels(context: IExtensionContext, api: IExtensionApi) {
    const servers = new Map<String, ICachedServer>();

    api.registerRemoteServerProvider({
        get id() {
            return 'github';
        },
        getQuickPickEntryItems: () => {
            return [] as QuickPickItem[];
        },
        handleQuickPick: (_item, _back) => {
            return Promise.resolve(undefined);
        },
        getServerUri: (handle: string) => {
            // github to do the auth
            const server = servers.get(handle);
            if (server) {
                // resume the machine/vm
                const token = server.token;
                return Promise.resolve({
                    authorizationHeader: { Authorization: `token ${token}` },
                    displayName: 'GitHub',
                    baseUrl: server.baseUrl,
                    token: token
                });
            }

            return Promise.reject();
        }
    });

    const createServer = () => {
        return new Promise<ICachedServer>(resolve => {
            const server = spawn('C:\\Users\\rebor\\.conda\\envs\\testenv\\python.exe', ['-m', 'jupyter', 'lab', '--no-browser']);

            const handleData = (data: string) => {
                const buffer = data.split(/\r|\n|\r\n/g) as string[];
                for (let i = 0; i < buffer.length; i++) {
                    const matches = /(http(s)?:\/\/(.*))\/\?token=(.*)/.exec(buffer[i]);
                    if (matches) {
                        const baseUrl = matches[1];
                        const token = matches[4];
                        resolve({
                            serverProcess: server,
                            baseUrl,
                            token
                        });
                        break;
                    }
                }

            }
            server.stdout.on('data', (data: any) => {
                handleData(data.toString());
            });

            server.stderr.on('data', (data: any) => {
                handleData(data.toString());
            });
        })
    }

    const controller = notebooks.createNotebookProxyController('lazy_kernel', 'jupyter-notebook', 'GitHub Server', async () => {
        return new Promise(async (resolve, reject) => {
            const server = servers.size === 0 ? (await createServer()) : [...servers.values()][0];
            servers.set(server.baseUrl, server);
            commands.executeCommand(
                'jupyter.selectjupyteruri',
                false,
                { id: 'github', handle: server.baseUrl },
                window.activeNotebookEditor?.document
            ).then(p => {
                if (p) {
                    resolve(p as NotebookController);
                } else {
                    reject()
                }
            });
        });
    });
    controller.kind = 'GitHub';
    context.subscriptions.push(controller);
}

rebornix · 2022-04-20T01:20:01Z

@kieferrm and I had offline discussions of the two common scenarios: initial connection and reload. The initial connection is simple as the proxy kernel will spin up the jupyter server on a remote machine, provides a jupyter server url to Jupyter extension and lastly a notebook controller is created and used to execute code.

The "Reload" scenario is not that straight forward. When the window/page reloads (e.g., install an extension which requires reloading the window), VS Code workbench restores the same notebook editor opened before, we need to figure out if we should allow users to run code against the jupyter server/kernel just used before. In all 3 options described in #146942 (comment), the 3rd one is the most natural one:

Jupyter extension caches all kernels from the jupyter server created by github
After the window reloads, Jupyter extension create notebook controllers from the cache data. Users would see the kernel used before the window reload.
If the jupyter server has more than one kernel, users will see them available in the kernel quick pick.
Execution will work since Jupyter extension already has the server endpoint and its auth info.

The catch here is the cached jupyter server might be deallocated/released. When that happens, Jupyter won't be able to connect to the server anymore and execution will fail. A nicer experience will be the GitHub extension launch a new jupyter server under the hood with the same kernel specs. An alternative is GitHub provides the availability of jupyter servers to Jupyter extension in advance, and Jupyter extension can clear the cache if the jupyter servers are no longer available.

We also found that the same concept/UX could be expanded to remote jupyter server support in Jupyter on VS Code desktop:

Currently users need to open the server picker from the status bar, manually type in a jupyter server url or choose AZML to create jupyter server on the fly.
Once a jupyter server url is provided to Jupyter, users can now see the list of kernels available in that jupyter server.
This is already some sort of Proxy Kernel but the user needs to launch jupyter server manually and paste the url in the server pick myself, other than having an extension (e.g., GitHub) handling that for us all automatically. If we leverage the Proxy kernel concept we can move the options in the server quick pick into the kernel picker.

DonJayamanne · 2022-04-20T17:55:03Z

Disconnect

Assume I connect to a a Github, Azure ML and/or Existing remote server, how would I now be able to disconnect from them?
Based on the current API there doesn't seem to be a way to disconnect.

An alternative is GitHub provides the availability of jupyter servers to Jupyter extension in advance, and Jupyter extension can clear the cache if the jupyter servers are no longer available.

Based on the latest notes, the only way to get rid of them is to wait for github to kill the remote servers, reload vscode and they'll disappear at some point in time. I think there should be a way to disconnect them. Its possible that some of these resources are not free (Azure ML, Github or the like) & I'd like to remove them. 
  Another reason to allow disconnecting is for security reasons, much like signing out of an account.

Display names

Based on the screenshots, the GitHub lazy kernel is still available. This leads me to believe one can possibly start multiple github servers. If this is the case, then each server would have to have to have its own display name. E.g. Github Remote Server 1, Github Remote Server 2, etc.
The existing API supports this and IJupyterServerUri has a property named displayName that could be used for this purpose. Issue and notes updated accordingly.
Change API of selectJupyterUrl(id: string, handle: JupyterServerUriHandle): vscode.NotebookController; to be async.

The return value needs to be async, as Jupyter extension will need to query the remote kernels (which happens to be an IO operation). Hence changed the API accordingly in the issue.
Remove Lazy Controller or Leave it once connected

Lets assume the lazy controller is removed as soon as the remote kernels have been resolved (after clicking this). Lets assume this to be the case in the case of Azure ML or another lazy kernel contributed by another extension. I'm assuming this would be removed to reduce confusion. why leave it there once you've already connected to the azure compute.

Next time VS Code is reloaded, now what? Should Azure ML extension add the same lazy kernel into the list?
The 3rd party extension should know that the controllers have been loaded for this remote uri.
I believe this can be achieved by the 3rd party extension proactively calling the method IExtensionApi.selectJupyterUrl.
If something is returned, then it knows that the Jupyter extension is aware of the controllers and it can chose to hide/remove the lazy controller.
Lets discuss this.

greazer · 2022-05-02T20:25:25Z

Be sure to account for this issue.
microsoft/vscode-jupyter#9865

rebornix assigned rebornix and miguelsolorio Apr 6, 2022

rebornix added this to the April 2022 milestone Apr 6, 2022

rebornix added the feature-request Request for new features or functionality label Apr 6, 2022

chrisdias mentioned this issue Apr 11, 2022

Iteration Plan for April 2022 #146672

Closed

greazer mentioned this issue Apr 11, 2022

Finding where and how to change remote server connections is not intuitive microsoft/vscode-jupyter#9660

Closed

This was referenced Apr 14, 2022

Support picking IJupyterServerUriInfo in command jupyter.selectjupyteruri microsoft/vscode-jupyter#9683

Closed

Support multiple remote servers microsoft/vscode-jupyter#9684

Closed

Experiment: proxy controller in notebooks #147470

Merged

kieferrm mentioned this issue Apr 26, 2022

Jupyter: Iteration Plan for April 2022 microsoft/vscode-jupyter#9566

Closed

15 tasks

rebornix modified the milestones: April 2022, May 2022 Apr 28, 2022

kieferrm mentioned this issue May 7, 2022

Iteration Plan for May 2022 #149008

Closed

99 tasks

rebornix mentioned this issue May 10, 2022

Async notebook controller connection #149201

Closed

DonJayamanne mentioned this issue May 11, 2022

Support returning a preferred remote controller microsoft/vscode-jupyter#9978

Merged

rebornix mentioned this issue May 23, 2022

Notebook kernel source menu contribution #150146

Merged

rebornix mentioned this issue May 30, 2022

Test proxy kernel support #150738

Closed

1 task

rebornix modified the milestones: May 2022, June 2022 Jun 1, 2022

rebornix added the on-testplan label Jun 29, 2022

rebornix closed this as completed Jun 29, 2022

github-actions bot locked and limited conversation to collaborators Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore proxy controller in notebooks #146942

Explore proxy controller in notebooks #146942

rebornix commented Apr 6, 2022

rebornix commented Apr 6, 2022

rebornix commented Apr 13, 2022 •

edited by DonJayamanne

rebornix commented Apr 14, 2022

rebornix commented Apr 20, 2022

DonJayamanne commented Apr 20, 2022 •

edited

greazer commented May 2, 2022

Explore proxy controller in notebooks #146942

Explore proxy controller in notebooks #146942

Comments

rebornix commented Apr 6, 2022

rebornix commented Apr 6, 2022

rebornix commented Apr 13, 2022 • edited by DonJayamanne

Initial connection: select and run

Reload and reconnect

rebornix commented Apr 14, 2022

rebornix commented Apr 20, 2022

DonJayamanne commented Apr 20, 2022 • edited

greazer commented May 2, 2022

rebornix commented Apr 13, 2022 •

edited by DonJayamanne

DonJayamanne commented Apr 20, 2022 •

edited