Skip to content

Intellisense for notebooks (old way)

Rich Chiodo edited this page Aug 2, 2022 · 3 revisions

Deprecated

This page describes the old way that notebook intellisense worked. It's used if the python.pylanceLspNotebooksEnabled setting is false.

Once this experiment is pushed out to 100%, the following code could be eliminated:


Table of Contents

Overview

Notebooks Architecture

Concatentation of the notebook cells

Finding modules

Future changes

Intellisense Overview

Intellisense in VS code works by sending LSP requests to a separate process (well in most cases, see this for more info)

Something like so:

image

Intellisense for notebooks

Intellisense for notebooks works pretty much the same way but with each cell of a notebook being a text document:

image

Concatentation of the notebook cells

This poses a problem for the language server (Pylance) because code from one cell can be referenced in another.

Example:

image

In that example, the pandas import crosses the cell boundary.

This means pylance cannot just analyze each cell individually.

The solution was to concatenate the cells in order.

This changes the original architecture to something like so:

image

How does concatenation actually work

Concatenation is mostly just a raw concat of all of the contents of each cell on top of each other. Then the concat document has functions to map back and forth between the original cells and the concatenated contents.

Code for this can be found here

Here's an example of using it:

public async provideReferences(
        document: vscode.TextDocument,
        position: vscode.Position,
        options: {
            includeDeclaration: boolean;
        },
        token: vscode.CancellationToken,
        _next: protocol.ProvideReferencesSignature
    ) {
        const client = this.getClient();
        if (this.shouldProvideIntellisense(document.uri) && client) {
            const documentId = this.asTextDocumentIdentifier(document);
            const newDoc = this.converter.toConcatDocument(documentId);
            const newPos = this.converter.toConcatPosition(documentId, position);
            const params: protocol.ReferenceParams = {
                textDocument: newDoc,
                position: newPos,
                context: {
                    includeDeclaration: options.includeDeclaration
                }
            };
            const result = await client.sendRequest(protocolNode.ReferencesRequest.type, params, token);
            const notebookResults = this.converter.toNotebookLocations(result);
            return client.protocol2CodeConverter.asReferences(notebookResults);
        }
    }

That is the handler for the references LSP request.

It is

  • translating the incoming cell uri into a concat document
  • translating the incoming cell position into a concat document
  • sending the request using the concat data
  • translating the results back into a cell uri

Finding modules

When pylance starts up, it is passed an interpreter that defines what modules are installed. In this example, pylance is running with a Python 3.10 environment that is missing scikit-learn:

image

For python files, this interpreter is set at the bottom left of VS code:

image

That interpreter is used by pylance to determine where it will find all of the modules it checks. So in this example, the window's 3.10 64 bit environment does not have the module 'scikit-learn'

Notebooks complication

Notebooks don't have a 'global' interpreter, but rather a 'kernel' that is used to run the code. This kernel is almost always associated with a python interpreter.

image

This interpreter is what we need to pass to pylance so it can find the correct modules.

This complicates how pylance is started.

For a normal python file, this is how things are started:

image

For a notebook, we can't use the global interpreter, but rather we start a pylance server per kernel in use:

image

This is necessary because each pylance needs to have a separate 'interpreter' to use to search for modules.

This means there are now 4 pylance servers running.

  • 1 for the python extension to handle python files
  • 3 for each notebook that is opened with a different kernel

Document Selectors and Documents

Having multiple language servers running would usually mean each server was assigned to a specific document selector, otherwise you'd end up with duplicate results for say hover or completion.

However that's not the case. That's because of limitations in how selectors are specified.

  • The can specify a scheme, a language, or a pattern match
  • They cannot run logic (they're static)
  • They cannot exclude things

The python extension's selector is basically "language": "python" and the jupyter extension's selector is basically "scheme":"vscode-notebook-cell", then how do we resolve the duplicates?

Both extensions use something called middleware.

Middleware

The VS Code language client npm module is a library for talking to LSP enabled language servers. Both the Python Extension and the Jupyter extension use it in order to send messages to pylance. The library allows for the creation of a 'Middleware' object that can listen to any LSP request before it is sent to the server.

This provides an opportunity to filter messages based on the outbound document URI. Meaning we can eliminate duplicates in the example above.

  • Python extension lets all non notebook requests go through normally and swallows notebook requests (handling the negative case that selectors can't handle)
  • Jupyter extension has one middleware started per kernel. Each middleware piece swallows all requests not notebook related and checks if the request matches the kernel on a server. (handling the 'function' check for a selector)

This diagram shows a request for a specific notebook cell:

image

Actual implementation

The middleware that makes these decisions can be found

Jupyter's mutliplexing code for picking which pylance server to be run can be found here.

Future changes

Having 4 pylance servers running at the same time is rather redundant and a waste of CPU so we'd like to eliminate this need. In order to do that, Pylance would have to support a custom message indicating that certain URIs have different interpreters.

If that were to happen, we wouldn't need any middleware layers at all. Pylance would just handle all requests for all python files, and the jupyter extension would just need to pass a message indicating certain cells use a different interpreter.

Clone this wiki locally