Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add build-time execution #856

Closed
wants to merge 17 commits into from
Closed

Conversation

agoose77
Copy link
Contributor

@agoose77 agoose77 commented Jan 16, 2024

This PR is a jumping off point for execution work, and should make it possible to scope discussions to just interested parties.

  • Separate existing inline expression result population from rendering
  • Implement local on-disk caching (with a nice abstraction)
  • Implement a CLI for execution w/o rendering
  • Implement CLI options to control caching and whether to start existing an Jupyter server

Closes #839, #550

packages/myst-cli/src/transforms/outputs.ts Show resolved Hide resolved
packages/myst-cli/src/transforms/execute.ts Outdated Show resolved Hide resolved
Comment on lines 116 to 127
type ICellBlockOutput = GenericNode & {
data: IOutput[];
};

type ICellBlock = GenericNode & {
children: (Code | ICellBlockOutput)[];
};

function isCellBlock(node: GenericNode): node is ICellBlock {
return node.type === 'block' && select('code', node) !== null && select('output', node) !== null;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to type this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things look good?!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwkoch did we have a kind on the block?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are tagging executable code blocks with data.type = 'notebook-code' - From notebooks - https://github.com/executablebooks/mystmd/blob/main/packages/myst-cli/src/process/notebook.ts#L24-L27 and from code cell directives - https://github.com/executablebooks/mystmd/blob/main/packages/myst-directives/src/code.ts#L215

However, I'm not sure we are relying on this field anywhere? In the myst-theme execution code at least, we find executable blocks in a similar way to your implementation: https://github.com/executablebooks/myst-theme/blob/855f29ec42a77deaa145481bfcab25b2d3c53757/packages/jupyter/src/execute/utils.ts#L8 The only difference is that function supports executable figures, i.e. container nodes with code/output children, in addition to blocks.

@agoose77 agoose77 requested a review from fwkoch January 17, 2024 10:39
@agoose77
Copy link
Contributor Author

agoose77 commented Jan 17, 2024

@fwkoch I've pinged you for review to hopefully course correct anything that's not looking like it's going in the right direction.

At a high level, this is my thinking about the transforms / user flow:

  1. We should prefer executed notebook outputs to pre-executed outputs
  2. We should be able to skip execution e.g. if the cache is unchanged / user requests no execution

In getting the current WIP working, I switched from finding Code nodes to finding "notebook cell" blocks. This seems like a poor assumption on my part. Could you perhaps help me to understand why we have an output node rather than output being a property of code nodes?

My plan is to slightly re-work inline expressions too, so that the "rendering" happens at a similar stage as the output rendering — inline expressions should likely be minified.

@agoose77
Copy link
Contributor Author

At this stage, environment variables can be set to point to an existing Jupyter Server instance. In future, these will be configurable and optional (we could spin up our own server).

@agoose77
Copy link
Contributor Author

I've now unified the treatment of outputs and expression results such that the notebook processor just populates these from the notebook itself (e.g. metadata). A later stage would be required to minify these, and then "simplify" them for rendering.

@rowanc1 could you elaborate on the purpose of them minimisation? I assume it's to make manipulations of the AST less memory intensive?

Copy link
Collaborator

@rowanc1 rowanc1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions at this stage about where we are minifying outputs and rendering inline expressions/adding IDs etc.

Not sure if this is fully working yet, but looks like we have a cache, pulling out executable pieces, executing those against a kernel, and providing results.

Another thought as I am looking through this, I wonder if this would be good as a separate package? myst-execute maybe?

packages/myst-cli/src/process/notebook.ts Show resolved Hide resolved
Comment on lines 116 to 127
type ICellBlockOutput = GenericNode & {
data: IOutput[];
};

type ICellBlock = GenericNode & {
children: (Code | ICellBlockOutput)[];
};

function isCellBlock(node: GenericNode): node is ICellBlock {
return node.type === 'block' && select('code', node) !== null && select('output', node) !== null;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things look good?!

packages/myst-cli/src/process/notebook.ts Show resolved Hide resolved
packages/myst-cli/src/process/mdast.ts Outdated Show resolved Hide resolved
@rowanc1
Copy link
Collaborator

rowanc1 commented Jan 18, 2024

The minimize outputs takes the encoded outputs (base64 images, html stuff, raw logs, etc.) and pulls them out of the mdast and writes them to disk. This is needed for writing latex, word, typst, etc. and for html is also more efficient for network requests and lazy loading of images in a long article.

We are also transforming the images into other formats (webp), which further improves load time. All of that needs them on disk.

@agoose77 agoose77 force-pushed the agoose77/feat-build-execution branch from 0f27917 to b18370e Compare January 18, 2024 18:26
Comment on lines 168 to 195
async jupyterSessionManager(): Promise<SessionManager | undefined> {
if (this._jupyterSessionManager !== null) {
return Promise.resolve(this._jupyterSessionManager);
}
try {
const partialServerSettings = await new Promise<JupyterServerSettings>(
async (resolve, reject) => {
if (process.env.JUPYTER_BASE_URL === undefined) {
resolve(findExistingJupyterServer() || (await launchJupyterServer(this.contentPath(), this.log)));
} else {
resolve({
baseUrl: process.env.JUPYTER_BASE_URL,
token: process.env.JUPYTER_TOKEN,
});
}
},
);
const serverSettings = ServerConnection.makeSettings(partialServerSettings);
const kernelManager = new KernelManager({ serverSettings });
const manager = new SessionManager({ kernelManager, serverSettings });
// TODO: this is a race condition, even though we shouldn't hit if if this promise is actually awaited
this._jupyterSessionManager = manager;
return manager;
} catch {
this._jupyterSessionManager = undefined;
return undefined;
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not feel elegant to me, but I want to get "something" in place for us to talk about.

@agoose77
Copy link
Contributor Author

OK, at this point it seems like this is behaving as expected (though with no user configurability besides an ENV var). It will be good to discuss this on Monday :)

@stevejpurves
Copy link
Contributor

@agoose77 re: outputs are separate from code rather than properties I think stemmed from the outputs being embeddable as first-class entities, rather than a code cell that has it's source hidden. That thinking originally was also imprinted on the initial implementation, whether it's strictly essential or not.

@stevejpurves
Copy link
Contributor

I'd also just like to keep this visible: https://github.com/executablebooks/mystmd/blob/aa335d748edd4d636ad117a216ba78c1c1283e4f/packages/myst-cli/src/process/mdast.ts#L232

i.e. text-based notebooks are flagged as executable here and jupytext / myst text-based notebooks should probably be a test case for local execution too

const serverSettings = ServerConnection.makeSettings(partialServerSettings);
const kernelManager = new KernelManager({ serverSettings });
const manager = new SessionManager({ kernelManager, serverSettings });
// TODO: this is a race condition, even though we shouldn't hit if if this promise is actually awaited
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await manager.ready in turn does await the kernelManager.ready if that's what you mean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callers of this function would need to know to await (await jupyterSessionManager()).ready unless you await manager.ready in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this instance it's that there's no logic to avoid creating two managers if one doesn't properly await the promise to session.jupyterSessionManager(). i.e., between the first await and finally setting this._jupyterSessionManager.

@agoose77
Copy link
Contributor Author

Closed in favour of #866 and #873.

@agoose77 agoose77 closed this Jan 24, 2024
@rowanc1 rowanc1 deleted the agoose77/feat-build-execution branch February 12, 2024 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for local kernel execution
4 participants