feat: add build-time execution #856

agoose77 · 2024-01-16T18:18:48Z

This PR is a jumping off point for execution work, and should make it possible to scope discussions to just interested parties.

Separate existing inline expression result population from rendering
Implement local on-disk caching (with a nice abstraction)
Implement a CLI for execution w/o rendering
Implement CLI options to control caching and whether to start existing an Jupyter server

Closes #839, #550

packages/myst-cli/src/transforms/outputs.ts

packages/myst-cli/src/transforms/inlineExpressions.ts

packages/myst-cli/src/transforms/execute.ts

agoose77 · 2024-01-17T10:39:15Z

packages/myst-cli/src/transforms/execute.ts

+type ICellBlockOutput = GenericNode & {
+  data: IOutput[];
+};
+
+type ICellBlock = GenericNode & {
+  children: (Code | ICellBlockOutput)[];
+};
+
+function isCellBlock(node: GenericNode): node is ICellBlock {
+  return node.type === 'block' && select('code', node) !== null && select('output', node) !== null;
+}
+


Is there a better way to type this?

Things look good?!

@fwkoch did we have a kind on the block?

We are tagging executable code blocks with data.type = 'notebook-code' - From notebooks - https://github.com/executablebooks/mystmd/blob/main/packages/myst-cli/src/process/notebook.ts#L24-L27 and from code cell directives - https://github.com/executablebooks/mystmd/blob/main/packages/myst-directives/src/code.ts#L215

However, I'm not sure we are relying on this field anywhere? In the myst-theme execution code at least, we find executable blocks in a similar way to your implementation: https://github.com/executablebooks/myst-theme/blob/855f29ec42a77deaa145481bfcab25b2d3c53757/packages/jupyter/src/execute/utils.ts#L8 The only difference is that function supports executable figures, i.e. container nodes with code/output children, in addition to blocks.

agoose77 · 2024-01-17T10:51:20Z

@fwkoch I've pinged you for review to hopefully course correct anything that's not looking like it's going in the right direction.

At a high level, this is my thinking about the transforms / user flow:

We should prefer executed notebook outputs to pre-executed outputs
We should be able to skip execution e.g. if the cache is unchanged / user requests no execution

In getting the current WIP working, I switched from finding Code nodes to finding "notebook cell" blocks. This seems like a poor assumption on my part. Could you perhaps help me to understand why we have an output node rather than output being a property of code nodes?

My plan is to slightly re-work inline expressions too, so that the "rendering" happens at a similar stage as the output rendering — inline expressions should likely be minified.

agoose77 · 2024-01-17T12:28:24Z

At this stage, environment variables can be set to point to an existing Jupyter Server instance. In future, these will be configurable and optional (we could spin up our own server).

agoose77 · 2024-01-17T18:44:38Z

I've now unified the treatment of outputs and expression results such that the notebook processor just populates these from the notebook itself (e.g. metadata). A later stage would be required to minify these, and then "simplify" them for rendering.

@rowanc1 could you elaborate on the purpose of them minimisation? I assume it's to make manipulations of the AST less memory intensive?

rowanc1

A few questions at this stage about where we are minifying outputs and rendering inline expressions/adding IDs etc.

Not sure if this is fully working yet, but looks like we have a cache, pulling out executable pieces, executing those against a kernel, and providing results.

Another thought as I am looking through this, I wonder if this would be good as a separate package? myst-execute maybe?

packages/myst-cli/src/process/notebook.ts

packages/myst-cli/src/transforms/inlineExpressions.ts

rowanc1 · 2024-01-17T21:12:10Z

packages/myst-cli/src/transforms/execute.ts

+type ICellBlockOutput = GenericNode & {
+  data: IOutput[];
+};
+
+type ICellBlock = GenericNode & {
+  children: (Code | ICellBlockOutput)[];
+};
+
+function isCellBlock(node: GenericNode): node is ICellBlock {
+  return node.type === 'block' && select('code', node) !== null && select('output', node) !== null;
+}
+


Things look good?!

packages/myst-cli/src/process/notebook.ts

packages/myst-cli/src/process/mdast.ts

rowanc1 · 2024-01-18T00:37:31Z

The minimize outputs takes the encoded outputs (base64 images, html stuff, raw logs, etc.) and pulls them out of the mdast and writes them to disk. This is needed for writing latex, word, typst, etc. and for html is also more efficient for network requests and lazy loading of images in a long article.

We are also transforming the images into other formats (webp), which further improves load time. All of that needs them on disk.

packages/myst-cli/src/process/mdast.ts

agoose77 · 2024-01-18T18:27:54Z

packages/myst-cli/src/session/session.ts

+  async jupyterSessionManager(): Promise<SessionManager | undefined> {
+    if (this._jupyterSessionManager !== null) {
+      return Promise.resolve(this._jupyterSessionManager);
+    }
+    try {
+      const partialServerSettings = await new Promise<JupyterServerSettings>(
+        async (resolve, reject) => {
+          if (process.env.JUPYTER_BASE_URL === undefined) {
+            resolve(findExistingJupyterServer() || (await launchJupyterServer(this.contentPath(), this.log)));
+          } else {
+            resolve({
+              baseUrl: process.env.JUPYTER_BASE_URL,
+              token: process.env.JUPYTER_TOKEN,
+            });
+          }
+        },
+      );
+      const serverSettings = ServerConnection.makeSettings(partialServerSettings);
+      const kernelManager = new KernelManager({ serverSettings });
+      const manager = new SessionManager({ kernelManager, serverSettings });
+      // TODO: this is a race condition, even though we shouldn't hit if if this promise is actually awaited
+      this._jupyterSessionManager = manager;
+      return manager;
+    } catch {
+      this._jupyterSessionManager = undefined;
+      return undefined;
+    }
+  }


This does not feel elegant to me, but I want to get "something" in place for us to talk about.

…nto agoose77/feat-build-execution

agoose77 · 2024-01-18T19:34:15Z

OK, at this point it seems like this is behaving as expected (though with no user configurability besides an ENV var). It will be good to discuss this on Monday :)

stevejpurves · 2024-01-22T13:39:25Z

@agoose77 re: outputs are separate from code rather than properties I think stemmed from the outputs being embeddable as first-class entities, rather than a code cell that has it's source hidden. That thinking originally was also imprinted on the initial implementation, whether it's strictly essential or not.

stevejpurves · 2024-01-22T13:42:54Z

I'd also just like to keep this visible: https://github.com/executablebooks/mystmd/blob/aa335d748edd4d636ad117a216ba78c1c1283e4f/packages/myst-cli/src/process/mdast.ts#L232

i.e. text-based notebooks are flagged as executable here and jupytext / myst text-based notebooks should probably be a test case for local execution too

stevejpurves · 2024-01-22T13:49:11Z

packages/myst-cli/src/session/session.ts

+      const serverSettings = ServerConnection.makeSettings(partialServerSettings);
+      const kernelManager = new KernelManager({ serverSettings });
+      const manager = new SessionManager({ kernelManager, serverSettings });
+      // TODO: this is a race condition, even though we shouldn't hit if if this promise is actually awaited


await manager.ready in turn does await the kernelManager.ready if that's what you mean

callers of this function would need to know to await (await jupyterSessionManager()).ready unless you await manager.ready in here?

In this instance it's that there's no logic to avoid creating two managers if one doesn't properly await the promise to session.jupyterSessionManager(). i.e., between the first await and finally setting this._jupyterSessionManager.

agoose77 · 2024-01-24T18:46:54Z

Closed in favour of #866 and #873.

agoose77 added 5 commits January 16, 2024 18:18

wip: initial commit

c7089dc

fix: relative import

b15ef1b

chore: loosen services, integrate into loop

02b13b6

fix: take Jupyter connection settings from env

bc48dea

fix: plumb in rendering

b2e6bba

agoose77 commented Jan 17, 2024

View reviewed changes

agoose77 requested a review from fwkoch January 17, 2024 10:39

wip: simple cache key design

3b8dc42

agoose77 added 3 commits January 17, 2024 13:11

refactor: support reading from not-yet-implemented cache

2962807

refactor: embed expression outputs at the notebook processing level

8b19f2a

chore: remove unused transform

072cb61

rowanc1 reviewed Jan 17, 2024

View reviewed changes

agoose77 added 4 commits January 18, 2024 11:25

fix: restore rendering of expressions

e87ee85

feat: add simple on-disk cache

fa121d7

feat: add support for error anticipation

0c87b89

feat: add support for auto-detecting the running server

b18370e

agoose77 force-pushed the agoose77/feat-build-execution branch from 0f27917 to b18370e Compare January 18, 2024 18:26

agoose77 commented Jan 18, 2024

View reviewed changes

packages/myst-cli/src/process/mdast.ts Outdated Show resolved Hide resolved

Update packages/myst-cli/src/process/mdast.ts

f57198c

agoose77 commented Jan 18, 2024

View reviewed changes

agoose77 added 2 commits January 18, 2024 18:33

refactor: sort by PID

18b0702

Merge remote-tracking branch 'origin/agoose77/feat-build-execution' i…

2f121af

…nto agoose77/feat-build-execution

chore: lint

d2e38f8

stevejpurves reviewed Jan 22, 2024

View reviewed changes

fwkoch mentioned this pull request Jan 23, 2024

📦 Add myst-execute package #866

Merged

agoose77 closed this Jan 24, 2024

rowanc1 deleted the agoose77/feat-build-execution branch February 12, 2024 22:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add build-time execution #856

feat: add build-time execution #856

agoose77 commented Jan 16, 2024 •

edited

Loading

agoose77 Jan 17, 2024

rowanc1 Jan 17, 2024

rowanc1 Jan 17, 2024

fwkoch Jan 19, 2024

agoose77 commented Jan 17, 2024 •

edited

Loading

agoose77 commented Jan 17, 2024

agoose77 commented Jan 17, 2024

rowanc1 left a comment

rowanc1 Jan 17, 2024

rowanc1 commented Jan 18, 2024

agoose77 Jan 18, 2024

agoose77 commented Jan 18, 2024

stevejpurves commented Jan 22, 2024

stevejpurves commented Jan 22, 2024

stevejpurves Jan 22, 2024

stevejpurves Jan 22, 2024

agoose77 Jan 22, 2024

agoose77 commented Jan 24, 2024

feat: add build-time execution #856

feat: add build-time execution #856

Conversation

agoose77 commented Jan 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agoose77 commented Jan 17, 2024 • edited Loading

agoose77 commented Jan 17, 2024

agoose77 commented Jan 17, 2024

rowanc1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rowanc1 commented Jan 18, 2024

Choose a reason for hiding this comment

agoose77 commented Jan 18, 2024

stevejpurves commented Jan 22, 2024

stevejpurves commented Jan 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agoose77 commented Jan 24, 2024

agoose77 commented Jan 16, 2024 •

edited

Loading

agoose77 commented Jan 17, 2024 •

edited

Loading