Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded code outputs abstraction #681

Open
chrisjsewell opened this issue Mar 2, 2022 · 22 comments
Open

Embedded code outputs abstraction #681

chrisjsewell opened this issue Mar 2, 2022 · 22 comments
Labels
discussion Things that aren't closeable

Comments

@chrisjsewell
Copy link
Member

Aim

Within jupyter-book, and EBP in general, it is a desire for users to be able to embed the outputs of Jupyter Notebook code cell execution within the body of their documents.
For example, referencing the value of a calculation:

a = 1 + 2

with some form of "placeholder" syntax:

The result is {{ a }}

As well as simple variable referencing, one would also like to embed "richer" outputs, such as images and HTML.

  1. The abstraction should aim for potential implementations across different editing/rendering platforms (Jupyter Lab, Sphinx, VS Code, Curvenote, etc)
  2. The abstraction should aim to be (Jupyter) kernel agnostic, i.e. work across the range of possible: kernels https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
  3. Embedding should allow for both inline and block level components
  4. It is desirable for the process to be a simple as possible for the user
  5. It may be desirable to embed code outputs in a cross-document manner, i.e. one can embed a code output from one document in another
  6. Caching of the

Sphinx recap

Before discussing potential abstractions, and their pros/cons, it will be helpful to recap the basic sphinx build process phases:

  1. For each document:
  • (myst-nb + notebook only) the notebook is executed, if necessary, populating all code cell outputs
  • The source text is parsed to an Abstract Syntax Tree (AST), which is agnostic of the eventual output format (HTML, LaTeX, ...)
  • Transforms are applied to the AST, to apply changes that require knowledge of the full AST
  • Certain variables are extracted to a global "database", known as the environment, such as reference targets
  • The AST is cached to disk, so that re-builds only have to re-parse modified documents
  1. The global environment is cached to disk
  2. For each output format
  • Post-transforms are applied to each cached AST, to apply changes that require knowledge of the full project, such as inter-document referencing (using the global environment)
  • All ASTs are converted into the output format

One difficulty with the outputs of Jupyter notebook code cells, is that they can provide multiple output formats (a.k.a mime types), which can only be "selected" in phase (3)

Potential abstractions

A number of potential abstractions are discussed below, with their pros and cons

Current myst-nb glue abstraction

In myst-nb v0.13, there is the glue function & roles/directives.
This is implemented for IPython kernels only, whereby one "binds" a variable name to a variable by the glue function:

from myst_nb import glue
a = "content"
glue("variable_name", a)

and placeholder syntax look like:

Inline: {glue:}`variable_name`

Block:

```{glue:} variable_name
```

All mime types for such outputs (such as text/plain, text/html, image/png`, ...) are saved to a cache, during phase (1) of the sphinx build.

Then, during phase (3), placeholder roles/directives are replaced with a suitable mime type for that output format, taking the mime type's content and converting it to AST, before injecting it into the document's AST.

Pros:

  • ✅ It is relatively simple for users to use
  • ✅ it provides a one-to-one mapping between variable name and variable output
  • ✅ All required outputs are saved in the Jupyter notebook (i.e. can be parsed without a live kernel)
  • ✅ It works cross-document

Cons:

  • ❌ It is not kernel agnostic
  • ❌ It requires that variable names are unique across the whole project
  • ❌ It would be very difficult to implement outside of sphinx
  • ❌ In sphinx, because the outputs are only converted to AST in phase (3), it means that important AST transformations from phase (1) can be missed, and need to be retroactively applied, making the conversion "brittle"

Refactored myst-nb glue abstraction

The refactor in executablebooks/MyST-NB#380 is not primarily aimed at glue, but it does intrinsically change how it works. It primarily addresses the issue of AST creation in phase (3), moving it to phase (1).
In its current form, the implementation precludes cross-document use, a proposal though is to use the form:

```{glue:} variable_name
:doc: docname
```

This would fix the issue of requiring variable names to be unique across the project
It would require a "bi-modal" approach though, whereby glue without the doc option would proceed by directly converting outputs to AST in phase (1), but with the doc option AST would still need to be generated in phase (3)

Using code cell IDs (or metadata)

As discussed above, a big issue with the glue abstraction above, is that it is only currently implemented for Python, and would require different implementations for different kernels.

One way round this, is to assign an ID to each code cell, then use this as the reference for embedding code outputs.
This ID could either be assigned within the cell's metadata or also now via the recent addition of: https://nbformat.readthedocs.io/en/latest/format_description.html#cell-ids

  • ✅ It is kernel agnostic
  • ❌ The metadata/ID fields of a code cell are less accessible to users, for example
  • ❌ A "cell wide" ID does not bind a variable name to a specific output, i.e. is a one-to-many mapping

For example, if one had a code cell like:

id: cell-id
source:
import sys, IPython import display
print("stdout")
print("stderr", file=sys.stderr)
IPython.display.display(1)
2

This cell actually has four outputs, and so this may require additional logic, to specify which output is being referred to (or limiting to only the final output).

Using the user_expressions kernel feature

user_expressions are a feature of the Jupyter client/kernel, which allow expressions to be evaluated after execution of the code cell's main content, and bound to variable names, see: https://jupyter-client.readthedocs.io/en/stable/messaging.html#execute

It would be implemented for example like:

user_expressions:
  variable_name1: a
  variable_name1: b
source:
a = 1
b = 2

This overcomes an issue with the above cell ID:

  • ✅ it provides a one-to-one mapping between variable name and variable output

However, similar to IDs

  • user_expressions are not currently implemented for any Notebook editor/renders

Additional to this limitation, it should be noted that this feature of the client is quite under-documented and, appears to be unimplemented in some kernels.

The IPython kernel's implementation is to call https://docs.python.org/3/library/functions.html#eval on each expression: https://github.com/ipython/ipython/blob/d9b5e550b673db900a08d03740ec0ce94e1b8feb/IPython/core/interactiveshell.py#L2606-L2631

This is somewhat problematic, since it means that it is technically possible for the expression to change the "state" of the python interpreter. This makes the order of execution important, and one feels it would have been a better design choice to make the user_expressions format a list rather than a dict.

For nbclient, a proof-of-principle implementation can be found at jupyter/nbclient#160

Using dynamic kernel injection

A somewhat radically different approach, would be to allow the Jupyter client to evaluate variables within the Markdown cells, during execution.
For example, as demonstrated in executablebooks/MyST-NB#382

```{code-cell}
a=1
```

First call to {eval}`a` gives us: 1

```{code-cell}
a=2
```

Second call to {eval}`a` gives us: 2

Here, the user does not need to provide any "additional" binding of variables to variable names, it simply utilises the binding already present in the target kernel language.
As shown, the variable's output is also specific to where in the documentation it is evaluated, dependent on the state of the kernel at that point in the execution flow.

Pros

  • ✅ Requires no extra input from the user
  • ✅ it provides a one-to-one mapping between variable name and variable output
  • ✅ It is kernel agnostic

Cons

  • ❌ It would not work cross-document

This is also somewhat similar to https://github.com/agoose77/jupyterlab-imarkdown, which arose from the discussion in https://discourse.jupyter.org/t/inline-variable-insertion-in-markdown/10525/126.
Here, the outputs of such evaluations are stored as attachments, on the markdown cell.

@mmcky
Copy link
Member

mmcky commented Mar 3, 2022

Thanks @chrisjsewell for this really well thought out summary of the issue. I am still digesting the options and issues here, but here are some initial comments. I suspect the "optimal" approach may be a phased one.

Short Term:

The proposal for delaying glue to phase 3 (for cross-page references) is what I had in mind to retain current cross-document support.

I guess the main cost is:

  • ❌ In sphinx, because the outputs are only converted to AST in phase (3), it means that important AST transformations from phase (1) can be missed, and need to be retroactively applied, making the conversion "brittle"

and the delayed conversion to phase(3) would have limitations on mime-type that phase(1) doesn't have right?

Medium Term:

Assigning some kind of cell-id seems to make a lot of sense, but I agree it needs user interfaces to catch up.

Would it be a valid assumption/limitation that a glue link will always resolve to the output of a cell? Or will glue need to fetch the contents of some variable that may be defined within a cell?

For the dynamic kernel injection -- essentially each {eval} is treated as a quasi inline code-cell and the contents are dispatched to the kernel to execute as the document is run? So, in this approach, there is nothing stoping assignments, for example:

The variable is assigned {eval}`a = "Hi"`

and then:

```{code-cell}
a
```

would print Hi as the output

@rowanc1
Copy link
Member

rowanc1 commented Mar 3, 2022

My response to this is thinking through three aspects: (1) javascript variables and interactive components; (2) not having the notebooks available when you want to re-render markup (I want to pull someone else’s variables into my new document); and (3) integration of live computation and pulling in kernels from Thebe. A few recent that @stevejpurves did are here and here, these show some different prototype implementations of Thebe on how to hook back into Jupyter.


TLDR

Rowan likes using cell IDs, with a target syntax (comment) to make these more accessible to existing interfaces. Rowan is anti “dynamic kernel injection”, as it is more difficult to store results and doesn’t compose well across documents.


I believe (3) Using code cell IDs has a lot of advantages, with an addition of being able to label the code cell in any language with a simple target tag.

# (myPlot)=
plt.plot([1,2,3])

This could be easily glued into {glue}`myPlot` and if we need to get more specific (output[0]) then we could add that through inline role data, or options in a directive. This can also have logic to look at the cellID first, and then the label as it is resolving things.

For naming outputs, we could render a custom mimetype (fancy json) myst.outputs-v1 which could allow you to index into variables using names. This would require a simple renderer in each language, or we could build on top of existing json renderers (or ipywidgets even thinking ahead to thebe) that get known to MyST by the target syntax.

# (myDataForMyST) =
display.JSON({a: 1, b: [1,2,3]})

This could be accessed by {glue}`myDataForMyst.a` or (probably better) through a link to the notebook perhaps [][myNotebook.a], which solves some of the scoping issues across documents.

Exposing the MyST target comment in a code cell I also think means that we wouldn’t have to work with user_expressions (which is less well known and not accessible to interfaces).


For option (4) dynamic kernel injection, what is presented is a static render and there is different from reactive/explorable where all instances of a are the final value. This is exactly what we are doing with {r:display}`myVariable` in javascript. My opinion of (4) is that it requires a complicated data structure to store intermediate data (a=1 in cell 1, a = 2 in cell 3) or for you to have the kernel at runtime (you don’t have the final variable that is injected, and stored independently in the notebook execution history). I also think that if an author wants to write this, they can easily get to the desired result by naming the variable differently throughout their docs, and then (3) works well.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 3, 2022

thanks @rowanc1

with an addition of being able to label the code cell in any language with a simple target tag

# (myPlot)=
plt.plot([1,2,3])

Are you suggesting here that the target is defined within the source of the code cell?
Does that not mean it would have to be at least slightly language specific, in terms of the syntax for a comment?

@chrisjsewell
Copy link
Member Author

not having the notebooks available when you want to re-render markup (I want to pull someone else’s variables into my new document);

The other consideration here, that I didn't mention in the initial comment: if you want to reference "cross-document", then how do you cache outputs, and also ensure they are up-to-date?

Let's say I have two notebooks: notebook1 and notebook2.
I start to parse notebook1 and it has a reference like {glue}`notebook2.x` :

  • How do I obtain notebook2.x?
  • What if notebook2 has not been executed yet?
  • What if later I modify notebook2?

@stevejpurves
Copy link
Member

stevejpurves commented Mar 7, 2022

Thanks @chrisjsewell for this issue and all the information/options added in one place!

Note: most of the following comments is in the context of thebe/live code

I think that a cell id based approach with some mechanism around exposing variables within the cell is a good way to go.

And using the mimetype output mechanism in some way for communicating output variables seems a natural choice. Don't we only need a language/kernel implementation for convenience, or if/where there is no native equivalent to IPython.display.display. Seeing that mimetype based rendering is fundamental to Jupyter, do many kernels/languages already provide some equivalent?

CellId's is essentially how thebe has been working (or at least definite id of a single cell) and how thebe-core is currently working in terms of allowing a developer to embed jupyter cell outputs both in a plain page and a reactive page (e.g. involving react, or other framework and build process)

Note: the following is a bit out of scope maybe but related - relating to reactive controls in a Juptyerbook which can also drive computation and fresh outputs using controls that are in the document post sphinx build, we also have to think about how to handle "inputs" for a given cell -- where we identify variables that can be changed outside the cell but should trigger computation and a new output to be generated.

We're currently experimenting with cell ids plus markup in comments to implement that in the prototypes that we have, which works fine and is the approach that for example Colab also takes in its notebook widget hookup. However, "inputs" could also be exposed in via the same mechanism for outputs, as the code cell is executed at first with its default value.

The advantage of the comment based markup approach we've taken so far on inputs is that it is definitely kernel agnostic but also more accessible to the majority of users than using the display(...)/mimetype mechanism which is maybe less well known.

@rowanc1
Copy link
Member

rowanc1 commented Mar 7, 2022

Are you suggesting here that the target is defined within the source of the code cell?
Does that not mean it would have to be at least slightly language specific, in terms of the syntax for a comment?

Yes, keeping the target in the source code can simplify things from a UI perspective. It is language specific in that we don't match explicitly on a comment but something more like (?:[^\w(\s]+)\s?\(([\w]*)\)= which seems to do the trick.
image

If you want to reference "cross-document", then how do you cache outputs, and also ensure they are up-to-date?

For cross document, the pages would need to be re-rendered if the notebook changes. This is the same if I am not mistaken for existing cross references (and TOC, etc.): clean, build. Cross-reference resolution is one of the last steps to run, so that is after executing and caching results from notebooks. The notebooks would need to be executed before a full render of a jupyter-book anyways, so not much of a difference?

@choldgraf
Copy link
Member

The "use MyST target syntax within code cell comments" is a clever idea, I hadn't thought of that one. I agree with @chrisjsewell's concern re: language-specific, but if you made some strict rules about how that label could be defined, maybe it wouldn't be too bad as @rowanc1 notes above.

What I like about that idea:

  • It does not require implementing UI in a particular interface (which will be a pain)
  • It meshes nicely with patterns that exist within MyST (you are just creating another target to reference later)
  • It works in both text and UI form
  • It feels like an extension of "literate programming" to a degree

Note: this also reminds me a bit of how nbdev uses comments to control rendering behavior, and that has proven popular in that community AFAIK.

So how would it work in implementation? It sounds something like:

  • When you execute a notebook, loop through each code cell
  • First parse the cell for the (foo)= pattern. If one is discovered, then
  • Execute the cell
  • ...what next? Run display on any variables in the cell? On the next expression just after the (foo)= tag? What if the cell itself didn't display anything?

Things I think we should strive for:

  • Do not require any more kernel-specific development beyond what already exists
  • Do not require the user to remember things that aren't easy to remember (NotebookName/foo seems most reasonable to me, though just foo would be best if you could make sure to check for conflicting targets, same as we do with any other target in MyST).
  • Minimize the number of steps between a user's current workflow, and their ability to leverage this feature.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 8, 2022

Yes, keeping the target in the source code can simplify things from a UI perspective.

This could certainly be viable 👍

Cross-reference resolution is one of the last steps to run, so that is after executing and caching results from notebooks. The notebooks would need to be executed before a full render of a jupyter-book anyways, so not much of a difference?

This is the primary issue of executablebooks/MyST-NB#380 though:
cross-reference resolution (essentially simple text replacement) is not the same as inserting whole chunks of "new" AST into the cached AST. For example:

  • If the code output is simple text, that's probably fine (well for "plain" sphinx operation)
  • If the code output is an image, that's problematic, because sphinx does a bunch of processing of image nodes, before the AST is cached: https://github.com/sphinx-doc/sphinx/blob/b3812f72a98b01bae4b1158761082edc46cfa87f/sphinx/environment/collectors/asset.py#L32.
    • So now you have to work out how to retroactively apply all that processing on the image you are embedding.
  • If the output is SVG, and you want to output latex/pdf, then are extensions like sphinx.ext.imgconverter, but you have to make sure that the node is inserted into the AST before this runs
  • If the output is actual Markdown that needs parsing, well that is even more difficult to ensure it has been processed properly

TLDR: it makes life a lot easier to embed the outputs (converted to AST) as early as possible,
or at least you limit what kind of outputs can be embedded cross-document

This will likely be an issue across any implementation, not just sphinx

@choldgraf
Copy link
Member

One thing that might confuse people though, is that now we would be mixing metadata about a cell that is embedded as comments (target labels) and other metadata that piggy backs on the cell JSON metadata (tags, and everything else basically). Would that be confusing to people?

@rowanc1
Copy link
Member

rowanc1 commented Mar 10, 2022

To @choldgraf's point - putting this in similarly to a tag might be better (vscode being still hard to do that). Especially if that is similar to other existing workflows that are working today. The advantage of that is that you can edit the tags easily, whereas the ID is likely something that is less exposed to end-user manipulation (and is still new). I think I only suggested the manual tag because of the initial options proposed, reflecting now: going with a unique tag seems like a really strong approach!

For caching on the AST: I am in the process of writing our own "version of sphinx" in node, and I don't think that the caching challenge you mention @chrisjsewell is really much of an issue. There are just a few different stages where you need to cache, rather than just the end state for a page (i.e. before filling in cross-references, on image manipulations/screenshots, etc.). Ours is a custom implementation that is designed for this though, and I recognize that this is more difficult with sphinx. This is a similar problem to sphinx caching the left-navigation pane on pages, when it is out of date on a cached page. This always annoys me, but is fixed by a clear/re-render of all pages. I think that is exactly the same as this issue, no?

@choldgraf
Copy link
Member

re: tags, do you mean something like:

glue-<somename> would "glue" the output of that cell and assign it to the variable "somename"?

@rowanc1
Copy link
Member

rowanc1 commented Mar 11, 2022

Yep! However, the tag maybe should be more like label-<something> as this might have other applications beyond "glue". I.e. can target that to open to in a URL, or be used in a future {hover}`for example <something>`. The tag shouldn't be the behaviour, but the identifier!

@chrisjsewell
Copy link
Member Author

putting this in similarly to a tag might be better

ughh, I'm not a great fan of "abusing" tags to do everything: they have their place and are helpful user UI but, a tag is very different semantically to an identifier.
Not ruling it out, but just noting my wariness

I am in the process of writing our own "version of sphinx" in node, and I don't think that the caching challenge you mention @chrisjsewell is really much of an issue

Well, knowing our work on https://github.com/executablebooks/myst-spec and https://github.com/executablebooks/mystjs, I'd say there is still a decent way to go, to reach the complexity of sphinx 😅
So, I wouldn't discard the issue just yet

@stevejpurves
Copy link
Member

Should we consider transparency of the process of exposing variables over being kernel agnostic?

This thought comes from a conversation with someone who is an RMarkdown user, and specifically:

"At the moment, I'm using RMarkdown because it has a clean version control and no overhead to use the variables that I create from the RMarkdown. "Gluing variables" in Jupyter Books (https://jupyterbook.org/reference/cheatsheet.html#gluing-variables) is a lot of extra work."

This is readily possible in RMarkdown because of the single language implementation I guess combined with the fact that they are bound/constrained by the sphinx render process outlined at the top of this issue. Either way should transparency by a priority requirement in this?

@chrisjsewell
Copy link
Member Author

Should we consider transparency of the process of exposing variables over being kernel agnostic?

Thanks @stevejpurves, but can you explain a little more what you mean by transparency here?
Perhaps an example of this in RMarkdown?

@stevejpurves
Copy link
Member

@chrisjsewell what I meant is that no markup or additional code required in order to expose an output. (The glue'ing []or whatever equivalent process is not visible to the user)

Instead, variables computed in a code chunk are by default "available" and their value can be displayed with a rule-like syntax in the content e.g. "r total_area" (where " are backticks) would display total_area as evaluated in code chunk(s) - without the need for a glue() or any additional markup/tags/comments in the code.

The last section of https://www.hzaharchuk.com/rmarkdown-guide/content.html shows the example and as far as I can tell, no declaration is needed in the front matter to achieve that. (Perhaps because referenced variables are identified early and resolved during computation?)

@choldgraf
Copy link
Member

choldgraf commented Mar 14, 2022

I think it's worth describing two related but different workflows here:

  1. Inserting expressions or variables within a page. In this case, we might be able to assume that a kernel is present, and/or directly execute inline code when we execute the page.
  2. Inserting expressions of variables across pages. In this case, we assume we do not have access to the kernel that was there when a notebook was first executed, and we nonetheless want the value of some variable/expression.

It sounds like the use-case @stevejpurves mentions from RMarkdown is from usecase (1), but how does this pattern work for use-case (2)?

Also regarding @chrisjsewell's point about over-loading tags, I do agree this isn't the "intended use case" for tags either. I'd feel fine just using a dedicated cell metadata key for this, rather than over-loading tags, and use this as an impetus to improve the UI/UX around cell-level metadata in JupyterLab (e.g. maybe doing something similar to how they recently overhauled the settings UI). If people wrote their notebooks as text files, it would be pretty simple, e.g., something like:

```{code-cell}
---
store:
  key: varname
---
key = 2
```

(note maybe we'd just use user_expressions for this, since it's the same pattern, but I worry that this name will be confusing for most users)

@chrisjsewell
Copy link
Member Author

two related but different workflows here

yep thats also what I felt @stevejpurves was referring to, and relates to my "Using dynamic kernel injection" section in the initial comment.
I would be surprised if this would work cross-document (you would essentially have to cache every variable after every cell execution 🤔)

In may indeed be desirable to support both use cases, as separate concerns, e.g. (as-per my initial comment) {glue} and {exec} being two distinctly different things (or whatever syntax you want to use)

@rgaiacs
Copy link

rgaiacs commented Mar 15, 2022

Hello! @stevejpurves pointed me to this conversation.

There are two types of documents used in scientific communication: lab notebooks and traditional scientific articles. Quoting "The computational notebook of the future" by Konrad Hinsen:

If you look at a traditional scientific article, or technical report, you will notice that its narrative is structured according to a high-level view of the work. It starts by describing the context of the work, then its goals and a very brief summary of the methods, and right after that it presents results and discusses them. Technical details are only discussed afterwards, once the reader understands why they actually matter. With today’s notebooks, the technical details come first: a typical data analysis starts with cleanup and preprocessing steps, and therefore they also come first in the narrative.

Jupyter Notebook is great to write lab notebooks. I don't see a great need to be able to embed the value of variables inside Markdown cells as the user can change the writing style to accommodate the lab notebook structure.

To write "literate programming" scientific articles, it is interesting to be able to embed the value of variables inside Markdown cells. For example, the user calculates a Spearman correlation coefficient with associated p-value using SciPy and glue/reference the p-value inside the Markdown cell. Embed the value of variables inside Markdown cells has many technical challenges as mentioned in the first comment. From a user perspective, I see three approaches to write "literate programming" scientific articles.

Concatenate and Run All

Example: bookdown.

The master document will be create by the concatenation of each raw individual document. The master document is executed linearly and references to code output replaced by the code output itself. After the code output replacement, the master document is converted to the desired output (HTML, PDF, DOCX).

Pros:

  • Code output is guaranteed to be updated with their dependencies (for example, table built from CSV file)
  • Little to no code duplication to reuse code output

Cons:

  • Build is slow (all cells need to be executed)
  • Collaboration limited to GitHub

Link and Embed

Example: Curvenote

The user has two independent documents: the lab notebook and the traditional scientific article. All the computation is done in the lab notebook. Users can link the the lab notebook to the traditional scientific article and embed any code output of the lab notebook into the article. Both document can be converted to the desired output (HTML, PDF, DOCX).

Pros:

  • Build is fast
  • Collaboration can happen in online "What You See Is What You Get" (WYSIWYG) environment

Cons:

  • Code output in article need to be updated by user
  • No guaranteed that code output is updated with their dependencies (for example, table built from CSV file)

Staple when Finished

Example: Jupyter Book

Each individual document is executed independently. All the individual documents with the code output are stapled together and converted to the desired output.

Pros:

  • Code output is updated (if source file changed)
  • Build is fast (only changed files need to be executed)

Cons:

  • User need to duplicate some code across documents to reuse code output (for example, information extracted from CSV file)
  • No guaranteed that code output is updated with their dependencies (for example, if the Jupyter Notebook does not change but the CSV file read with pandas from the Jupyter Notebook changes)
  • Collaboration limited to GitHub

Final Consideration

The "Link and Embed" approach used by Curvenote has the biggest potential to lower the barrier to new adopters of "literate programming" scientific articles and worth investing on it. The "Concatenate and Run All" approach used by bookdown does not work well for long documents or documents that include costly computational code. The "Staple when Finished" approach used by Jupyter Book is ideal for technical tutorial or books but has big limitations to "literate programming" scientific articles.

@stevejpurves
Copy link
Member

@chrisjsewell it seem that this would mean executing code on demand when a markdown cell containing a variable is encountered? wouldn't that lead to unexpected results? i.e. computation is designed and execution order set by the notebook's linear flow, yet but also executing code from myst content (that could equally be outside a notebook in an .md file correct?) in that same context when it is encountered could produce a different result, as it injects new computation into that linear flow.

Apart from that and answering to @choldgraf's question - could cross document support not be provided but first performing a full pass over all markdown content to build a map of all required variables (perhaps scoped to notebooks somehow via their identifier in the myst expression) ahead of any computation, and these can then be harvested at compute time when the kernel live? i.e. what currently constraints notebooks computation to be the first step?

Another point is that I'm a bit confused over:

  1. using any variable defined in the kernel language, versus
  2. using data already exposed as a cell output via the mimetype package

My understanding of the glue mechanism is that it would expose variables as mimetype outputs (correct?), ok but otherwise are we mixing two different concerns here? and the aim here is to enable (1)?

@chrisjsewell
Copy link
Member Author

yet but also executing code from myst content ... could produce a different result, as it injects new computation into that linear flow

Not in executablebooks/MyST-NB#382 no, because it "restricts" itself to only running variable evaluation, i.e. nothing that would change the state of the kernel, only "querying" its current state
I imagine this is exactly how RMarkdown inline variables work

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 18, 2022

curvenote: cons: No guaranteed that code output is updated with their dependencies

This is definitely a key consideration for me: the trade-off, between execution and caching

I believe the famous quote is

There are only two hard things in Computer Science: cache invalidation and naming things.

-- Phil Karlton

Ideally, when users want their rendered pages to be showing the outputs which correspond to the latest input code. But also, want to minimise the amount of re-executing / re-rendering they have to do, to stay up-to-date

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Things that aren't closeable
Projects
None yet
Development

No branches or pull requests

6 participants