WIP: Proposed improvement/alternative to `jupyter-sphinx` #47

chrisjsewell · 2020-02-16T00:03:36Z

@choldgraf @akhmerov following discussion in #42 I've created a proposal here for how I would forsee the execution working.

To recap, the Sphinx build phases and Sphinx core events can be summarised as:

1. event.config-inited(app,config)
2. event.builder-inited(app)
3. event.env-get-outdated(app, env, added, changed, removed)
4. event.env-before-read-docs(app, env, docnames)

for docname in docnames:
    5.  event.env-purge-doc(app, env, docname)
    if doc changed and not removed:
        6. source-read(app, docname, source)
        7. run parser: text -> docutils.document
        8. apply transforms (by priority): docutils.document -> docutils.document
        9. event.doctree-read(app, doctree)

10. event.env-updated(app, env)
11. event.env-check-consistency(app, env)

for docname in docnames:
    12. apply post-transforms (by priority): docutils.document -> docutils.document
    13. event.doctree-resolved(app, doctree, docname)

14. call builder

15. event.build-finished(app, exception)

jupyter_sphinx.main contains the extension setup, and utilises this approach:

In (2) Setup a JupyterDB, that is stored within the doctrees build folder. This is imported from a separate package jupyter_db whose sole purpose is to maintain a database of all the kernels code cells, and their outputs. It doesn't know anything about sphinx, hence having it as a separate package. I have written this as an SQL database (with sqlalchemy to provide an OO layer on top), since I think that's best for fast look-up and ansynchronous access. But you could essentially write this in anyway you see fit, as long as it has the same interface (i.e. class methods)

During (3) any documents (and their related cells, kernels, outputs) that have been changed/removed are removed from JupyterDB.

During (7) directives save kernels and code cells to the JupyterDB and the JupyterDB return primary keys (pk) to those elements, which are saved on the docutils elements for later lookup.

During (10) the JupyterDB is parsed to an entry-point defined function: run_execution. This function's job is to act 'in-place' on JupyterDB to populate it with outputs.
Again it doesn't know anything about sphinx, so could be external to jupyter-sphinx.
To decide which kernels need re-running, it can check which kernels / cells no longer have outputs, or could even check the outputs 'last modified time'.

During (12) The primary keys are used to access the outputs, and insert them where necessary.

The only addition is that the SphinxRenderer handles cross references with the `pending_xref` node.

chrisjsewell · 2020-02-16T00:12:06Z

FYI this is what the internal structure of the SQL database is:

akhmerov · 2020-02-16T00:15:45Z

This is really cool! After I understand it better, I think it would be great if this could go to jupyter-sphinx, and fix a bunch of issues there!

akhmerov · 2020-02-16T00:32:10Z

@chrisjsewell a few questions:

In jupyter-sphinx we let the users download the notebooks directly. Isn't that a more natural way to store the inputs and outputs, as opposed to a database? At which step would it negatively influence performance?
Can you explain more about the JupyterViewNode: what is its purpose?

choldgraf · 2020-02-16T01:20:38Z

I'll take a look at this as well, it sounds like quite an interesting proposal!

At a first glance, my main concern is with maintainability and using an SQLite database. It may be the best technical solution to this, but we should think about it balanced against the likelihood that potential maintainers will have experience with sqlite. I can get a better feel for this after looking through the implementation code. To me the question is always "will contributing this project be accessible to somebody with only volunteer time, and a PhD's worth of Python experience in a scientific field"

chrisjsewell · 2020-02-16T01:58:08Z

So if you look at JupyterDB, there is a get_notebook() method that will return an nbformat notebook.
JupyterDB is written to be ‘backend agnostic’, so you should be able to switch out the SQL backend with anything you want.
The ‘natural’ reason to use an SQL backend, is because cells and kernels (and mimebundles) utilise integer primary keys to reference them, that are unique across all documents/notebooks (which are not necessarily one and the same). At the ‘post transform’ stage you need to be able to return a cells outputs by its pk (or a synonymous name string). Using a folder of notebooks as a backend, for example, you would have to work out how to cache file reads, so you don’t have to re-read a notebook every time you want to access a cell. You would then have to keep an internal database of pk’s to cells/kernels.

chrisjsewell · 2020-02-16T03:35:48Z

To me the question is always "will contributing this project be accessible to somebody with only volunteer time, and a PhD's worth of Python experience in a scientific field"

Well I think sqlalchemy makes it relatively easy to work with sql databases, and hides a lot of the complexity. I only learnt it in January and didn't find it too traumatising. But have a look at the code and see what you think.

chrisjsewell · 2020-02-16T09:57:46Z

Can you explain more about the JupyterViewNode: what is its purpose?

The idea of this is that there is no inherent constraint that the outputs of a code cell must have one defined visualisation. Along the lines of the classic Model-View pattern for data visualisation, ideally the data and the views would be split. Granted that for simple use cases, the user may want to reduce the syntax, and only use the single jupyter-execute directive for both. But to have greater control of the output (e.g. controlling whether to render latex as math), this is a better approach. This also opens up possibilities like having an inline role views of the output, and achieving things like #46

akhmerov · 2020-02-16T10:57:43Z

Thanks for the explanation. I agree that storing all outputs, and referring to them from the JupyterView node is much better than what we have jupyter-sphinx right now—at the very least to better support multiple writers.

How would the user control rendering of a single output? By including a separate .. jupyter-view?

I would like to suggest sticking to the constraint that the cell outputs only occur after the cell. This choice was a result of a lengthy discussion when we were working on jupyter-sphinx, and the main motivation was the robustness. When working on documentation, it is usually easy to keep in your memory what happens within a single kernel, and manipulate its outputs explicitly. Allowing to include outputs elsewhere may lead to subtle errors because the effects of the changes become much more nonlocal. Similarly, I propose to make the inline role also execute a small cell, rather than refer to some code. I expect that the inline executable code would mainly yield text or latex output, and would be relatively short. This approach (inline code rather than inline references) is also used by Rmd.

akhmerov · 2020-02-16T11:05:01Z

Also I'd like provide more background about the jupyter-sphinx design: a principle I found really useful was to critically examine the amount of configurability we allow. For example, I see that you control that whether errors are tolerated is controlled on a per-kernel basis in your implementation, whereas the jupyter-sphinx only has a global setting or a per-cell setting. This is a result of considering usage scenarios outlined in jupyter/jupyter-sphinx#73.

Basically, since adding features is much easier than removing them, we opted for implementing a minimal set that we consider definitely useful, and to wait for requests with implementing anything else.

akhmerov · 2020-02-16T11:15:38Z

jupyter_sphinx/main.py

+
+    # Document Elements
+    app.add_directive("jupyter-kernel", JupyterKernel)
+    app.add_directive("jupyter-exec", JupyterExec)


Another question: is there a reason why you use jupyter-exec, and not jupyter-execute? I personally don't have a preference either way, but if possible I'd like to avoid changing conventions. Or is there another package that introduces a jupyter-exec directive?

No it was absolutely an arbitrary choice, and can be changed to jupyter-execute 👍

chrisjsewell · 2020-02-16T13:15:13Z

Also I'd like provide more background about the jupyter-sphinx design

Thanks 👍 , I will certainly take this into consideration. Actually the only reason I added the :allow-errors option, was because I wanted to check errors were being correctly renderer in my test case: https://github.com/ExecutableBookProject/myst_parser/blob/jupyter/tests/jupyter_sphinx/sourcedirs/basic/contents.rst

chrisjsewell · 2020-02-16T13:22:34Z

How would the user control rendering of a single output? By including a separate .. jupyter-view?

Yeh that's the idea:

.. jupyter-exec::
    :timeout: 20
    :show-code:

    print("x")
    a = 1
    a

.. jupyter-view::

Although as I mentioned before, for simple use cases this may not be ideal; to have to put a .. jupyter-view:: after every code block. So maybe there is some basic options for rendering on the jupyter-execute, but then if you want to have finer output control, you use a jupyter-view. Something to think about...

akhmerov · 2020-02-16T13:26:48Z

So maybe there is some basic options for rendering on the jupyter-execute, but then if you want to have finer output control, you use a jupyter-view.

jupyter-sphinx now defines some basic options like :hide-output:. Wouldn't enabling a more fine-grained control over the representation be a worthy extension of the interface once there's a specific need for it?

chrisjsewell · 2020-02-16T13:40:47Z

jupyter-sphinx now defines some basic options like :hide-output:. Wouldn't enabling a more fine-grained control over the representation be a worthy extension of the interface once there's a specific need for it?

Well you still at least need to make sure that the code/process is structured in such a way that it can be extended like this. Also, to my mind there is already some potential needs for it; adding things like captions and labels to outputs, allowing for complimentary latex/PDF output, etc...

akhmerov · 2020-02-16T13:45:38Z

Ah, but I have nothing against the view node or the code structure. Rather I think the view directive may be omitted at first—that faces the users and complicates the interface. Instead, I imagine the view node would be inserted during the execution.

akhmerov · 2020-02-16T13:50:48Z

Also: what are output captions and labels? Is that similar to latex figure captions and labels? If that's the case, wouldn't it be better specified outside of the jupyter-specific directives?

chrisjsewell · 2020-02-16T13:54:30Z

Ah, but I have nothing against the view node or the code structure. Rather I think the view directive may be omitted at first—that faces the users and complicates the interface. Instead, I imagine the view node would be inserted during the execution.

Yeh no problem, I wouldn't be completely against that lol

chrisjsewell · 2020-02-16T14:12:32Z

Also: what are output captions and labels? Is that similar to latex figure captions and labels? If that's the case, wouldn't it be better specified outside of the jupyter-specific directives?

Well it would be like the figure directive: https://github.com/docutils-mirror/docutils/blob/e88c5fb08d5cdfa8b4ac1020dd6f7177778d5990/docutils/parsers/rst/directives/images.py#L101

You mean wrapping the jupyter-execute? That would make it very difficult to achieve round-trip conversion between the text representation and the notebook.

akhmerov · 2020-02-16T14:21:03Z

I think a round-trip conversion would be nearly impossible to achieve regardless of that, due to the notebook format limitations. Both v2 and v2.1 component listings in executablebooks/meta#21 don't include round-tripping. See also discussions in #27. Come to think of this, even following this PR, a text representation with more than one kernel cannot round-trip to a notebook.

chrisjsewell · 2020-02-16T22:50:40Z

Well it should be round-trippable for a basic subset of the text format. Yes you won’t be able to do a 1:1 mapping for multiple kernels, but then generally if your just using them for separate sections you can just split those sections into multiple documents (would be nice in Sphinx if multiple source documents could end up as a single HTML page).

For single kernel pages particularly, it might be nice (and facilitate this round trip) if you could specify the kernel in the front matter. In RST it would look like this:

:kernel_name: name

In myst:

—-
kernel_name: name
—-

This would be available (after passing the whole document) in env.metadata[docname][“kernel_name”]]. The only drawback is that this would necessitate handling back population of code cells with the kernel name in an additional transform after the initial parse (step (8)).

akhmerov · 2020-02-16T23:21:26Z

Well it should be round-trippable for a basic subset of the text format.

Sure, but why would figure captions and labels belong to this basic subset?

My concern here is that this way a directive for code execution also takes responsibility for something unrelated to execution, which is also provided by another directive.

EDIT: kernel in the frontmatter sounds like a great idea.

chrisjsewell · 2020-02-17T04:44:43Z

My concern here is that this way a directive for code execution also takes responsibility for something unrelated to execution, which is also provided by another directive.

Well surely this was my whole point with the jupyter-view directive: at present the jupyter-execute directive is not only responsible for code execution, it also takes responsibility for the output rendering.

This is not already provided by another directive; the figure directive only works if you have an image uri, which is not the way that jupyter-execute works.

akhmerov · 2020-02-17T08:37:23Z

@chrisjsewell great point. Since caption itself may include arbitrary rst content, it also seems hard to implement a wrapper that would create figures.

Continuing with the usage of jupyter-view: do you envision that it'd have a reference to the jupyter-execute that generates the view? Or would it only be correct to include the view immediately after the execute?

Also: how would this interact with thebelab? Since its users aren't the code devs, they will be confused if the input code is separated in the document from the result it produces (also thebelab only inserts the output right after the input). Should the code appear in the figure body after thebelab is activated? What should happen to the original code? Same questions apply to the case when there are multiple references to the execute call, or when the execute and view are separated in the document. Currently in jupyter-sphinx the code only moves when thebelab is activated if initially if it was rendered below the output, and that's not too far.

Further questions: do we expect that the users will need to tweak the rendering of the mime bundle on a case by case basis, or would a global configuration suffice?

akhmerov · 2020-02-17T08:47:50Z

jupyter_db/main.py

+            mime: OrmMimeBundle = session.query(OrmMimeBundle).filter_by(pk=pk).one()
+            source = mime.source
+            if mime.mimetype == "html":
+                source = dedent(source)


Is this always a safe operation?

What do you mean safe? It’s not actually executing the source?

Sorry, that's a noob question: is dedenting html guaranteed to not change its looks?

choldgraf · 2020-02-19T18:18:20Z

What's the status of this PR? Is it superseded by the existence of jupyter-cache?

chrisjsewell · 2020-02-19T19:24:31Z

What's the status of this PR? Is it superseded by the existence of jupyter-cache?

Eventually it will be, but I'll leave it open for now since it has some different/additional code than is currently in jupyter-cache, that I may need at a later stage.

chrisjsewell · 2020-02-24T07:18:16Z

Now being implemented in jupyter-cache

chrisjsewell added 7 commits February 15, 2020 16:37

Split SphinxRenderer from DocutilsRenderer

f82e674

The only addition is that the SphinxRenderer handles cross references with the `pending_xref` node.

Add initial implementation of jupyter-sphinx proposal

792142b

Fix tests

a9ccc74

fix test

75a872a

test fix

a17c20a

Add basic outputs

c2264f2

Remove 3.5 testing for now

63e4add

Add TODO comment

2178f73

akhmerov reviewed Feb 16, 2020

View reviewed changes

choldgraf mentioned this pull request Feb 16, 2020

Things to discuss before Chris leaves executablebooks/meta#25

Closed

Add image handling (not yet working)

3686b5f

akhmerov reviewed Feb 17, 2020

View reviewed changes

chrisjsewell closed this Feb 24, 2020

chrisjsewell deleted the jupyter branch July 20, 2020 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Proposed improvement/alternative to `jupyter-sphinx` #47

WIP: Proposed improvement/alternative to `jupyter-sphinx` #47

chrisjsewell commented Feb 16, 2020 •

edited

chrisjsewell commented Feb 16, 2020

akhmerov commented Feb 16, 2020

akhmerov commented Feb 16, 2020 •

edited

choldgraf commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020 •

edited

chrisjsewell commented Feb 16, 2020 •

edited

chrisjsewell commented Feb 16, 2020

akhmerov commented Feb 16, 2020

akhmerov commented Feb 16, 2020 •

edited

akhmerov Feb 16, 2020

chrisjsewell Feb 16, 2020

chrisjsewell commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020

akhmerov commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020

akhmerov commented Feb 16, 2020

akhmerov commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020

akhmerov commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020 •

edited

akhmerov commented Feb 16, 2020 •

edited

chrisjsewell commented Feb 17, 2020

akhmerov commented Feb 17, 2020

akhmerov Feb 17, 2020

chrisjsewell Feb 17, 2020

akhmerov Feb 17, 2020 •

edited

choldgraf commented Feb 19, 2020

chrisjsewell commented Feb 19, 2020

chrisjsewell commented Feb 24, 2020

WIP: Proposed improvement/alternative to jupyter-sphinx #47

WIP: Proposed improvement/alternative to jupyter-sphinx #47

Conversation

chrisjsewell commented Feb 16, 2020 • edited

chrisjsewell commented Feb 16, 2020

akhmerov commented Feb 16, 2020

akhmerov commented Feb 16, 2020 • edited

choldgraf commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020 • edited

chrisjsewell commented Feb 16, 2020 • edited

chrisjsewell commented Feb 16, 2020

akhmerov commented Feb 16, 2020

akhmerov commented Feb 16, 2020 • edited

akhmerov Feb 16, 2020

Choose a reason for hiding this comment

chrisjsewell Feb 16, 2020

Choose a reason for hiding this comment

chrisjsewell commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020

akhmerov commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020

akhmerov commented Feb 16, 2020

akhmerov commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020

akhmerov commented Feb 16, 2020

chrisjsewell commented Feb 16, 2020 • edited

akhmerov commented Feb 16, 2020 • edited

chrisjsewell commented Feb 17, 2020

akhmerov commented Feb 17, 2020

akhmerov Feb 17, 2020

Choose a reason for hiding this comment

chrisjsewell Feb 17, 2020

Choose a reason for hiding this comment

akhmerov Feb 17, 2020 • edited

Choose a reason for hiding this comment

choldgraf commented Feb 19, 2020

chrisjsewell commented Feb 19, 2020

chrisjsewell commented Feb 24, 2020

WIP: Proposed improvement/alternative to `jupyter-sphinx` #47

WIP: Proposed improvement/alternative to `jupyter-sphinx` #47

chrisjsewell commented Feb 16, 2020 •

edited

akhmerov commented Feb 16, 2020 •

edited

chrisjsewell commented Feb 16, 2020 •

edited

chrisjsewell commented Feb 16, 2020 •

edited

akhmerov commented Feb 16, 2020 •

edited

chrisjsewell commented Feb 16, 2020 •

edited

akhmerov commented Feb 16, 2020 •

edited

akhmerov Feb 17, 2020 •

edited