Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Section headings in markdown render nodes outside of the InputCell nodes #30

Closed
choldgraf opened this issue Feb 23, 2020 · 13 comments
Closed

Comments

@choldgraf
Copy link
Member

I just noticed that we have a bug in how the rendered docutils nodes are put (or not) inside of the CellInputNode that should contain them. I think I know what the problem is.

For each markdown cell, we create a CellInputNode and set it as the current node. Then we render the markdown content inside. What we want is for whatever nodes are spit out to be put inside the CellInputNode (as children). This usually works for most markdown. However, if there are section headers, or other syntax that should trigger a new section, then the render function creates a new section and starts rendering outside of the CellInputNode. I think this is where that happens for headers:

https://github.com/ExecutableBookProject/myst_parser/blob/develop/myst_parser/docutils_renderer.py#L259

@chrisjsewell can you think of a way that we can ensure the renderer is always placing objects inside the CellinputNode?

@choldgraf
Copy link
Member Author

One option that would be fairly hacky would be to, at the end of the "render" operation, check if there are any non CellNode nodes in the doctree. And if so, assume that they should be appended to the children of the latest CellInputNote...

@chrisjsewell
Copy link
Member

Hmm thats a tricky one. It feels like you have two immiscible hierarchies here; sections and markdown cells, since both can encapsulate each other. Do you actually need to wrap the markdown cells in CellInputNode, what would it be used for? Would it be easier to do what the SphinxRenderer now does and add a 'classed' comment node, to denote a break between two markdown cells, as in render_block_break

@choldgraf
Copy link
Member Author

choldgraf commented Feb 23, 2020

Well, the main trick is that jupyter-book often wants to know about a "block" of markdown to do things to it (e.g. "look for the cell tag div.cell.tag_hide_input and when found, hide the input of that cell"). So keeping things together with their metadata is important to Jupyter Book at least. The other nice part about this from a CSS perspective is that you can use a single selector (e.g. div.cell) and know that everything on the page will behave the same way at the top level.

  1. Special-case the page title field and have some combination of "check ntbk-level metadata, and if no title field is there, find the first markdown heading and treat it as the "title" for the document". Then the title node gets added before any subsequent cell nodes are added.
  • We should probably define a "title" standard in the myst-parser anyway, no? I keep forgetting that Sphinx will treat multiple single # sections as pages in the sidebars etc.
  1. If there is another way to wrap a bundle of markdown as a "cell" with a class in the HTML, then we could also say something like "only put markdown cells in the container node if they have tags in them", and then we could say "you need at least one markdown cell without tags in it and a header inside". That feels edge-casey for the user.

This brings up a third point - since tags are the lightest-weight way to add metadata to cells, what if we adopted the tag1, tag2 syntax for the short-hand markdown breaks. So we accept:

+++ tag1, tag2

as valid, and it'll parse anything after the +++ as a comma-or-space-separated list of tags?

@chrisjsewell
Copy link
Member

This is possible in HTML, because headers are stored in a flat structure:

<div class="cell">
<h1>This is heading 1</h1>
<p>This is some text.</p>
<h2>This is heading 2</h2>
<p>This is some other text.</p>
</div>
<div class="cell">
<h3>This is heading 3</h3>
</div>

But in the doctree format (with heirarchical headers) this is surely impossible to represent

<cell>
    <section h1>
        This is some text
        <section h2>
            This is some other text.

How can you close the cell and start h3 here?
I think it would also not be possible to use such a structure when converting to LaTeX.

I feel if you did want this, it would be more robust to deal with this in the HTML translator, if you had signalled the cell begin/end in the docutils tree, something like:

<comment classes=["markdown_cell_start"]>
<section h1>
    This is some text
    <section h2>
        This is some other text.
        <comment classes=["markdown_cell_end"]>
        <comment classes=["markdown_cell_start"]>
        <section h3>
            <comment classes=["markdown_cell_end"]>

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 24, 2020

what if we adopted the tag1, tag2 syntax for the short-hand markdown breaks

The only thing with this is that it does not allow for strict round-tripping, as opposed to:

+++ {"tags": ["tag1", "tag2"], "other": "data"}

In general though, I feel we should be emphasising the use of in-text directives or syntax, rather than markdown cell metadata.

For example in your Hiding markdown cells, with MyST I feel it would be better to either comment out text you don't want in the document:

% commented out
% text

or using a directive to indicate toggle views, like you can do with the cloud sphinx theme:

```{rst-class} html-toggle
```

# Title

@choldgraf
Copy link
Member Author

So just to clarify, it sounds like you're suggesting we shouldn't be using container nodes at all for this, and instead using a custom html translator along with some kind of metadata that's included w comments in the sphinx AST?

I guess the translator would need to override the comment render function, to check for the tag metadata and render the div blocks as needed?

Maybe @mmcky could advise on the best way to do this?

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 24, 2020

So just to clarify, it sounds like you're suggesting we shouldn't be using container nodes at all for this

For markdown cells yes, because IMO it's theoretically impossible, given the docutils AST, to add a container that is guaranteed to not break the section hierarchy.

For code cells it should be fine. The only possible edge-case I could think of is if a code output had a text/markdown mime type, that also contained headers. I notice actually in CellOutputsToNodes you don't deal with text/markdown.

@choldgraf
Copy link
Member Author

OK - I'll try adding a lightweight HTMLTranslator that overrides comment behavior...I might have some questions since I haven't build a translator before :-)

@choldgraf
Copy link
Member Author

OK quick question for @chrisjsewell:

How do we handle the case when we are leaving a codecell? We want the renderer to start adding nodes as a sibling of the CellNode that was the last codecell, not as a child of the node. But by default if we do something like renderer.current_node = latest_code_cell, then if the next markdown is not a header it will try to render it as a child of the code cell, and not a new section, right?

@choldgraf
Copy link
Member Author

Question 2: is there any way to do this without manually setting a custom HTMLTranslator? I ask this because I know some Sphinx themes also do this, so if we want to use our own HTMLTranslator, those themes would not be able to use this functionality of MyST-NB.

@chrisjsewell
Copy link
Member

How do we handle the case when we are leaving a codecell?

Well you want to revert to the last node that was the current_node before the codecell (i.e. its parent). This is exactly the use of the (now named) DocutilsRenderer.current_node_context method:

        with self.current_node_context(node, append=True):
            self.render_children(token)

which reverts the current_node back to its original state on exit.

@chrisjsewell
Copy link
Member

is there any way to do this without manually setting a custom HTMLTranslator?

I have no idea about HTMLTranslators I'm afraid!? I just know it will be difficult/impossible to not break the docutils AST with the original approach.

@choldgraf
Copy link
Member Author

OK, I thought about your point re: nestedness and the AST, and decided to stop being stubborn and re-work the parser so that it no longer forces the top-level "flat" cell structure that notebooks have.

Check out #31 for the latest state of this...I think it's an approach that more naturally fits with the AST we're working with

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants