Rendering each markdown cell causes incorrect section headers #17

choldgraf · 2020-02-21T06:25:44Z

I just discovered a bug (from a user's perspective) that is technically a feature in rST :-P

I just noticed that the top-level markdown header of each cell in the notebook is being treated as an H1 header for the whole document, even if the header text is technically ##.

e.g. see how Sphinx thinks there are multiple page titles for the single notebook document that is here: https://sphinx-jupyter-notebook.readthedocs.io/en/latest/

I think this is because of Sphinx deciding on page titles etc based on the order in which headers happen, so if we parse a cell with only a single ## header, Sphinx will treat that as a document-level header.

e.g., note how the Sphinx document output is the same whether the two headers have different numbers of # symbols.

So from a docutils perspective, we need to make sure that cells that are sub-sections are nested properly. Even though notebooks have a single top-level hierarchy, docutils doesn't.

@chrisjsewell does that sound correct to you? Any ideas on the right way to do this?

The text was updated successfully, but these errors were encountered:

choldgraf · 2020-02-21T06:30:01Z

Two ideas for how we could handle it:

Keep track of the depth of docutils tree we should be in each time we parse a cell, and make sure that we insert the resulting docutils objects at the correct depth
Convert the ipynb file into a MyST markdown representation and parse that (though then we'd lose things like cell/line numbers?)

chrisjsewell · 2020-02-21T07:01:05Z

So here https://github.com/ExecutableBookProject/sphinx-notebook/blob/80569087613cc09e84e980012e9d4feb94aaf827/sphinx_notebook/parser.py#L50

You initialise a new renderer for every cell. This obviously wipes any internal state, including self._level_to_elem.
You should probably rework the code so that the render is instantiated once, and just change the current_node before parsing new text.

chrisjsewell · 2020-02-21T07:08:25Z

One thing that should be checked/documented at the myst_parser level, is that the docutils.RstParser specifically prohibits transitions from 'non-consecutive' header levels, e.g.

# Title h1

### Title h3

If think with myst, this will probably end up being the same as:

# Title h1

## Title h2

I should check this, and decide how to handle it, e.g. would sphinx be happy with 'virtual' sections being added that don't have a <title> child.

chrisjsewell · 2020-02-21T07:24:40Z

Two other things:

Because you are using myst_ast = Document(cell["source"]) to parse cells, for each cell a front_matter YAML block will be looked for. This probably doesn't make sense at a cell level, as you could end up with multiple front matter blocks that obviously doesn't make sense. You should think about parsing at a lower level (take a look what Document actually does), and having an (optional) key in the notebook level metadata that relates to this front matter, then parse it to render_front_matter().
As I've mentioned somewhere before, currently docutils will report any parsing errors by line number, but obviously won't tell you what cell that line was in. You should think about how to override docutils.utils.Reporter.system_message, to also report the cell number.

choldgraf · 2020-02-21T07:34:08Z

You should probably rework the code so that the render is instantiated once, and just change the current_node before parsing new text.

Is that as straightforward as updating the attribute directly by assigning a new node?

chrisjsewell · 2020-02-21T07:35:23Z

Yeh should be

choldgraf · 2020-02-21T07:50:40Z

hmmm, I just tried doing:

# Before looping cells
renderer = SphinxRenderer(document=document, current_node=None)

and in the cell loop:

# also tried assigning directly
renderer.set_current_node(cell_input)

And I got this error:

Exception occurred:
  File "/c/Users/chold/Dropbox/github/forks/python/ebp/myst_parser/myst_parser/docutils_renderer.py", line 142, in render_paragraph
    para.line = token.range[0]
AttributeError: 'Paragraph' object has no attribute 'range'

chrisjsewell · 2020-02-21T07:56:18Z

Oh no you don't have to use set_current_node, thats a context manager for transiently setting the current node whilst parsing children tokens. It should probably be rename with_current_node.

Just literally do renderer.current_node = cell_input

choldgraf · 2020-02-21T07:57:15Z

I did that, got the same error :-/

I'm doing:

            # If a markdown cell, simply call the Myst parser and append children
            if cell["cell_type"] == "markdown":
                # Initialize the render so that it'll append things to our current cell
                renderer.current_node = cell_input
                with renderer:
                    # TODO: This shouldn't be `Document` because it'll look for yaml frontmatter
                    #       Should be something else but need to see how myst parser does it.
                    myst_ast = Document(cell["source"])
                    renderer.render(myst_ast)

chrisjsewell · 2020-02-21T08:03:32Z

Ah yeh, you need to use the with renderer for the entire notebook parse, because on exit it resets the parsing tokens to the standard ones used by mistletoe (not the myst specific ones). This context manager behaviour is from the mistletoe BaseRenderer, but probably should be improved so that the myst tokens are reinstated on entrance.

choldgraf · 2020-02-21T08:15:04Z

ahhh that got it

chrisjsewell mentioned this issue Feb 21, 2020

Test/document behaviour of non-consecutive header levels executablebooks/MyST-Parser#71

Closed

choldgraf mentioned this issue Feb 21, 2020

fixing parser headings #19

Merged

choldgraf closed this as completed in #19 Feb 21, 2020

chrisjsewell mentioned this issue Feb 21, 2020

Minor API Improvements executablebooks/MyST-Parser#72

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rendering each markdown cell causes incorrect section headers #17

Rendering each markdown cell causes incorrect section headers #17

choldgraf commented Feb 21, 2020

choldgraf commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020 •

edited

choldgraf commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020

choldgraf commented Feb 21, 2020 •

edited

chrisjsewell commented Feb 21, 2020

choldgraf commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020

choldgraf commented Feb 21, 2020

Rendering each markdown cell causes incorrect section headers #17

Rendering each markdown cell causes incorrect section headers #17

Comments

choldgraf commented Feb 21, 2020

choldgraf commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020 • edited

choldgraf commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020

choldgraf commented Feb 21, 2020 • edited

chrisjsewell commented Feb 21, 2020

choldgraf commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020

choldgraf commented Feb 21, 2020

chrisjsewell commented Feb 21, 2020 •

edited

choldgraf commented Feb 21, 2020 •

edited