Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rendering each markdown cell causes incorrect section headers #17

Closed
choldgraf opened this issue Feb 21, 2020 · 11 comments · Fixed by #19
Closed

Rendering each markdown cell causes incorrect section headers #17

choldgraf opened this issue Feb 21, 2020 · 11 comments · Fixed by #19

Comments

@choldgraf
Copy link
Member

I just discovered a bug (from a user's perspective) that is technically a feature in rST :-P

I just noticed that the top-level markdown header of each cell in the notebook is being treated as an H1 header for the whole document, even if the header text is technically ##.

e.g. see how Sphinx thinks there are multiple page titles for the single notebook document that is here: https://sphinx-jupyter-notebook.readthedocs.io/en/latest/

I think this is because of Sphinx deciding on page titles etc based on the order in which headers happen, so if we parse a cell with only a single ## header, Sphinx will treat that as a document-level header.

e.g., note how the Sphinx document output is the same whether the two headers have different numbers of # symbols.

image

So from a docutils perspective, we need to make sure that cells that are sub-sections are nested properly. Even though notebooks have a single top-level hierarchy, docutils doesn't.

@chrisjsewell does that sound correct to you? Any ideas on the right way to do this?

@choldgraf
Copy link
Member Author

Two ideas for how we could handle it:

  • Keep track of the depth of docutils tree we should be in each time we parse a cell, and make sure that we insert the resulting docutils objects at the correct depth
  • Convert the ipynb file into a MyST markdown representation and parse that (though then we'd lose things like cell/line numbers?)

@chrisjsewell
Copy link
Member

So here https://github.com/ExecutableBookProject/sphinx-notebook/blob/80569087613cc09e84e980012e9d4feb94aaf827/sphinx_notebook/parser.py#L50

You initialise a new renderer for every cell. This obviously wipes any internal state, including self._level_to_elem.
You should probably rework the code so that the render is instantiated once, and just change the current_node before parsing new text.

@chrisjsewell
Copy link
Member

One thing that should be checked/documented at the myst_parser level, is that the docutils.RstParser specifically prohibits transitions from 'non-consecutive' header levels, e.g.

# Title h1

### Title h3

If think with myst, this will probably end up being the same as:

# Title h1

## Title h2

I should check this, and decide how to handle it, e.g. would sphinx be happy with 'virtual' sections being added that don't have a <title> child.

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 21, 2020

Two other things:

  1. Because you are using myst_ast = Document(cell["source"]) to parse cells, for each cell a front_matter YAML block will be looked for. This probably doesn't make sense at a cell level, as you could end up with multiple front matter blocks that obviously doesn't make sense. You should think about parsing at a lower level (take a look what Document actually does), and having an (optional) key in the notebook level metadata that relates to this front matter, then parse it to render_front_matter().
  2. As I've mentioned somewhere before, currently docutils will report any parsing errors by line number, but obviously won't tell you what cell that line was in. You should think about how to override docutils.utils.Reporter.system_message, to also report the cell number.

@choldgraf
Copy link
Member Author

You should probably rework the code so that the render is instantiated once, and just change the current_node before parsing new text.

Is that as straightforward as updating the attribute directly by assigning a new node?

@chrisjsewell
Copy link
Member

Yeh should be

@choldgraf
Copy link
Member Author

choldgraf commented Feb 21, 2020

hmmm, I just tried doing:

# Before looping cells
renderer = SphinxRenderer(document=document, current_node=None)

and in the cell loop:

# also tried assigning directly
renderer.set_current_node(cell_input)

And I got this error:

Exception occurred:
  File "/c/Users/chold/Dropbox/github/forks/python/ebp/myst_parser/myst_parser/docutils_renderer.py", line 142, in render_paragraph
    para.line = token.range[0]
AttributeError: 'Paragraph' object has no attribute 'range'

@chrisjsewell
Copy link
Member

Oh no you don't have to use set_current_node, thats a context manager for transiently setting the current node whilst parsing children tokens. It should probably be rename with_current_node.

Just literally do renderer.current_node = cell_input

@choldgraf
Copy link
Member Author

I did that, got the same error :-/

I'm doing:

            # If a markdown cell, simply call the Myst parser and append children
            if cell["cell_type"] == "markdown":
                # Initialize the render so that it'll append things to our current cell
                renderer.current_node = cell_input
                with renderer:
                    # TODO: This shouldn't be `Document` because it'll look for yaml frontmatter
                    #       Should be something else but need to see how myst parser does it.
                    myst_ast = Document(cell["source"])
                    renderer.render(myst_ast)

@chrisjsewell
Copy link
Member

Ah yeh, you need to use the with renderer for the entire notebook parse, because on exit it resets the parsing tokens to the standard ones used by mistletoe (not the myst specific ones). This context manager behaviour is from the mistletoe BaseRenderer, but probably should be improved so that the myst tokens are reinstated on entrance.

@choldgraf
Copy link
Member Author

ahhh that got it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants