Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new parser for text-based notebooks #83

Closed
chrisjsewell opened this issue Mar 17, 2020 · 3 comments · Fixed by #116
Closed

Add new parser for text-based notebooks #83

chrisjsewell opened this issue Mar 17, 2020 · 3 comments · Fixed by #116
Labels
enhancement New feature or request

Comments

@chrisjsewell
Copy link
Member

Once mwouts/jupytext#458 is merged, and a new jupytext version is released, we can progress with the folowing:

Also these files won't work directly with the myst-parser extension (because it won't know what to do with code-cell and raw-cell). You need to add a separate parser to myst-nb that calls jupytext.myst.matches_mystnb, to work out if the markdown file should be read as pure markdown or converted to a notebook. (it should just need to be a small subclass of NotebookParser)

@AakashGfude will also need to use it in some manner in #55 to work out which markdown files to convert/execute/cache

Originally posted by @chrisjsewell in #82 (comment)

@chrisjsewell chrisjsewell added the enhancement New feature or request label Mar 17, 2020
@choldgraf
Copy link
Member

sounds good - that's similar to what jupyter book does currently, if it gets a text file that's not .ipynb, it does a quick check for whether it's a jupytext file and, if so, converts it into a notebook before processing.

One snag here will be same line number / reporting challenge in #71 - in this case, we do have a "natural" line number to report, even though we're working with a notebook, but we'll need to find a way to pass this through the build system to use when needed

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 17, 2020

One snag here will be same line number / reporting challenge in #71 -

Yeh absolutely, thats the 'major' complication of pre-conversion to a notebook, rather than treating these cells as 'pure' sphinx directives (as done by jupyter-sphinx). I think/hope it is solvable though.

To re-iterate why trying to achieve this by adapting jupyter-sphinx is problematic:

  1. You have to maintain two separate parsing flows for notebooks and text-based documents. As opposed to here, where once you do the initial conversion, both continue down the same process. Two flows can lead to inconsistencies, as we have now for the CSS formatting.

  2. It would be difficult/impossible to integrate jupyter-sphinx into the approach being taken in FEAT: Jupyter-cache integration #55 for execution and caching. This is because you have to make a full (sphinx) parse of the document to identify the code cells, which precludes having the execution before any parsing takes place. In this approach, jupytext effectively acts as a 'fast parse', that doesn't need to use the full sphinx machinery, just identify the boundaries (and metadata) between cells.

  3. It would be difficult to see how jupyter-sphinx could utilise the required front-matter metadata, output by jupytext, to extract the kernel information (currently it uses a seperate directive to specify the kernel, which isn't really an option for a jupytext conversion)

  4. Sphinx directives cannot take arbitrary metadata (as may be output by jupytext), you have to define all options up front. For example, this would mean that users could only add specific metadata to their code cell, otherwise it would break the directive. Or you have to make the jupytext converter output only select metadata, in a specific format, which would 'break' true round-trip conversion.

@chrisjsewell
Copy link
Member Author

One snag here will be same line number / reporting challenge in #71 - in this case, we do have a "natural" line number to report, even though we're working with a notebook

Using

nb = jupytext.myst.matches_mystnb(text, store_line_numbers=True)

Now stores the line numbers in cell metadata, under _source_lines.
This can be parsed into: https://github.com/ExecutableBookProject/MyST-NB/blob/ab4ba1d0964a7fe0a6cd516143ccc0a472b63570/myst_nb/parser.py#L71
as

start_line=nb_cell["metadata"].get("_source_lines", [0])[0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants