Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to markdown-it-py parser #123

Merged
merged 33 commits into from Apr 1, 2020
Merged

Move to markdown-it-py parser #123

merged 33 commits into from Apr 1, 2020

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Mar 27, 2020

This PR implements the move from mistletoe to markdown-it-py, as the underlying markdown parser. The reason for this is are discussed in executablebooks/meta#44 (comment) and executablebooks/meta#47.

Basic Implementation

The basic structure of the DocutilsRenderer, SphinxRenderer, and SphinxParser remain very similar. But all the syntax specific elements have actually been removed, as they are included directly in https://github.com/ExecutableBookProject/markdown-it-py/tree/master/markdown_it/extensions (at least for now).

In https://github.com/ExecutableBookProject/MyST-Parser/blob/22eb21809ce1eeddeecc52dfc392125e3e2230d3/myst_parser/main.py#L18
You can see how the parsing class instance is constructed; starting with a base commonmark compliant implementation, then iteratively applying plugins with the extended syntax features (its just soo damn "clean"!). I have also added some control of this within the actual sphinx extension, e.g. see https://github.com/ExecutableBookProject/MyST-Parser/tree/markdown-it/tests/test_sphinx/sourcedirs/conf_values

conf.py

extensions = ["myst_parser"]
myst_config = {"disable_syntax": ["emphasis"], "math_delimiters": "brackets"}

means that this

*disabled*

\[a=1\]

goes to:

<document source="index.md">
    <title>
        Test
    <paragraph>
        *disabled*
    <math_block nowrap="False" number="True" xml:space="preserve">
        a=1

With this control, you can relatively easily port more of the existing markdown-it plugins, or any new ones, and allow them to be optionally enabled (assuming that a complimentary docutils render method could be derived)

Testing

I have also "cleaned" up the testing, to use "text based" test fixture files (as also used in markdown-it), that follow the structure:

Test Description
.
Input Markdown text
.
Expected doctree (pseudo XML)
.

If you look in https://github.com/ExecutableBookProject/MyST-Parser/tree/markdown-it/tests/test_renderers/fixtures, you'll find these files, which give a really good self-documentation of how all the different syntax elements / roles / directives are rendered.

Benchmarking

As with https://github.com/ExecutableBookProject/markdown-it-py#benchmarking

$ myst-benchmark -n 50
Test document: spec.md
Test iterations: 50
Running 6 test(s) ...
=====================
python-markdown:extra   (3.2.1): 66.89 s
commonmark.py           (0.9.1): 35.61 s
mistletoe               (0.10.0): 16.92 s
[mistune                (0.8.4): 5.52 s]*
markdown-it-py          (0.2.3): 15.38 s
myst-parser:sphinx**    (0.8.0): 23.13 s

* not commonmark compliant
** rendering to docutils AST, rather than HTML string

Documentation

I have not made any updates to the documentation thus far, but did make the changes for the pydata-sphinx-theme, so the documentation builds: https://circleci.com/gh/ExecutableBookProject/MyST-Parser/307#artifacts/containers/0. Errors should just be due to autodoc/intersphinx looking for obsolete API elements

Also testing has been moved from travis to GitHub actions, @choldgraf can we/shall we also move the document build to here, rather than circle-ci (from a quick google, something like https://github.com/ammaraskar/sphinx-action)

Additional Features

Some additional syntax features this has also allowed me to initially add:

  1. Block equations can have a preceding label (from tests/test_renderers/fixtures/syntax_elements.md::268):
Math Block With Equation Label:
.
$$foo$$ (abc)
.
<document source="notset">
    <target ids="equation-abc">
    <math_block docname="mock_docname" label="abc" nowrap="False" number="1" xml:space="preserve">
        foo
.
  1. Footnotes can be multi-line (from tests/test_renderers/fixtures/syntax_elements.md::471):
Footnotes nested blocks:
.
[^a]

[^a]: footnote*text*

    abc
xyz

    > a

    - b

finish
.
<document source="notset">
    <paragraph>
        <footnote_reference auto="1" ids="id1" refname="a">
    <paragraph>
        finish
    <transition>
    <footnote auto="1" ids="a" names="a">
        <paragraph>
            footnote
            <emphasis>
                text
        <paragraph>
            abc

            xyz
        <block_quote>
            <paragraph>
                a
        <bullet_list>
            <list_item>
                <paragraph>
                    b
  1. Aligned tables columns are given classes (note, this would require the complimentary CSS to have an effect in the web pages):
| a   | b |
|-----|--:|
| *a* | 2 |
    <table class="colwidths-auto docutils align-default">
     <thead>
      <tr class="row-odd">
       <th class="head">
        a
       </th>
       <th class="text-align:right head">
        b
       </th>
      </tr>
     </thead>
     <tbody>
      <tr class="row-even">
       <td>
        <em>
         a
        </em>
       </td>
       <td class="text-align:right">
        2
       </td>
      </tr>
     </tbody>
    </table>

CC'in @choldgraf @mmcky @rossbar @jstac @mtiley @najuzilu

env:
COVERALLS_REPO_TOKEN: ${{ secrets.COVERALLS_KEY }}

publish:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is a publish vs. a build triggered? this seems like a useful way to automate releases, just not sure how it works :-)


Docutils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from a dogfooding perspective I feel like these should be written in markdown :-) (not blocking this PR though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why its not is because, autodoc has to read the python docstrings as rST. I'm sure you could get it to read them as markdown (and rewrite all you doctrings as such), but I doubt its worth the hassle!

from myst_parser.sphinx_renderer import mock_sphinx_env

with mock_sphinx_env(conf=conf, document=md.options["document"]):
return md.render(text, env)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how simple this is 👍

@choldgraf
Copy link
Member

This looks great - markdown-it-py is pretty readable!

Comment on lines +48 to +49
needs: build
if: github.event_name == 'push' && startsWith(github.event.ref, 'refs/tags')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@choldgraf, you can give conditionals to each job. In this case the 'publish' job is triggered conditional on the 'build' job passing, and if the full process was triggered by creating a release tag on the repo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shit that's cool! I'm really liking github-actions (I know I know, you're not allowed to make fun of me every time I praise a microsoft product)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆
yeh well in terms of document building, its certainly something we could consider providing our own workflow/action for; something like https://github.com/ammaraskar/sphinx-action (using a Docker environment), but also making use of caching and artefacts: https://help.github.com/en/actions/configuring-and-managing-workflows/caching-and-storing-workflow-data

Copy link
Member

@choldgraf choldgraf Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I am one step ahead of you on that one ;-) even have a friend that works on Microsoft Azure helping out!

https://github.com/ExecutableBookProject/cli/issues/44

@rossbar
Copy link
Contributor

rossbar commented Mar 27, 2020

Thanks for the heads up, this does indeed (at first glance) seem to be a nice change - the details you summarized in Basic Details and Testing make things seem easier to understand. I plan to take a more detailed look perhaps over the weekend (unless things are changing so rapidly that you think it would be best to hold off).

Love the new features too :)

@chrisjsewell chrisjsewell marked this pull request as ready for review March 28, 2020 19:55
@choldgraf
Copy link
Member

choldgraf commented Mar 30, 2020

A couple quick thoughts:

  1. This looks great
  2. We should document the "labeling math" feature you note above, that is cool
  3. Since we were discussing having myst-nb parse markdown files, instead of myst-parser (for jupyter book anyway), will we need to add these configuration values to myst-nb as well, or can we just "activate" the myst-parser extension and then make sure that myst-nb is the one doing the markdown parsing?
  4. Yeah I'm +1 on moving stuff to github actions...I am less-familiar with it but willing to learn if it can simplify our build processes

@chrisjsewell
Copy link
Member Author

2. We should document the "labeling math" feature you note above, that is cool

Yeh that's already there https://329-240151150-gh.circle-artifacts.com/0/html/using/syntax.html#math-shortcuts

3. Since we were discussing having `myst-nb` parse markdown files, _instead_ of myst-parser (for jupyter book anyway), will we need to add these configuration values to myst-nb as well, or can we just "activate" the myst-parser extension and then make sure that myst-nb is the one doing the markdown parsing?

Will probably have to add them to both (may be able to sub-class, bu they will still need to added to the setup function)

@chrisjsewell
Copy link
Member Author

This should be ready for merge. @mmcky @najuzilu may wish to give it a go in ExecutableBookProject/myst-parser.example-project first though? (pip install myst-parser[sphinx]==0.8.0a3)

@chrisjsewell
Copy link
Member Author

There is also now a complimentary pre-release for executablebooks/MyST-NB#107: pip install myst-nb==0.6.0a1. So I don't know if folks also want to give that a go first as well, to check for any unexpected issues.
Then both can be merged at the same time

@rossbar
Copy link
Contributor

rossbar commented Mar 31, 2020

I just built MyST-flavored Elegant Scipy with v0.8.0a3 and the only obvious difference I noticed was that multiline footnotes work now 🎉 . FWIW I didn't see any new complaints from sphinx during the build process.

@chrisjsewell
Copy link
Member Author

Thanks @rossbar, yep if others also want to let me know when they are happy for this to happen.
(Also notifying @phaustin)

@choldgraf
Copy link
Member

amazing! I'd be a fan of just merging this away and spot-checking bugs as they come up. In my mind the myst-parser is in a perpetual "pre-release" state until we start advertising it as "ready for users". @chrisjsewell are you +1 to merge?

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Apr 1, 2020

Ok with executablebooks/myst-parser.example-project#16 now closed I'm gonna go ahead and merge. Note the only niggling issue is this executablebooks/markdown-it-py#8,
but I think you'll have to be really abusing the documentation to encounter these!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants