Split docutils only functionality into separate package #347

chrisjsewell · 2021-04-16T13:42:02Z

extracted from #342:

I was surprised to see docutils 0.17 add experimental Markdown support using Recommonmark

ah I completely missed the response on the email thread.

Ah well that's a different proposition: I literally have myst_parser/docutils_renderer.py and myst_parser/sphinx_renderer.py as a sub-class, so that it is certainly possible to use myst-parser with "docutils only" functionality.

If I had known previously about such an intention to ship Markdown in docutils, I would certainly have considered splitting the "docutils only" aspects in to a separate "myst-docutils" package (and have that as a dependency here), which I think would probably be a better solution than just moving dependencies to extras.
Indeed, if there was an agreement in principle with the docutils guys to include something like myst-docutils in docutils I would look into this.

cpitclaudel · 2021-07-18T17:57:21Z

Ah well that's a different proposition: I literally have myst_parser/docutils_renderer.py and myst_parser/sphinx_renderer.py as a sub-class, so that it is certainly possible to use myst-parser with "docutils only" functionality.

I think this would be great. The thing that's missing right now for convenient integration with docutils, I think (besides separating dependencies), is a version of sphinx_parser for docutils. It would allow this kind of code to work:

from docutils.core import publish_string
from myst_parser.sphinx_parser import MystParser
print(publish_string(source="{math}`e^{i\pi} = -1`", parser=MystParser(), writer_name='html5'))

(and with that integration into a variety of other docutils-based pipelines; my use case is integrating with Alectryon for Coq proofs.)

As currently written, this code crashes:

Traceback (most recent call last):
  File "minimyst.py", line 3, in <module>
    print(publish_string(source="{math}`a^2 + b^2 = c^2`", parser=MystParser(), writer_name='html5'))
  File "/home/clement/.local/lib/python3.8/site-packages/docutils/core.py", line 407, in publish_string
    output, pub = publish_programmatically(
  File "/home/clement/.local/lib/python3.8/site-packages/docutils/core.py", line 665, in publish_programmatically
    output = pub.publish(enable_exit_status=enable_exit_status)
  File "/home/clement/.local/lib/python3.8/site-packages/docutils/core.py", line 217, in publish
    self.document = self.reader.read(self.source, self.parser,
  File "/home/clement/.local/lib/python3.8/site-packages/docutils/readers/__init__.py", line 71, in read
    self.parse()
  File "/home/clement/.local/lib/python3.8/site-packages/docutils/readers/__init__.py", line 77, in parse
    self.parser.parse(self.input, document)
  File "/home/clement/.local/lib/python3.8/site-packages/myst_parser/sphinx_parser.py", line 55, in parse
    config = document.settings.env.myst_config
AttributeError: 'Values' object has no attribute 'env'

(in fact I see now that it's almost exactly the same example as in https://sourceforge.net/p/docutils/mailman/docutils-users/thread/rkv4nb%24139g%241%40ciao.gmane.io/#msg37118232)

This crash is due to parse taking an additional argument that docutils doesn't pass:

    def parse(self, inputstring: str, document: nodes.document, renderer: str = "sphinx") -> None:
        if renderer == "sphinx":
            config = document.settings.env.myst_config # Here, since docutils will call this function without specifying the renderer
        else:
            config = MdParserConfig()

It works fine if I use a subclass of MystParser that passes a dummy renderer string into super().parse instead of the default "sphinx":

from docutils.core import publish_string
from myst_parser.sphinx_parser import MystParser

class NonSphinxMystParser(MystParser):
    def parse(self, inputstring, document) -> None:
        return super().parse(inputstring, document, r"¯\_(ツ)_/¯")

print(publish_string(source="{math}`e^{i\pi} = -1`", parser=NonSphinxMystParser(), writer_name='html5').decode('utf-8'))

… but it's no clear whether that's the right approach (?) So separating things out in a way that makes this works would be very very nice, especially setting MyST options works from the usual docutils.conf, etc.

Also nice would be some canonical way to register MyST's directives and roles: some function that clients can call to make the appropriate calls to docutils.directives.register_directive etc.

In #342 @astrojuanlu wrote:

It sounds like docutils could potentially use https://github.com/executablebooks/markdown-it-py for its Markdown support then?

and @choldgraf wrote

Yep, i think that is the answer.

But am I correct to think that this requires extra work to also parse MyST's config, recreate its custom roles/directives (?) and make a Reader that's compatible with docutils?

In any case it looks like with very small changes (and independently of integrating with the official docutils package) applications like Alectryon that already depend on docutils but not necessarily sphinx could add support for MyST, which would be great!

@staticmethod

The global approach in 395 isn't really sustainable: it requires all-ways cooperation between all projects that want to customize MathJax. Additionally, when processing a MyST document without Sphinx, the MathJax configuration changes are not performed (part of executablebooks#347). And, of course, this approach of overriding the MathJax object causes issues down the line for projects that need to customize MathJax (the setting in Sphinx isn't sufficient, see sphinx-doc/sphinx#9450) The following two approaches would not cause these issues: 1. Add a custom script instead of touching the mathjax3_config variable; something like this, essentially: ```js app.add_js_file(None, priority=0, body=""" var MathJax = window.MathJax || MathJax; MathJax.options = MathJax.options || {}; MathJax.options.processHtmlClass = (MathJax.options.processHtmlClass || "") + "|math"; """) ``` - Don't touch MathJax_config at all; instead, add an explicit `mathjax_process` class on all math nodes, either by changing `docutils_renderer` (this PR) or by adding a Docutils transform to processes all math nodes: ```python class ActivateMathJaxTransform(Transform): default_priority = 800 @staticmethod def is_math(node): return isinstance(node, (math, math_block)) def apply(self, **kwargs): for node in self.document.traverse(self.is_math): node.attributes.setdefault("classes", []).append("mathjax_process") ``` This PR isn't ready for merging; it's just to start a discussion.

gmilde · 2021-10-16T09:53:38Z

Given the end-of-life for recommonmark, is there a chance for a Docutils-only version of the MyST parser that can be utilised by Docutils?

chrisjsewell · 2021-10-16T10:02:29Z

Heya, DocutilsRenderer is already a docutils-only renderer:

MyST-Parser/myst_parser/docutils_renderer.py

Line 72 in a28e9b7

class DocutilsRenderer(RendererProtocol):

@cpitclaudel has kindly made a PR to allow for controlling the configuration via docutils: #426, that I'm just trying to find time to circle round to and finalise the tests.

The only sticking point perhaps is myst-parser's pinned dependency on sphinx, creating a cylic dependency. I specifically added this, because changes in docutils/sphinx kept breaking myst-parser for users, so would have to think how this could be achieved

chrisjsewell · 2021-12-11T04:12:45Z

myst-parser v0.16.0 now introduces docutils-only functionality (https://myst-parser.readthedocs.io/en/latest/docutils.html) and https://pypi.org/project/myst-docutils/ release pipeline, which inlcudes no install dependencies on sphinx/docutils 😄

@gmilde, do you want me to open an issue on docutils, to move from recommonmark to myst-docutils?

gmilde · 2021-12-16T10:46:55Z

Am 10.12.21 schrieb Chris Sewell:

myst-parser v0.16.0 now introduces docutils-only functionality

This is good news. Thank you. From [MyST with Docutils](https://myst-parser.readthedocs.io/en/latest/docutils.html#myst-with-docutils) I see that 5 front-end tools shall be installed as well (but in the package download I didn't find the corresponding files). IMV, separate front end tools are not necessary when we can get this working with `docutils-cli.py --parser=myst_parser.docutils_`.

@gmilde, do you want me to open an issue on docutils, to move from recommonmark to myst-docutils?

If the above works, Docutils can add `'myst': 'myst_parser.docutils_',` to `docutils.parsers._parser_aliases` to provide a shorter parser name. In future, the default "markdown" and "commonmark" parsers may switch to "myst", but this is an API change which requires planning, advance warnings, and tests. Feel free to open an enhancement ticket or start a thread on ***@***.*** to discuss arising issues, a test strategy and possible closer integration.

chrisjsewell · 2021-12-16T10:59:28Z

I see that 5 front-end tools shall be installed as well (but in the package download I didn't find the corresponding files).

They don't need separate files, they use python entry points:

MyST-Parser/setup.cfg

Lines 50 to 57 in 11fb239

    
           [options.entry_points] 
        
           console_scripts = 
        
               myst-anchors = myst_parser.cli:print_anchors 
        
               myst-docutils-html = myst_parser.docutils_:cli_html 
        
               myst-docutils-html5 = myst_parser.docutils_:cli_html5 
        
               myst-docutils-latex = myst_parser.docutils_:cli_latex 
        
               myst-docutils-xml = myst_parser.docutils_:cli_xml 
        
               myst-docutils-pseudoxml = myst_parser.docutils_:cli_pseudoxml

I feel this is the more modern approach for including CLI tools in python package distributions.

IMV, separate front end tools are not necessary

I felt this was easier for end users, and also inline with the separate rst2xxx.py front end tools that docutils already ships with

Feel free to open an enhancement ticket or start a thread

Yep will do then 👍

chrisjsewell · 2021-12-16T11:02:48Z

I feel this is the more modern approach for including CLI tools in python package distributions.

See https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/?highlight=console_scripts#scripts

Although setup() supports a scripts keyword for pointing to pre-made scripts to install, the recommended approach to achieve cross-platform compatibility is to use console_scripts entry points

chrisjsewell · 2021-12-16T11:10:14Z

a test strategy

Also, just to note, there are now separate test jobs for basic testing of myst-docutils against docutils 0.16, 0.17, and 0.18 (on top of the full test suite): https://github.com/executablebooks/MyST-Parser/runs/4546203130?check_suite_focus=true

gmilde · 2021-12-20T12:33:30Z

Am 16.12.21 schrieb Chris Sewell:

IMV, separate front end tools are not necessary I felt this was easier for end users, and also inline with the separate rst2xxx.py front end tools that docutils already ships with

Unfortunately, the approach of a separate front-end tool for every reader-parser-writer combination doesn't scale well. This is why Docutils 0.17 introduced the generic "docutils-cli" front-end together with Markdown support. From the description at [MyST with Docutils](https://myst-parser--426.org.readthedocs.build/en/426/docutils.html), I suppose that `docutils-cli.py --parser=`myst_parser.docutils_` should do the trick. (It should also enable the user to get output from writers not in the myst2... set, e.g., ODT or LaTeX for processing with LuaTeX or XeTeX engines). The generic front-end has also command-line support for all [Docutils configuration settings](https://docutils.sourceforge.io/docs/user/config.html). For end users consistent configuration framework would help a lot. If `myst_parser.docutils_.Parser.settings_spec` could provide an interface to the relevant myst configuration options, this would allow configuring reader, parser, and writer from the command line or a common config file. A user would easily get help for all available options with `docutils-cli --parser=myst_parser.docutils_ --help`.

chrisjsewell · 2021-12-20T12:46:32Z

Unfortunately, the approach of a separate front-end tool for every reader-parser-writer combination doesn't scale well.

Oh indeed, but also having to write the full parser path, etc, every time is not ideal.
Usually, if I am making a complex CLI I would use click.palletsprojects.com/

I suppose that docutils-cli.py --parser=myst_parser.docutils_` should do the trick.

Yes the parser is just a standard docutils parser.
Again here, I would suggest you use entry points to load parsers, rather than module paths

If myst_parser.docutils_.Parser.settings_spec could
provide an interface to the relevant myst configuration options, this would
allow configuring reader

Yep this is already what happens, settings_spec already contains all the myst options, so when you use the CLI you get, e.g. (as shown in the drop down at https://myst-parser.readthedocs.io/en/latest/docutils.html)

Usage
=====
  myst-docutils-<writer> [options] [<source> [<destination>]]

Options
=======
General Docutils Options
------------------------
--title=TITLE           Specify the document title as metadata.
--generator, -g         Include a "Generated by Docutils" credit and link.
--no-generator          Do not include a generator credit.
--date, -d              Include the date at the end of the document (UTC).
--time, -t              Include the time & date (UTC).
--no-datestamp          Do not include a datestamp of any kind.
--source-link, -s       Include a "View document source" link.
--source-url=<URL>      Use <URL> for a source link; implies --source-link.
--no-source-link        Do not include a "View document source" link.
--toc-entry-backlinks   Link from section headers to TOC entries.  (default)
--toc-top-backlinks     Link from section headers to the top of the TOC.
--no-toc-backlinks      Disable backlinks to the table of contents.
--footnote-backlinks    Link from footnotes/citations to references. (default)
--no-footnote-backlinks
                        Disable backlinks from footnotes and citations.
--section-numbering     Enable section numbering by Docutils.  (default)
--no-section-numbering  Disable section numbering by Docutils.
--strip-comments        Remove comment elements from the document tree.
--leave-comments        Leave comment elements in the document tree. (default)
--strip-elements-with-class=<class>
                        Remove all elements with classes="<class>" from the
                        document tree. Warning: potentially dangerous; use
                        with caution. (Multiple-use option.)
--strip-class=<class>   Remove all classes="<class>" attributes from elements
                        in the document tree. Warning: potentially dangerous;
                        use with caution. (Multiple-use option.)
--report=<level>, -r <level>
                        Report system messages at or higher than <level>:
                        "info" or "1", "warning"/"2" (default), "error"/"3",
                        "severe"/"4", "none"/"5"
--verbose, -v           Report all system messages.  (Same as "--report=1".)
--quiet, -q             Report no system messages.  (Same as "--report=5".)
--halt=<level>          Halt execution at system messages at or above <level>.
                        Levels as in --report.  Default: 4 (severe).
--strict                Halt at the slightest problem.  Same as "--halt=info".
--exit-status=<level>   Enable a non-zero exit status for non-halting system
                        messages at or above <level>.  Default: 5 (disabled).
--debug                 Enable debug-level system messages and diagnostics.
--no-debug              Disable debug output.  (default)
--warnings=<file>       Send the output of system messages to <file>.
--traceback             Enable Python tracebacks when Docutils is halted.
--no-traceback          Disable Python tracebacks.  (default)
--input-encoding=<name[:handler]>, -i <name[:handler]>
                        Specify the encoding and optionally the error handler
                        of input text.  Default: <locale-dependent>:strict.
--input-encoding-error-handler=INPUT_ENCODING_ERROR_HANDLER
                        Specify the error handler for undecodable characters.
                        Choices: "strict" (default), "ignore", and "replace".
--output-encoding=<name[:handler]>, -o <name[:handler]>
                        Specify the text encoding and optionally the error
                        handler for output.  Default: UTF-8:strict.
--output-encoding-error-handler=OUTPUT_ENCODING_ERROR_HANDLER
                        Specify error handler for unencodable output
                        characters; "strict" (default), "ignore", "replace",
                        "xmlcharrefreplace", "backslashreplace".
--error-encoding=<name[:handler]>, -e <name[:handler]>
                        Specify text encoding and error handler for error
                        output.  Default: UTF-8:backslashreplace.
--error-encoding-error-handler=ERROR_ENCODING_ERROR_HANDLER
                        Specify the error handler for unencodable characters
                        in error output.  Default: backslashreplace.
--language=<name>, -l <name>
                        Specify the language (as BCP 47 language tag).
                        Default: en.
--record-dependencies=<file>
                        Write output file dependencies to <file>.
--config=<file>         Read configuration settings from <file>, if it exists.
--version, -V           Show this program's version number and exit.
--help, -h              Show this help message and exit.

reStructuredText Parser Options
-------------------------------
--pep-references        Recognize and link to standalone PEP references (like
                        "PEP 258").
--pep-base-url=<URL>    Base URL for PEP references (default
                        "http://www.python.org/dev/peps/").
--pep-file-url-template=<URL>
                        Template for PEP file part of URL. (default
                        "pep-%04d")
--rfc-references        Recognize and link to standalone RFC references (like
                        "RFC 822").
--rfc-base-url=<URL>    Base URL for RFC references (default
                        "http://tools.ietf.org/html/").
--tab-width=<width>     Set number of spaces for tab expansion (default 8).
--trim-footnote-reference-space
                        Remove spaces before footnote references.
--leave-footnote-reference-space
                        Leave spaces before footnote references.
--no-file-insertion     Disable directives that insert the contents of
                        external file ("include" & "raw"); replaced with a
                        "warning" system message.
--file-insertion-enabled
                        Enable directives that insert the contents of external
                        file ("include" & "raw").  Enabled by default.
--no-raw                Disable the "raw" directives; replaced with a
                        "warning" system message.
--raw-enabled           Enable the "raw" directive.  Enabled by default.
--syntax-highlight=<format>
                        Token name set for parsing code with Pygments: one of
                        "long", "short", or "none (no parsing)". Default is
                        "long".
--smart-quotes=<yes/no/alt>
                        Change straight quotation marks to typographic form:
                        one of "yes", "no", "alt[ernative]" (default "no").
--smartquotes-locales=<language:quotes[,language:quotes,...]>
                        Characters to use as "smart quotes" for <language>.
--word-level-inline-markup
                        Inline markup recognized at word boundaries only
                        (adjacent to punctuation or whitespace). Force
                        character-level inline markup recognition with "\ "
                        (backslash + space). Default.
--character-level-inline-markup
                        Inline markup recognized anywhere, regardless of
                        surrounding characters. Backslash-escapes must be used
                        to avoid unwanted markup recognition. Useful for East
                        Asian languages. Experimental.

MyST options
------------
--myst-commonmark-only=MYST_COMMONMARK_ONLY
                        Use strict CommonMark parser (type: bool, default:
                        False)
--myst-enable-extensions=MYST_ENABLE_EXTENSIONS
                        Enable extensions (type: comma-delimited, default:
                        'dollarmath')
--myst-linkify-fuzzy-links=MYST_LINKIFY_FUZZY_LINKS
                        linkify: recognise URLs without schema prefixes (type:
                        bool, default: True)
--myst-dmath-allow-labels=MYST_DMATH_ALLOW_LABELS
                        Parse `$$...$$ (label)` (type: bool, default: True)
--myst-dmath-allow-space=MYST_DMATH_ALLOW_SPACE
                        dollarmath: allow initial/final spaces in `$ ... $`
                        (type: bool, default: True)
--myst-dmath-allow-digits=MYST_DMATH_ALLOW_DIGITS
                        dollarmath: allow initial/final digits `1$ ...$2`
                        (type: bool, default: True)
--myst-dmath-double-inline=MYST_DMATH_DOUBLE_INLINE
                        dollarmath: parse inline `$$ ... $$` (type: bool,
                        default: False)
--myst-disable-syntax=MYST_DISABLE_SYNTAX
                        Disable syntax elements (type: comma-delimited,
                        default: '')
--myst-url-schemes=MYST_URL_SCHEMES
                        URL schemes to allow in links (type: comma-delimited,
                        default: 'http,https,mailto,ftp')
--myst-footnote-transition=MYST_FOOTNOTE_TRANSITION
                        Place a transition before any footnotes (type: bool,
                        default: True)
--myst-words-per-minute=MYST_WORDS_PER_MINUTE
                        For reading speed calculations (type: int, default:
                        200)

gmilde · 2021-12-23T20:05:23Z

Am 20.12.21 schrieb Chris Sewell:

Unfortunately, the approach of a separate front-end tool for every reader-parser-writer combination doesn't scale well. Oh indeed, but also having to write the full parser path, etc, every time is not ideal.

This would be mitigated by a "myst" parser alias (implemented in [r8916]). Now, `docutils-cli.py --parser=myst` should work just like `myst-docutils-html5`. Different default components can be set in [docutils.conf files](https://docutils.sourceforge.io/docs/user/config.html): system-wide, user-wide and per directory. Another option would be a common `myst-docutils` front-end (with a `--writer` option) instead of a set of 5 (out of 12) front-ends. Again, docutils.conf files would allow to set the preferred writer and reader.

I suppose that docutils-cli.py --parser=myst_parser.docutils_` should do the trick. Yes the parser is just a standard docutils parser.

I realized that docutils-cli.py restricted the `--parser` values to "rst", "recommonmark" and "markdown". This is fixed in [r8915].

Again here, I would suggest you use entry points to load parsers, rather than module paths.

I have been experimenting with a `docutils.parsers.myst_wrapper` module (analogue to `docutils.parsers.recommonmark_wrapper`). Currently, the only difference to the parser name alias is a more helpful error message in case the upstream parser is missing. A wrapper module may become necessary, if the import name changes or to implement a "commonmark" parser that selects whichever of "myst" or "recommonmark" is available.

If myst_parser.docutils_.Parser.settings_spec could provide an interface to the relevant myst configuration options, [...] Yep this already what happens, [...]

This is good news. From the documentation it was not clear to me, that this means the `myst-…` options can be set also from a `docutils.conf` file, nor that the `myst-doctuils-…` front-ends also feature the config settings of the relevant writers. ...

reStructuredText Parser Options -------------------------------

BTW: Why do the `myst-doctuils-…` front-ends include rst parser settings? Maybe some settings should move to a generic "parser options" section (and "smart-quotes" to the generic options). But settings may also be defined in several components (see, e.g. the "stylesheet…" settings in HTML and LaTeX).

chrisjsewell · 2022-01-02T13:39:27Z

Now, docutils-cli.py --parser=myst should work just like
myst-docutils-html5.

yep that sounds good, although I would note, I do like the "inherent" tab-completion you get in the terminal with the myst-docutils- commands:

$ myst-docutils-<tab>
myst-docutils-html       myst-docutils-html5      myst-docutils-latex      myst-docutils-pseudoxml  myst-docutils-xml

makes it very easy to use.

BTW: Why do the myst-doctuils-… front-ends include rst parser settings?

myst-parser hooks in to the RST directive/role parsing mechanisms, I seem to recall some of these settings were required, and it was breaking without them being present.

Feel free to open an enhancement ticket

Anyhow, I have now opened https://sourceforge.net/p/docutils/feature-requests/86/ and created #487 for any parallel discussion here, so any more conversation can move to them cheers

chrisjsewell added the discussion no fixed close condition label Apr 16, 2021

chrisjsewell mentioned this issue Apr 16, 2021

Proposal to sunset recommonmark in favor of MyST-Parser readthedocs/recommonmark#221

Closed

chrisjsewell mentioned this issue May 10, 2021

Support for LaTeX math environments like in Jupyter notebooks? #380

Closed

cpitclaudel mentioned this issue Jul 19, 2021

[draft] Remove option mathjax_classes #406

Closed

cpitclaudel mentioned this issue Aug 23, 2021

Build single pages with docutils (instead of Sphinx) #420

Closed

This was referenced Dec 6, 2021

Convert single md to rst using command #441

Closed

🔧 MAINTAIN: Add publishing job for myst-docutils #456

Merged

chrisjsewell closed this as completed in #456 Dec 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split docutils only functionality into separate package #347

Split docutils only functionality into separate package #347

chrisjsewell commented Apr 16, 2021

cpitclaudel commented Jul 18, 2021 •

edited

Loading

gmilde commented Oct 16, 2021

chrisjsewell commented Oct 16, 2021

chrisjsewell commented Dec 11, 2021

gmilde commented Dec 16, 2021 via email

chrisjsewell commented Dec 16, 2021

chrisjsewell commented Dec 16, 2021

chrisjsewell commented Dec 16, 2021

gmilde commented Dec 20, 2021 via email •

edited by chrisjsewell

Loading

chrisjsewell commented Dec 20, 2021 •

edited

Loading

gmilde commented Dec 23, 2021 via email

chrisjsewell commented Jan 2, 2022

Split docutils only functionality into separate package #347

Split docutils only functionality into separate package #347

Comments

chrisjsewell commented Apr 16, 2021

cpitclaudel commented Jul 18, 2021 • edited Loading

gmilde commented Oct 16, 2021

chrisjsewell commented Oct 16, 2021

chrisjsewell commented Dec 11, 2021

gmilde commented Dec 16, 2021 via email

chrisjsewell commented Dec 16, 2021

chrisjsewell commented Dec 16, 2021

chrisjsewell commented Dec 16, 2021

gmilde commented Dec 20, 2021 via email • edited by chrisjsewell Loading

chrisjsewell commented Dec 20, 2021 • edited Loading

gmilde commented Dec 23, 2021 via email

chrisjsewell commented Jan 2, 2022

cpitclaudel commented Jul 18, 2021 •

edited

Loading

gmilde commented Dec 20, 2021 via email •

edited by chrisjsewell

Loading

chrisjsewell commented Dec 20, 2021 •

edited

Loading