Skip to content

A Saxon extension function that wraps the org.commonmark markdown parser

License

Notifications You must be signed in to change notification settings

ndw/scommonmark

Repository files navigation

Saxon CommonMark Processor

This is a Saxon extension function that wraps the org.commonmark Markdown parser.

In your stylesheet, declare the extension namespace:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                xmlns:ext="http://nwalsh.com/xslt"

                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                exclude-result-prefixes="ext xs"
                version="3.0">

Then you can call the extension function:

<xsl:sequence select="ext:commonmark('This is *bold*', $options)"/>

You can use function-available() to code more defensively. The options map is a map from QName keys to values. Four keys are recognized:

escape-html
boolean, enables the escapeHtml() option on the HtmlRenderer
sanitize-urls
boolean, enables the sanitizeUrls() option on the HtmlRenderer
ext:parser
“xml”, “html”, or “none”; how to parse the result
ext:extensions
list-of-strings, adds extensions to the Markdown parser

The default value for escape-html and sanitize-urls is “false”. (Neither option seems to have any effect in the 0.21.0 version of the CommonMark parser.)

The default value for ext:parser is “xml”.

The default value for ext:extensions is the empty sequence.

You can omit the second argument entirely if you’re happy with those defaults.

Parser

The output from the Markdown parser is a string containing the derived HTML markup. You can further parse that as XML or HTML, to return an XDM node for further processing, or you can omit parsing and return the string.

XML parsing

If the parser is “xml”, the rendered HTML is parsed as XML (by fn:parse-xml). If the first attempt to parse fails, a second attempt is made with a single <div> wrapped around the html. (In other words, if the CommonMark parse produces more than one top-level element, then they’re all wrapped in a “div” to make a well-formed XML document.)

The resulting HTML will be in the XHTML namespace.

If the Markdown contains markup that’s intrinsically not well-formed, then it cannot be parsed as XML.

HTML parsing

If the parser is “html”, the rendered HTML is parsed as HTML using the nu.validator htmlparser. This always produces well-formed results, by altering the markup if necessary. The top-level element returned is always <html> in the XHTML namespace.

No parsing

If the parser is “none”, the rendered HTML is returned as a string.

Extensions

The org.commonmark parser supports a number of extensions. You can use them by listing the full class name in the extensions property. For example:

<xsl:sequence
    select="ext:commonmark($markdown,
            map{
              xs:QName('ext:extensions'):
                ('org.commonmark.ext.gfm.tables.TablesExtension',
                 'org.commonmark.ext.gfm.strikethrough.StrikethroughExtension')})"/>

You must also make sure that the corresponding class is on your classpath.

Initialization of each extension relies on the class providing a static method create() that returns an instance of Extension. That’s the way all of the standard extensions from org.commonmark work.

If you are trying to use an extension that has to be instantiated in some other way, it won’t work.

About

A Saxon extension function that wraps the org.commonmark markdown parser

Resources

License

Stars

Watchers

Forks

Packages

No packages published