Skip to content

Allow plugins to register inline terminator characters #390

@chrisjsewell

Description

@chrisjsewell

Problem

The inline text rule uses a hardcoded set of terminator characters that stop text consumption. Plugins cannot extend this set, which means inline rules that need to trigger on non-terminator characters (e.g. w for GFM autolink www. scanning) must use workarounds like core-rule post-processing.

This diverges from the Rust markdown-it architecture where each InlineRule declares a MARKER char that automatically becomes a terminator.

Proposed change

Move the terminator character set from a module-level constant in rules_inline/text.py onto the ParserInline instance, and expose a method to extend it.

parser_inline.py:

class ParserInline:
    def __init__(self) -> None:
        self.ruler = Ruler[RuleFuncInlineType]()
        ...
        self._terminator_chars: set[str] = set(_DEFAULT_TERMINATORS)
        self._terminator_re: re.Pattern[str] | None = None

    def add_terminator_char(self, ch: str) -> None:
        """Register a character that stops the text rule, allowing inline rules to fire."""
        self._terminator_chars.add(ch)
        self._terminator_re = None  # invalidate cached regex

    @property
    def terminator_re(self) -> re.Pattern[str]:
        if self._terminator_re is None:
            self._terminator_re = re.compile(
                "[" + re.escape("".join(sorted(self._terminator_chars))) + "]"
            )
        return self._terminator_re

text.py:

Change the text rule to use state.md.inline.terminator_re instead of the module-level cached regex.

Scope

  • ~15 lines changed in parser_inline.py
  • ~5 lines changed in rules_inline/text.py
  • Fully backward-compatible (default set is unchanged, existing plugins unaffected)
  • No performance regression for the common case (regex is cached until a new char is added)

Motivation

Enables the GFM autolink www. scanner to be implemented as a proper inline rule:

def gfm_autolink_plugin(md: MarkdownIt) -> None:
    md.inline.add_terminator_char("w")
    md.inline.ruler.push("gfm_autolink_www", _www_rule)
    ...

This matches how cmark-gfm (GitHub's parser) and the Rust markdown-it handle it — all three autolink scanners (www, protocol, email) are inline rules with markers w, :, @.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions