Problem
The inline text rule uses a hardcoded set of terminator characters that stop text consumption. Plugins cannot extend this set, which means inline rules that need to trigger on non-terminator characters (e.g. w for GFM autolink www. scanning) must use workarounds like core-rule post-processing.
This diverges from the Rust markdown-it architecture where each InlineRule declares a MARKER char that automatically becomes a terminator.
Proposed change
Move the terminator character set from a module-level constant in rules_inline/text.py onto the ParserInline instance, and expose a method to extend it.
parser_inline.py:
class ParserInline:
def __init__(self) -> None:
self.ruler = Ruler[RuleFuncInlineType]()
...
self._terminator_chars: set[str] = set(_DEFAULT_TERMINATORS)
self._terminator_re: re.Pattern[str] | None = None
def add_terminator_char(self, ch: str) -> None:
"""Register a character that stops the text rule, allowing inline rules to fire."""
self._terminator_chars.add(ch)
self._terminator_re = None # invalidate cached regex
@property
def terminator_re(self) -> re.Pattern[str]:
if self._terminator_re is None:
self._terminator_re = re.compile(
"[" + re.escape("".join(sorted(self._terminator_chars))) + "]"
)
return self._terminator_re
text.py:
Change the text rule to use state.md.inline.terminator_re instead of the module-level cached regex.
Scope
- ~15 lines changed in parser_inline.py
- ~5 lines changed in
rules_inline/text.py
- Fully backward-compatible (default set is unchanged, existing plugins unaffected)
- No performance regression for the common case (regex is cached until a new char is added)
Motivation
Enables the GFM autolink www. scanner to be implemented as a proper inline rule:
def gfm_autolink_plugin(md: MarkdownIt) -> None:
md.inline.add_terminator_char("w")
md.inline.ruler.push("gfm_autolink_www", _www_rule)
...
This matches how cmark-gfm (GitHub's parser) and the Rust markdown-it handle it — all three autolink scanners (www, protocol, email) are inline rules with markers w, :, @.
Problem
The inline
textrule uses a hardcoded set of terminator characters that stop text consumption. Plugins cannot extend this set, which means inline rules that need to trigger on non-terminator characters (e.g.wfor GFM autolinkwww.scanning) must use workarounds like core-rule post-processing.This diverges from the Rust
markdown-itarchitecture where eachInlineRuledeclares aMARKERchar that automatically becomes a terminator.Proposed change
Move the terminator character set from a module-level constant in
rules_inline/text.pyonto theParserInlineinstance, and expose a method to extend it.parser_inline.py:
text.py:
Change the
textrule to usestate.md.inline.terminator_reinstead of the module-level cached regex.Scope
rules_inline/text.pyMotivation
Enables the GFM autolink
www.scanner to be implemented as a proper inline rule:This matches how cmark-gfm (GitHub's parser) and the Rust markdown-it handle it — all three autolink scanners (www, protocol, email) are inline rules with markers
w,:,@.