You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found this parser works no so well when parse documentation written in CJK.
For example: :strong:`text`。 (trailing with a Chinese full stop 。, in Engish it is .) is a valid inline markup (OK for rst2pseudoxml), but can not be correctly recognize by tree-sitter-rst.
Inline markup start-strings must start a text block or be immediately preceded by
whitespace,
one of the ASCII characters - : / ' " < ( [ {
or a similar non-ASCII punctuation character. [18]
Inline markup end-strings must end a text block or be immediately followed by
whitespace,
one of the ASCII characters - . , : ; ! ? \ / ' " ) ] } >
or a similar non-ASCII punctuation character. [19]
I have make a PR(#10) for this, but it is not a good fix.
Docutils provides some regex for matching these non-ASCII punctuation characters. According to my current understanding, matching them in src/tree_sitter_rst/chars.c::is_{start,end}_char should fix this issue.
The text was updated successfully, but these errors were encountered:
Just FYI, I am working on this, by generating C chars array from docutils.utils.punctuation_chars, and replacing the valid_chars inside is_{start,end}_char function.
Hi stsewd, thank for your awesome rst parser!
I found this parser works no so well when parse documentation written in CJK.
For example:
:strong:`text`。
(trailing with a Chinese full stop。
, in Engish it is.
) is a valid inline markup (OK forrst2pseudoxml
), but can not be correctly recognize by tree-sitter-rst.How to reproduce
How to fix
According to Inline markup recognition rules:
I have make a PR(#10) for this, but it is not a good fix.
Docutils provides some regex for matching these non-ASCII punctuation characters. According to my current understanding, matching them in
src/tree_sitter_rst/chars.c::is_{start,end}_char
should fix this issue.The text was updated successfully, but these errors were encountered: