Recognize non-ASCII punctuation chars #54

SilverRainZ · 2024-03-19T13:36:04Z

The punctuation_chars.h header file is auto-generated from gen_punctuation_chars.py.
I also add a test case "Unicode Punctuation Chars":

before:

  inline_markup:
    ✗ Unicode Punctuation Chars

1 failure:

correct / expected / unexpected

  1. Unicode Punctuation Chars:

    (document
      (paragraph)
      (paragraph)
      (paragraph)
      (paragraph))
      (paragraph
        (emphasis))
      (paragraph
        (emphasis)
        (strong))
      (paragraph
        (emphasis))
      (paragraph
        (emphasis)))

after:

  inline_markup:
    ✓ Unicode Punctuation Chars

Any comments are welcome.

Close #53.

The punctuation_chars.h header file is auto-generated from gen_punctuation_chars.py

SilverRainZ · 2024-03-20T05:52:07Z

@stsewd Can you please review it?

stsewd · 2024-03-22T03:14:55Z

@SilverRainZ thank you for opening this PR! I'll try to take a look at it this weekend or the next one (sorry, busy weeks). I just noticed that the Windows CI is failing with this change.

SilverRainZ · 2024-03-24T16:03:02Z

The Windows CI failed with a weird error message:

 scanner.c
D:\a\tree-sitter-rst\tree-sitter-rst\src\tree_sitter_rst\punctuation_chars.h(107,41): error C2059: syntax error: '}' [D:\a\tree-sitter-rst\tree-sitter-rst\build\tree_sitter_rst_binding.vcxproj]
  (compiling source file '../src/scanner.c')

I have checked L107 and there is nothing special, I have no idea for now.

// ...
  L'\u201f',
};
const int32_t start_chars_range[][2] = {}; // <-- L107

const int32_t delim_chars[] = {
// ...

SilverRainZ · 2024-03-31T09:17:24Z

It seems that we should update the WASM binary after any changes, but I think it should be done with the maintainer.

B.T.W, the npm run wasm is broken due to tree-sitter/tree-sitter#3202.

stsewd

Thanks for this great contribution! I did some small edits and fixed the Windows build by not generating empty arrays (since the array was empty, when trying to access to the first element, it was probably pointing to invalid memory/values, and Windows didn't like that).

SilverRainZ added 2 commits March 19, 2024 21:29

Recognize non-ASCII punctuation chars

cb2119b

The punctuation_chars.h header file is auto-generated from gen_punctuation_chars.py

fix: Consider closing delimiters

a537b4e

SilverRainZ force-pushed the bugfix/non-ascii-punct2 branch from 558afc8 to a537b4e Compare March 20, 2024 05:30

Test unicode punctuation chars

e8b63b3

SilverRainZ marked this pull request as ready for review March 20, 2024 05:51

Fix comment format

1cc483d

stsewd added 6 commits March 31, 2024 17:39

Small edits

73d4459

Format with black

a0fc4b8

Regenerate punctuation_chars.h

431f2ed

Update instructions

29cb770

Run make release

d8ad503

Try to fix windows build?

569feb8

stsewd approved these changes Mar 31, 2024

View reviewed changes

stsewd merged commit c6f7444 into stsewd:master Mar 31, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recognize non-ASCII punctuation chars #54

Recognize non-ASCII punctuation chars #54

SilverRainZ commented Mar 19, 2024 •

edited

Loading

SilverRainZ commented Mar 20, 2024

stsewd commented Mar 22, 2024

SilverRainZ commented Mar 24, 2024

SilverRainZ commented Mar 31, 2024

stsewd left a comment •

edited

Loading

Recognize non-ASCII punctuation chars #54

Recognize non-ASCII punctuation chars #54

Conversation

SilverRainZ commented Mar 19, 2024 • edited Loading

SilverRainZ commented Mar 20, 2024

stsewd commented Mar 22, 2024

SilverRainZ commented Mar 24, 2024

SilverRainZ commented Mar 31, 2024

stsewd left a comment • edited Loading

Choose a reason for hiding this comment

SilverRainZ commented Mar 19, 2024 •

edited

Loading

stsewd left a comment •

edited

Loading