Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recognize non-ASCII punctuation chars #54

Merged
merged 10 commits into from
Mar 31, 2024

Conversation

SilverRainZ
Copy link
Contributor

@SilverRainZ SilverRainZ commented Mar 19, 2024

The punctuation_chars.h header file is auto-generated from gen_punctuation_chars.py.
I also add a test case "Unicode Punctuation Chars":

before:

  inline_markup:
    ✗ Unicode Punctuation Chars

1 failure:

correct / expected / unexpected

  1. Unicode Punctuation Chars:

    (document
      (paragraph)
      (paragraph)
      (paragraph)
      (paragraph))
      (paragraph
        (emphasis))
      (paragraph
        (emphasis)
        (strong))
      (paragraph
        (emphasis))
      (paragraph
        (emphasis)))

after:

  inline_markup:
    ✓ Unicode Punctuation Chars

Any comments are welcome.

Close #53.

The punctuation_chars.h header file is auto-generated from gen_punctuation_chars.py
@SilverRainZ SilverRainZ marked this pull request as ready for review March 20, 2024 05:51
@SilverRainZ
Copy link
Contributor Author

@stsewd Can you please review it?

@stsewd
Copy link
Owner

stsewd commented Mar 22, 2024

@SilverRainZ thank you for opening this PR! I'll try to take a look at it this weekend or the next one (sorry, busy weeks). I just noticed that the Windows CI is failing with this change.

@SilverRainZ
Copy link
Contributor Author

The Windows CI failed with a weird error message:

 scanner.c
D:\a\tree-sitter-rst\tree-sitter-rst\src\tree_sitter_rst\punctuation_chars.h(107,41): error C2059: syntax error: '}' [D:\a\tree-sitter-rst\tree-sitter-rst\build\tree_sitter_rst_binding.vcxproj]
  (compiling source file '../src/scanner.c')

I have checked L107 and there is nothing special, I have no idea for now.

// ...
  L'\u201f',
};
const int32_t start_chars_range[][2] = {}; // <-- L107

const int32_t delim_chars[] = {
// ...

@SilverRainZ
Copy link
Contributor Author

It seems that we should update the WASM binary after any changes, but I think it should be done with the maintainer.

B.T.W, the npm run wasm is broken due to tree-sitter/tree-sitter#3202.

Copy link
Owner

@stsewd stsewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great contribution! I did some small edits and fixed the Windows build by not generating empty arrays (since the array was empty, when trying to access to the first element, it was probably pointing to invalid memory/values, and Windows didn't like that).

@stsewd stsewd merged commit c6f7444 into stsewd:master Mar 31, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scanner should recognize non-ASCII punctuation chars
2 participants