-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recognize non-ASCII punctuation chars #54
Conversation
The punctuation_chars.h header file is auto-generated from gen_punctuation_chars.py
558afc8
to
a537b4e
Compare
@stsewd Can you please review it? |
@SilverRainZ thank you for opening this PR! I'll try to take a look at it this weekend or the next one (sorry, busy weeks). I just noticed that the Windows CI is failing with this change. |
The Windows CI failed with a weird error message:
I have checked L107 and there is nothing special, I have no idea for now. // ...
L'\u201f',
};
const int32_t start_chars_range[][2] = {}; // <-- L107
const int32_t delim_chars[] = {
// ... |
It seems that we should update the WASM binary after any changes, but I think it should be done with the maintainer. B.T.W, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this great contribution! I did some small edits and fixed the Windows build by not generating empty arrays (since the array was empty, when trying to access to the first element, it was probably pointing to invalid memory/values, and Windows didn't like that).
The
punctuation_chars.h
header file is auto-generated fromgen_punctuation_chars.py
.I also add a test case "Unicode Punctuation Chars":
before:
after:
Any comments are welcome.
Close #53.