-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exceptions for number words #1
Comments
@bittlingmayer Thanks for your feedback! |
@jikyo Thanks for your great works! In addition to this issue, >>> import suji
>>>suji.kansuji("40分")
'零' like above code, all number with "分" are replaced to "零" |
@ToraRuka Thank you so much! |
Thanks @jikyo, I appreciate that. Normally in the spirit of open source, I would send a PR. But I don't know Japanese at all, so I'm not very efficient at thinking about what other exceptions, or exceptions to the exceptions, exist. |
Actual:
Expected:
四半期
means quarter, but四
is getting interpreted as4
, which is only etymologically correct.Similar for
2月
/二月
/ 2月
etc.I'd probably create a list of these exceptions, and mask them out (by replacing with
' '
) before searching for actual numbers.(Most parsers (including spaCy) have something similar for rules-based sentence segmentation - a list of abbrevations so they know when not to split on
.
.)The text was updated successfully, but these errors were encountered: