Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions for number words #1

Open
bittlingmayer opened this issue Feb 15, 2021 · 4 comments
Open

Exceptions for number words #1

bittlingmayer opened this issue Feb 15, 2021 · 4 comments

Comments

@bittlingmayer
Copy link

import suji
suji.value('四半期')

Actual:

44半期

Expected:

4四半期

四半期 means quarter, but is getting interpreted as 4, which is only etymologically correct.

Similar for 2月 / 二月 / 2 etc.

I'd probably create a list of these exceptions, and mask them out (by replacing with ' ') before searching for actual numbers.

(Most parsers (including spaCy) have something similar for rules-based sentence segmentation - a list of abbrevations so they know when not to split on ..)

@jikyo
Copy link
Owner

jikyo commented Feb 15, 2021

@bittlingmayer Thanks for your feedback!
I'll add a list of such exceptions not to parse ASAP.

@ToraRuka
Copy link

@jikyo Thanks for your great works!

In addition to this issue,

>>> import suji
>>>suji.kansuji("40分")
'零'

like above code, all number with "分" are replaced to "零"
I think this is bug!

@jikyo
Copy link
Owner

jikyo commented Feb 17, 2021

@ToraRuka Thank you so much!
I screwed up...
The decimals number function works for time representation.
I'll also fix this bug ASAP.

@bittlingmayer
Copy link
Author

Thanks @jikyo, I appreciate that.

Normally in the spirit of open source, I would send a PR. But I don't know Japanese at all, so I'm not very efficient at thinking about what other exceptions, or exceptions to the exceptions, exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants