-
-
Notifications
You must be signed in to change notification settings - Fork 53
Characters 々 〇 cause Exception on conv.do() #46
Comments
OK, what should be converted from 〇? There are different discussion for "々". These symbols are located "CJK Symbols and Punctuation Range: 3000–303F" in Unicode standard. |
Definition in sym2.py:
It returns
So we can modify the defition in pykakasi/sym2.py:
Any opinions? |
I found there is a test case missing for E2a and also a missing logics for the case! |
@10000shiro I've updated a code to fix here. Cloud you test again in master branch? |
With the updated symbols.py the code snippet works without an exception. The 〇 in the snippet together with the 三〇 was meant to translate to 30. But I'm do not know whether a standard interpretation for this character exists. The main problem for me was more that it caused an exception. |
Found another faulty symbol: : the fullwidth colon \uff1a |
@10000shiro Please put it as another issue? Thanks. |
@10000shiro Cloud you propose the update of dictionary or a better way for processing?
In fact, You can register '〇' as '0' in a translation table in pykakasi/sym2.py but '三〇' may be converted to 'Mi0'. You also can register '三〇' as '30’ in Kana-Kanji dictioanry, then it cloud be converted to '30'. You can easily observe dictionary using |
One solution would be to preprocess the input and convert the partial strings containing numerals and "〇" accordingly, e.g.: '三〇二〇' -> '三千二十' before doing the conversion. Here a prototype implementation for this solution: One isssue I can see with this is that the usage of "〇" is not limited to 0, but often also as a means to ommit other Kanji which could lead to strange results. All in all, I'm not sure what the right way to handle this character is. |
@10000shiro Interesting! But I'm afraid that it seems to be an out of scope of KAKASI functionality |
@miurahr Yes, I agree that this goes beyond the scope of a simple kana kanji inverter. Replacing the "〇" with (maru) and having the user do the context analysis on their own is probably for the best. |
This error happens when mode "E" is set to "a" and one attempts to convert a string containing "々" or "〇".
Attached you can find a small script demonstrating the issue.
kigou_conversion_issue.txt
The text was updated successfully, but these errors were encountered: