-
-
Notifications
You must be signed in to change notification settings - Fork 54
KeyError with specific character #68
Comments
And, this error was caused in Python 3.7. |
It should not be raise error when getting bad characters, it may ignore silently. |
The character seems \uE098 that is in private use code(PUC) area in Unicode standard. |
It is a reason why it comes in j2.py:80, a method isRegion returns true as follows: def isRegion(self, c):
return 0x3400 <= ord(c[0]) < 0xfa2e has a PUC area \ue0000-f8ff it should be def isRegion(self, c):
return 0x3400 <= ord(c[0]) < 0xe000 or 0xf900 <= ord(c[0]) 0xfa2e |
We ran into the same problem with character \u57C7:
results in
|
Hi @miurahr,
I extracted a list of characters from our logs - note that this will definitely not include every character for which pykakasi fails currently, but hopefully it gives you enough information to pinpoint the problem.
|
Fix #68 Signed-off-by: Hiroshi Miura <miurahr@linux.com>
@miurahr, thank you very much for the quick fix. I just tested the new version with the full dataset I had at hand, it worked without any problems. I left out one piece of information from the list of failures above: The two characters Potentially you could add transliterations for these two characters instead of skipping them? It's not important, I just thought that this might be interesting to you. I would provide a PR, but I don't speak Japanese so I can't help with this. |
A bad character causes KeyError in pykakasi.
results
NOTE: This issue is from python - pykakasiで文字列置き換えの際にKeyErrorが発生する - スタック・オーバーフロー .
The text was updated successfully, but these errors were encountered: