-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tone color incorrect in numerals (digits) after converting from CEDict u8 format to the Stardict format dictionary #328
Comments
To see how it works take a look at the Cedict dictionary entry 21三体综合症 |
Hi
If tones are wrong, you should either edit it on https://cc-cedict.org/editor/editor.php |
Hi! |
I get it now. |
Please try again. |
Thanks, it worked great. |
If that doesn't cause deep changes to the code: can these, now not colored, 'problem' items nevertheless be colored but with black color? So that a user can later manually edit these two items (美国51区 and 21三体综合症) adding correct colors. If color is hex, then stardict format won't be ruined, as the number of symbols remains unchanged. Of course that's quite a minor thing already now )) |
Nevermind, I think it would be easier to use old conv.py to convert (to keep placeholders for color tags), and new conv.py to check which words are redundant in tone, and then manually fix these exceptions, colorizing it in correct colors or just black where no color is applicable. |
In order to fix the color with the new code, you just need to know what each Chinese character sounds like. |
Exactly. Or to know it by heart (it's easy to remember tones in just 2 phrases). |
Please try again. |
Then we'll need 7 more symbols inside this as a placeholder. Because if we add symbols, each entry will have greater number of chars than it had initially and thus stardict format will be ruined. |
StarDict format is not meant for editing. |
Yes, I know it. Still, while editing doesn't affect the size of a dictionary's entry in chars, it's possible even in direct mode with no side effects. I edit colors (mass substitute) in hex-format. for example, or convert Trad<>Simp and vice versa, etc. All works fine. Thanks to you, now I have full solution to this issue, with some minimal handmade adjustment still needed (moreover there's no need to update Cedict daily). Not sure it's important for other users (only 2 incorrect entries and ~10 exotic hanzi abridgments aren't really an issue) to spend efforts making a general workaround which is uncertain taking into account so many language and software tricks and limitations. Current version of Pyglossary produces result without previous incorrect colorings - that's ok |
Hello,
Found a little error when converting to stardict from the Cedict u8 source:
tone colors are wrong when the headword contains numerals (digits).
For example, if we have '21' in the headword, it's pronounced as 'ershiyi' (3 syllables), while '21' is only two characters.
Pyglossary takes tones (fourth, second, first) from the pinyin pronunciation and extrapolates it to the '21', so the '2' is fourth tone (and hence its color), '1' is second tone, AND... the leftover first tone goes to the next character.
Then, this character gives its tone to its further neighbor and so on. This way the all tones could be misplaced (shift to the right by one position).
Possible solution: not to assign tone colors to the numerals shown as digits, because their pronunciation is unpredictable (in the example above it could be also spelled as 'eryi', two syllables).
Cheers,
Alex
The text was updated successfully, but these errors were encountered: