New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ja: missing kanji reading #366
Comments
The Japanese language currently only supports the Hiragana and Katakana scripts. I am thinking of deriving the pronunciations from Unicode's UniHan database (https://www.unicode.org/reports/tr38/#N1019C) like is done for the Mandarin and Cantonese pronunciations. I would also like to split the Mandarin and Cantonese list files into those derived from the UniHan database and those not, so they can be updated to the latest Unicode (and get readings for the new characters). There are two tricky parts to handling Kanji:
The first is fairly straightforward -- I want to map the Kanji to their pronunciation in Hiragana, which can then use the Hiragana pronunciation rules. The problem here is that espeak does not easily support mapping to an intermediate transliterated form. It has a workaround for Chinese, but this is not general enough to support Japanese. The second is harder One possibility would be to have a The problem then is creating the disambiguation rules. One possibility here is to do something like the stress placement and part of speech rules, but saying "this kanji, in this context, has this reading". |
Pronounciation of Japanese kanjis sometimes depends on the following hiragana. For example: 飛 in 飛び 飛ぶ 飛ぼう sounds "to (+ b+Vowel)". However, in 飛行機 it sound "hi". |
As mentioned in #498, Japanese does not currently support readings for English words mixed into Japanese text (it will read letter by letter). It is fairly common in Japanese, however, to have English vocabulary sprinkled throughout. See, for example, the Yomiuri Giants' website where many section titles are in English. I have been working on a dictionary of English words written as they would be written/pronounced in Japanese. Sample:
Although there are some errors ( |
I think that if the implementation of Kanji pronounciation, Regex-based pattern matching should be enabled to transliterate Kanjis with origanas to speech via hiraganas. for example in Perl-like pesudo code: s/行([かきくけこっ])/ゆ$1/ // 行きます -> ゆきます Then it's much easier to generate speech from hiraganas. |
That can be implemented in |
Hello,
For the japanese language, it seems we are missing dictionaries for the reading of kanjis:
speaks fine "hito", but
does not pronounce "hito"
Samuel
The text was updated successfully, but these errors were encountered: