Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ja: missing kanji reading #366

Open
sthibaul opened this issue Jan 4, 2018 · 5 comments
Open

ja: missing kanji reading #366

sthibaul opened this issue Jan 4, 2018 · 5 comments

Comments

@sthibaul
Copy link
Collaborator

sthibaul commented Jan 4, 2018

Hello,

For the japanese language, it seems we are missing dictionaries for the reading of kanjis:

espeak-ng -v ja ひと

speaks fine "hito", but

espeak-ng -v ja 人

does not pronounce "hito"

Samuel

@rhdunn
Copy link
Member

rhdunn commented Jan 5, 2018

The Japanese language currently only supports the Hiragana and Katakana scripts.

I am thinking of deriving the pronunciations from Unicode's UniHan database (https://www.unicode.org/reports/tr38/#N1019C) like is done for the Mandarin and Cantonese pronunciations. I would also like to split the Mandarin and Cantonese list files into those derived from the UniHan database and those not, so they can be updated to the latest Unicode (and get readings for the new characters).

There are two tricky parts to handling Kanji:

  1. specifying the pronunciation;
  2. specifying and disambiguating on (Chinese-derived reading) and kun (Japanese-derived reading).

The first is fairly straightforward -- I want to map the Kanji to their pronunciation in Hiragana, which can then use the Hiragana pronunciation rules. The problem here is that espeak does not easily support mapping to an intermediate transliterated form. It has a workaround for Chinese, but this is not general enough to support Japanese.

The second is harder One possibility would be to have a $reading=on and $reading=kun selector (possibly applying to other readings as well). These could be easily imported from the UniHan database.

The problem then is creating the disambiguation rules. One possibility here is to do something like the stress placement and part of speech rules, but saying "this kanji, in this context, has this reading".

@Yoxem
Copy link
Contributor

Yoxem commented Jan 8, 2018

Pronounciation of Japanese kanjis sometimes depends on the following hiragana. For example: 飛 in 飛び 飛ぶ 飛ぼう sounds "to (+ b+Vowel)". However, in 飛行機 it sound "hi".
The pronunciation of a kanji in a compound may differ from the pronunciation when it singularly appears. The pronunciation is so complex that a list of these big mapping rules is necessary.

@jamesohortle
Copy link

As mentioned in #498, Japanese does not currently support readings for English words mixed into Japanese text (it will read letter by letter). It is fairly common in Japanese, however, to have English vocabulary sprinkled throughout. See, for example, the Yomiuri Giants' website where many section titles are in English.

I have been working on a dictionary of English words written as they would be written/pronounced in Japanese. Sample:

UNDERSTATE | アンダーステイト
EXCLAVE | エクスクレイブ
SUMATRAN | スーマトラン
BUZBEE | バズビー
DELINQUENTLY | ディリンクァントリー
FULLTIME | フルタイム
SIBLEY | シブリー
COURLAND | クールラント
FANTABULOSA | ファンタビューローサ
JOHN'S | ジャンズ
ROMINGER | ローミンアー
HOLDSWORTH | ホールドズワース
NICOTIN | ニコチン
ZAGAZIG | ザガジグ
MORNA | モーナ
IANNAMICO | イーアナミーコー
ANISOPLANATISM | アニソプラナティズム
RANCHERS | ランチャーズ
BARSAMIAN | バーセイミーアン
VISTAVISION | ビスタビジョン

Although there are some errors (JOHN'S should really be ジョンズ, for example), I am happy to contribute the resource if given some guidance on how to put it in.

@Yoxem
Copy link
Contributor

Yoxem commented Nov 8, 2022

Pronounciation of Japanese kanjis sometimes depends on the following hiragana. For example: 飛 in 飛び 飛ぶ 飛ぼう sounds "to (+ b+Vowel)". However, in 飛行機 it sound "hi". The pronunciation of a kanji in a compound may differ from the pronunciation when it singularly appears. The pronunciation is so complex that a list of these big mapping rules is necessary.

I think that if the implementation of Kanji pronounciation, Regex-based pattern matching should be enabled to transliterate Kanjis with origanas to speech via hiraganas. for example in Perl-like pesudo code:

s/行([かきくけこっ])/ゆ$1/ // 行きます ->  ゆきます
s/有([らりるれろっ])/あ$1/ // 有った -> あった
...

Then it's much easier to generate speech from hiraganas.

@sthibaul
Copy link
Collaborator Author

sthibaul commented Nov 9, 2022

That can be implemented in locale/ja/symbols.dic. That's however coming from nvda (https://github.com/nvaccess/nvda), so rather submit it there: https://github.com/nvaccess/nvda/blob/master/source/locale/ja/symbols.dic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants