ja: missing kanji reading #366

sthibaul · 2018-01-04T01:24:53Z

Hello,

For the japanese language, it seems we are missing dictionaries for the reading of kanjis:

espeak-ng -v ja ひと

speaks fine "hito", but

espeak-ng -v ja 人

does not pronounce "hito"

Samuel

The text was updated successfully, but these errors were encountered:

rhdunn · 2018-01-05T13:12:41Z

The Japanese language currently only supports the Hiragana and Katakana scripts.

I am thinking of deriving the pronunciations from Unicode's UniHan database (https://www.unicode.org/reports/tr38/#N1019C) like is done for the Mandarin and Cantonese pronunciations. I would also like to split the Mandarin and Cantonese list files into those derived from the UniHan database and those not, so they can be updated to the latest Unicode (and get readings for the new characters).

There are two tricky parts to handling Kanji:

specifying the pronunciation;
specifying and disambiguating on (Chinese-derived reading) and kun (Japanese-derived reading).

The first is fairly straightforward -- I want to map the Kanji to their pronunciation in Hiragana, which can then use the Hiragana pronunciation rules. The problem here is that espeak does not easily support mapping to an intermediate transliterated form. It has a workaround for Chinese, but this is not general enough to support Japanese.

The second is harder One possibility would be to have a $reading=on and $reading=kun selector (possibly applying to other readings as well). These could be easily imported from the UniHan database.

The problem then is creating the disambiguation rules. One possibility here is to do something like the stress placement and part of speech rules, but saying "this kanji, in this context, has this reading".

Yoxem · 2018-01-08T14:03:51Z

Pronounciation of Japanese kanjis sometimes depends on the following hiragana. For example: 飛 in 飛び飛ぶ飛ぼう sounds "to (+ b+Vowel)". However, in 飛行機 it sound "hi".
The pronunciation of a kanji in a compound may differ from the pronunciation when it singularly appears. The pronunciation is so complex that a list of these big mapping rules is necessary.

jamesohortle · 2019-10-18T10:05:14Z

As mentioned in #498, Japanese does not currently support readings for English words mixed into Japanese text (it will read letter by letter). It is fairly common in Japanese, however, to have English vocabulary sprinkled throughout. See, for example, the Yomiuri Giants' website where many section titles are in English.

I have been working on a dictionary of English words written as they would be written/pronounced in Japanese. Sample:

UNDERSTATE | アンダーステイト
EXCLAVE | エクスクレイブ
SUMATRAN | スーマトラン
BUZBEE | バズビー
DELINQUENTLY | ディリンクァントリー
FULLTIME | フルタイム
SIBLEY | シブリー
COURLAND | クールラント
FANTABULOSA | ファンタビューローサ
JOHN'S | ジャンズ
ROMINGER | ローミンアー
HOLDSWORTH | ホールドズワース
NICOTIN | ニコチン
ZAGAZIG | ザガジグ
MORNA | モーナ
IANNAMICO | イーアナミーコー
ANISOPLANATISM | アニソプラナティズム
RANCHERS | ランチャーズ
BARSAMIAN | バーセイミーアン
VISTAVISION | ビスタビジョン

Although there are some errors (JOHN'S should really be ジョンズ, for example), I am happy to contribute the resource if given some guidance on how to put it in.

Yoxem · 2022-11-08T14:50:42Z

Pronounciation of Japanese kanjis sometimes depends on the following hiragana. For example: 飛 in 飛び飛ぶ飛ぼう sounds "to (+ b+Vowel)". However, in 飛行機 it sound "hi". The pronunciation of a kanji in a compound may differ from the pronunciation when it singularly appears. The pronunciation is so complex that a list of these big mapping rules is necessary.

I think that if the implementation of Kanji pronounciation, Regex-based pattern matching should be enabled to transliterate Kanjis with origanas to speech via hiraganas. for example in Perl-like pesudo code:

s/行([かきくけこっ])/ゆ$1/ // 行きます　-> 　ゆきます
s/有([らりるれろっ])/あ$1/ // 有った -> あった
...

Then it's much easier to generate speech from hiraganas.

sthibaul · 2022-11-09T14:34:42Z

That can be implemented in locale/ja/symbols.dic. That's however coming from nvda (https://github.com/nvaccess/nvda), so rather submit it there: https://github.com/nvaccess/nvda/blob/master/source/locale/ja/symbols.dic

rhdunn added the languages label Jan 5, 2018

jaacoppi mentioned this issue Jun 2, 2018

Japanese language cannot read Japanese characters and numbers #498

Closed

valdisvi mentioned this issue May 9, 2020

Fixing sequences of ? and ! #747

Closed

BenTalagan mentioned this issue May 16, 2020

Sequence of punctuation signs handling #583

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ja: missing kanji reading #366

ja: missing kanji reading #366

sthibaul commented Jan 4, 2018

rhdunn commented Jan 5, 2018

Yoxem commented Jan 8, 2018

jamesohortle commented Oct 18, 2019

Yoxem commented Nov 8, 2022 •

edited

sthibaul commented Nov 9, 2022

ja: missing kanji reading #366

ja: missing kanji reading #366

Comments

sthibaul commented Jan 4, 2018

rhdunn commented Jan 5, 2018

Yoxem commented Jan 8, 2018

jamesohortle commented Oct 18, 2019

Yoxem commented Nov 8, 2022 • edited

sthibaul commented Nov 9, 2022

Yoxem commented Nov 8, 2022 •

edited