New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show word frequency information in the JMDict entry detail screen #539

Closed
andrejrenard opened this Issue Jul 17, 2015 · 18 comments

Comments

Projects
None yet
3 participants
@andrejrenard

andrejrenard commented Jul 17, 2015

Hi Martin,

A problem I am facing when trying to read in a new language is to have access to easy but interesting texts with a good integration with a dictionary, grammar and learning tool (including word frequencies). Aedict is going in the right direction but... It uses phrases only, not text. And there is no word frequencies and the kanji frequencies are too general and not so much integrated in the learning tools.

As a source of good texts I am using News Web Easy (http://www3.nhk.or.jp/news/easy/index.html). And I recently discovered Kanji Web Easy (http://www.kanjiwebeasy.com/) to help focusing on what is really useful to learn (at least for News Web Easy).

Would it be possible to integrate (hyperlinks might be enough) these tools in Aedict offering a complete learning environment for all levels? I am pretty sure Sebastien from Kanji Web Easy will be open to a collaboration. I have no idea concerning the guys behind News Web Easy (http://www.aovill.com/).

Of course other computed frequencies available on the web could be used.

Andre

@andrejrenard

This comment has been minimized.

Show comment
Hide comment
@andrejrenard

andrejrenard Jul 25, 2015

Hi Martin,

I played a bit around with the word and kanji frequencies computed by Tatsuhiko Matsushita (http://www17408ui.sakura.ne.jp/tatsum/English_top_Tatsu.html) to see how such information could be useful.

Just an example to be clear. I found "ぜいたく" in a text and discovered its meaning, "luxury", using Aedict. The hiragana form is underlined, but it has also a kanji form "贅沢" which is also underlined.
Now the questions: Is it worth to learn the word? In both form? What about its two kanji?
Quite easy to answer with the frequency lists:

  • ぜいたく : 3939
  • 贅沢: not present
  • 贅: 1927
  • 沢: 658

My current targets are 1000 kanjis and 10000 words with a core target to 500k/5000w.So I safely directly memorized ぜいたく and add 沢 in my kanji to be learned later.

In conclusion it would be nice to be able to set two frequency levels for words, to be used to show them in three different levels of grey in all word screens (black, dark grey, light grey). The same for kanji in the kanji specific screens.

Another use would be to start with a list of kanji ordered by their frequency. Take one and show iits most frequent words. Learned them and the most frequent kanji in them, the sample sentences being used to facilitate the memorization of words and kanji at the same time.

Andre

andrejrenard commented Jul 25, 2015

Hi Martin,

I played a bit around with the word and kanji frequencies computed by Tatsuhiko Matsushita (http://www17408ui.sakura.ne.jp/tatsum/English_top_Tatsu.html) to see how such information could be useful.

Just an example to be clear. I found "ぜいたく" in a text and discovered its meaning, "luxury", using Aedict. The hiragana form is underlined, but it has also a kanji form "贅沢" which is also underlined.
Now the questions: Is it worth to learn the word? In both form? What about its two kanji?
Quite easy to answer with the frequency lists:

  • ぜいたく : 3939
  • 贅沢: not present
  • 贅: 1927
  • 沢: 658

My current targets are 1000 kanjis and 10000 words with a core target to 500k/5000w.So I safely directly memorized ぜいたく and add 沢 in my kanji to be learned later.

In conclusion it would be nice to be able to set two frequency levels for words, to be used to show them in three different levels of grey in all word screens (black, dark grey, light grey). The same for kanji in the kanji specific screens.

Another use would be to start with a list of kanji ordered by their frequency. Take one and show iits most frequent words. Learned them and the most frequent kanji in them, the sample sentences being used to facilitate the memorization of words and kanji at the same time.

Andre

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny Sep 18, 2015

Owner

Regarding the kanji frequency list: I am currently using this list of 1000 most frequent kanjis. The problem is that I cannot remember where I got this list ;) Please use it as-is ;) 日 is the most frequent kanji, followed by 一, etc.

日一国会人年大十二本中長出三同時政事自行社見月分議後前民生連五発間対上部東者党地合市業内相方四定今回新場金員九入選立開手米力学問高代明実円関決子動京全目表戦経通外最言氏現理調体化田当八六約主題下首意法不来作性的要用制治度務強気小七成期公持野協取都和統以機平総加山思家話世受区領多県続進正安設保改数記院女初北午指権心界支第産結百派点教報済書府活原先共得解名交資予川向際査勝面委告軍文反元重近千考判認画海参売利組知案道信策集在件団別物側任引使求所次水半品昨論計死官増係感特情投示変打男基私各始島直両朝革価式確村提運終挙果西勢減台広容必応演電歳住争談能無再位置企真流格有疑口過局少放税検藤町常校料沢裁状工建語球営空職証土与急止送援供可役構木割聞身費付施切由説転食比難防補車優夫研収断井何南石足違消境神番規術護展態導鮮備宅害配副算視条幹独警宮究育席輸訪楽起万着乗店述残想線率病農州武声質念待試族象銀域助労例衛然早張映限親額監環験追審商葉義伝働形景落欧担好退準賞訴辺造英被株頭技低毎医復仕去姿味負閣韓渡失移差衆個門写評課末守若脳極種美岡影命含福蔵量望松非撃佐核観察整段横融型白深字答夜製票況音申様財港識注呼渉達良響阪帰針専推谷古候史天階程満敗管値歌買突兵接請器士光討路悪科攻崎督授催細効図週積丸他及湾録処省旧室憲太橋歩離岸客風紙激否周師摘材登系批郎母易健黒火戸速存花春飛殺央券赤号単盟座青破編捜竹除完降超責並療従右修捕隊危採織森競拡故館振給屋介読弁根色友苦就迎走販園具左異歴辞将秋因献厳馬愛幅休維富浜父遺彼般未塁貿講邦舞林装諸夏素亡劇河遣航抗冷模雄適婦鉄寄益込顔緊類児余禁印逆王返標換久短油妻暴輪占宣背昭廃植熱宿薬伊江清習険頼僚覚吉盛船倍均億途圧芸許皇臨踏駅署抜壊債便伸留罪停興爆陸玉源儀波創障継筋狙帯延羽努固闘精則葬乱避普散司康測豊洋静善逮婚厚喜齢囲卒迫略承浮惑崩順紀聴脱旅絶級幸岩練押軽倒了庁博城患締等救執層版老令角絡損房募曲撤裏払削密庭徒措仏績築貨志混載昇池陣我勤為血遅抑幕居染温雑招奈季困星傷永択秀著徴誌庫弾償刊像功拠香欠更秘拒刑坂刻底賛塚致抱繰服犯尾描布恐寺鈴盤息宇項喪伴遠養懸戻街巨震願絵希越契掲躍棄欲痛触邸依籍汚縮還枚属笑互複慮郵束仲栄札枠似夕恵板列露沖探逃借緩節需骨射傾届曜遊迷夢巻購揮君燃充雨閉緒跡包駐貢鹿弱却端賃折紹獲郡併草徹飲貴埼衝焦奪雇災浦暮替析預焼簡譲称肉納樹挑章臓律誘紛貸至宗促慎控

Owner

mvysny commented Sep 18, 2015

Regarding the kanji frequency list: I am currently using this list of 1000 most frequent kanjis. The problem is that I cannot remember where I got this list ;) Please use it as-is ;) 日 is the most frequent kanji, followed by 一, etc.

日一国会人年大十二本中長出三同時政事自行社見月分議後前民生連五発間対上部東者党地合市業内相方四定今回新場金員九入選立開手米力学問高代明実円関決子動京全目表戦経通外最言氏現理調体化田当八六約主題下首意法不来作性的要用制治度務強気小七成期公持野協取都和統以機平総加山思家話世受区領多県続進正安設保改数記院女初北午指権心界支第産結百派点教報済書府活原先共得解名交資予川向際査勝面委告軍文反元重近千考判認画海参売利組知案道信策集在件団別物側任引使求所次水半品昨論計死官増係感特情投示変打男基私各始島直両朝革価式確村提運終挙果西勢減台広容必応演電歳住争談能無再位置企真流格有疑口過局少放税検藤町常校料沢裁状工建語球営空職証土与急止送援供可役構木割聞身費付施切由説転食比難防補車優夫研収断井何南石足違消境神番規術護展態導鮮備宅害配副算視条幹独警宮究育席輸訪楽起万着乗店述残想線率病農州武声質念待試族象銀域助労例衛然早張映限親額監環験追審商葉義伝働形景落欧担好退準賞訴辺造英被株頭技低毎医復仕去姿味負閣韓渡失移差衆個門写評課末守若脳極種美岡影命含福蔵量望松非撃佐核観察整段横融型白深字答夜製票況音申様財港識注呼渉達良響阪帰針専推谷古候史天階程満敗管値歌買突兵接請器士光討路悪科攻崎督授催細効図週積丸他及湾録処省旧室憲太橋歩離岸客風紙激否周師摘材登系批郎母易健黒火戸速存花春飛殺央券赤号単盟座青破編捜竹除完降超責並療従右修捕隊危採織森競拡故館振給屋介読弁根色友苦就迎走販園具左異歴辞将秋因献厳馬愛幅休維富浜父遺彼般未塁貿講邦舞林装諸夏素亡劇河遣航抗冷模雄適婦鉄寄益込顔緊類児余禁印逆王返標換久短油妻暴輪占宣背昭廃植熱宿薬伊江清習険頼僚覚吉盛船倍均億途圧芸許皇臨踏駅署抜壊債便伸留罪停興爆陸玉源儀波創障継筋狙帯延羽努固闘精則葬乱避普散司康測豊洋静善逮婚厚喜齢囲卒迫略承浮惑崩順紀聴脱旅絶級幸岩練押軽倒了庁博城患締等救執層版老令角絡損房募曲撤裏払削密庭徒措仏績築貨志混載昇池陣我勤為血遅抑幕居染温雑招奈季困星傷永択秀著徴誌庫弾償刊像功拠香欠更秘拒刑坂刻底賛塚致抱繰服犯尾描布恐寺鈴盤息宇項喪伴遠養懸戻街巨震願絵希越契掲躍棄欲痛触邸依籍汚縮還枚属笑互複慮郵束仲栄札枠似夕恵板列露沖探逃借緩節需骨射傾届曜遊迷夢巻購揮君燃充雨閉緒跡包駐貢鹿弱却端賃折紹獲郡併草徹飲貴埼衝焦奪雇災浦暮替析預焼簡譲称肉納樹挑章臓律誘紛貸至宗促慎控

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny Sep 18, 2015

Owner

Regarding the word frequency list, I am currently using http://ftp.monash.edu.au/pub/nihongo/00INDEX.html Michiel Kamermans word occurency data. Mainichi Shimbun's frequency list is not as good as it reflects a specific part of JP language only.
I can also use the Matshushita's list and show both commonality informations next to the kanji/reading.

Owner

mvysny commented Sep 18, 2015

Regarding the word frequency list, I am currently using http://ftp.monash.edu.au/pub/nihongo/00INDEX.html Michiel Kamermans word occurency data. Mainichi Shimbun's frequency list is not as good as it reflects a specific part of JP language only.
I can also use the Matshushita's list and show both commonality informations next to the kanji/reading.

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny Sep 18, 2015

Owner

Kanji's commonality information item has been added to Aedict 3.37 - just click the (i) button next to the kanji, to show the commonality information.

Owner

mvysny commented Sep 18, 2015

Kanji's commonality information item has been added to Aedict 3.37 - just click the (i) button next to the kanji, to show the commonality information.

@mvysny mvysny added the enhancement label Sep 18, 2015

@andrejrenard

This comment has been minimized.

Show comment
Hide comment
@andrejrenard

andrejrenard Sep 18, 2015

Le 2015-09-18 08:04, Martin Vysny a écrit :

Kanji's commonality information item has been added to Aedict 3.37 -
just click the (i) button next to the kanji, to show the commonality
information.


Reply to this email directly or view it on GitHub
#539 (comment).

ok thanks

andrejrenard commented Sep 18, 2015

Le 2015-09-18 08:04, Martin Vysny a écrit :

Kanji's commonality information item has been added to Aedict 3.37 -
just click the (i) button next to the kanji, to show the commonality
information.


Reply to this email directly or view it on GitHub
#539 (comment).

ok thanks

@andrejrenard

This comment has been minimized.

Show comment
Hide comment
@andrejrenard

andrejrenard Sep 18, 2015

Le 2015-09-18 07:54, Martin Vysny a écrit :

Regarding the word frequency list, I am currently using
http://ftp.monash.edu.au/pub/nihongo/00INDEX.html Michiel Kamermans
word occurency data. Mainichi Shimbun's frequency list is not as good
as it reflects a specific part of JP language only.
I can also use the Matshushita's list and show both commonality
informations next to the kanji/reading.


Reply to this email directly or view it on GitHub
#539 (comment).

That would be useful.

andrejrenard commented Sep 18, 2015

Le 2015-09-18 07:54, Martin Vysny a écrit :

Regarding the word frequency list, I am currently using
http://ftp.monash.edu.au/pub/nihongo/00INDEX.html Michiel Kamermans
word occurency data. Mainichi Shimbun's frequency list is not as good
as it reflects a specific part of JP language only.
I can also use the Matshushita's list and show both commonality
informations next to the kanji/reading.


Reply to this email directly or view it on GitHub
#539 (comment).

That would be useful.

@andrejrenard

This comment has been minimized.

Show comment
Hide comment
@andrejrenard

andrejrenard Sep 18, 2015

Le 2015-09-18 07:54, Martin Vysny a écrit :

Regarding the word frequency list, I am currently using
http://ftp.monash.edu.au/pub/nihongo/00INDEX.html Michiel Kamermans
word occurency data. Mainichi Shimbun's frequency list is not as good
as it reflects a specific part of JP language only.
I can also use the Matshushita's list and show both commonality
informations next to the kanji/reading.


Reply to this email directly or view it on GitHub
#539 (comment).

Also when asking for words from kanji it would be useful to be able to
rank them based on their frequency.

andrejrenard commented Sep 18, 2015

Le 2015-09-18 07:54, Martin Vysny a écrit :

Regarding the word frequency list, I am currently using
http://ftp.monash.edu.au/pub/nihongo/00INDEX.html Michiel Kamermans
word occurency data. Mainichi Shimbun's frequency list is not as good
as it reflects a specific part of JP language only.
I can also use the Matshushita's list and show both commonality
informations next to the kanji/reading.


Reply to this email directly or view it on GitHub
#539 (comment).

Also when asking for words from kanji it would be useful to be able to
rank them based on their frequency.

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny Sep 18, 2015

Owner

Yes, every word-based search should automatically be sorted, most frequent words first. This includes the Kanji Detail screen's "WORDS" tab.

Owner

mvysny commented Sep 18, 2015

Yes, every word-based search should automatically be sorted, most frequent words first. This includes the Kanji Detail screen's "WORDS" tab.

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny Sep 18, 2015

Owner

device-2015-09-18-193454

Max. 6 stars, 6 stars most common, 0 stars least common. Matsushita: 6 stars are roughly index 1..3333, 5 stars are roughly index 3334.6666, etc. Please let me know if this is okay.

Owner

mvysny commented Sep 18, 2015

device-2015-09-18-193454

Max. 6 stars, 6 stars most common, 0 stars least common. Matsushita: 6 stars are roughly index 1..3333, 5 stars are roughly index 3334.6666, etc. Please let me know if this is okay.

@andrejrenard

This comment has been minimized.

Show comment
Hide comment
@andrejrenard

andrejrenard Sep 18, 2015

perfect

Envoyé depuis un mobile Samsung

-------- Message d'origine --------
De : Martin Vysny notifications@github.com
Date : 18/09/2015 19:36 (GMT+01:00)
À : mvysny/aedict aedict@noreply.github.com
Cc : andrejrenard a.j.renard@skynet.be
Objet : Re: [aedict] Integration with News Web Easy and Kanji Web Easy (#539)

Max. 6 stars, 6 stars most common, 0 stars least common. Matsushita: 6 stars are roughly index 1..3333, 5 stars are roughly index 3334.6666, etc. Please let me know if this is okay.


Reply to this email directly or view it on GitHub.

andrejrenard commented Sep 18, 2015

perfect

Envoyé depuis un mobile Samsung

-------- Message d'origine --------
De : Martin Vysny notifications@github.com
Date : 18/09/2015 19:36 (GMT+01:00)
À : mvysny/aedict aedict@noreply.github.com
Cc : andrejrenard a.j.renard@skynet.be
Objet : Re: [aedict] Integration with News Web Easy and Kanji Web Easy (#539)

Max. 6 stars, 6 stars most common, 0 stars least common. Matsushita: 6 stars are roughly index 1..3333, 5 stars are roughly index 3334.6666, etc. Please let me know if this is okay.


Reply to this email directly or view it on GitHub.

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny Sep 19, 2015

Owner

Changed the star pattern a bit: Matsushita 6 stars roughly correspond to index 1..4000, 5 stars correspond to 4001..8000, ..., 1 star correspond to index 16000..21000, zero stars mean that the entry is not present in the Matsushita list at all.
Kamermans index has roughly the same meaning.
Closing as fixed, please feel free to reopen if you have additional requests regarding this issue.

Owner

mvysny commented Sep 19, 2015

Changed the star pattern a bit: Matsushita 6 stars roughly correspond to index 1..4000, 5 stars correspond to 4001..8000, ..., 1 star correspond to index 16000..21000, zero stars mean that the entry is not present in the Matsushita list at all.
Kamermans index has roughly the same meaning.
Closing as fixed, please feel free to reopen if you have additional requests regarding this issue.

@mvysny mvysny closed this Sep 19, 2015

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny Sep 19, 2015

Owner

Fixed in Aedict 3.37

Owner

mvysny commented Sep 19, 2015

Fixed in Aedict 3.37

@mvysny mvysny changed the title from Integration with News Web Easy and Kanji Web Easy to Show word frequency information in the JMDict entry detail screen Sep 19, 2015

@denyeo

This comment has been minimized.

Show comment
Hide comment
@denyeo

denyeo Aug 3, 2016

@mvysny Hi Martin, I'm interested in purchasing Aedict3, but I would like to easily see the word frequencies from Matsushita (I want to add only common words to Anki). The current screenshots on Google Play don't show the frequency stars or numbers for words. Can you confirm that the frequencies are still being shown in the app?

denyeo commented Aug 3, 2016

@mvysny Hi Martin, I'm interested in purchasing Aedict3, but I would like to easily see the word frequencies from Matsushita (I want to add only common words to Anki). The current screenshots on Google Play don't show the frequency stars or numbers for words. Can you confirm that the frequencies are still being shown in the app?

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny Aug 3, 2016

Owner

@denyeo: Hi denyeo, I apologize, the screenshots are quite old ;) Yes, the Matsushita+Kamermans occurence index (the number of stars) is shown for the newest Aedict. Just make sure that you have newest dictionaries installed, and you should be good to go. I have added an Aedict screenshot showing the Matsushita occurence index, please wait a couple of hours before it will appear on Google Play

Owner

mvysny commented Aug 3, 2016

@denyeo: Hi denyeo, I apologize, the screenshots are quite old ;) Yes, the Matsushita+Kamermans occurence index (the number of stars) is shown for the newest Aedict. Just make sure that you have newest dictionaries installed, and you should be good to go. I have added an Aedict screenshot showing the Matsushita occurence index, please wait a couple of hours before it will appear on Google Play

@denyeo

This comment has been minimized.

Show comment
Hide comment
@denyeo

denyeo Aug 4, 2016

@mvysny Thanks for adding the screenshot! It shows a frequency for 母 (a single kanji), so I'd like to confirm that:

  1. frequencies are shown not just for kanji but for all words, such as 地震, 駆け付ける, ズボン, etc.
  2. the Matsushita frequency list you're using is VDLJ-GL, right? [VDLJ-GL = The Vocabulary Database for Learners of Japanese Ver. 1.0 (for General Learners)] There are a few other lists on that webpage so I just wanted to be sure.

(Matsushita in fact has a 60,000 word frequency list, available at http://tatsuma2010.web.fc2.com/ Vocabulary Database for Reading Japanese (for Teachers) Ver. 1.0, which you might consider using in future for lower frequencies such as 20,000-30,000. But this is a small and unessential thing.)

Appreciate it!

denyeo commented Aug 4, 2016

@mvysny Thanks for adding the screenshot! It shows a frequency for 母 (a single kanji), so I'd like to confirm that:

  1. frequencies are shown not just for kanji but for all words, such as 地震, 駆け付ける, ズボン, etc.
  2. the Matsushita frequency list you're using is VDLJ-GL, right? [VDLJ-GL = The Vocabulary Database for Learners of Japanese Ver. 1.0 (for General Learners)] There are a few other lists on that webpage so I just wanted to be sure.

(Matsushita in fact has a 60,000 word frequency list, available at http://tatsuma2010.web.fc2.com/ Vocabulary Database for Reading Japanese (for Teachers) Ver. 1.0, which you might consider using in future for lower frequencies such as 20,000-30,000. But this is a small and unessential thing.)

Appreciate it!

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny Aug 4, 2016

Owner

Yes, the Matsushita frequency list is in fact VDLJ-GL 1.0 taken from http://www17408ui.sakura.ne.jp/tatsum/English_top_Tatsu.html

Sure, the frequency list applies to words, not to kanjis per se. Yet, the screenshot of a single hon kanji could in fact be confusing, so I updated the screenshot to show the word 書斎. Attaching screenshot, I have also updated the screenshot at Google Play.
screenshot-matsushita

Owner

mvysny commented Aug 4, 2016

Yes, the Matsushita frequency list is in fact VDLJ-GL 1.0 taken from http://www17408ui.sakura.ne.jp/tatsum/English_top_Tatsu.html

Sure, the frequency list applies to words, not to kanjis per se. Yet, the screenshot of a single hon kanji could in fact be confusing, so I updated the screenshot to show the word 書斎. Attaching screenshot, I have also updated the screenshot at Google Play.
screenshot-matsushita

@denyeo

This comment has been minimized.

Show comment
Hide comment
@denyeo

denyeo Aug 4, 2016

@mvysny You're the best. I've bought the app. Hope many others do!

denyeo commented Aug 4, 2016

@mvysny You're the best. I've bought the app. Hope many others do!

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny Aug 5, 2016

Owner

Thanks man for your support, I hope Aedict will serve you well ;)

Owner

mvysny commented Aug 5, 2016

Thanks man for your support, I hope Aedict will serve you well ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment