New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading: Common can be confusing (暇) #754

Closed
atagunov opened this Issue Apr 30, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@atagunov

atagunov commented Apr 30, 2017

Hi,

[1]

another case where raw JMDict wins by being more informative: entry for 暇

http://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&q=1577280
seems to suggest that

  • only ひま reading is common (ichi1,news1,nf14)
  • いとま reading is less common (no frequency flags on it)

however AEdict entry seems to suggest both readings are common

[2]

as stated before for me it might make sense to display Matsushita/Kamermans frequency rating separately from JMDict frequency rating.. though in this particular case these sources of info agree it would make sense for me to toggle either one or the other on or maybe have both displayed via different means (stars vs words etc)

[3]

handling

<r_ele>
<reb>ヒマ</reb>
<re_nokanji/>
</r_ele>

from the same entry for 暇 in JMDict is another story whatsover.. perhaps a 'nokanji' reading could become a separate spelling in its own right? on par with all of these: 暇 ; 閑 ; 遑? in fact it's not a reading but a different way to spell..

Thx,
Anton

P.S. I do love jmdict raw :) Would love to have it on my mobile in some way. AEDict is lovely but in it's transformation to a different data model it does loose some info :-( Think it's really hard not to loose info really when converting from such a flexible format as JMDict which is often driven by convention (like nokanji..)

@mvysny mvysny self-assigned this May 3, 2017

@mvysny mvysny added the bug label May 3, 2017

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny May 3, 2017

Owner

いとま reading is less common (no frequency flags on it)

True. Aedict doesn't underline this reading, to state that this particular reading is not common. Yet, the underline may not be understood, and the Reading: below states that it indeed is Common, which tricks the user that all readings are common. Please, what would you suggest in this case? I'm thinking of changing of the color of the less-common readings to be more greyish, yet that also may not communicate the information clearly enough.

re_nokanji

This is new. Quoting from JMDict: "This element, which will usually have a null value, indicates that the reb, while associated with the keb, cannot be regarded as a true reading of the kanji. It is typically used for words such as foreign place names, gairaigo which can be in kanji or katakana, etc.".

Currently Aedict ignores re_nokanji and since kanji ref is missing, it thinks that ヒマ reading applies to all kanjis. Which is half-true: the reading is "associated" but not a "true" reading. I wonder how to mark that in the UI... Should it have its own category with no kanji? This is doable.

Would love to have it on my mobile in some way. AEDict is lovely but in it's transformation to a different data model it does loose some info :-( Think it's really hard not to loose info really when converting from such a flexible format as JMDict which is often driven by convention (like nokanji..)

Think of Aedict as a visualizer for JMDict data :-) It is indeed true that some information may be lost, but we can always add those bits.

Owner

mvysny commented May 3, 2017

いとま reading is less common (no frequency flags on it)

True. Aedict doesn't underline this reading, to state that this particular reading is not common. Yet, the underline may not be understood, and the Reading: below states that it indeed is Common, which tricks the user that all readings are common. Please, what would you suggest in this case? I'm thinking of changing of the color of the less-common readings to be more greyish, yet that also may not communicate the information clearly enough.

re_nokanji

This is new. Quoting from JMDict: "This element, which will usually have a null value, indicates that the reb, while associated with the keb, cannot be regarded as a true reading of the kanji. It is typically used for words such as foreign place names, gairaigo which can be in kanji or katakana, etc.".

Currently Aedict ignores re_nokanji and since kanji ref is missing, it thinks that ヒマ reading applies to all kanjis. Which is half-true: the reading is "associated" but not a "true" reading. I wonder how to mark that in the UI... Should it have its own category with no kanji? This is doable.

Would love to have it on my mobile in some way. AEDict is lovely but in it's transformation to a different data model it does loose some info :-( Think it's really hard not to loose info really when converting from such a flexible format as JMDict which is often driven by convention (like nokanji..)

Think of Aedict as a visualizer for JMDict data :-) It is indeed true that some information may be lost, but we can always add those bits.

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny May 13, 2017

Owner

Adding support for re_nokanji requires fixing both in Aedict, and in the data files. I'll release updated Aedict now and update the data files later, so that there is enough time for everyone to upgrade Aedict. Marking as fixed, but this will really be fixed only after the updated data files are released.

Owner

mvysny commented May 13, 2017

Adding support for re_nokanji requires fixing both in Aedict, and in the data files. I'll release updated Aedict now and update the data files later, so that there is enough time for everyone to upgrade Aedict. Marking as fixed, but this will really be fixed only after the updated data files are released.

@mvysny mvysny closed this May 13, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment