-
Notifications
You must be signed in to change notification settings - Fork 6
Description
In the kanji view, aedict could have a "not to be confused with" section to help people learn their kanjis and spot the differences between similar ones, for example 副 should not be confused with 福.
Finding similar kanjis is difficult, but someone wrote a thesis about that and compiled a database of similar kanjis.
The important files are this one and this one which use two different methods to determine a distance between kanjis.
Here is an example of a line from storkeEditDistance.csv:
天 夫 1 矢 0.8 末 0.8 未 0.8 失 0.8 丈 0.75 文 0.75 井 0.75 木 0.75 大 0.75
Which means that 天 is very similar to 夫, and their "distance" is 1. That value is a little different from a "distance", as the higher it is, the more the kanjis are similar. On the same line, we can see that 天 is also similar to 大, but with a lesser "distance" of 0.75.
I don't think aedict needs to show the distance to the user as it is more or less an arbitrary number, but it should list for each kanji the kanjis that are similar in the same order (from the most similar to the less one).
I am not sure you need both files, as the yehAndLiRadical file is based on radical and not strokes, but they often overlap and strokeEditDistance really gives kanjis that look alike, even if they don't share radicals. Here is an example from yehAndLiRadical and strokeEditDistance respectively:
則 測 0.894 側 0.894 財 0.750 敗 0.750 賊 0.671 損 0.671 慣 0.671 販 0.612 漬 0.612 債 0.612
則 側 0.818182 財 0.8 貝 0.777778 測 0.75 貯 0.666667 昇 0.666667 見 0.666667 販 0.636364 敗 0.636364 眺 0.636364
So I would recommend only adding strokeEditDistance, but it's up to you!
And don't forget to add a link to that page from aedict :)
Keep up the good work!