New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching a word doesn't show it is added to notepad #488

Closed
eevman opened this Issue May 2, 2015 · 14 comments

Comments

Projects
None yet
2 participants
@eevman

eevman commented May 2, 2015

I use aedict version 3.16, and I have noticed a strange thing. I was searching for a word. When the dictionary listed it in the results screen, it was not marked with the notepad icon. However I was sure that it has already been added to notepad. Then I found this word in the notepad. When I returned to the start screen with recent words, I could see two entries with the same word - one added to notepad, and other without notepad icon (see picture below). By looking at them it is obvious that they are not completely the same, although they both come from the same dictionary. It looks like that the word stored in notepad was copied from the dictionary, and in the meantime the dictionary has been updated. Now I can't know for sure if the word I'm searching for is in the notepad or not...

Is it possible to update the notepad entry when the dictionary is updated? Or do you have some other suggestion?

2015-05-01_19-26-53

@mvysny mvysny added the bug label May 2, 2015

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny May 15, 2015

Owner

Fixed in Aedict 3.19

Owner

mvysny commented May 15, 2015

Fixed in Aedict 3.19

@mvysny mvysny closed this May 15, 2015

@eevman

This comment has been minimized.

Show comment
Hide comment
@eevman

eevman May 16, 2015

I'm afraid this fix creates a problem in the opposite direction...
For example, search for "hou", open it, then search for "kata", and open it. Both use the same kanji 方, but are two separate entires in JMDict. Now you should have both entries in the recently viewed screen. If you now add one of the two to the notepad, both entires will be marked with a notepad icon, even though only one of them is added to the notepad... Do you think it is possible to handle this case? If not, maybe it is better to revert to the old behaviour as was in 3.18? :-)

eevman commented May 16, 2015

I'm afraid this fix creates a problem in the opposite direction...
For example, search for "hou", open it, then search for "kata", and open it. Both use the same kanji 方, but are two separate entires in JMDict. Now you should have both entries in the recently viewed screen. If you now add one of the two to the notepad, both entires will be marked with a notepad icon, even though only one of them is added to the notepad... Do you think it is possible to handle this case? If not, maybe it is better to revert to the old behaviour as was in 3.18? :-)

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny May 16, 2015

Owner

You're right, thanks for letting me know. Perhaps I need to find a matching rule which is neither as strict as in Aedict 3.17, nor as relaxed as in Aedict 3.18 :-)
Currently the matching rule is very relaxed as I only compare by kanjis. This is apparently too relaxed, as seen with the kanji 方. I can make this role more strict so that the entries will match only if the kanji+reading set is exactly the same, but if Jim adds e.g. a new reading to an entry, this will again be problematic as the entries will cease to match.
Perhaps the following rule may be enough: the entries will match if they share at least one common kanji AND one common reading? This will apparently fix the issue for 方 but it may still not be enough for other entries - can you please try to name such entries from the top of your head? If yes, we need to find yet more strict rule. Please let me know if this explanation is clear as I am not as skilled in English as I would like to be :)

Owner

mvysny commented May 16, 2015

You're right, thanks for letting me know. Perhaps I need to find a matching rule which is neither as strict as in Aedict 3.17, nor as relaxed as in Aedict 3.18 :-)
Currently the matching rule is very relaxed as I only compare by kanjis. This is apparently too relaxed, as seen with the kanji 方. I can make this role more strict so that the entries will match only if the kanji+reading set is exactly the same, but if Jim adds e.g. a new reading to an entry, this will again be problematic as the entries will cease to match.
Perhaps the following rule may be enough: the entries will match if they share at least one common kanji AND one common reading? This will apparently fix the issue for 方 but it may still not be enough for other entries - can you please try to name such entries from the top of your head? If yes, we need to find yet more strict rule. Please let me know if this explanation is clear as I am not as skilled in English as I would like to be :)

@eevman

This comment has been minimized.

Show comment
Hide comment
@eevman

eevman May 16, 2015

I can't name entries from the top of my head, but I'll think about it. Explanation is OK.

Now, I have noticed that matching is really relaxed, as it also matches entries from other dictionaries :-). In my opinion, dictionaries should also be matched, not just kanji and readings.

eevman commented May 16, 2015

I can't name entries from the top of my head, but I'll think about it. Explanation is OK.

Now, I have noticed that matching is really relaxed, as it also matches entries from other dictionaries :-). In my opinion, dictionaries should also be matched, not just kanji and readings.

@eevman

This comment has been minimized.

Show comment
Hide comment
@eevman

eevman May 16, 2015

Maybe you can use "ent_seq" field from the jmdict file to match entries added to notepad?

The following is from this page

  1. The JMDict DTD states that the ent_seq entry is a unique identifier
    for a particular entry. Is there a chance that a particular entry's ent_seq
    value could change in later revisions of the file?

It won't change for a particular entry, but entries get merged, in which case
one of the original pair will be deleted.

So, even though this doesn't cover merges/deletions, it is probably a better solution than matching only kanji/readings. As for other dictionaries, I don't know if they have something like this...

eevman commented May 16, 2015

Maybe you can use "ent_seq" field from the jmdict file to match entries added to notepad?

The following is from this page

  1. The JMDict DTD states that the ent_seq entry is a unique identifier
    for a particular entry. Is there a chance that a particular entry's ent_seq
    value could change in later revisions of the file?

It won't change for a particular entry, but entries get merged, in which case
one of the original pair will be deleted.

So, even though this doesn't cover merges/deletions, it is probably a better solution than matching only kanji/readings. As for other dictionaries, I don't know if they have something like this...

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny May 16, 2015

Owner

There are lots of dictionaries in old EDICT format (e.g. Buddha Term Dictionary, Car dictionary etc) which do not contain the 'ent_seq' parameter. Also, current notepad entries do not contain such ID so you would have to clear the notepad and populate it again. So, unfortunately the ent_seq idea is very good but cannot be used I believe.

Owner

mvysny commented May 16, 2015

There are lots of dictionaries in old EDICT format (e.g. Buddha Term Dictionary, Car dictionary etc) which do not contain the 'ent_seq' parameter. Also, current notepad entries do not contain such ID so you would have to clear the notepad and populate it again. So, unfortunately the ent_seq idea is very good but cannot be used I believe.

@eevman

This comment has been minimized.

Show comment
Hide comment
@eevman

eevman May 16, 2015

What about the following: if the dictionary doesn't have 'ent_seq' parameter, compare kanji and pronunciation. If the dictionary does have 'ent_seq', use it if it stored in notepad (for newly added entries), else use backup strategy of comparing kanji and pronunciation.

Now, since currently notepad entries don't contain 'ent_seq', there could be several options to provide this field: there could be an option to start an automatic process to compare data stored in notepad entries with data from the dictionary, and only if everything matches (I think like the method used in version 3.16), to save 'ent_seq' to each notepad entry. This would probably fill quite a lot of entries at the moment. Those entries which are not matched could be marked somehow (an icon if enabled in options?), and there could be another option to provide this information manually for people who need this.

This might be too complicated for a regular user, but I suppose there would be some people how would use this (well, at least one :-) ).

eevman commented May 16, 2015

What about the following: if the dictionary doesn't have 'ent_seq' parameter, compare kanji and pronunciation. If the dictionary does have 'ent_seq', use it if it stored in notepad (for newly added entries), else use backup strategy of comparing kanji and pronunciation.

Now, since currently notepad entries don't contain 'ent_seq', there could be several options to provide this field: there could be an option to start an automatic process to compare data stored in notepad entries with data from the dictionary, and only if everything matches (I think like the method used in version 3.16), to save 'ent_seq' to each notepad entry. This would probably fill quite a lot of entries at the moment. Those entries which are not matched could be marked somehow (an icon if enabled in options?), and there could be another option to provide this information manually for people who need this.

This might be too complicated for a regular user, but I suppose there would be some people how would use this (well, at least one :-) ).

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny May 17, 2015

Owner

Hmm, that sounds complicated ;) I am going to release a quick-fix version of Aedict 3.20. How about I implement the 'easy' solution (the entries will match if they share at least one common kanji AND one common reading) and we will see how this will work in real life.

Owner

mvysny commented May 17, 2015

Hmm, that sounds complicated ;) I am going to release a quick-fix version of Aedict 3.20. How about I implement the 'easy' solution (the entries will match if they share at least one common kanji AND one common reading) and we will see how this will work in real life.

@eevman

This comment has been minimized.

Show comment
Hide comment
@eevman

eevman May 17, 2015

Well, that's certainly better than the current solution... Don't forget to also match the dictionary.

eevman commented May 17, 2015

Well, that's certainly better than the current solution... Don't forget to also match the dictionary.

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny May 18, 2015

Owner

Thanks, I will do that. Closing this bug, please feel free to post if you find more of the incorrectly matched entries.

Owner

mvysny commented May 18, 2015

Thanks, I will do that. Closing this bug, please feel free to post if you find more of the incorrectly matched entries.

@mvysny mvysny closed this May 18, 2015

@eevman

This comment has been minimized.

Show comment
Hide comment
@eevman

eevman May 18, 2015

In version 3.20, there is no longer a problem with 方, however, there is still cross-dictionary matching, so words from different dictionaries are shown with notepad icon, even though only one entry is added to notepad.
Additionally, search for "akagire" (this has additional reading of "hibi"). You should get one entry in JMDict. Add it to notepad. Then search for "hibi" (in both JMDict and Life Sciences/Bio-Medical Dictionary). You should see 5 results, and two marked with notepad icon. The two marked items don't have the same "kanji" (well, one has no kanji at all). If you do the opposite (so instead of adding akagire to notepad, you add the hibi - crack from Life Sciences), and search for "hibi" again, you will see all 5 entries marked with notepad icon.

eevman commented May 18, 2015

In version 3.20, there is no longer a problem with 方, however, there is still cross-dictionary matching, so words from different dictionaries are shown with notepad icon, even though only one entry is added to notepad.
Additionally, search for "akagire" (this has additional reading of "hibi"). You should get one entry in JMDict. Add it to notepad. Then search for "hibi" (in both JMDict and Life Sciences/Bio-Medical Dictionary). You should see 5 results, and two marked with notepad icon. The two marked items don't have the same "kanji" (well, one has no kanji at all). If you do the opposite (so instead of adding akagire to notepad, you add the hibi - crack from Life Sciences), and search for "hibi" again, you will see all 5 entries marked with notepad icon.

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny May 19, 2015

Owner

Thanks, you're right - I will additionally match the dictionaries.

Owner

mvysny commented May 19, 2015

Thanks, you're right - I will additionally match the dictionaries.

@mvysny mvysny reopened this May 19, 2015

@eevman

This comment has been minimized.

Show comment
Hide comment
@eevman

eevman May 19, 2015

Just to avoid misunderstandings, apart from matching the dictionaries there is the additional problem of falsely matching entries which doesn't have kanji (only kana) - e.g. the above mentioned "akagire" - "hibi".

eevman commented May 19, 2015

Just to avoid misunderstandings, apart from matching the dictionaries there is the additional problem of falsely matching entries which doesn't have kanji (only kana) - e.g. the above mentioned "akagire" - "hibi".

@mvysny

This comment has been minimized.

Show comment
Hide comment
@mvysny

mvysny May 20, 2015

Owner

Thanks! I have implemented a stricter version of the match algorithm which correctly no longer match akagire from JMDict and hibi from Live Sciences (I have implemented the dictionary origin matching, but also the algorithm got stricter and would not match akagire/hibi even if they would originate from the same dictionary). Please wait for Aedict 3.21 to get released, then please let me know if there are still more entries which match incorrectly.

Owner

mvysny commented May 20, 2015

Thanks! I have implemented a stricter version of the match algorithm which correctly no longer match akagire from JMDict and hibi from Live Sciences (I have implemented the dictionary origin matching, but also the algorithm got stricter and would not match akagire/hibi even if they would originate from the same dictionary). Please wait for Aedict 3.21 to get released, then please let me know if there are still more entries which match incorrectly.

@mvysny mvysny closed this May 20, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment