-
Notifications
You must be signed in to change notification settings - Fork 7
Example sentences showing all occurrences of kanji when looking for single-kanji expressions #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for letting me know,this definitely looks like a bug. Let me try to find some ways to improve accuracy. |
Fixed, now it will find the following sentences (and 1000 others): 悪ふざけはほどほどにしろ。: わるふざけはほどほど に しろ。 Fixed in Aedict 3.46 |
I really appreciate your attention to this. However, I don't think the update gets at the root of the matter. I think the main issue is that it's pulling examples from every occurrence of the characters ほど. It doesn't look like 3.46 is available yet, but the issue I mentioned is apparent in the fix you posted as well. ほどほどに and ほど use the same characters, but their meaning and usage is quite different. They each have their own dictionary entry, yet the example sentences for one are displayed when searching for the other. Also, I didn't mention this in my original post, but I noticed that the example sentences that appear under the actual full dictionary entry/definition appear to pull from every instance of the kanji 程 (even if I search for the reading ほど), making the issue much more apparent. If I take the time to do a new Word Search and search for ほど and check Examples, it displays only occurrences of ほど (not displaying words like 程度, but including words like ほどほどに since they share the same kana). Usually when I'm looking for examples it's after I search for the definition and I'm already looking at the dictionary entry, so the example sentences I'm seeing usually include any occurrence of the kanji if it's a single-kanji expression. If there's a way to make the example sentences displayed under the dictionary entry to display only occurrences of a kanji's actual reading (ほど in this case) rather than every example sentence associated with the kanji 程 (程度, 日程, 過程, etc.) that would help alleviate the issue. It would still display examples that share the same kana (in this case, searching for ほど would still display examples for ほどほどに), though those instances would be significantly more uncommon in comparison. I'm sure the solution (jf there even is one) is very complex and would probably require more than one overworked programmer to find, so please don't consider this a complaint of any kind. It's just a slight hiccup I noticed that I think would greatly improve an already awesome app if resolved :) |
Yup, 3.46 is not out yet; I plan to fix some other things as well before releasing that. I humbly disagree: if you look up 悪ふざけはほどほどにしろ on Aedict Online, you can see in the sentence breakdown that ほどほど's kanjis are indeed 程々, 程ほど, 程程, so from my point of view this sentence is eligible to be included as an example sentence for 程/ほど. You're right, it's ほどほど, not just ほど - should I try to filter this out as well?
Yup, the search was very simple and only looked up 程; that was incorrect. Now the search will search for 程+ほど and thus it should filter out 程度, 日程, 過程 and others. Yet it will include 程々, 程ほど, 程程, so please let me know if that is a problem or not. |
Well, I'd argue that even though both expressions share the same kanji that is read as ほど, the meaning and usage is still very different. 程 is categorized as a common adverbial noun, and ほどほど is a rare no-adjective. It should certainly be included in the "Buddies" tab, which it is, but if someone is looking for example sentences to clarify the appropriate usage of ほど, I think including ほどほど muddles things up a bit. It'd be like an English dictionary including example sentences for "graduate" under the entry for "graduated cylinder"...they share the same root word (English "kanji" for all intents and purposes), which is useful reference information, but including example sentences for "graduate" under the entry for "graduated cylinder" wouldn't make much sense.
That's great news! If it's applied to search results beyond this specific case like I think you're saying, it would solve 90% of the issue for me (100% of the problem initially mentioned in my post). If there was a solution or option that would filter out entries like ほどほど as well I would opt for that for reasons I mentioned earlier, though I think your update is a huge improvement and I'd be totally happy with that alone. Thanks a ton! お疲れ様でした。 |
Happy to help! The "90%" solution implemented in Aedict 3.46, please upgrade and let me know if it helped. Regarding the 100% solution: I wonder if I can filter out example sentences where the word is not used in that exact form (unfortunately I have no information about a word in a sentence being, say, adverbial noun. I can auto-reconstruct this information by doing a JMDict lookup, but it's not 100% accurate). But I wonder if that would not filter out useful examples where the word is used in a slightly modified form (say, irregularly inflected or otherwise). Hard to say. Your example of "graduate" vs "graduated cylinder" is very illustrative, thanks! The difference between "graduate" and "graduated" is more of a difference in semantics; and that's something that a simple automatized algorithm can't differentiate. It would require having either a hand-selected set of examples (which we don't have), or a proper analysis of all example sentences by human (which is definitely missing from Tatoeba - the analysis is done by Aedict, it is done by a very simple matching algorithm and thus often not accurate). As you can see, we're dealing with an imperfect data here. I believe that doing 100% match may do more harm than good. |
I've run into this several times and tried looking for a fix in the settings but can't seem to find anything. Inclusion of the Tatoeba Examples is a great feature, but it really backfires when looking for example sentences of a one-kanji expression. For example, I was searching for example sentences using 程(ほど), but the vast majority of example sentences listed are for 程度(ていど). In the past when I had this issue, the word I was looking for usually outnumbered the unrelated ones, but sometimes, like this case, scrolling through example sentences unrelated to my search is a bit tedious. An option to display example sentences only with occurrences identical to the search would be really useful. Apologies if there's already a workaround for this, but I haven't found it yet. Thanks!
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: