You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would highlighting words that you already "know" (they exist in the sort field in an anki card) be possible? The program recognizes if a word exists, if you open the dictionary, but I suspect that the hard part would be deciding which letter/mora to "hover over" in order to check. Maybe it could check for all different combinations and decide on the longest (in letters/mora) result, with the user having the ability to correct the selection?
I don't know how useful this would be, though being at the early stages of language learning, being able to recognize i+1 sentences at a glance would be something that could make the process easier.
The text was updated successfully, but these errors were encountered:
The major problem to solve is subtitle tokenization.
This can be done fast and easy with MeCab. The issue with only relying on MeCab's results is that it only tokenizes based on data in ipadic. This isn't necessarily going to line up with what is actually available in a user's dictionary. For example, jmdict contains a lot of definitions for phrases which MeCab likely won't consider a single token.
The alternative to MeCab would be writing a tokenizer that's aware of the user's dictionaries. A simple algorithm would be for each character in the subtitle, create a token for every possible substring starting from that character then highlight all the matches. This is O(n^2) just in searches done, which is expensive since each search goes out to disk and Anki in order to get a result. If subtitles are on the screen for only a second or two, there's no guarantee that you even get a result back in time unless you're preloading results.
The other question I have is what is the utility of this all? If you search a word, it's likely because you didn't know it or didn't remember it. Knowing you have a card for the term before you even search doesn't really move the needle in my opinion since Memento is not an SRS program.
Sorry for the half-posted comment originally. I accidentally pressed Ctrl+Enter which GitHub takes as "publish my in progress comment".
I see, thank you for the very thoughtful answer. Sounds like too much work without a whole lot of reward. I'm still at the start of my language learning journey so indeed, the utility might be a lot lower than what I expected.
Would highlighting words that you already "know" (they exist in the sort field in an anki card) be possible? The program recognizes if a word exists, if you open the dictionary, but I suspect that the hard part would be deciding which letter/mora to "hover over" in order to check. Maybe it could check for all different combinations and decide on the longest (in letters/mora) result, with the user having the ability to correct the selection?
I don't know how useful this would be, though being at the early stages of language learning, being able to recognize i+1 sentences at a glance would be something that could make the process easier.
The text was updated successfully, but these errors were encountered: