Underline known words "migaku-style" using anki as a database #169

AxillV · 2023-07-11T16:29:36Z

Would highlighting words that you already "know" (they exist in the sort field in an anki card) be possible? The program recognizes if a word exists, if you open the dictionary, but I suspect that the hard part would be deciding which letter/mora to "hover over" in order to check. Maybe it could check for all different combinations and decide on the longest (in letters/mora) result, with the user having the ability to correct the selection?

I don't know how useful this would be, though being at the early stages of language learning, being able to recognize i+1 sentences at a glance would be something that could make the process easier.

ripose-jp · 2023-07-12T03:58:14Z

The major problem to solve is subtitle tokenization.

This can be done fast and easy with MeCab. The issue with only relying on MeCab's results is that it only tokenizes based on data in ipadic. This isn't necessarily going to line up with what is actually available in a user's dictionary. For example, jmdict contains a lot of definitions for phrases which MeCab likely won't consider a single token.

The alternative to MeCab would be writing a tokenizer that's aware of the user's dictionaries. A simple algorithm would be for each character in the subtitle, create a token for every possible substring starting from that character then highlight all the matches. This is O(n^2) just in searches done, which is expensive since each search goes out to disk and Anki in order to get a result. If subtitles are on the screen for only a second or two, there's no guarantee that you even get a result back in time unless you're preloading results.

The other question I have is what is the utility of this all? If you search a word, it's likely because you didn't know it or didn't remember it. Knowing you have a card for the term before you even search doesn't really move the needle in my opinion since Memento is not an SRS program.

Sorry for the half-posted comment originally. I accidentally pressed Ctrl+Enter which GitHub takes as "publish my in progress comment".

AxillV · 2023-07-12T15:03:02Z

I see, thank you for the very thoughtful answer. Sounds like too much work without a whole lot of reward. I'm still at the start of my language learning journey so indeed, the utility might be a lot lower than what I expected.

(Sorry the for (re)opening spam).

AxillV closed this as completed Jul 12, 2023

AxillV reopened this Jul 12, 2023

AxillV closed this as completed Jul 12, 2023

ripose-jp added enhancement New feature or request wontfix This will not be worked on labels Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Underline known words "migaku-style" using anki as a database #169

Underline known words "migaku-style" using anki as a database #169

AxillV commented Jul 11, 2023

ripose-jp commented Jul 12, 2023 •

edited

Loading

AxillV commented Jul 12, 2023 •

edited

Loading

Underline known words "migaku-style" using anki as a database #169

Underline known words "migaku-style" using anki as a database #169

Comments

AxillV commented Jul 11, 2023

ripose-jp commented Jul 12, 2023 • edited Loading

AxillV commented Jul 12, 2023 • edited Loading

ripose-jp commented Jul 12, 2023 •

edited

Loading

AxillV commented Jul 12, 2023 •

edited

Loading