Rework kanji/reading association using RegEx to address 息抜き bug #24

ahlec · 2022-10-20T04:10:40Z

This will close #23.

There was a bug with generating readings for 息抜き under the current kanji/reading association algorithm. This is because the reading is いきぬき, and when it goes character by character and arrives at the first き, it uses kanji.index and detects the き at the end of the string and skips ahead to there, believing that 息抜 together receives the reading of い. It then runs out of characters in the kanji and crashes.

For this PR, I've rewritten the association code using regular expressions. What was important was to have an algorithm that had a full view of the entire string — one that would realize there's a second き in the reading.

The idea here is that we take the kanji (息抜き) and convert this into a regular expression. We want the plugin to only generate furigana for kanji and not kana, so this regular expression helps us detect what "holes" should have furigana and which ones should.

kanji (息抜き) becomes → ^(.+?)き$

We then apply this regular expression to the reading (いきぬき), which results a groups match of [ "いきぬ" ]. We then use the Kanji to piece it all back together in the original order, reading from the regular expression match whenever we're replacing a (.+?).

I've added the example sentence from the bug report as a unit test, and ensured that all existing unit tests continue to pass. I've also run it through more cards in my personal deck and found no issues with this algorithm yet.

I've tested in both Anki 2.1.54 and Anki 2.1.49.

obynio · 2022-10-20T15:17:20Z

Cool work, I never stumbled upon this bug during my learning lessons for the moment but I can reproduce it using your examples ! I'll merge it and deploy everything ! Thanks 👍

Rework kanji/reading association using RegEx to address 息抜き bug

5e44fe4

obynio merged commit 07fbea6 into master Oct 20, 2022

obynio deleted the ikinuki branch October 20, 2022 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework kanji/reading association using RegEx to address 息抜き bug #24

Rework kanji/reading association using RegEx to address 息抜き bug #24

ahlec commented Oct 20, 2022

obynio commented Oct 20, 2022

Rework kanji/reading association using RegEx to address 息抜き bug #24

Rework kanji/reading association using RegEx to address 息抜き bug #24

Conversation

ahlec commented Oct 20, 2022

obynio commented Oct 20, 2022