Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework kanji/reading association using RegEx to address 息抜き bug #24

Merged
merged 1 commit into from
Oct 20, 2022

Conversation

ahlec
Copy link
Collaborator

@ahlec ahlec commented Oct 20, 2022

This will close #23.

There was a bug with generating readings for 息抜き under the current kanji/reading association algorithm. This is because the reading is いきぬき, and when it goes character by character and arrives at the first き, it uses kanji.index and detects the き at the end of the string and skips ahead to there, believing that 息抜 together receives the reading of . It then runs out of characters in the kanji and crashes.

For this PR, I've rewritten the association code using regular expressions. What was important was to have an algorithm that had a full view of the entire string — one that would realize there's a second き in the reading.

The idea here is that we take the kanji (息抜き) and convert this into a regular expression. We want the plugin to only generate furigana for kanji and not kana, so this regular expression helps us detect what "holes" should have furigana and which ones should.

kanji (息抜き) becomes → ^(.+?)き$

We then apply this regular expression to the reading (いきぬき), which results a groups match of [ "いきぬ" ]. We then use the Kanji to piece it all back together in the original order, reading from the regular expression match whenever we're replacing a (.+?).

I've added the example sentence from the bug report as a unit test, and ensured that all existing unit tests continue to pass. I've also run it through more cards in my personal deck and found no issues with this algorithm yet.

I've tested in both Anki 2.1.54 and Anki 2.1.49.

@obynio
Copy link
Owner

obynio commented Oct 20, 2022

Cool work, I never stumbled upon this bug during my learning lessons for the moment but I can reproduce it using your examples ! I'll merge it and deploy everything ! Thanks 👍

@obynio obynio merged commit 07fbea6 into master Oct 20, 2022
@obynio obynio deleted the ikinuki branch October 20, 2022 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error while adding readings involving mixed kana/kanji
2 participants