Rework kanji/reading association using RegEx to address 息抜き bug #24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This will close #23.
There was a bug with generating readings for 息抜き under the current kanji/reading association algorithm. This is because the reading is
いきぬき
, and when it goes character by character and arrives at the first き, it useskanji.index
and detects the き at the end of the string and skips ahead to there, believing that息抜
together receives the reading ofい
. It then runs out of characters in the kanji and crashes.For this PR, I've rewritten the association code using regular expressions. What was important was to have an algorithm that had a full view of the entire string — one that would realize there's a second き in the reading.
The idea here is that we take the
kanji
(息抜き) and convert this into a regular expression. We want the plugin to only generate furigana for kanji and not kana, so this regular expression helps us detect what "holes" should have furigana and which ones should.kanji
(息抜き) becomes →^(.+?)き$
We then apply this regular expression to the
reading
(いきぬき), which results a groups match of[ "いきぬ" ]
. We then use the Kanji to piece it all back together in the original order, reading from the regular expression match whenever we're replacing a(.+?)
.I've added the example sentence from the bug report as a unit test, and ensured that all existing unit tests continue to pass. I've also run it through more cards in my personal deck and found no issues with this algorithm yet.
I've tested in both Anki 2.1.54 and Anki 2.1.49.