Add support for small ヵ/ヶ being read as large か in words #28

ahlec · 2023-02-20T20:03:42Z

This closes #27.

Currently, if you try to generate furigana on a word like 一ヵ月 or 二ヶ国, the plugin will throw an exception about not being able to match the Regex group. The reason for this is that the readings for both of these words have "か” (full sized) in the space where the text has ヵ or ヶ (small). Even after we convert these to hiragana (ゕ and ゖ), they don't match against the full-size character. When we're processing these two characters specifically, we need to also register that they should match both their own hiragana characters (ゕ and ゖ) but they should also match full-sized か.

In this PR, I rework some of the kanjiToRegex function for when we're processing a regular kana character. If the single hiragana/katakana character we're looking at has additional possible readings beyond just their own reading, then instead of producing a string literal within the Regex, we'll now produce a Regex capture group.

Given kanjiToRegex("ヶ月"):
- Before: ^ゖ(.+?)$
- After: ^(ゖ|か)(.+?)$
- In both situations, these regular expressions are then being matched against reading = "かげつ" (via MeCab)

If we don't have additional readings, we'll continue to go down the regular pathway, where we just output the hiragana directly.

Given kanjiToRegex("ローマ字"):
- Before: ^ろーま(.+?)$
- After: ^ろーま(.+?)$ (no change)

In the case where we have ヵ and ヶ, I've chosen to include furigana readings for them. I did this because Jisho includes readings in this situation, and because it's a situation where one character is being read as a different character — this can easily mess up beginner learners of Japanese, as it's very non-standard.

I've added unit tests to track all of this and prevent regressions.

I've tested this change in both Anki ⁨2.1.54 and Anki 2.1.49 (the version prior to the Python 2.10 bump).

obynio · 2023-02-21T11:43:16Z

Seems good to me, I'll deploy this change asap

obynio · 2023-02-21T11:49:44Z

This change has been released in version 1.4.2

Add support for small ヵ/ヶ being read as large か in words

455114c

obynio merged commit 545631b into master Feb 21, 2023

obynio deleted the ahlec/kagetsu branch February 21, 2023 11:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for small ヵ/ヶ being read as large か in words #28

Add support for small ヵ/ヶ being read as large か in words #28

ahlec commented Feb 20, 2023 •

edited

Loading

obynio commented Feb 21, 2023

obynio commented Feb 21, 2023

Add support for small ヵ/ヶ being read as large か in words #28

Add support for small ヵ/ヶ being read as large か in words #28

Conversation

ahlec commented Feb 20, 2023 • edited Loading

obynio commented Feb 21, 2023

obynio commented Feb 21, 2023

ahlec commented Feb 20, 2023 •

edited

Loading