Fix same Kanji character not being replaced twice #10

inoueakimitsu · 2022-06-16T16:25:28Z

Some of the kanji characters that were once furiganized
are matched again to the current regular expression.
This fix will make sure that Kanji characters in
tags will not be replaced.
To do so, a modification is made to the
regular expression for the search.

This relates #9

Some of the kanji characters that were once furiganized are matched again to the current regular expression. This fix will make sure that Kanji characters in <ruby> tags will not be replaced. To do so, a modification is made to the regular expression for the search.

Sort tagged array in order to add furigana for the longer Kanji series first.

Enable toggle ON/OFF of the function to give priority to consecutive Kanji characters from options_uit.

inoueakimitsu · 2022-06-19T07:00:18Z

To add an option to avoid splitting consecutive Kanji characters,
the following two points have been implemented.

Pre-sorting of tagger output to process longer kanji candidates first
An option prevent_splitting_consecutive_kanjis was added to options_ui

The example sentences used for testing are as follows:

作成します。作ります。
作ります。作成します。
食べる。飲食店。
飲食店。食べる。
飲食店飲食店飲食店飲食店飲食店食べる。飲食店。食べる。飲食店。食べる。飲食店。食べる。飲食店。
お茶について問う問題。「とうもんだい」と読んでください。
問題を問う。「もんだいをとう」と読んでください。

This relates #9

kuanyui · 2022-06-19T11:22:06Z

Oh my God thanks for your PR and hard working! I'm investigating and testing your PR currently.

And add more example sentences for test:

I tried to find these sentences that cause "multiple furigana in the same kanji/tango" from NHK's Twitter.
~~but I don't exactly know why after I copied-and-pasted them into Github Issue, Furiganaize works correctly in most of them...~~
=> I guess it's because the syntax analyzer (igo.js) may give out multiple potentially correct analysis results for a same long sentence.

栃木那須町の養豚場で豚舎3棟ほぼ全焼豚2000頭が死んだか
水泳授業 3年ぶりの本格再開水着も泳ぐ場所も変化が…
体に帯状に出る発疹「帯状疱疹」 （読み方は違う）
小中学生の子どもたちも春休み中でしたが外出できず、ずっと自宅で子育てしなくてはいけません。
司法取引が適用無罪主張の元社長業務上横領の罪で実刑確定へ
ロシア海軍の7隻伊豆諸島付近を通過し南西方向に海自が確認
【円安加速の懸念に…】欧米の中央銀行が相次いで金融引き締めに動く中、日銀は大規模な緩和策の維持を決定
夕方のラッシュ時間帯、列車内の方もいらっしゃると思います。
強風による運転見合わせや徐行が相次いでいることから、ＪＲ西日本は最新の強風予測システムを活用して運転見合わせなどの回数を減らすための検証を行うことになりました。
自動車の安全評価で自転車への自動ブレーキ機能試験を導入
自転車が車にはねられる事故があとを絶たない中、国は車の安全性を評価する「自動車アセスメント」に、自転車に対する自動ブレーキ機能を確かめる項目を追加し試験を始めました。歩行者に比べスピードがある自転車の検知は技術的に難しいということです。
「選挙ポスターの掲示板」立候補者が多くなって、その枠が足りなくなる可能性があるとして、参院選の東京選挙区では、急きょ、掲示板の枠を増やす作業が進められています。
新検事総長に東京高検検事長の甲斐行夫氏を起用
痴漢についてそんな調査結果があります(2011年警察庁調べ)。

kuanyui · 2022-06-19T12:25:51Z

The result is much better than I've expected before, I even want to make this option enable by default... But I guess I must be calm down and be patient, test it for a period before making it enabled by default.

Actually, I'm still a little worried about the reliability of RegExp new RegExp(kanji + "(?![^<]*<\/rb>)", 'g');, but donno how to prove it. Whatever, let's try to use it in real world :P

Thanks for your contribution again!

kuanyui · 2022-06-19T12:53:03Z

Your PR has been included in latest v0.6.6, now available for update.

BTW, I add more information to the option.

inoueakimitsu · 2022-06-20T01:58:48Z

@kuanyui Thank you for your

adding sentences for testing
survey about igo.js's multiple outputs behavior
adding more information on the options_ui.

If you encounter any problems with the regular expression in question, please let me know.

inoueakimitsu added 4 commits June 17, 2022 00:53

Sort tagged array

61bc209

Sort tagged array in order to add furigana for the longer Kanji series first.

Add option to options_ui

9d37925

Enable toggle ON/OFF of the function to give priority to consecutive Kanji characters from options_uit.

Fixed regular expression for Kanji characters

d4cae4e

inoueakimitsu mentioned this pull request Jun 19, 2022

kanji characters that has been furiganaed once may be furiganaed more than once #9

Closed

kuanyui merged commit af1cecd into kuanyui:master Jun 19, 2022

kuanyui mentioned this pull request Aug 23, 2022

Incorrect furigana entry #6

Closed

kuanyui mentioned this pull request Jul 7, 2023

Added ruby twice #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix same Kanji character not being replaced twice #10

Fix same Kanji character not being replaced twice #10

inoueakimitsu commented Jun 16, 2022

inoueakimitsu commented Jun 19, 2022

kuanyui commented Jun 19, 2022 •

edited

Loading

kuanyui commented Jun 19, 2022 •

edited

Loading

kuanyui commented Jun 19, 2022

inoueakimitsu commented Jun 20, 2022

Fix same Kanji character not being replaced twice #10

Fix same Kanji character not being replaced twice #10

Conversation

inoueakimitsu commented Jun 16, 2022

inoueakimitsu commented Jun 19, 2022

kuanyui commented Jun 19, 2022 • edited Loading

kuanyui commented Jun 19, 2022 • edited Loading

kuanyui commented Jun 19, 2022

inoueakimitsu commented Jun 20, 2022

kuanyui commented Jun 19, 2022 •

edited

Loading

kuanyui commented Jun 19, 2022 •

edited

Loading