Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NormalizerNFKC*: Add tests for mixing katakana options #1539

Merged
merged 8 commits into from
Mar 8, 2023

Conversation

HashidaTKS
Copy link
Contributor

@HashidaTKS HashidaTKS commented Mar 7, 2023

Added tests for mixing katakana options.
And fixed unify_katakana_trailing_o in order to work fine for the following cases.

  1. ョー -> ヨー -> ヨオ -> ヨウ

    ョー was normalized to ヨオ (should be ヨウ) because unify_katakana_trailing_o was applied before unify_kana_case and unify_katakana_trailing_o is not normalized after small letters.
    I have added after or as a target of unify_katakana_trailing_o.

  2. ヺー -> ヺオ -> ヺウ

    ヺー was normalized to ヺオ because unify_katakana_trailing_o didn't target .
    I have added as a target of unify_katakana_trailing_o.

6,
0,
0,
32616,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd, this should be 3 but 32616.
The value 32626 is not stable, changes each time I run it.
I will check it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

あぁ、昨日私がいじったところが悪いかも。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

「ズァ」と「ヅァ」のときだけ発生していて、「ヴァ」などでは起きないので、昨日の須藤さんの修正とは関係なさそうな感じもします。(調べています。)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

最新のmasterにrebaseしたところ直ったので、関係あったようです。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

さーせん。。。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

いえ、ご対応ありがとうございます。

@HashidaTKS HashidaTKS force-pushed the add-multiple-normalizer-option-test branch from 0533db3 to 4ea7c23 Compare March 7, 2023 08:51
@HashidaTKS
Copy link
Contributor Author

@kou

Would you review this when you have time?

@HashidaTKS HashidaTKS marked this pull request as ready for review March 7, 2023 09:21
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Adding similar tests to all test/command/suite/normalizers/nfkc*/ is annoying... We may be able to unify them by improving grntest in the future...

lib/normalizer.c Outdated
grn_nfkc_normalize_unify_katakana_trailing_o,
&need_trailing_check,
"[unify][katakana-trailing-o]");
grn_nfkc_normalize_unify_stateless(ctx, data, &unify, false);
Copy link
Member

@kou kou Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to change the current following order:

  1. grn_nfkc_normalize_unify_stateless(before)
  2. grn_nfkc_normalize_unify_statefull()s
  3. grn_nfkc_normalize_unify_stateless(after)

For keeping unifying order simple.

How about adding and small letters to katakana_trailing_o targets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will fix so as to normalize after small letters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@HashidaTKS
Copy link
Contributor Author

Hmm. Adding similar tests to all test/command/suite/normalizers/nfkc*/ is annoying... We may be able to unify them by improving grntest in the future...

Agree.

@HashidaTKS
Copy link
Contributor Author

@kou

Thank you, I have addressed your comments.

@@ -0,0 +1,14 @@
normalize \
'NormalizerNFKC100("unify_katakana_v_sounds", true, \
"unify_katakana_gu_small_sounds", true, \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken indent but I'll merge this as-is. I can't review all tests...

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't review all tests but I merge this.

@kou kou changed the title [WIP] NormalizerNFKC*: Add tests for mixing katakana options NormalizerNFKC*: Add tests for mixing katakana options Mar 8, 2023
@kou kou merged commit 34e6cc1 into master Mar 8, 2023
@kou kou deleted the add-multiple-normalizer-option-test branch March 8, 2023 05:45
kou pushed a commit that referenced this pull request Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants