-
Notifications
You must be signed in to change notification settings - Fork 8
Japanese character set regexp? #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I don't speak either Chinese or Japanese, but yes, as far as I know they
reuse the same Unicode characters, so you're right that you'd have to
remove Chinese in order to be able to add Japanese.
I did a quick search and here is what I found:
https://gist.github.com/terrancesnyder/1345094
https://gist.github.com/ryanmcgrath/982242
Are these links helpful?
…------ Original Message ------
From: "Florian Beijers" <notifications@github.com>
To: "mltony/nvda-tonys-enhancements"
<nvda-tonys-enhancements@noreply.github.com>
Cc: "Subscribed" <subscribed@noreply.github.com>
Sent: 12/29/2020 7:47:29 AM
Subject: [mltony/nvda-tonys-enhancements] Japanese character set regexp?
(#4)
Hello,
I see there is a character set for Chinese, as well as Russian, but I
am missing one for Japanese. I know that Chinese and Japanese share a
lot of characters according to UTF-8 so I will most likely have to
remove the one for Chinese and add the Japanese one. However, I am
having a bit of trouble getting the regexp right for all the three
alphabets. My attempt:
ja_ja:([一-龯])
This gives us (most) kanji, and works for those. However, ideally, I
would also like to add the following regexp:
([ぁ-んァ-ン])
which should add hiragana and katakana. However, adding that line below
the previous one makes the entire thing not work anymore. My regexp is
a bit rusty and I'm not sure how to combine these two. Am I reinventing
the wheel here?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIJRDHFR6TSH7CHHXOBATODSXH2ZDANCNFSM4VNK25BA>.
|
Somewhat :) hat first link, is where I got the first regexp from. Like I said, I think it will probably work, if those two regular expressions can be combined somehow. I just don't know how to do that. Do you know how those two can be combined? |
That is an affirmative. For people stumbling across this issue as well, the regexp you want for Japanese to have it read both Kanji as well as kana, is the following: I am not entirely sure if this will work with half-width and other edge cases, but should help out in the majority of cases. |
Hello,
I see there is a character set for Chinese, as well as Russian, but I am missing one for Japanese. I know that Chinese and Japanese share a lot of characters according to UTF-8 so I will most likely have to remove the one for Chinese and add the Japanese one. However, I am having a bit of trouble getting the regexp right for all the three alphabets. My attempt:
ja_ja:([一-龯])
This gives us (most) kanji, and works for those. However, ideally, I would also like to add the following regexp:
([ぁ-んァ-ン])
which should add hiragana and katakana. However, adding that line below the previous one makes the entire thing not work anymore. My regexp is a bit rusty and I'm not sure how to combine these two. Am I reinventing the wheel here?
The text was updated successfully, but these errors were encountered: