Skip to content

Commit

Permalink
update (#1364)
Browse files Browse the repository at this point in the history
Signed-off-by: XuhuiRen <xuhui.ren@intel.com>
  • Loading branch information
XuhuiRen committed Mar 12, 2024
1 parent 395bc54 commit 0da63fe
Showing 1 changed file with 11 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,17 @@ def uni_pro(text):
for char in normalized_text:
if ord(char) < 128 or unicodedata.category(char) == 'Mn':
filtered_text += char
elif '\u4E00' <= char <= '\u9FFF':
filtered_text += char
elif ('\u3400' <= char <= '\u4DBF' # CJK Unified Ideographs Extension A
or '\u20000' <= char <= '\u2A6DF' # CJK Unified Ideographs Extension B
or '\u2A700' <= char <= '\u2B73F' # CJK Unified Ideographs Extension C
or '\u2B740' <= char <= '\u2B81F' # CJK Unified Ideographs Extension D
or '\u2B820' <= char <= '\u2CEAF' # CJK Unified Ideographs Extension E
or '\uF900' <= char <= '\uFAFF' # CJK Compatibility Ideographs
or '\u2F800' <= char <= '\u2FA1F'):
filtered_text += char

return filtered_text


Expand Down

0 comments on commit 0da63fe

Please sign in to comment.