Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Chinese tokenization for Australian city name "Brisbane" #6

Closed
sean1975 opened this issue Aug 29, 2021 · 2 comments
Closed

Comments

@sean1975
Copy link
Owner

Chinese terms for Australian city name are incorrectly tokenized. For example, Chinese term "布里斯本" is the Australian city name "Brisbane". However, it is tokenized as "布", "里斯", "里斯本"

@sean1975
Copy link
Owner Author

This problem is no longer visible on UI because the fix for #7 but there is still room for improvement. Keep this issue open

@sean1975
Copy link
Owner Author

sean1975 commented Sep 1, 2021

Add Chinese term "布里斯本" in the new customized dictionary file for Chinese tokenizer Jieba

@sean1975 sean1975 closed this as completed Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant