Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] how to use as segmenter/tokenizer #32

Closed
loretoparisi opened this issue Dec 10, 2021 · 1 comment
Closed

[QUESTION] how to use as segmenter/tokenizer #32

loretoparisi opened this issue Dec 10, 2021 · 1 comment

Comments

@loretoparisi
Copy link

Hello, thanks for this interesting project!
Currently in my nlp pipelines I'm using Jieba / Mecab as chinese / segmenter and japanese/tokenizer modules.
Is it safe to use FastHan as a replacement of Mecab han tokenizer?

Thank you!

@fdugzc
Copy link
Member

fdugzc commented Dec 10, 2021

I'm not familiar with mecab, but I’m certain that fastHan cannot be used as japanese tokenizer. Because it was only trained on Chinese data samples, it can not even recognize japanese characters

@fdugzc fdugzc closed this as completed Nov 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants