[QUESTION] how to use as segmenter/tokenizer #32

loretoparisi · 2021-12-10T13:15:44Z

Hello, thanks for this interesting project!
Currently in my nlp pipelines I'm using Jieba / Mecab as chinese / segmenter and japanese/tokenizer modules.
Is it safe to use FastHan as a replacement of Mecab han tokenizer?

Thank you!

fdugzc · 2021-12-10T13:53:42Z

I'm not familiar with mecab, but I’m certain that fastHan cannot be used as japanese tokenizer. Because it was only trained on Chinese data samples, it can not even recognize japanese characters

fdugzc closed this as completed Nov 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] how to use as segmenter/tokenizer #32

[QUESTION] how to use as segmenter/tokenizer #32

loretoparisi commented Dec 10, 2021

fdugzc commented Dec 10, 2021

[QUESTION] how to use as segmenter/tokenizer #32

[QUESTION] how to use as segmenter/tokenizer #32

Comments

loretoparisi commented Dec 10, 2021

fdugzc commented Dec 10, 2021