-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
中文数据集怎么预处理 #3
Comments
不需要分词,以字符为单位即可 |
请问中英文,包含数字,比如“扎克伯格是facebook的ceo,今年37岁”。那么facebook和37也都拆成单个的数组元素吗 |
是的,由于我们拿到的中文数据本身就是以字符分割的,并且英语单词之间没有空格,为了方便就直接以字符序列作为输入了。当然如果对输入进行预处理分好词,效果应该会更好。 |
请问中文数据集中的word是如何处理的?是根据字典匹配的吗,如果是的话,针对匹配的词是选择最长的词吗? |
你好,请问可以分享一下中文数据集预处理的代码吗?邮箱2674053421@qq.com |
大佬,如果是分词后输入,那我的lable怎么对齐到单个字上面呢? |
分词吗
The text was updated successfully, but these errors were encountered: