Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

想分的词却合成一个词了 #503

Open
jiangchao123 opened this issue Aug 7, 2017 · 3 comments
Open

想分的词却合成一个词了 #503

jiangchao123 opened this issue Aug 7, 2017 · 3 comments

Comments

@jiangchao123
Copy link

你好,在做基于情感词典的情绪分析,结果发现,结巴分词,会把很多词合在一起。比如:你真笨;
我想要的结果是:你,真,笨,
但是返回的却是:你,真笨

还有:你太笨,返回的也是:你,太笨
感觉很不科学呐。真和太明明是一种程度词,怎么和笨合成一个词了。
同时调研了其它多个分词开源工具,发现只有你们会合在一起。

@jiangchao123
Copy link
Author

是不是只有自定义的词比你们已有的词长才会有效,而比你们的词短的就不生效?

@Brentbin
Copy link

Brentbin commented Aug 7, 2017 via email

@jiangchao123
Copy link
Author

试了一下,这样在有些词是可以的,有些依然不行,并且如果HMM设为FALSE的话,好多人名都会被切开了,比如:小明被切成了小,明;李小福被切成了李,小,福;
找代驾我想分成找,代驾;但是HMM设为True为找代驾;HMM设为False为找,代,驾
看来是不能两全???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants