Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于词频的问题 #13

Closed
hainuo opened this issue Aug 11, 2016 · 3 comments
Closed

关于词频的问题 #13

hainuo opened this issue Aug 11, 2016 · 3 comments

Comments

@hainuo
Copy link

hainuo commented Aug 11, 2016

我现在使用jieba 分词来处理系统的关键词,但是有好多并不是我想要的。我觉得如果我可以生成一个自己的词频词库或许能够更好地提取关键词。所以想问一下,我应该如何训练 jieba 生成自己的词库呢

@fukuball
Copy link
Owner

fukuball commented Sep 5, 2016

@hainuo 你想訓練哪個部分呢?字典、hmm 模型或是 idf 詞頻?

@hainuo
Copy link
Author

hainuo commented Sep 5, 2016

现在基础的专业词汇没有先训练字典吧

发自我的 iPhone

在 2016年9月5日,20:12,Fukuball Lin notifications@github.com 写道:

@hainuo 你想訓練哪個部分呢?字典、hmm 模型或是 idf 詞頻?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@fukuball
Copy link
Owner

fukuball commented Sep 6, 2016

@hainuo 不懂你的需求,結巴目前的運作原理就是先用字典斷詞再用 HMM 找新詞,某種程度字典決定了結巴的準確度,字典的詞頻如何產生,對結巴來說就是訓練過程,不過我目前沒有可用來訓練的語料庫就是了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants