关于词频的问题 #13

hainuo · 2016-08-11T07:57:13Z

我现在使用jieba 分词来处理系统的关键词，但是有好多并不是我想要的。我觉得如果我可以生成一个自己的词频词库或许能够更好地提取关键词。所以想问一下，我应该如何训练 jieba 生成自己的词库呢

fukuball · 2016-09-05T12:12:44Z

@hainuo 你想訓練哪個部分呢？字典、hmm 模型或是 idf 詞頻？

hainuo · 2016-09-05T12:49:10Z

现在基础的专业词汇没有先训练字典吧

发自我的 iPhone

在 2016年9月5日，20:12，Fukuball Lin notifications@github.com 写道：

@hainuo 你想訓練哪個部分呢？字典、hmm 模型或是 idf 詞頻？

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

fukuball · 2016-09-06T08:32:01Z

@hainuo 不懂你的需求，結巴目前的運作原理就是先用字典斷詞再用 HMM 找新詞，某種程度字典決定了結巴的準確度，字典的詞頻如何產生，對結巴來說就是訓練過程，不過我目前沒有可用來訓練的語料庫就是了

fukuball closed this as completed Jun 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于词频的问题 #13

关于词频的问题 #13

hainuo commented Aug 11, 2016

fukuball commented Sep 5, 2016

hainuo commented Sep 5, 2016

fukuball commented Sep 6, 2016 •

edited

Loading

关于词频的问题 #13

关于词频的问题 #13

Comments

hainuo commented Aug 11, 2016

fukuball commented Sep 5, 2016

hainuo commented Sep 5, 2016

fukuball commented Sep 6, 2016 • edited Loading

fukuball commented Sep 6, 2016 •

edited

Loading