Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

这个有词频分析和数据输出么? #41

Open
wolf8210137 opened this issue Apr 26, 2018 · 3 comments
Open

这个有词频分析和数据输出么? #41

wolf8210137 opened this issue Apr 26, 2018 · 3 comments

Comments

@wolf8210137
Copy link

我想要在一个句子中,取出前3个使用频率最高的名词,或者说最重要的前三个词。单纯的分词实现不了。不知道jieba-php有分词功能么?

@fukuball
Copy link
Owner

@wolf8210137 有關鍵詞提取功能,使用 TF/IDF 演算法,請見 Readme 功能 3):關鍵詞提取

@kency
Copy link

kency commented May 23, 2018

cut可以加一个返回完整的分词带idf和词性的数组的选项吗, 返回结果类似这样
array(21) {
[0]=>
array(2) {
["word"]=>
string(3) "这"
["idf"]=>
double(8) 1.22223333
["tag"]=>
string(1) "r"
}....
}
用google的simhash算法做文章的相似度比较,需要文章的全部分词的权重,同时做情感分析需要分词的词性
也就是说,在cut的返回结果里把idf和词性也都带上就好了@fukuball

@fukuball
Copy link
Owner

是可以花時間加上這樣的功能,看有沒有人要幫忙,或是等我有空 XD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants