golang implement of Tomas Mikolov's word/document embedding. You may want to feel the basic idea from Mikolov's two orignal papers, word2vec and doc2vec. More recently, Andrew M. Dai etc from Google reported its power in more detail
[@bjsjs_11_83 doc2vec-golang]$ ./control build
traning Exec build ok
build ok
# The training data(data/zhihu_data.1w) is one document per line, two columns divided by tab,
# the first column is id, and the second column is the segmented document separated by spaces.
[@bjsjs_11_83 doc2vec-golang]$ ./train data/zhihu_data.1w
Skip-Gram Iter:48 Alpha: 0.000796 Progress: 96.81% Words/sec: 24.27k
2018-03-30 14:53:00.218536235 +0800 CST training end, 1342521 26861
[@bjsjs_11_83 doc2vec-golang]$ ./knn 2.model
please select operation type:
0:word2words
1:doc_likelihood
2:leave one out key words
3:sen2words
4:sen2docs
5:word2docs
6:doc2docs
7:doc2words
0
Enter text:网页
1 网页
0.7823723719117796 不让
0.7651260773728028 浏览
0.7642516944020028 邮件
0.7601415883811553 近
0.7517607921006224 迷恋
0.7492900066365179 等同
0.7485966355448261 传说
0.7463299535930537 基于
0.7447865182221745 版
please select operation type:
0:word2words
1:doc_likelihood
2:leave one out key words
3:sen2words
4:sen2docs
5:word2docs
6:doc2docs
7:doc2words
- doc2vec支持CBOW和Skip-Gram两种模型,Negative Sampling和Hierarchical Softmax优化均已实现
- online infer document
- likelihood of document
- doc2words
- doc2docs
- word2words
- word2docs
- wmd
- doc2vec添加同义词语义约束
- 句子提取核心词
- google word2vec 实现
- hiyijian/doc2vec
- word2vec语义约束
- doc2vec添加同义词语义约束