Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

词向量选择的target都是word吗?只不过context是word、word+char、word+ngram、word+char+ngram #40

Closed
PuddingCoder opened this issue Sep 11, 2018 · 1 comment

Comments

@PuddingCoder
Copy link

PuddingCoder commented Sep 11, 2018

你好~感谢你们将你们的工作开源,受贵组论文启示,我想要用自己的语料库训练context为word+char+ngram的SGNS embedding。于是我又看了ngram2vec的论文,发现其根据target和context不同分为:uni_uni, uni_bi, bi_bi... 。CA8中是只用target为uni的uni_bi吗?然后又在context中加入char?如果我想训练context为word+char+ngram的SGNS embedding,如何将char加入到context呢?是要自己在ngram2vec toolkit中自己写代码添加<word,char>对嘛?

@shenshen-hungry
Copy link
Collaborator

target和context都可以用word+char+ngram,这个项目提供的词向量target大部分都是word,下面的Various Co-occurrence Information有char的。
用ngram2vec训练的话可以自定义target和context,按需求改代码就可以,比较容易的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants