Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PV-DBOW or PV-DM? #11

Open
jwijffels opened this issue Nov 10, 2020 · 5 comments
Open

PV-DBOW or PV-DM? #11

jwijffels opened this issue Nov 10, 2020 · 5 comments

Comments

@jwijffels
Copy link

Is this implementation the distributed bag of words ('PV-DBOW') or the distributed memory ('PV-DM') model

@jwijffels
Copy link
Author

@hiyijian would be great to have an answer on this

@rekola
Copy link

rekola commented Oct 17, 2021

Both models are implemented.

@jwijffels
Copy link
Author

@rekola where in the c++ code can you call dm and pv-dbow

@rekola
Copy link

rekola commented Oct 18, 2021

It seems my information was based on your work in https://github.com/bnosac/doc2vec/blob/master/R/paragraph2vec.R, which says:

# cbow = 0 = skip-gram                                             = PV-DBOW
# cbow = 1 = continuous bag of words including vector of paragraph = PV-DM

Is this not true?

I've been working on a fork of doc2vec to remove the word and sentence length limits. This original version also crashes, if you have more than 30 million documents in your dataset.

@jwijffels
Copy link
Author

jwijffels commented Oct 18, 2021

Yes, that is indeed my interpretation

# cbow = 0 = skip-gram                                             = PV-DBOW
# cbow = 1 = continuous bag of words including vector of paragraph = PV-DM

and I would prefer to have a validation from @hiyijian as in the R wrapper I call: https://github.com/bnosac/doc2vec/blob/3e947562a0a69e11eb292283116a4fdc9cf5c0f4/src/rcpp_doc2vec.cpp#L14 which calls the train functionality from this repository https://github.com/hiyijian/doc2vec/blob/master/cpp/Doc2Vec.cpp#L65 and I make the above assumption

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants