-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc2vec/word2vec/fasttext models do not appear to improve if similarities checked mid-training epochs #2260
Comments
It turns out
has to be called before each call |
If that's the case, then that's definitely a bug! Are you saying you have to call |
Well, yes and no. |
|
This is an excellent idea imo. I implemented it in this way. |
is it closed? I want to get to work on this @menshikh-iv |
@naba7 see status on top |
Sorry for my recent absence. I pushed new changes to the branch of the PR, but it is still closed. I hope it is reopened in the next days, so we can finish working on it. |
@timbicker done, see #2273 |
Also an issue for FastText: #2260 |
I believe this issue is moot given changes that eliminated so much normed-vector caching in Gensim-4.0. |
Description
I am training a doc2vec model on a large corpus. I need to observe the model for more detailed statistics for my supervisor/boss.
The problem is similar to the problem below where I just slightly modified the Doc2Vec Tutorial on the Lee Dataset. The model does not improve its recommendations for the
most_similar method
.Steps/Code/Corpus to Reproduce
Expected Results
I expect to see many improvements in either recommendation or distance.
Actual Results
Consol Output with four workers:
It surprises me that only the first sample in the training_corpus receives some updates. I don't understand it.
So I debug the model and there are no improvements anymore:
I try it with 1 worker only:
What's happening here and how can I see during training how my doc2vec model improves? Because it is also not possible to see the training_error for doc2vec #999.
Further experimenting reveals that docvecs.vectors_docs are of course updated between each call of
batch_end
. Butmost_similiar
always returns the same suggestion.Versions
Darwin-17.5.0-x86_64-i386-64bit
Python 3.7.0 (default, Jun 29 2018, 20:13:13)
[Clang 9.1.0 (clang-902.0.39.2)]
NumPy 1.15.0
SciPy 1.1.0
gensim 3.5.0
FAST_VERSION 0
The text was updated successfully, but these errors were encountered: