You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have installed gensim using pip install gensim. the version i have is the latest 0.12.4.
I have created a word2vec model from the same data and using the same parameters but every time i create the model it give different results. I have tried to use seed with fixed int when creating the model but it still behaves in the same way.
As soon as you use more than one worker thread, scheduling jitter from the OS means examples are trained in slightly different order. And, sources of randomness in the algorithm – like frequent-word downsampling – will be applied to different examples, meaning slightly different words chosen each run. (And even further, in Python3, PYTHONHASHSEED-controlled randomization on each interpreter-launch will affect the iteration order of keys in the discovered vocabulary dictionary, which can again affect their sampling or ordering inside the model.)
So: you can't expect identical results without taking extra steps, including limiting yourself to just a single worker thread. A pending PR (#642) will make this clearer in the doc-comment.
See also this thread on the discussion forum – https://groups.google.com/d/msg/gensim/7eiwqfhAbhs/qC0pmbw5HwAJ – the same considerations apply to Word2Vec. (If you have other questions that are not likely to be bugs, it's better to discuss at that forum than in this issue-tracker.)
I have installed gensim using pip install gensim. the version i have is the latest 0.12.4.
I have created a word2vec model from the same data and using the same parameters but every time i create the model it give different results. I have tried to use seed with fixed int when creating the model but it still behaves in the same way.
Here is some example code:
`>>> from nltk.corpus import brown
The text was updated successfully, but these errors were encountered: