New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase of "online word2vec" by rutum #615
Conversation
5285757
to
780a504
Compare
fb3145f
to
4167d5e
Compare
I'm going to take a look at the test failure, but may need some help fixing it: ======================================================================
ERROR: testOnlineLearning (gensim.test.test_word2vec.TestWord2VecModel)
Test that the algorithm is able to add new words to the
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/travis/build/piskvorky/gensim/gensim/test/test_word2vec.py", line 104, in testOnlineLearning
model.build_vocab(new_sentences, update=True)
File "/home/travis/build/piskvorky/gensim/gensim/models/word2vec.py", line 510, in build_vocab
self.finalize_vocab(update) # build tables & arrays
File "/home/travis/build/piskvorky/gensim/gensim/models/word2vec.py", line 649, in finalize_vocab
self.sort_vocab()
File "/home/travis/build/piskvorky/gensim/gensim/models/word2vec.py", line 672, in sort_vocab
raise RuntimeError("must sort before initializing vectors/weights")
RuntimeError: must sort before initializing vectors/weights |
Ahh, I see. I need to add the update parameter to |
So I think the easy solution is to say you can't update models with sorted vocab. The other solution might be to skip the sorting step when updating a sorted model. |
I added 2 commits, one that fixed the test to use non-sorted vocab, and a second commit to skip sorting when updating a sorted model. Let me know which way makes sense. |
Travis passes now, but appveyor fails |
Can someone help me with the appveyor failure? I'm not sure what went wrong. I could also use some help deciding how to handle |
appveyor error is Collecting gensim
Could not find a version that satisfies the requirement gensim (from versions: )
No matching distribution found for gensim
Command exited with code 1 |
CCing @tmylk for AppVeyor. If it constantly fails, isn't it better to just drop it? It's confusing for everybody. |
@piskvorky How do I re-run the appveyor checks? |
Not sure -- that's a question for @tmylk. |
@piskvorky regarding AppVeyor: We could also just try https://github.com/The-Compiler/pytest-vw |
38dee6c
to
c7684c8
Compare
@piskvorky @phdowling I rebased against the develop branch again, which will re-trigger the travis tests. It looks like other PRs are passing now, so I expect this one will too (but we'll see). |
@piskvorky @phdowling tests passing now. |
Hey...in case you got comment notifications, please disregard the previous one. Embarrassingly enough, I screwed up find and replace #fail Thanks for all your work on this! |
@gushecht I force pushed to this branch, which probably caused the error |
@zachmayer Thanks a lot for your work! This functionality is in high demand. Do you think it is ready to be reviewed and merged after the conflicts are resolved? |
Definitly ready for review! I'll rebase and resolve merge conflicts (I already did it once lol) |
Rebased it here: #700 |
I rebased #435 off the develop branch, and fixed merge conflicts. I think this branch has all the new features from the "online word2vec" and I've resolved all the merge conflicts and removed the commits from master.
@rutum Let me know if I missed anything. You could pull this branch, and then work from here to respond to the issues on your pull request.
Full list of commits from original PR below:
recovering lost work
updating bug in sentences iterator
vector freeze after each training iteration
cbow update
clean up code
update summarization tutorial image
resolving merge conflicts
resolving merge conflicts
resolving merge conflicts
resolving merge conflicts
resolving merge conflicts
resolving merge conflicts
update_weights function change
updating the numpy copy code
updating the numpy copy code