Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase of "online word2vec" by rutum #615

Closed
wants to merge 0 commits into from

Conversation

zachmayer
Copy link

I rebased #435 off the develop branch, and fixed merge conflicts. I think this branch has all the new features from the "online word2vec" and I've resolved all the merge conflicts and removed the commits from master.

@rutum Let me know if I missed anything. You could pull this branch, and then work from here to respond to the issues on your pull request.

Full list of commits from original PR below:
recovering lost work
updating bug in sentences iterator
vector freeze after each training iteration
cbow update
clean up code
update summarization tutorial image
resolving merge conflicts
resolving merge conflicts
resolving merge conflicts
resolving merge conflicts
resolving merge conflicts
resolving merge conflicts
update_weights function change
updating the numpy copy code
updating the numpy copy code

@zachmayer zachmayer force-pushed the develop branch 4 times, most recently from 5285757 to 780a504 Compare February 22, 2016 20:12
@zachmayer zachmayer mentioned this pull request Feb 22, 2016
@zachmayer zachmayer force-pushed the develop branch 3 times, most recently from fb3145f to 4167d5e Compare February 22, 2016 20:50
@zachmayer
Copy link
Author

I'm going to take a look at the test failure, but may need some help fixing it:

======================================================================
ERROR: testOnlineLearning (gensim.test.test_word2vec.TestWord2VecModel)
Test that the algorithm is able to add new words to the
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/piskvorky/gensim/gensim/test/test_word2vec.py", line 104, in testOnlineLearning
    model.build_vocab(new_sentences, update=True)
  File "/home/travis/build/piskvorky/gensim/gensim/models/word2vec.py", line 510, in build_vocab
    self.finalize_vocab(update)  # build tables & arrays
  File "/home/travis/build/piskvorky/gensim/gensim/models/word2vec.py", line 649, in finalize_vocab
    self.sort_vocab()
  File "/home/travis/build/piskvorky/gensim/gensim/models/word2vec.py", line 672, in sort_vocab
    raise RuntimeError("must sort before initializing vectors/weights")
RuntimeError: must sort before initializing vectors/weights

@zachmayer
Copy link
Author

Ahh, I see. I need to add the update parameter to sort_vocab.

@zachmayer
Copy link
Author

So I think the easy solution is to say you can't update models with sorted vocab.

The other solution might be to skip the sorting step when updating a sorted model.

@zachmayer
Copy link
Author

I added 2 commits, one that fixed the test to use non-sorted vocab, and a second commit to skip sorting when updating a sorted model.

Let me know which way makes sense.

@zachmayer
Copy link
Author

Travis passes now, but appveyor fails

@zachmayer
Copy link
Author

Can someone help me with the appveyor failure? I'm not sure what went wrong.

I could also use some help deciding how to handle sort_vocab when updating models.

@zachmayer
Copy link
Author

appveyor error is

Collecting gensim
  Could not find a version that satisfies the requirement gensim (from versions: )
No matching distribution found for gensim
Command exited with code 1

@piskvorky
Copy link
Owner

CCing @tmylk for AppVeyor. If it constantly fails, isn't it better to just drop it? It's confusing for everybody.

@zachmayer
Copy link
Author

@piskvorky How do I re-run the appveyor checks?

@piskvorky
Copy link
Owner

Not sure -- that's a question for @tmylk.

@phdowling
Copy link
Contributor

@piskvorky regarding AppVeyor: We could also just try https://github.com/The-Compiler/pytest-vw

@zachmayer zachmayer force-pushed the develop branch 2 times, most recently from 38dee6c to c7684c8 Compare March 7, 2016 12:55
@zachmayer
Copy link
Author

@piskvorky @phdowling I rebased against the develop branch again, which will re-trigger the travis tests. It looks like other PRs are passing now, so I expect this one will too (but we'll see).

@zachmayer
Copy link
Author

@piskvorky @phdowling tests passing now.

@gushecht
Copy link

gushecht commented Mar 8, 2016

Hey...in case you got comment notifications, please disregard the previous one. Embarrassingly enough, I screwed up find and replace #fail

Thanks for all your work on this!

@zachmayer
Copy link
Author

@gushecht I force pushed to this branch, which probably caused the error

@tmylk
Copy link
Contributor

tmylk commented May 17, 2016

@zachmayer Thanks a lot for your work! This functionality is in high demand. Do you think it is ready to be reviewed and merged after the conflicts are resolved?

@zachmayer
Copy link
Author

Definitly ready for review! I'll rebase and resolve merge conflicts (I already did it once lol)

@zachmayer
Copy link
Author

Rebased it here: #700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants