Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue #743, Updated word2vec.n_similarities and test_word2vec.testSimilarities methods #883

Merged
merged 17 commits into from
Sep 26, 2016
Merged
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
Changes
=======

0.13.3, 2016-09-26
* Fixed issue #743 , In word2vec's n_similarity method if atleast one empty list is passed ZeroDivisionError is raised, added test cases in test/test_word2vec.py(@pranay360, #883)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't delete from changelog of phrases fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't delete it, I had a previous build in which that entry was not in the changelog, anyway I added it. Thank You for marking out the changes.

0.13.2, 2016-08-19
* export_phrases in Phrases model changed. Fixed issue #794 and added test cases in test/test_phrases.py(@AadityaJ,
[#879](https://github.com/RaRe-Technologies/gensim/pull/879))
Expand Down
6 changes: 5 additions & 1 deletion gensim/models/word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -1539,9 +1539,13 @@ def n_similarity(self, ws1, ws2):
True

"""
if not(len(ws1) and len(ws2)):
raise ZeroDivisionError('Atleast one of the passed list is empty.')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If input is a list, idiomatic Python is: if not ws1 or not ws2:.

v1 = [self[word] for word in ws1]
v2 = [self[word] for word in ws2]
return dot(matutils.unitvec(array(v1).mean(axis=0)), matutils.unitvec(array(v2).mean(axis=0)))
return dot(matutils.unitvec(array(v1).mean(axis=0)),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use vertical indent in gensim -- change to normal hanging indent (see PEP8: all statements on new lines, one level of indent).

matutils.unitvec(array(v2).mean(axis=0)))


def init_sims(self, replace=False):
"""
Expand Down
5 changes: 4 additions & 1 deletion gensim/test/test_word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -370,9 +370,12 @@ def testSimilarities(self):
model = word2vec.Word2Vec(size=2, min_count=1, sg=0, hs=0, negative=2)
model.build_vocab(sentences)
model.train(sentences)

self.assertTrue(model.n_similarity(['graph', 'trees'], ['trees', 'graph']))
self.assertTrue(model.n_similarity(['graph'], ['trees']) == model.similarity('graph', 'trees'))
self.assertRaises(ZeroDivisionError, model.n_similarity, ['graph', 'trees'], [])
self.assertRaises(ZeroDivisionError, model.n_similarity, [], ['graph', 'trees'])
self.assertRaises(ZeroDivisionError, model.n_similarity, [], [])

def testSimilarBy(self):
"""Test word2vec similar_by_word and similar_by_vector."""
Expand Down