Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix backwards compatibility bug in Word2Vec #3415

Merged
merged 2 commits into from
Dec 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 0 additions & 3 deletions gensim/models/word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -1986,9 +1986,6 @@ def _load_specials(self, *args, **kwargs):
for a in ('hashfxn', 'layer1_size', 'seed', 'syn1neg', 'syn1'):
if hasattr(self.trainables, a):
setattr(self, a, getattr(self.trainables, a))
if hasattr(self, 'syn1'):
self.syn1 = self.syn1
piskvorky marked this conversation as resolved.
Show resolved Hide resolved
del self.syn1
del self.trainables
if not hasattr(self, 'shrink_windows'):
self.shrink_windows = True
Expand Down
Binary file added gensim/test/test_data/model-from-gensim-3.8.0.w2v
Binary file not shown.
7 changes: 7 additions & 0 deletions gensim/test/test_word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,13 @@ def test_persistence(self):
self.assertTrue(np.allclose(wv.vectors, loaded_wv.vectors))
self.assertEqual(len(wv), len(loaded_wv))

def test_persistence_backwards_compatible(self):
"""Can we still load a model created with an older gensim version?"""
path = datapath('model-from-gensim-3.8.0.w2v')
model = word2vec.Word2Vec.load(path)
x = model.score(['test'])
assert x is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be a self.assertSomething() call, rather than a language assert for both consistency with other tests & rare case where someone may have disabled language-asserts via command-line. And, relying on the (maybe not possible to support indefinitely) score() function to probe model completeness, when that's not declared as tested in method name, could create maintenance headaches at some later date.


def test_persistence_from_file(self):
"""Test storing/loading the entire model trained with corpus_file argument."""
with temporary_file(get_tmpfile('gensim_word2vec.tst')) as corpus_file:
Expand Down