Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fastText models from 2.3.0 can't be loaded in 3.0.0 #1642

Closed
Liebeck opened this issue Oct 22, 2017 · 6 comments
Closed

fastText models from 2.3.0 can't be loaded in 3.0.0 #1642

Liebeck opened this issue Oct 22, 2017 · 6 comments
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills

Comments

@Liebeck
Copy link

Liebeck commented Oct 22, 2017

Description

I do have a compatibility issue with fastText and version 3.0.0. In version 2.3.0, I used the fastText C++ wrapper to train a model based on the code available at that time from
https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/FastText_Tutorial.ipynb

This code works in 2.3.0

from gensim.models.wrappers.fasttext import FastText as FT_wrapper
model = FT_wrapper.load(model_path)
if key in model:
    character_embedding = model[key]

In 3.0.0 it fails due to

File "scripts/foo.py", line 43, in reduce_fasttext_embedding
character_embedding = model[key]

File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 1345, in getitem
return self.wv.getitem(words)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/keyedvectors.py", line 602, in getitem
return self.word_vec(words)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/wrappers/fasttext.py", line 94, in word_vec
word_vec = np.zeros(self.syn0_ngrams.shape[1])
AttributeError: 'FastTextKeyedVectors' object has no attribute 'syn0_ngrams'

Expected Results

I expected the model from 2.3.0 to be loadable in 3.0.0. I was able to get my code working by downgrading to 2.3.0. I made some evaluations with trained models and I'd be happy to still use these models. Otherwise, I'm stuck at gensim 2.3.0

@menshikh-iv
I guess this has something to do with this commit 6e51156#diff-cd6e655ec64f5b3927aa96ce5d006207 and split 'syn0_all' into 'syn0_vocab' and 'syn0_ngrams'. I'm guessing that models trained with 2.3.0 aren't compatible with version 3. Is it possible that the load method checks whether the model was trained in 2.3.0, loads the 2.3.0 method, and internally makes the same split?

@Liebeck
Copy link
Author

Liebeck commented Oct 22, 2017

Or another idea to solve this: Can you create a utilsscript that transforms a 2.3.0 model into a 3.0.0 model?

@piskvorky piskvorky added the bug Issue described a bug label Oct 22, 2017
@menshikh-iv
Copy link
Contributor

@Liebeck Thanks for the report

I think possible to check this in load method, wdyt @chinmayapancholi13?

Can you fix this bug and create PR @Liebeck @chinmayapancholi13?

@menshikh-iv menshikh-iv added the difficulty medium Medium issue: required good gensim understanding & python skills label Oct 23, 2017
@Liebeck
Copy link
Author

Liebeck commented Oct 25, 2017

I'm not sure if I understand enough of gensim's architecture to contribute a quick fix. I might be able to have a further look at in January 😐

@chinmayapancholi13
Copy link
Contributor

@Liebeck Thanks for reporting this issue! Seems to be a problem in the load function.

@menshikh-iv Hey Ivan! I am a little occupied in this week. So I can take a look at this and try to get it resolved in the following week. I hope this is fine. I'll give an update about my progress here. :)

@menshikh-iv
Copy link
Contributor

It will be great @chinmayapancholi13, I'm glad to see you here again :)

@menshikh-iv
Copy link
Contributor

Fixed in #1723

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug difficulty medium Medium issue: required good gensim understanding & python skills
Projects
None yet
Development

No branches or pull requests

4 participants