Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_rollback_optimization is not performed on some old FastText models #2453

Closed
generall opened this issue Apr 19, 2019 · 5 comments · Fixed by #2454
Closed

_rollback_optimization is not performed on some old FastText models #2453

generall opened this issue Apr 19, 2019 · 5 comments · Fixed by #2454
Assignees
Labels
bug Issue described a bug fasttext Issues related to the FastText model

Comments

@generall
Copy link

Problem description

Model araneum_none_fasttextskipgram_300_5_2018 from https://rusvectores.org/ru/models/
is loading incorrectly with gensim.models.fasttext.FastText.load
This method creates model with compatible_hash = True and hash2index param, so that ngram indexes are computed incorrectly.

Steps/code/corpus to reproduce

from gensim.models.fasttext import FastText

ft2 = FastText.load('./data/araneum_none_fasttextskipgram_300_5_2018.model')
ft2.wv.word_vec('слово123')

gives the following stacktrace

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-14-5fac1e42e24a> in <module>()
----> 1 ft2.wv.word_vec('слово123')

~/anaconda3/lib/python3.6/site-packages/gensim/models/keyedvectors.py in word_vec(self, word, use_norm)
   2111                 return word_vec
   2112             for nh in ngram_hashes:
-> 2113                 word_vec += ngram_weights[nh]
   2114             return word_vec / len(ngram_hashes)
   2115 

IndexError: index 1369364 is out of bounds for axis 0 with size 27355

Versions

Linux-4.2.0-27-generic-x86_64-with-debian-jessie-sid
Python 3.6.8 |Anaconda 4.4.0 (64-bit)| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0]
NumPy 1.16.0
SciPy 1.1.0
gensim 3.7.2
FAST_VERSION 1
@generall
Copy link
Author

Possible workaround is to manually apply this changes on loaded model:

from gensim.models.keyedvectors import _rollback_optimization

ft2.wv.compatible_hash = False
_rollback_optimization(ft2.wv)

@mpenkov mpenkov self-assigned this Apr 20, 2019
@mpenkov
Copy link
Collaborator

mpenkov commented Apr 20, 2019

Good catch. We'll add logic to handle such models.

@mpenkov mpenkov added bug Issue described a bug fasttext Issues related to the FastText model labels Apr 20, 2019
@akutuzov
Copy link
Contributor

akutuzov commented Apr 23, 2019

This should also handle models loaded via KeyedVectors.load(), not only FastText.load().
FastText models can be (and also are) stored as FastTextKeyedVectors objects, to save space.

@akutuzov
Copy link
Contributor

akutuzov commented Apr 23, 2019

@generall FYI: all the RusVectores fastText models are now updated to support the latest Gensim version. Also note that you should use KeyedVectors.load() to load these models (as described in the RusVectores tutorial).

@mpenkov
Copy link
Collaborator

mpenkov commented Apr 23, 2019

@akutuzov I think the KeyedVectors stuff already gets handled correctly - could you please confirm?

mpenkov added a commit that referenced this issue May 4, 2019
* update legacy model loading, fix #2453

* extract _try_upgrade function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue described a bug fasttext Issues related to the FastText model
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants