_rollback_optimization is not performed on some old FastText models #2453

generall · 2019-04-19T12:52:05Z

Problem description

Model araneum_none_fasttextskipgram_300_5_2018 from https://rusvectores.org/ru/models/
is loading incorrectly with gensim.models.fasttext.FastText.load
This method creates model with compatible_hash = True and hash2index param, so that ngram indexes are computed incorrectly.

Steps/code/corpus to reproduce

from gensim.models.fasttext import FastText

ft2 = FastText.load('./data/araneum_none_fasttextskipgram_300_5_2018.model')
ft2.wv.word_vec('слово123')

gives the following stacktrace

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-14-5fac1e42e24a> in <module>()
----> 1 ft2.wv.word_vec('слово123')

~/anaconda3/lib/python3.6/site-packages/gensim/models/keyedvectors.py in word_vec(self, word, use_norm)
   2111                 return word_vec
   2112             for nh in ngram_hashes:
-> 2113                 word_vec += ngram_weights[nh]
   2114             return word_vec / len(ngram_hashes)
   2115 

IndexError: index 1369364 is out of bounds for axis 0 with size 27355

Versions

Linux-4.2.0-27-generic-x86_64-with-debian-jessie-sid
Python 3.6.8 |Anaconda 4.4.0 (64-bit)| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0]
NumPy 1.16.0
SciPy 1.1.0
gensim 3.7.2
FAST_VERSION 1

The text was updated successfully, but these errors were encountered:

generall · 2019-04-19T13:00:36Z

Possible workaround is to manually apply this changes on loaded model:

from gensim.models.keyedvectors import _rollback_optimization

ft2.wv.compatible_hash = False
_rollback_optimization(ft2.wv)

mpenkov · 2019-04-20T06:41:22Z

Good catch. We'll add logic to handle such models.

akutuzov · 2019-04-23T16:04:10Z

This should also handle models loaded via KeyedVectors.load(), not only FastText.load().
FastText models can be (and also are) stored as FastTextKeyedVectors objects, to save space.

akutuzov · 2019-04-23T16:06:51Z

@generall FYI: all the RusVectores fastText models are now updated to support the latest Gensim version. Also note that you should use KeyedVectors.load() to load these models (as described in the RusVectores tutorial).

mpenkov · 2019-04-23T23:21:50Z

@akutuzov I think the KeyedVectors stuff already gets handled correctly - could you please confirm?

* update legacy model loading, fix #2453 * extract _try_upgrade function

mpenkov self-assigned this Apr 20, 2019

mpenkov added bug Issue described a bug fasttext Issues related to the FastText model labels Apr 20, 2019

mpenkov mentioned this issue Apr 20, 2019

update legacy model loading, fix #2453 #2454

Merged

mpenkov closed this as completed in #2454 May 4, 2019

mpenkov added a commit that referenced this issue May 4, 2019

update legacy model loading, fix #2453 (#2454)

1ceb7a4

* update legacy model loading, fix #2453 * extract _try_upgrade function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_rollback_optimization is not performed on some old FastText models #2453

_rollback_optimization is not performed on some old FastText models #2453

generall commented Apr 19, 2019

generall commented Apr 19, 2019

mpenkov commented Apr 20, 2019

akutuzov commented Apr 23, 2019 •

edited

Loading

akutuzov commented Apr 23, 2019 •

edited

Loading

mpenkov commented Apr 23, 2019 •

edited

Loading

_rollback_optimization is not performed on some old FastText models #2453

_rollback_optimization is not performed on some old FastText models #2453

Comments

generall commented Apr 19, 2019

Problem description

Steps/code/corpus to reproduce

Versions

generall commented Apr 19, 2019

mpenkov commented Apr 20, 2019

akutuzov commented Apr 23, 2019 • edited Loading

akutuzov commented Apr 23, 2019 • edited Loading

mpenkov commented Apr 23, 2019 • edited Loading

akutuzov commented Apr 23, 2019 •

edited

Loading

akutuzov commented Apr 23, 2019 •

edited

Loading

mpenkov commented Apr 23, 2019 •

edited

Loading