Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FastText RAM usage #2790

Closed
menshikh-iv opened this issue Apr 12, 2020 · 0 comments · Fixed by #2791
Closed

FastText RAM usage #2790

menshikh-iv opened this issue Apr 12, 2020 · 0 comments · Fixed by #2791
Assignees

Comments

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Apr 12, 2020

Problem description

The FastText model takes too much RAM. We saw this issue many times in our CI systems, this typically looks like

____________________ TestFastTextModel.test_cbow_hs_online _____________________
self = <gensim.test.test_fasttext.TestFastTextModel testMethod=test_cbow_hs_online>
    @unittest.skipIf(IS_WIN32, "avoid memory error with Appveyor x32")
    def test_cbow_hs_online(self):
        model = FT_gensim(
>           sg=0, cbow_mean=1, alpha=0.05, window=2, hs=1, negative=0, min_count=3, iter=1, seed=42, workers=1
        )
self       = <gensim.test.test_fasttext.TestFastTextModel testMethod=test_cbow_hs_online>
/venv/lib/python3.7/site-packages/gensim/test/test_fasttext.py:733: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/venv/lib/python3.7/site-packages/gensim/models/fasttext.py:595: in __init__
    self.trainables.prepare_weights(hs, negative, self.wv, update=False, vocabulary=self.vocabulary)
...
/venv/lib/python3.7/site-packages/gensim/models/fasttext.py:1130: in prepare_weights
    self.init_ngrams_weights(wv, update=update, vocabulary=vocabulary)
 ...
mtrand.pyx:1307: in mtrand.RandomState.uniform
    ???
        ...
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
>   ???
E   MemoryError

Steps/code/corpus to reproduce

from gensim.models import FastText

m = FastText()  # and measure RAM after that

It will "eat" around 1.6GB after __init__. I guess than issue in "bucket" matrix (too large).
Itself, this isn't a problem, but this is an issue in our tests (because we almost never pin this parameter).

Versions

Please provide the output of:

>>> import platform; print(platform.platform())
Linux-5.3.0-46-generic-x86_64-with-Ubuntu-19.10-eoan
>>> import sys; print("Python", sys.version)
('Python', '2.7.17 (default, Nov  7 2019, 10:07:09) \n[GCC 9.2.1 20191008]')
>>> import numpy; print("NumPy", numpy.__version__)
('NumPy', '1.16.1')
>>> import scipy; print("SciPy", scipy.__version__)
('SciPy', '1.2.3')
>>> import gensim; print("gensim", gensim.__version__)
('gensim', '3.8.1')
>>> from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
('FAST_VERSION', 1)

I'm sure than other py versions and more early gensim versions affected in the same way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants