Memory Error - Word2Vec #293

dav009 · 2015-02-11T17:34:12Z

Im running a fairly simple script[1] calling word2vec on a 15G corpus which is already tokenized.I have tried running on a 30G machine, and then on a 60G machine.
Both leading to the following error:

Traceback (most recent call last):
  File "word2vec.py", line 13, in <module>
    read_corpus("/mnt/data/corpus")
  File "word2vec.py", line 9, in read_corpus
    model = gensim.models.Word2Vec(sentences, min_count=10, size=500, window=10, sg=1, workers=4)
  File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 312, in __init__
    self.build_vocab(sentences)
  File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 414, in build_vocab
    self.reset_weights()
  File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 524, in reset_weights
    self.syn1 = zeros((len(self.vocab), self.layer1_size), dtype=REAL)
MemoryError

Any clues? Does the script contains something which is not meant to be done ?

[1] https://gist.github.com/dav009/fb9a42890d3048b3b745

The text was updated successfully, but these errors were encountered:

sebastien-j · 2015-02-11T18:01:52Z

How big is your vocabulary?

If I recall correctly, with a vocabulary of size |V|, the memory usage
should be approximately 8 * size * |V| bytes (plus some overhead).

For |V|=10^7 and size=500, this is 40 GB.

The simplest solution is probably to increase min_count.

seelikat · 2015-10-09T19:43:02Z

@dav009 , could you find the problem? gensim is also simply failing for me with a Memory Error although plenty of memory is available.

piskvorky · 2015-10-10T12:59:21Z

As @sebastien-j says, it's best to give a more detailed report. A link (gist) to the log of your run (at INFO level) would be ideal.

seelikat · 2015-10-19T09:55:46Z

The actual problem was that on my cluster node a 32Bit Python (in an Anaconda framework) was installed, so that had nothing to do with gensim.

piskvorky · 2015-10-19T09:57:58Z

@dav009 did you figure out the cause in your case? Was it 32bit Python as well?

Let's try to conclude & close this ticket.

tmylk · 2016-01-23T22:47:05Z

Closing as abandoned.

shirish93 · 2016-01-26T13:55:17Z

Hello, I seem to have encountered a similar issue. I'm using gensim with WinPython in one of my virtual machines with 16G memory. A model I have loads fine and works perfectly on a different system with 8GB memory in similar conditions, but when I run it in the VM, I get this:

word2vec.py", line 1266, in init_sims self.syn0norm = (self.syn0 / sqrt((self.syn0 ** 2).sum(-1))[..., newaxis]).astype(REAL) MemoryError
when trying to use the 'most_similar' functions for Word2Vec. I can load the model fine, and can retrieve the word vectors of words fine, but it seems to explode just when I try to access similarity-related functions (including 'doesn't match') etc. The closest existing issue I was with this. Any ideas?

gojomo · 2016-01-26T19:48:59Z

@shirish93 – verify that it's the exact same Python version (both installed and specifically in use at the time of the error) on the system that works and doesn't. (That seems to have been the issue above.)

shirish93 · 2016-01-27T14:56:20Z

This was right, installed 64-bit python, and the issue was resolved. Apologies for raising non-issue!

ansiiso mentioned this issue Apr 6, 2015

Problems to run idio/wiki2vec#2

Closed

phdowling mentioned this issue Jun 28, 2015

Improves peak memory usage of Word2Vec on vocabulary creation #370

Closed

tmylk closed this as completed Jan 23, 2016

shirish93 mentioned this issue Jan 26, 2016

Memory Error (Revisited)- Word2Vec #592

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Error - Word2Vec #293

Memory Error - Word2Vec #293

dav009 commented Feb 11, 2015

sebastien-j commented Feb 11, 2015

seelikat commented Oct 9, 2015

piskvorky commented Oct 10, 2015

seelikat commented Oct 19, 2015

piskvorky commented Oct 19, 2015

tmylk commented Jan 23, 2016

shirish93 commented Jan 26, 2016

gojomo commented Jan 26, 2016

shirish93 commented Jan 27, 2016

Memory Error - Word2Vec #293

Memory Error - Word2Vec #293

Comments

dav009 commented Feb 11, 2015

sebastien-j commented Feb 11, 2015

seelikat commented Oct 9, 2015

piskvorky commented Oct 10, 2015

seelikat commented Oct 19, 2015

piskvorky commented Oct 19, 2015

tmylk commented Jan 23, 2016

shirish93 commented Jan 26, 2016

gojomo commented Jan 26, 2016

shirish93 commented Jan 27, 2016