Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy.core._exceptions.MemoryError: Unable to allocate array with shape (51, 6, 64, 2) and data type float32 #9690

Closed
mihirpatel7 opened this issue Nov 18, 2021 · 5 comments
Labels
perf / memory Performance: memory use resolved The issue was addressed / answered v2 spaCy v2.x windows Issues related to Windows

Comments

@mihirpatel7
Copy link

mihirpatel7 commented Nov 18, 2021

Your Environment

  • Operating System: Windows server
  • Python Version Used: 3.6.2
  • spaCy Version Used: 2.1.3
  • Environment Information: RAM 8 Gb

Spacy continues throwing error:
->numpy.core._exceptions.MemoryError : Unable to allocate array with shape (51, 6, 64, 2) and data type float32.
->numpy.core._exceptions.MemoryError: Unable to allocate array with shape (75999, 96) and data type float32
->numpy.core._exceptions.MemoryError: Unable to allocate array with shape (82, 768) and data type float32
->numpy.core._exceptions.MemoryError: Unable to allocate array with shape (80, 768) and data type float32
many more like this.

Example-

self.nlp=spacy.load('en_core_web_lg',disable=['tagger','parser'])
self.nlp.max_length = 1200000

#Processing Document line by line.
doc = self.nlp(line)
 File "C:\python36\lib\site-packages\spacy\language.py", line 385, in __call__
 doc = proc(doc, **component_cfg.get(name, {}))
 File "nn_parser.pyx", line 205, in spacy.syntax.nn_parser.Parser.__call__
 File "nn_parser.pyx", line 244, in spacy.syntax.nn_parser.Parser.predict
 File "nn_parser.pyx", line 257, in spacy.syntax.nn_parser.Parser.greedy_parse
 File "C:\python36\lib\site-packages\thinc\neural\_classes\model.py", line 165, in __call__
 return self.predict(x)
 File "C:\python36\lib\site-packages\thinc\neural\_classes\model.py", line 129, in predict
 y, _ = self.begin_update(X, drop=None)
 File "_parser_model.pyx", line 214, in spacy.syntax._parser_model.ParserModel.begin_update
 File "_parser_model.pyx", line 262, in spacy.syntax._parser_model.ParserStepModel.__init__
 File "C:\python36\lib\site-packages\thinc\neural\_classes\feed_forward.py", line 46, in begin_update
 X, inc_layer_grad = layer.begin_update(X, drop=drop)
 File "C:\python36\lib\site-packages\thinc\api.py", line 264, in begin_update
 X, bp_layer = layer.begin_update(layer.ops.flatten(seqs_in, pad=pad), drop=drop)
 File "C:\python36\lib\site-packages\thinc\neural\_classes\feed_forward.py", line 46, in begin_update
 X, inc_layer_grad = layer.begin_update(X, drop=drop)
 File "C:\python36\lib\site-packages\thinc\neural\_classes\resnet.py", line 21, in begin_update
 y, bp_y = self._layers[0].begin_update(X, drop=drop)
 File "C:\python36\lib\site-packages\thinc\neural\_classes\feed_forward.py", line 46, in begin_update
 X, inc_layer_grad = layer.begin_update(X, drop=drop)
 File "C:\python36\lib\site-packages\thinc\neural\_classes\layernorm.py", line 55, in begin_update
 X, backprop_child = self.child.begin_update(X, drop=0.0)
 File "C:\python36\lib\site-packages\thinc\neural\_classes\maxout.py", line 79, in begin_update
 best__bo, which__bo = self.ops.maxout(output__boc)
 File "ops.pyx", line 500, in thinc.neural.ops.NumpyOps.maxout
numpy.core._exceptions.MemoryError: Unable to allocate array with **shape (80622, 96) and data type int32**
@polm polm added perf / memory Performance: memory use v2 spaCy v2.x windows Issues related to Windows labels Nov 18, 2021
@polm
Copy link
Contributor

polm commented Nov 18, 2021

It sounds like you ran out of memory. Is there any reason to believe that this is something besides running out of memory?

self.nlp.max_length = 1200000

This is higher than the default limit of 1M characters, but you only have 8GB of RAM, which is not a lot. Are you doing this because you have very long documents? If so you may want to look at dividing them up.

@mihirpatel7
Copy link
Author

@polm Thank you for your response.
I use that coz large documents with greater than 1M char.
for nlp.max_length = 1200000 char length but I am passing text only line<1M doc = self.nlp(line)

yes, I have very long documents bcoz of that I am passing line sometimes line size is larger but I use condition not passing <1M char through Model nlp=spacy.load('en_core_web_lg',disable=['tagger','parser']) and getting Entity from the text.

Now check this exception: even too small memory.
->numpy.core._exceptions.MemoryError: Unable to allocate array with shape (82, 768) and data type float32
->numpy.core._exceptions.MemoryError: Unable to allocate array with shape (80, 768) and data type float32

@polm
Copy link
Contributor

polm commented Nov 18, 2021

Even if an individual document isn't enough to exceed your memory usage, any previous docs that haven't been unallocated will still fill up your memory. How do you know that's not the problem?

Also, you're using a pretty old version of spaCy. Did this start happening suddenly or something?

One thing you should be aware of is that there's little benefit to processing very long documents. The NER models don't benefit from context much more than a paragraph. You will probably have an easier time if you split your documents into paragraphs before passing them to spaCy.

Based on the information you've given here it still looks like you just have documents that are too large for the amount of RAM you have, whether individually or in combination with other data in your program.

@adrianeboyd adrianeboyd added the resolved The issue was addressed / answered label Dec 1, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Dec 1, 2021

This issue has been automatically closed because it was answered and there was no follow-up discussion.

@github-actions github-actions bot closed this as completed Dec 1, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2022

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
perf / memory Performance: memory use resolved The issue was addressed / answered v2 spaCy v2.x windows Issues related to Windows
Projects
None yet
Development

No branches or pull requests

3 participants