Avoid MemoryError on large queries #132

tompko · 2012-04-19T17:01:18Z

When receiving data from large queries we were running into a MemoryError. From investigating sock_info.sock.recv returns a buffer of length size, which is then inserted into the chunks list. Unfortunately we were only receiving a small amount of bytes per iteration so chunks was filling up with items of size (approximately) length and quickly running out of memory. In total our query looked like it would try to allocate about 4Tb worth of memory.

I've rewritten the function to behave more like the a previous version which fixes the memory issue as the chunk memory is freed once the data has been concatenated to the message.

behackett · 2012-04-19T17:20:16Z

What python interpreter and version are you using?

tompko · 2012-04-19T17:26:45Z

Python 2.6.2 and Python 2.6.7 with the CPython interpreter.

behackett · 2012-04-19T17:42:58Z

Can you open a ticket at jira.mongodb.org under the "python driver" project with reproduction steps and possibly an example document? I want to do a little more research on this issue before merging your pull request.

tompko · 2012-04-19T17:45:54Z

Sure, I'll do it once I'm back in the office tomorrow.

behackett · 2012-04-19T17:46:51Z

Thanks a lot. I'm going to do some more research in the meantime.

behackett · 2012-04-23T17:20:39Z

Hi Chris, are you using a large batch_size setting? By default MongoDB will only return 4MB or 101 documents (whichever comes first) in a single batch. I'm not sure I understand how PyMongo could use TBs of memory in a query just by doing "".join(chunks) for each batch.

ajdavis · 2012-04-23T18:18:22Z

@tompko I can't reproduce this behavior in my own tests. What OS are you on? Is your application multi-threaded, and if so roughly how many concurrent threads are running? At the bottom of Connection.__receive_data_on_socket, before the return statement, could you add

print len(message)

... and let me know what range the lengths fall in, in the scenario that caused the out-of-memory error?

behackett · 2012-05-21T17:30:25Z

We haven't been able to reproduce this behavior. If you have a test case please open a ticket under the python project at jira.mongodb.org.

TomasB · 2012-09-21T10:24:31Z

Using pymongo 2.3, and had the same issue. This patch seems to fix this issue.

Added reporting length after exception is caught:
while length: try: chunk = sock_info.sock.recv(length) except: # recv was interrupted print str(length) print str(len(chunks)) print str(len(chunk)) self.__pool.discard_socket(sock_info) raise if chunk == EMPTY: raise ConnectionFailure("connection closed") length -= len(chunk) chunks.append(chunk) return EMPTY.join(chunks)
print output:
568622
895
512

The following fixes the issue:
from cStringIO import StringIO ..... out = StringIO() while length: try: chunk = sock_info.sock.recv(length) except: self.__pool.discard_socket(sock_info) raise if chunk == EMPTY: raise ConnectionFailure("connection closed") length -= len(chunk) out.write(chunk) return out.getvalue()

behackett · 2012-11-08T21:24:47Z

I still haven't been able to reproduce this but it's been reported a few times. This change only reverts back to our pre-2.2 behavior so I'm going to merge it. Thanks for the patch and your patience with this issue.

Avoid MemoryError when receiving large queries

e90cc72

behackett closed this May 21, 2012

ghost assigned ajdavis Sep 21, 2012

rozza mentioned this pull request Sep 24, 2012

MemoryError while retrieving large cursors #142

Closed

behackett reopened this Nov 8, 2012

behackett closed this Nov 8, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid MemoryError on large queries #132

Avoid MemoryError on large queries #132

tompko commented Apr 19, 2012

behackett commented Apr 19, 2012

tompko commented Apr 19, 2012

behackett commented Apr 19, 2012

tompko commented Apr 19, 2012

behackett commented Apr 19, 2012

behackett commented Apr 23, 2012

ajdavis commented Apr 23, 2012

behackett commented May 21, 2012

TomasB commented Sep 21, 2012

behackett commented Nov 8, 2012

Avoid MemoryError on large queries #132

Avoid MemoryError on large queries #132

Conversation

tompko commented Apr 19, 2012

behackett commented Apr 19, 2012

tompko commented Apr 19, 2012

behackett commented Apr 19, 2012

tompko commented Apr 19, 2012

behackett commented Apr 19, 2012

behackett commented Apr 23, 2012

ajdavis commented Apr 23, 2012

behackett commented May 21, 2012

TomasB commented Sep 21, 2012

behackett commented Nov 8, 2012