Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid MemoryError on large queries #132

Closed
wants to merge 1 commit into from
Closed

Avoid MemoryError on large queries #132

wants to merge 1 commit into from

Conversation

tompko
Copy link
Contributor

@tompko tompko commented Apr 19, 2012

When receiving data from large queries we were running into a MemoryError. From investigating sock_info.sock.recv returns a buffer of length size, which is then inserted into the chunks list. Unfortunately we were only receiving a small amount of bytes per iteration so chunks was filling up with items of size (approximately) length and quickly running out of memory. In total our query looked like it would try to allocate about 4Tb worth of memory.

I've rewritten the function to behave more like the a previous version which fixes the memory issue as the chunk memory is freed once the data has been concatenated to the message.

@behackett
Copy link
Member

What python interpreter and version are you using?

@tompko
Copy link
Contributor Author

tompko commented Apr 19, 2012

Python 2.6.2 and Python 2.6.7 with the CPython interpreter.

@behackett
Copy link
Member

Can you open a ticket at jira.mongodb.org under the "python driver" project with reproduction steps and possibly an example document? I want to do a little more research on this issue before merging your pull request.

@tompko
Copy link
Contributor Author

tompko commented Apr 19, 2012

Sure, I'll do it once I'm back in the office tomorrow.

@behackett
Copy link
Member

Thanks a lot. I'm going to do some more research in the meantime.

@behackett
Copy link
Member

Hi Chris, are you using a large batch_size setting? By default MongoDB will only return 4MB or 101 documents (whichever comes first) in a single batch. I'm not sure I understand how PyMongo could use TBs of memory in a query just by doing "".join(chunks) for each batch.

@ajdavis
Copy link
Member

ajdavis commented Apr 23, 2012

@tompko I can't reproduce this behavior in my own tests. What OS are you on? Is your application multi-threaded, and if so roughly how many concurrent threads are running? At the bottom of Connection.__receive_data_on_socket, before the return statement, could you add

print len(message)

... and let me know what range the lengths fall in, in the scenario that caused the out-of-memory error?

@behackett
Copy link
Member

We haven't been able to reproduce this behavior. If you have a test case please open a ticket under the python project at jira.mongodb.org.

@behackett behackett closed this May 21, 2012
@TomasB
Copy link

TomasB commented Sep 21, 2012

Using pymongo 2.3, and had the same issue. This patch seems to fix this issue.

Added reporting length after exception is caught:

while length:
try:
chunk = sock_info.sock.recv(length)
except:
# recv was interrupted
print str(length)
print str(len(chunks))
print str(len(chunk))
self.__pool.discard_socket(sock_info)
raise
if chunk == EMPTY:
raise ConnectionFailure("connection closed")
length -= len(chunk)
chunks.append(chunk)
return EMPTY.join(chunks)

print output:
568622
895
512

The following fixes the issue:

from cStringIO import StringIO
.....
out = StringIO()
while length:
try:
chunk = sock_info.sock.recv(length)
except:
self.__pool.discard_socket(sock_info)
raise
if chunk == EMPTY:
raise ConnectionFailure("connection closed")
length -= len(chunk)
out.write(chunk)
return out.getvalue()

@behackett
Copy link
Member

I still haven't been able to reproduce this but it's been reported a few times. This change only reverts back to our pre-2.2 behavior so I'm going to merge it. Thanks for the patch and your patience with this issue.

@behackett behackett closed this Nov 8, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants