Memory consumption issues #18

Closed
mattseh opened this Issue Aug 5, 2012 · 7 comments

Projects

None yet

5 participants

@stanim
stanim commented Aug 19, 2012

There seems to be a circular reference between corresponding response and request objects:

https://github.com/kennethreitz/requests/blob/develop/requests/models.py#L309

def _build_response(self, resp):
    (...)
    self.response = r
    self.response.request = self

As a quick solution I propose to reset response.request and probably also response.history to None.

A better solution might be to refactor response and request classes to use weak references: http://docs.python.org/library/weakref.html

@mattseh
mattseh commented Aug 19, 2012

Thanks for the input stanim, the solution you gave on SO still leads to memory usage constantly increasing. Did it work for you?

@kennethreitz
Owner

Circular references are cleaned up quite easily by Python itself, and are nothing to be concerned about.

@stanim
stanim commented Aug 21, 2012

@kennethreitz
You're right. Circular references only introduce memory leaks if they have a custom __del__ method.

@bcoughlan

Hi, noticed very high memory usage that my profiler said is coming from grequests.imap(). Could this have anything to do with the line 49 self.session = Session()? As there are no other references to self.session outside of AsyncResult, a new Session is created for each request, which I think means that every AsyncRequest is managing a new connection pool to the server?

I don't fully understand how requests manages connection pools, but there is a close() method on Session that never gets called, which could be the source of the problem.

@reclosedev

a new Session is created for each request, which I think means that every AsyncRequest is managing a new connection pool to the server?

Right, but you can pass session kwarg to get/post/etc, to make them reuse same session:

import grequests
import requests

s = requests.Session()
reqs = (grequests.get(..., session=s) for ...)
...

Can you profile memory usage with single session?

If session is a leak's reason, I think solution will be to move session creation to send(), and not to store it as attriubute

class AsyncRequest(object):
    def __init__(self, method, url, **kwargs):
        ...
        self.session = kwargs.pop('session', None)
        ...
    def send(self, **kwargs):
        ...
        session = self.session or Session()
        self.response =  session.request(...)

So session.close() will be called on function exit, only if it's newly created session.

@bcoughlan

I figured out the problem.

I am using the callback functions of requests so I don't need the imap result. I converted it to a list to evaluate it figuring Python would be smart enough not to store it, but memory usage exploded because it holds on to the Response objects.

python -m memory_profiler test.py

Line #    Mem usage    Increment   Line Contents
================================================
     4                             @profile
     5    13.316 MB     0.000 MB   def test():
     6    13.316 MB     0.000 MB       r = (grequests.get('http://localhost:8002') for i in xrange(3000))
     7    93.062 MB    79.746 MB       list(grequests.imap( r, size=100))

When I change that to a generator, memory usage is much lower:

Line #    Mem usage    Increment   Line Contents
================================================
     4                             @profile
     5    13.312 MB     0.000 MB   def test():
     6    13.312 MB     0.000 MB       r = (grequests.get('http://localhost:8002') for i in xrange(3000))
     7    22.426 MB     9.113 MB       for _ in grequests.imap( r, size=100):
     8                                     pass

After reading more requests/urllib3 docs I think the grequests code is fine and doesn't need to call close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment