ThreadPoolExecutor resource cleanup? #20

boboli · 2014-12-11T00:41:03Z

When using FuturesSession for a long-running web scraper script, I've noticed a memory leak due to the fact that I wasn't cleaning up the ThreadPoolExecutors that were created by the many FuturesSession(max_workers=blah) calls I was making.

I fixed the issue by writing a contextmanager that cleaned up my executor when exiting:

@contextmanager
def clean_futures_session_when_done(session):
    try:
        yield
    finally:
        if session.executor:
            session.executor.shutdown()

with clean_futures_session_when_done(FuturesSession(max_workers=2)):
    do_stuff()

This feels a bit slimy since I'm using the internal(?) self.executor reference. I also realize that the shutdown() will block until all Futures are done, but I feel this is acceptable for many use cases.

An alternative I've considered is having FuturesSession implement the context manager protocol with __enter__() and __exit__() so we can directly use it in a with statement. This would be similar to how open() works:

class FuturesSessionWithCleanup(FuturesSession):
    def __enter__(self):
        return self

    def __exit__(self, type, value, traceback):
        self.executor.shutdown()

with FuturesSessionWithCleanup(max_workers=2):
    do_stuff()
# block until all Futures are cleaned

Does this sound reasonable?

The text was updated successfully, but these errors were encountered:

ross · 2014-12-11T19:58:11Z

interesting. that's not a use-case i've come across myself so hadn't though about addressing.

is there a specific reason you're creating a bunch of sessions and not a single long-lived FuturesSession to be shared across time. unless each one needs to be its own distinct session (cookies etc.) then you would probably be better off creating a single FuturesSession with a larger number of max_workers and just let it live for the life of the script.

i'm not completely opposed to FuturesSession implementing the context manager protocol, just want to make sure that it's needed first.

boboli · 2014-12-12T14:21:26Z

I think cleaning up the thread pool resources is akin to calling .close() on file objects when we're done with them. And open() follows the context manager pattern to give you a convenient wrapper that automatically calls the .close(), so that's where I got the idea from.

I agree that for my script it's better to use just a single FuturesSession, but I feel it's good practice to clean up resources regardless.

ross · 2014-12-15T15:21:15Z

feel free to pr that change to FuturesSession and ideally provide an example in the README. i assume the example should catch and use the session.

with FuturesSession(max_workers=2) as session:
    session...

ideally there'd be some sort of unit testing of the functionality. perhaps there's a way to tell if the executor has been shutdown correctly.

perpetual-hydrofoil · 2015-02-01T21:57:02Z

+1. Seems useful when chunking large numbers of requests (I'm doing 5000 per FuturesSession)

ross · 2015-02-02T16:02:05Z

happy to accept patches w/tests. otherwise i'll try and get to it in an upcoming weekend.

boboli · 2015-02-03T21:16:28Z

Heh I was dragging my feet on the PR because of the difficulty of writing a proper unit test. I've investigated the concurrent.futures module, and there's only 2 ways I can think of to determine if the executor has been shutdown:

Inspect executor._shutdown which is a private field on ThreadPoolExecutor (https://hg.python.org/cpython/file/3.2/Lib/concurrent/futures/thread.py#l125). Feels really icky to rely on private API.
Rely on the documented fact that a RuntimeError will be raised if we try to use the ThreadPoolExectuor again: (https://docs.python.org/3.2/library/concurrent.futures.html#concurrent.futures.Executor.shutdown): "Calls to Executor.submit() and Executor.map() made after shutdown will raise RuntimeError."

Option 2 sounds slightly more proper but still icky in that it's not directly asserting what we intended, but a side effect.

Lemme know which option sounds better and I can try to do a PR with it.

ross · 2015-02-03T21:26:27Z

another option might be to monkey patch executor.shutdown in the unit test and replace it with something that sets a flag and calls the original.

or slightly cleaner, inherit from FuturesSession and override exit and set a flag that can be checked there.

definitely a tough thing to test, that it shut down as designed. i guess the most important part to test is that the object functions in the with context correctly. that it calls exit is nice to test, but not critical.

ross closed this as completed in da19a46 Feb 20, 2015

KostyaEsmukov mentioned this issue Feb 11, 2016

Upload a new version to the pypi #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThreadPoolExecutor resource cleanup? #20

ThreadPoolExecutor resource cleanup? #20

boboli commented Dec 11, 2014

ross commented Dec 11, 2014

boboli commented Dec 12, 2014

ross commented Dec 15, 2014

perpetual-hydrofoil commented Feb 1, 2015

ross commented Feb 2, 2015

boboli commented Feb 3, 2015

ross commented Feb 3, 2015

ThreadPoolExecutor resource cleanup? #20

ThreadPoolExecutor resource cleanup? #20

Comments

boboli commented Dec 11, 2014

ross commented Dec 11, 2014

boboli commented Dec 12, 2014

ross commented Dec 15, 2014

perpetual-hydrofoil commented Feb 1, 2015

ross commented Feb 2, 2015

boboli commented Feb 3, 2015

ross commented Feb 3, 2015