Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadPoolExecutor resource cleanup? #20

Closed
boboli opened this issue Dec 11, 2014 · 7 comments
Closed

ThreadPoolExecutor resource cleanup? #20

boboli opened this issue Dec 11, 2014 · 7 comments

Comments

@boboli
Copy link

boboli commented Dec 11, 2014

When using FuturesSession for a long-running web scraper script, I've noticed a memory leak due to the fact that I wasn't cleaning up the ThreadPoolExecutors that were created by the many FuturesSession(max_workers=blah) calls I was making.

I fixed the issue by writing a contextmanager that cleaned up my executor when exiting:

@contextmanager
def clean_futures_session_when_done(session):
    try:
        yield
    finally:
        if session.executor:
            session.executor.shutdown()

with clean_futures_session_when_done(FuturesSession(max_workers=2)):
    do_stuff()

This feels a bit slimy since I'm using the internal(?) self.executor reference. I also realize that the shutdown() will block until all Futures are done, but I feel this is acceptable for many use cases.

An alternative I've considered is having FuturesSession implement the context manager protocol with __enter__() and __exit__() so we can directly use it in a with statement. This would be similar to how open() works:

class FuturesSessionWithCleanup(FuturesSession):
    def __enter__(self):
        return self

    def __exit__(self, type, value, traceback):
        self.executor.shutdown()

with FuturesSessionWithCleanup(max_workers=2):
    do_stuff()
# block until all Futures are cleaned

Does this sound reasonable?

@ross
Copy link
Owner

ross commented Dec 11, 2014

interesting. that's not a use-case i've come across myself so hadn't though about addressing.

is there a specific reason you're creating a bunch of sessions and not a single long-lived FuturesSession to be shared across time. unless each one needs to be its own distinct session (cookies etc.) then you would probably be better off creating a single FuturesSession with a larger number of max_workers and just let it live for the life of the script.

i'm not completely opposed to FuturesSession implementing the context manager protocol, just want to make sure that it's needed first.

@boboli
Copy link
Author

boboli commented Dec 12, 2014

I think cleaning up the thread pool resources is akin to calling .close() on file objects when we're done with them. And open() follows the context manager pattern to give you a convenient wrapper that automatically calls the .close(), so that's where I got the idea from.

I agree that for my script it's better to use just a single FuturesSession, but I feel it's good practice to clean up resources regardless.

@ross
Copy link
Owner

ross commented Dec 15, 2014

feel free to pr that change to FuturesSession and ideally provide an example in the README. i assume the example should catch and use the session.

with FuturesSession(max_workers=2) as session:
    session...

ideally there'd be some sort of unit testing of the functionality. perhaps there's a way to tell if the executor has been shutdown correctly.

@perpetual-hydrofoil
Copy link

+1. Seems useful when chunking large numbers of requests (I'm doing 5000 per FuturesSession)

@ross
Copy link
Owner

ross commented Feb 2, 2015

happy to accept patches w/tests. otherwise i'll try and get to it in an upcoming weekend.

@boboli
Copy link
Author

boboli commented Feb 3, 2015

Heh I was dragging my feet on the PR because of the difficulty of writing a proper unit test. I've investigated the concurrent.futures module, and there's only 2 ways I can think of to determine if the executor has been shutdown:

  1. Inspect executor._shutdown which is a private field on ThreadPoolExecutor (https://hg.python.org/cpython/file/3.2/Lib/concurrent/futures/thread.py#l125). Feels really icky to rely on private API.
  2. Rely on the documented fact that a RuntimeError will be raised if we try to use the ThreadPoolExectuor again: (https://docs.python.org/3.2/library/concurrent.futures.html#concurrent.futures.Executor.shutdown): "Calls to Executor.submit() and Executor.map() made after shutdown will raise RuntimeError."

Option 2 sounds slightly more proper but still icky in that it's not directly asserting what we intended, but a side effect.

Lemme know which option sounds better and I can try to do a PR with it.

@ross
Copy link
Owner

ross commented Feb 3, 2015

another option might be to monkey patch executor.shutdown in the unit test and replace it with something that sets a flag and calls the original.

or slightly cleaner, inherit from FuturesSession and override exit and set a flag that can be checked there.

definitely a tough thing to test, that it shut down as designed. i guess the most important part to test is that the object functions in the with context correctly. that it calls exit is nice to test, but not critical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants