Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-32309: Implement asyncio.ThreadPool #18410

Conversation

aeros
Copy link
Contributor

@aeros aeros commented Feb 8, 2020

Implements asyncio.ThreadPool, a high-level asynchronous context manager for concurrently running IO-bound functions without blocking the event loop. The initial implementation relies significantly upon loop.run_in_executor(), but the long-term goal is to eventually use a fully native asyncio threadpool (which doesn't rely on concurrent.futures.ThreadPoolExecutor). See the bpo issue for more details.

https://bugs.python.org/issue32309

Note: Disregard the name of the branch. It has all of the intended changes, but it seems that I accidentally added the changes on top of branch from my recently merged PR (#18057). This has no consequence though other than a misleading branch name, so I'll leave it as is.

self.loop.run_until_complete(pool_cm())


class ThreadPoolTests(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the tests benefit from IsolatedAsyncioTestCase? I guess it would remove run_until_complete calls along with using async def main . Doc : https://docs.python.org/3/library/unittest.html#unittest.IsolatedAsyncioTestCase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that seems like it would be quite useful for reducing a decent chunk of the boilerplate. I'll test it locally when I get the chance later, assuming @1st1 and @asvetlov are on board with using IsolatedAsyncioTestCase for these tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my local experimentation so far, it seems be to working as intended and does provide a bit of a readability benefit in terms to reducing white noise. Just waiting on a +1 from @1st1 since this would be the first CPython regression test to use IsolatedAsyncioTestCase (outside of unittest of course).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I'd like to consider using IsolatedAsyncioTestCase in a separate PR, after the initial version is implemented since we're a bit short on time with the upcoming feature deadline for 3.9 beta.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be interested to see the non-wasteful native implementation when you'll get to it :-)

Doc/library/asyncio-pools.rst Outdated Show resolved Hide resolved
Lib/asyncio/pools.py Show resolved Hide resolved
Lib/asyncio/pools.py Show resolved Hide resolved
@aeros
Copy link
Contributor Author

aeros commented Feb 27, 2020

I'll be interested to see the non-wasteful native implementation when you'll get to it :-)

I'll definitely let you know when there's a working native implementation then! That's probably the part that I'm most looking forward to, but I do think it's important to have an initial stable version to work with. :)

In the meantime, this has the benefit of being more convenient to use than loop.run_in_executor(), in addition to being self-contained (instead of associated with the lifespan of the event loop).

Use abc.ABC and abc.abstractmethod to prevent direct instantiation
of AbstractPool.
@tmewett
Copy link

tmewett commented Apr 30, 2020

First, thanks a lot for this PR! I was surprised when I found run_in_executor was part of only the "low-level" API.

If I may make a suggestion, I would say that the use of "IO-bound" throughout the documentation here is not particularly appropriate; long-running, CPU-bound routines will also block the event loop and so are also well-suited to running in a separate thread. Perhaps "blocking" is a more suitable general term?

[Disclaimer: not familiar with any Python style, this is my opinion] More broadly, I think from a style POV, the documentation should avoid prescribing a use case for this feature, and instead describe its general functionality with reference to common use cases for it. E.g. the docs for ThreadPool say

An asynchronous threadpool that provides methods to concurrently run IO-bound functions, without blocking the event loop.

Even changed to "blocking," this still seems too prescriptive IMO. Maybe there are other reasons users would want to await threads? I don't know.

@tmewett
Copy link

tmewett commented Apr 30, 2020

I soon as I commented that I remembered the global interpreter lock. I have no idea how that interacts with async code or executors. Please ignore my comment where it is inaccurate!

@aeros
Copy link
Contributor Author

aeros commented Apr 30, 2020

@tmewett

First, thanks a lot for this PR! I was surprised when I found run_in_executor was part of only the "low-level" API.

No problem. :-)

The distinction between "high-level" and "low-level" can be a bit subjective, but in our case the goal is for users to be able to accomplish most things in asyncio with a "high-level" API that doesn't require direct interaction with the event loop. It adds some unnecessary boilerplate and complexity that can be easily misused (especially with regards to the cleanup process and sharing objects between event loops).

The "low-level" API will remain in place for asyncio-based libraries, but eventually users should not have to use it much at all. It may take some time though for the ecosystem to adopt the model, and some parts are still a work in progress.

If you're specifically curious about which parts of asyncio are defined as high-level vs low-level, we have a couple of dedicated sections in the documentation:

high-level index
low-level index

If I may make a suggestion, I would say that the use of "IO-bound" throughout the documentation here is not particularly appropriate; long-running, CPU-bound routines will also block the event loop and so are also well-suited to running in a separate thread.

I soon as I commented that I remembered the global interpreter lock. I have no idea how that interacts with async code or executors. Please ignore my comment where it is inaccurate!

Due to the GIL, a thread pool would not be especially suitable for CPU-bound operations; it would effectively block the event loop. Instead, in order to avoid blocking the event loop, one would have to use a process pool in order to circumvent the GIL, since each process has its own separate GIL (similar to how you could pass an instance of concurrent.futures.ProcessPoolExecutor to run_in_executor).

In the bpo issue, I mentioned that we might want to eventually consider an asyncio.ProcessPool. However, asyncio.ThreadPool has a stronger specific use case for asyncio, and a primary goal is to eventually build an entirely asyncio native version with the same API, instead of relying upon concurrent.futures (which adds some overhead). It could be possible to do this for a process pool as well, but a thread pool would be significantly more simple to start with.

@tmewett
Copy link

tmewett commented May 4, 2020

Of course, the high- / low-level distinction seems a good idea to me, no qualms about that.

I suppose it's still possible an external library routine could do some number crunching with the GIL unlocked. All I was saying was that I think we should slightly re-word the docs as to not prescribe the thread pool to just IO-bound things. I can suggest some changes so you can see what I mean?

@aeros
Copy link
Contributor Author

aeros commented May 5, 2020

All I was saying was that I think we should slightly re-word the docs as to not prescribe the thread pool to just IO-bound things. I can suggest some changes so you can see what I mean?

The main reason why the current documentation for asyncio.ThreadPool is focused around IO-bound operations is because that's the current primary use case for thread pools in asyncio, and I want to clearly communicate that to users in a way that illuminates the real-world use cases as effectively as possible.

It also nicely paves the way for asyncio.ProcessPool in the future: for IO-bound operations, use asyncio.ThreadPool; for CPU-bound use asyncio.ProcessPool.

That being said, I would be glad to consider any changes to the wording in the docs to make it a bit less prescriptive, as long as it's still clear what the primary use cases are. To some degree, I think the examples can help communicate that, but I want to at least mention it in the docs. It's a fine balance between making the use cases obvious enough while not limiting its usefulness in other areas; I can see that.

@tmewett
Copy link

tmewett commented May 6, 2020

I completely agree with your views on the use case. I think all I mean is that, technically the thread pool's typical use for only IO-bound work is a CPython implementation detail. It's a point of consistency: just like the 'threading' module discusses threads in general, and mentions the GIL as an implementation detail, I think these docs should too.

E.g. the ThreadPool docs could change from

   An asynchronous threadpool that provides methods to concurrently
   run IO-bound functions, without blocking the event loop.

to something like

   An asynchronous threadpool that provides methods to concurrently
   run functions in separate threads, preventing the event loop from being
   blocked.

   .. impl-detail::

      Due to the Global Interpreter Lock, only one thread can execute Python
      code at a time. This makes ThreadPool mainly suitable for I/O-bound
      functions or third-party library functions which release the GIL.

If you see what I mean? [I will leave the formatting decisions to you :)]

I will also leave some other thoughts now, let me know if they are not welcome.

@aeros
Copy link
Contributor Author

aeros commented May 6, 2020

If you see what I mean? [I will leave the formatting decisions to you :)]

Yeah, I quite like the overall idea of the suggestion. In general, it's a good goal for the documentation to be less specific to CPython as much as reasonably possible, without sacrificing readability and understanding for the majority of the audience.

I'll have to go over the specific wording of it, but I like the separation of a more general Python language definition and a CPython implementation specific part that mentions the main use case in IO-bound operations. I'll likely include some links to other parts of the documentation when mentioning the GIL, so that readers can have somewhere to go for more information without going off-topic.

Thanks for the detailed feedback and suggestions.

@1st1 As the current primary author and maintainer of the asyncio docs, would something along the lines of what @tmewett suggested would be appropriate, specifically the separation between language definition and CPython details? Although I'm authoring this specific section and +1 on the idea, I'd like to ensure it fits well with the overall theme of the existing documentation.

Co-authored-by: Tom M <tom@collider.in>
Doc/library/asyncio-pools.rst Outdated Show resolved Hide resolved
Doc/library/asyncio-pools.rst Outdated Show resolved Hide resolved
Lib/asyncio/pools.py Outdated Show resolved Hide resolved
@njsmith
Copy link
Contributor

njsmith commented May 14, 2020

Yes... I was active in that bpo. Was there something I missed?

No, I meant, I wasn't in the discussion and I tried to skim it but didn't end up with a clear picture of where things ended up.

I wanted to know if you have something like this in Trio.

Yeah. The closest equivalent is trio.to_thread.run_sync. The reason for the to_thread submodule is to allow for extensions like async for obj in trio.to_thread.iter(blocking_iter): ..., and for parallelism with the trio.from_thread.* functions. The from_thread functions are used to call async code from inside a thread, and there's some magic to make it so if you're in a thread spawned by trio.to_thread, then trio.from_thread can automatically find the originating event loop, and propagate cancellation.

Probably the most unusual design choice is that we don't have any explicit "thread pool" concept. Classically, thread pools are solving two separate problems at once: (1) reusing threads to amortize startup costs (i.e. basically acting like a cache for thread objects), (2) putting a policy on the total number of threads, to avoid overwhelming the system. We break these responsibilities apart, so there's a "thread cache" that amortizes startup + trio.to_thread.run_sync takes some kind of semaphore to define policy (defaulting to a global default semaphore).

This is nice because:

  • using a global semaphore by default means that the limit is applied globally, so if you have lots of independent modules submitting small bounded jobs (getaddrinfo calls, etc.), then they all share the same limit. If each module creates its own thread pool, then your actual thread limit ends up being (intended thread limit * number of modules).

  • if you need a custom thread limit policy (e.g. because you have long-running thread jobs and are worried that a global limit will cause deadlocks, or just because you want something fancy like a per-user limit), then you can write that easily using regular Trio tools, without having to write scary multi-thread synchronization code

  • All modules share the same thread cache, even if they're using different limiting policies

There's more detail here:

https://trio.readthedocs.io/en/stable/reference-core.html#threads-if-you-must

(Some limitations of the current implementation: (a) our thread cache is currently trivial and doesn't actually re-use threads; it turns out that re-using threads is actually a pretty minor optimization, but we'll implement this eventually. See python-trio/trio#6. (b) to_thread and from_thread don't have wrappers for context managers or iterators yet. (c) we don't actually propagate cancellation through to_threadfrom_thread yet. But we'll get there :-))

@1st1
Copy link
Member

1st1 commented May 15, 2020

Oh my, I actually like Trio design a lot. Huge thanks, Nathaniel, for sharing this.

@asvetlov @cjerdonek @elprans I'm interested to hear your opinion.

I think that a simple ThreadPool implementation is infinitely better than what we have now with loop.run_in_executor and given that asyncio already has a bunch low-level APIs having this one in 3.9 wouldn't hurt. For some use cases, having fine grained control over the size of the pool can be preferred.

What I like a lot about Trio's design is from_thread -- this is typically painful and requires non-trivial setup with the current run_in_executor and the proposed new asyncio.ThreadPool. Global threads autoscaling also makes sense, because picking the right pool size is hard and I bet 99% of the code either reserves too many or too few threads.

Should we go ahead with this PR and consider implementing a similar to Trio API in 3.10?

@aeros
Copy link
Contributor Author

aeros commented May 15, 2020

Global threads autoscaling also makes sense, because picking the right pool size is hard and I bet 99% of the code either reserves too many or too few threads.

For the eventual full native implementation of asyncio.ThreadPool, we may even want to consider providing an option for autoscaling the number of threads used as a constructor parameter. I've found that spawning threads on-demand (up to max_workers) is quite appealing for ThreadPoolExecutor and suits most general use cases, but I can imagine some where the user may desire more fine-grained control and want all of the threads to be spawned immediately.

That's still significantly different from the global autoscaling w/ threads in Trio, but I think that autoscaling in general is something we may want to consider. In the bpo issue, I believe we were considering that the native implementation of asyncio.ThreadPool would spawn all of the threads up to concurrency immediately.

@njsmith
Copy link
Contributor

njsmith commented May 15, 2020

Thread spawning is very cheap. Pulling a pre-existing thread out of a cache is even cheaper, but the difference is like 100 µs versus 30 µs or something. It's still probably worth having a cache, but it's only important for code that's pushing lots and lots of cheap operations into threads (e.g. heavy disk I/O where most data is in the cache, but occasionally something isn't, so you have to use a thread just-in-case). And in that regime, your cache is always warm, so pre-spawning doesn't really make a difference.

@cjerdonek
Copy link
Member

@asvetlov @cjerdonek @elprans I'm interested to hear your opinion.

@1st1 I think the dual approach you described sounds great. Good work on this, @aeros. And as usual, always good to hear what @njsmith is thinking.

@methane
Copy link
Member

methane commented May 15, 2020

I love trio's design. It is very common that there is one or two blocking operation.
Having a thread support before threadpool makes a lot sense.

I am worrying about current asyncio.ThreadPool design is misleading for some users.
Users may use the pool only for running one blocking task:

def called_very_often():
    with ThreadPool(...) as pool:
        pool.run(blocking_work, args)

Creating and shutting down thread pool is more expensive than a thread.

@tmewett
Copy link

tmewett commented May 15, 2020

I love trio's design. It is very common that there is one or two blocking operation.
Having a thread support before threadpool makes a lot sense.

I am worrying about current asyncio.ThreadPool design is misleading for some users.
Users may use the pool only for running one blocking task:

I thought this too - having a threads only available inside the context of a managed pool doesn't seem consistent with Python's other APIs. Indeed, when i started learning asyncio I saw that you could await individual subprocesses, and naturally started looking for how I could await threads. So I think a more direct replacement of run_in_executor, for one function inside (effectively) one-off thread, would be a great eventual feature. But also I see no reason it should delay this one

@1st1
Copy link
Member

1st1 commented May 15, 2020

I am worrying about current asyncio.ThreadPool design is misleading for some users.
Users may use the pool only for running one blocking task:

Yeah.

@aeros @cjerdonek @methane Maybe we should add asyncio.run_in_thread() function in 3.9:

async def run_in_thread(func, /, *args, **kwargs):
  loop = asyncio.get_running_loop()
  return await loop.run_in_executor(None, *args, **kwargs)

And for now we can get away without implementing threads auto scaling etc. As soon as this is merged we can start working on auto scaling for 3.10?

I have a feeling that auto-scalable and simple run_in_thread is what the majority of asyncio users actually need (including myself). The proposed (by me initially) asyncio.ThreadPool can live on pypi. Thoughts?

@njsmith
Copy link
Contributor

njsmith commented May 15, 2020

Note that Trio's version was originally called run_sync_in_thread, but we had users confused about whether in_thread meant that this is a function that switches in to a thread, or a function to use when you're already in a thread.

If you want to read the bikeshedding before we settled on to_thread/from_thread, that's here: python-trio/trio#810

@aeros
Copy link
Contributor Author

aeros commented May 16, 2020

@1st1

And for now we can get away without implementing threads auto scaling etc. As soon as this is merged we can start working on auto scaling for 3.10?

In order to implement auto-scaling though eventually (or any significant differences from ThreadPoolExecutor for that matter), we are going to very likely going to need an asyncio ThreadPool of some form, whether it's public or under the hood. So while I do very much like the API of a run_in_thread(), if we want it to have any of those interesting features and no longer be dependent on the conversion to concurrent.futures.Future (within run_in_executor()), an asyncio.ThreadPool seems necessary to some degree; even if most users end up using run_in_thread(). The two are by no means mutually exclusive, so could we possibly consider including both in 3.9?

The current implementation in this PR does not provide that, but it does give us a baseline to work with testing purposes as we build one that's fully native to asyncio. My primary goal has been to have this completed by 3.10.

Regarding Inada's point of it potentially being misunderstood and used in a way that causes the thread pool to be repeatedly created, I think that's certainly worth addressing. IMO, it would be best clarified with a note in the documentation, perhaps with a link to run_in_thread() if that's added.

Also, regarding the proposed implementation above for run_in_thread(), run_in_executor() doesn't accept kwargs, so it requires functools.partial(). But that's a very minor change:

async def run_in_thread(func, /, *args, **kwargs):
  loop = asyncio.get_running_loop()
  func_call = functools.partial(func, *args, **kwargs)
  return await loop.run_in_executor(None, func_call)

@1st1
Copy link
Member

1st1 commented May 16, 2020

In order to implement auto-scaling though eventually (or any significant differences from ThreadPoolExecutor for that matter), we are going to very likely going to need an asyncio ThreadPool of some form, whether it's public or under the hood. So while I do very much like the API of a run_in_thread(), if we want it to have any of those interesting features and no longer be dependent on the conversion to concurrent.futures.Future (within run_in_executor()), an asyncio.ThreadPool seems necessary to some degree; even if most users end up using run_in_thread(). The two are by no means mutually exclusive, so could we possibly consider including both in 3.9?

asyncio almost has too many APIs now, so I'd prefer adding the fewest new APIs as possible.

The current implementation in this PR does not provide that, but it does give us a baseline to work with testing purposes as we build one that's fully native to asyncio. My primary goal has been to have this completed by 3.10.

The implementation of auto scaling will be drastically different from the current code.


Right now I'm inclined to include asyncio.to_thread() in 3.9 and start working on auto-scaling asap. Kyle, I hope you can work on that. I hope you won't be too discouraged to continue working on this if we decide to scrap this PR :(

@aeros
Copy link
Contributor Author

aeros commented May 16, 2020

@1st1

asyncio almost has too many APIs now, so I'd prefer adding the fewest new APIs as possible.

Yeah, I can certainly understand that. If we conclude that the current implementation of asyncio.ThreadPool will end up being too confusing and not provide as much practical benefit for users compared to a different approach, I'm content with closing the PR.

Right now I'm inclined to include asyncio.to_thread() in 3.9 and start working on auto-scaling asap. Kyle, I hope you can work on that.

I would certainly be interested in working on to_thread()/run_in_thread(). Although, I am somewhat concerned about the remaining time to engage in full discussion over the best name, considering that Monday (two days from now) is the cutoff for new features in 3.9. The issue linked by @njsmith certainly helps, but this seems important enough to warrant a more in-depth discussion on somewhere like discuss.python.org, in the Async-SIG category.

I hope you won't be too discouraged to continue working on this if we decide to scrap this PR :(

I'll admit that it would be somewhat disappointing with it being one of my most involved contributions so far, but I would definitely understand and not take it personally. Regardless of the eventual outcome of this PR, this is something that I very much want to continue working on. :-)

@1st1
Copy link
Member

1st1 commented May 16, 2020

I would certainly be interested in working on to_thread()/run_in_thread(). Although, I am somewhat concerned about the remaining time to engage in full discussion over the best name, considering that Monday (two days from now) is the cutoff for new features in 3.9. The issue linked by @njsmith certainly helps, but this seems important enough to warrant a more in-depth discussion on somewhere like discuss.python.org, in the Async-SIG category.

Yeah, that's my usual problem of pushing something too close to the deadline.

In this case, the asyncio.ThreadPool api we're proposing here is simply way bigger than the alternative asyncio.to_thread() function that relies on run_in_executor(). And judging by my own experience and actual use cases in asyncio code base, using to_thread would be more straightforward.

Kyle, can you open an alternative PR adding asyncio.to_thread() function?

I'll admit that it would be somewhat disappointing with it being one of my most involved contributions so far, but I would definitely understand and not take it personally. Regardless of the eventual outcome of this PR, this is something that I very much want to continue working on. :-)

I understand. Sorry. OTOH, implementing auto scaling & efficient thread management is more challenging and ultimately will be way more rewarding.

@aeros
Copy link
Contributor Author

aeros commented May 16, 2020

@1st1

Kyle, can you open an alternative PR adding asyncio.to_thread() function?

Sure, I'll work on that and close this PR. But I do have a question regarding the location:

At a glance, this seems like the function would be best placed in asyncio/tasks.py and documented in asyncio-task.rst, in its own separate sub-section just above run_coroutine_threadsafe(). Does this seem reasonable?

@aeros aeros closed this May 16, 2020
@1st1
Copy link
Member

1st1 commented May 16, 2020

At a glance, this seems like the function would be best placed in asyncio/tasks.py and documented in asyncio-task.rst, in its own separate sub-section just above run_coroutine_threadsafe(). Does this seem reasonable?

It's fine; or you can add new asyncio/threads.py internal module (as we will have more stuff there later).

@1st1
Copy link
Member

1st1 commented May 16, 2020

Lastly, we also need to look at all active deprecation warnings and act on them before 3.9. Do you have time for that or should I work on that?

@aeros
Copy link
Contributor Author

aeros commented May 16, 2020

Lastly, we also need to look at all active deprecation warnings and act on them before 3.9. Do you have time for that or should I work on that?

I'm going to be otherwise occupied tomorrow w/ college work, but I should have some time on Monday to help with that (assuming Monday isn't too late). Perhaps we could open a separate issue to group all of the deprecations that need to be transitioned into removals for 3.9.

miss-islington pushed a commit that referenced this pull request May 19, 2020
Implements `asyncio.to_thread`, a coroutine for asynchronously running IO-bound functions in a separate thread without blocking the event loop. See the discussion starting from [here](#18410 (comment)) in GH-18410 for context.

Automerge-Triggered-By: @aeros
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 19, 2020
Implements `asyncio.to_thread`, a coroutine for asynchronously running IO-bound functions in a separate thread without blocking the event loop. See the discussion starting from [here](https://github.com/python/cpython/pull/18410GH-issuecomment-628930973) in pythonGH-18410 for context.

Automerge-Triggered-By: @aeros
(cherry picked from commit cc2bbc2)

Co-authored-by: Kyle Stanley <aeros167@gmail.com>
miss-islington added a commit that referenced this pull request May 19, 2020
Implements `asyncio.to_thread`, a coroutine for asynchronously running IO-bound functions in a separate thread without blocking the event loop. See the discussion starting from [here](https://github.com/python/cpython/pull/18410GH-issuecomment-628930973) in GH-18410 for context.

Automerge-Triggered-By: @aeros
(cherry picked from commit cc2bbc2)

Co-authored-by: Kyle Stanley <aeros167@gmail.com>
arturoescaip pushed a commit to arturoescaip/cpython that referenced this pull request May 24, 2020
Implements `asyncio.to_thread`, a coroutine for asynchronously running IO-bound functions in a separate thread without blocking the event loop. See the discussion starting from [here](python#18410 (comment)) in pythonGH-18410 for context.

Automerge-Triggered-By: @aeros
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants