Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the existing runloop for sync calls in async handler #179

Closed
bennovakovic opened this issue Jul 31, 2019 · 21 comments · Fixed by #534
Closed

Use the existing runloop for sync calls in async handler #179

bennovakovic opened this issue Jul 31, 2019 · 21 comments · Fixed by #534
Labels
concurrency Issues related to concurrency and usage of async libraries

Comments

@bennovakovic
Copy link

Hi, this library looks awesome and I really want to use it; but I seem to have hit a blocker and can't seem to find a solution in the docs.

TL;DR: Is there any way to provide our own runloop when using the sync calls as looking through the source the sync calls are actually using the async request call.

Basically I'm writing a client api that provides 2 interfaces; an async and sync interface.

The issue that i am having is that if I try to use the sync methods with an existing runloop (provided by Sanic in this scenario), it complains with the exception: Cannot run the event loop while another loop is running.

Essentially I'm just calling:

import httpx
...
class SomeClass(object):
    def get_some_url(self):
        r = httpx.get('some url')

note that nothing here is async; apart from the fact that the .get_some_url call is made from within an async function like so:

import SomeClass

async def handler(request):
    o = SomeClass()
    o.get_some_url()

This works fine when using the async calls like this:

import httpx
...
class SomeClass(object):
    async def get_some_url(self):
        client = httpx.AsyncClient()
        r = await client.get('some url')
import SomeClass

async def handler(request):
    o = SomeClass()
    await o.get_some_url()

So I was just wondering if its possible to provide our own runloop when using the sync calls? Sanic uses uvloop internally, would be nice to be able to just pass that straight in somehow when constructing the httpx object, or setting it as a property after import.

Thanks!

@bennovakovic
Copy link
Author

FWIW; I tried the following in a vein hope that it would do what I wanted:

import httpx
client = httpx.Client(backend=<my app uvloop>)

But alas, no success.. this the end of the stacktrace.

   self.client = httpx.Client(backend=kwargs.get('runloop'))
  File "/usr/local/lib/python3.6/site-packages/httpx/client.py", line 85, in __init__
    backend=backend,
  File "/usr/local/lib/python3.6/site-packages/httpx/dispatch/connection_pool.py", line 97, in __init__
    self.max_connections = self.backend.get_semaphore(pool_limits)
AttributeError: 'Loop' object has no attribute 'get_semaphore'

@bennovakovic
Copy link
Author

In the meantime; I'll just use requests for the sync stuff, it feels odd passing in a runloop when I'm doing sync calls; purely because internally you're using async under the hood even for sync requests, or am I mistaken?

request = AsyncRequest(

@cansarigol
Copy link
Contributor

FWIW; I tried the following in a vein hope that it would do what I wanted:

import httpx
client = httpx.Client(backend=<my app uvloop>)

But alas, no success.. this the end of the stacktrace.

   self.client = httpx.Client(backend=kwargs.get('runloop'))
  File "/usr/local/lib/python3.6/site-packages/httpx/client.py", line 85, in __init__
    backend=backend,
  File "/usr/local/lib/python3.6/site-packages/httpx/dispatch/connection_pool.py", line 97, in __init__
    self.max_connections = self.backend.get_semaphore(pool_limits)
AttributeError: 'Loop' object has no attribute 'get_semaphore'

Hi to use backend parameter, it has to be derived by ConcurrencyBackend.

@bennovakovic
Copy link
Author

Yeah, I figured that after I got the exception. I figured it would handle a standard runloop.

@cansarigol
Copy link
Contributor

you can check sample class from

class MockHTTP2Backend(AsyncioBackend):

@bennovakovic
Copy link
Author

Yeah, unfortunately I can't modify my existing runloop to meet the requirements of the backend parameter; it is given to me by sanic directly.

@cansarigol
Copy link
Contributor

Maybe runloop would be added with a loop params rather backend. I'd want to :)

@yeraydiazdiaz
Copy link
Contributor

There is no API to provide the event loop yourself, however there is a loop property in the AsyncioConcurrencyBackend that checks an existing private _loop instance variable you can potentially set to the Sanic event loop.

We should consider adding this as a feature though.

@yeraydiazdiaz
Copy link
Contributor

yeraydiazdiaz commented Jul 31, 2019

I'm not sure about the context you're working with but there's some subtleties around using the sync API in an async application.

The sync API internally uses the loop's run_in_executor which will create a ThreadPoolExecutor and run your request there which might hinder performance or just be unnecessary. If possible I'd suggest staying in async land and use the AsyncClient, but, again, it depends on your context.

@tomchristie
Copy link
Member

The issue that i am having is that if I try to use the sync methods with an existing runloop (provided by Sanic in this scenario), it complains with the exception: Cannot run the event loop while another loop is running.

Okay, that's a problematic thing to be doing.

If you're running in the context of an event loop, but then making blocking style network calls you're going to block the event loop. You don't want to be doing that.

We've got a relevant issue here about how we describe that most clearly in any raised exception, but it's a fundamental property of the differing concurrency model, rather than a bug (at least as I read it here)

@tomchristie
Copy link
Member

I'd also echo what @yeraydiazdiaz's comment: if you're running in an async context, then you need to be using the async client.

(Tho it might be useful to understand why you're not doing that, in order to provide better support for other users who try the same thing)

@bennovakovic
Copy link
Author

@yeraydiazdiaz Yeah I figured; I was just trying to make the api client library idiot proof; and also give them the option to use a sync client even in an async application (for whatever reason that might be). I totally agree; staying in async land is definitely where I want to be too; but I can't assume that everyone is in async land; in fact my experience is that most aren't in the python world, unfortunately.

@bennovakovic
Copy link
Author

bennovakovic commented Jul 31, 2019

@tomchristie yeah I completely agree & understand; it was more a question of is it possible rather than what am I doing wrong. I understand how its working; but was seeing if the library could behave the way I was hoping based on the way I'm building out the api client library. No big deal though; I can just revert to the requests library for the sync client, and httpx for the async.
FWIW; this is how the user would create an instance of the client library:

client = MyApiClient.create('v1.async', apikey='foo')
# or 
client = MyApiClient.create('v1', apikey='foo')

Then any following calls would either need to be using await if its the async api version, or just regular method calls in the sync scenario.

@tomchristie
Copy link
Member

No big deal though; I can just revert to the requests library for the sync client, and httpx for the async.

That's not really the right resolution here.

httpx provides a sync client, for sync usage, and an async client, for async usage, so you have both cases covered.

The error you're seeing only occurs when you try to call into the sync interface from within an async context which a broken thing to do for your users, regardless of if that's implemented in requests, httpx or whatever. (*)

There are technical ways we could make it not error in httpx and carry on regardless, but it's far more sensible that we error loudly. I think it'd be worth us wrapping the Cannot run the event loop while another loop is running exception in our own exception class so that we can point to a relevant part in the async documentation explaining why that's not a valid thing to do.

(*) Unless you're switching over to sync portions of the codebase within a seperate thread, in which case httpx wouldn't raise the error you're seeing.

@tomchristie
Copy link
Member

I was just trying to make the api client library idiot proof; and also give them the option to use a sync client even in an async application (for whatever reason that might be)

I guess this is the core bit. Making the library robust here ought to mean raising a nice clean, explanatory error when it's used in a broken way, rather than allowing that to pass silently.

@pquentin
Copy link
Contributor

pquentin commented Aug 1, 2019

I guess this is the core bit. Making the library robust here ought to mean raising a nice clean, explanatory error when it's used in a broken way, rather than allowing that to pass silently.

Exactly! Trying to make an API both sync and async is an attractive nuisance. It does not even help end users who write applications.

On a related note, here's one way for httpx to avoid using asyncio and its event loop (and avoid this error):

  1. Support a synchronous backend that uses standard blocking sockets
  2. Use this trivial loop runner

But I don't know if it's worth it or not.

@florimondmanca
Copy link
Member

Hey, seems like this issue has drifted away. :) Is there anything actionable we could do about it?

  • Adding a warning in the docs against using the sync API in an async context?
  • Trying to detect when the user is doing that and raise an explanatory error?

I’m in an issue closing mood today, and I think this one could just be considered as a constraint and documented as such (ie option 1).

@tomchristie
Copy link
Member

I'd say documented constraint.

There's a million and one other broken blocking operations that a user could do, that won't autodetect. Unless there ends up being some backed in support in the stdlib guarding against this (unlikely) then this really just comes down to "if you're using async you need to be attentive to avoiding blocking operations".

@iluxonchik
Copy link

iluxonchik commented Nov 17, 2019

I've just encountered exactly the same issue. In my understanding, there are cases where it makes sense to make sync calls inside a coroutine. Let's say that you have a command-line tool that reads a JSON file from a url containing a website name: website url key-value pairs. In your command line tool you want to go through each one of the urls in that JSON file and parse some information from its HTML. Thus, you will have a code similar to this:

async def parse_info_from_urls(name, url, client):
    # this part of the code uses async httpx requests
    # the obtained HTML parsing is done here

async def main(url):
    # this part is sync and making it async would only add overhead
    json_res = https.get(url).json()
    name_url_list = [(name, url) for name, url in json_res.items()]
    async with httpx.AsyncClient() as client:
        res = await asyncio.gather(*(parse_info_from_urls(elem[0], elem[1], client) for elem in name_url_list)

if __name__ == '__main__':
    asyncio.run(main('http://some-url-to-the-json-file.com'))

Of course I can make the sync requests outside of main(), but that would make the code dirtier.
I would suggest doing one of the following:

  1. Add support to specify custom runloop
  2. Use a different runloop in the sync case
  3. Keeping everything as it is, but documenting the behaviour and providing more explicit error messages

@florimondmanca
Copy link
Member

@iluxonchik

# this part is sync and making it async would only add overhead

I think your comment is mistaken here, for two reasons:

  1. The only sensible thing we could do to support the sync call would be to spawn a new thread (since you can't just have multiple event loops running in the same thread). So, that would definitely be worse overhead than scheduling a coroutine on the current event loop.
  2. If there was a way to make the sync call without using the event loop (e.g. using a sync backend, which was rejected in Add SyncBackend and use it on Client #525), the sync call would block the event loop completely, which is almost always not what you want.

In your case, it seems you have to use async (as most users do when they're using HTTPX in an async context), e.g. by wrapping all the code inside the async with AsyncClient() statement. Running coroutines is way cheaper than you might think. :-)


I think this issue is a duplicate of #508 (or vice versa). I'll go ahead a push a PR that sync calls are not supported in an async environment; as discussed earlier this should at least address this PR.

@florimondmanca florimondmanca added the concurrency Issues related to concurrency and usage of async libraries label Nov 17, 2019
@iluxonchik
Copy link

iluxonchik commented Nov 17, 2019

@florimondmanca

the sync call would block the event loop completely, which is almost always not what you want

I completely agree, however, in the case of a simple script that needs to get some information from a url, before using that information to distribute it across multiple coroutines, blocking the event loop does no harm at all. In my example, the sync call happens when there is only one coroutine running on the event loop, so it makes no difference whether I make blocking calls or not.

Running coroutines is way cheaper than you might think. :-)

Oh trust me, I'm not worried about that overhead, which, when you account for the I/O stuff (network requests), is most likely will not going to be noticeable ever.

Just to be clear, I'm not alleging that you should break the philosophy of the library, because it makes sense so for a limited number of cases (although in web scraping that could be a common scenario). I'm just saying that it should bee made explicit 😄


For me, the issue was assuming that httpx.AsyncClient() is equivalent to aiohttp.Session(), which it is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
concurrency Issues related to concurrency and usage of async libraries
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants