Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Sync. Done right. #572

Closed
tomchristie opened this issue Nov 30, 2019 · 36 comments · Fixed by #735
Closed

Supporting Sync. Done right. #572

tomchristie opened this issue Nov 30, 2019 · 36 comments · Fixed by #735
Labels
enhancement New feature or request

Comments

@tomchristie
Copy link
Member

tomchristie commented Nov 30, 2019

So, with the 0.8 release we dropped our sync interface completely.

The approach of bridging a sync interface onto an underlying async one, had some issues, in particular:

  • Have Sync and Async variant classes both subclassing Base classes was leading to poorer, less comprehensible code.
  • There were niggly issues, such as being unable to run in some environments where top-level async support was provided. (Eg. failing under Jupyter)

The good news is that I think the unasync approach that the hip team are working with is actually a much better tack onto this, and is something that I think httpx can adopt.

We'll need to decide:

  • How we name/scope our sync vs. async variants.
  • If we provide the top-level API for sync+async, or just sync.

(Essentially the same discussion as python-trio/hip#168)

One option here could be...

  • httpx.get(), httpx.post() and pals. Sync only - they're just a nice convenience. Good for the repl and casual scripting, but no good reason why async code shouldn't use a proper client.
  • httpx.sync.Client, httpx.async.Client and pals. Keep the names the same in each sync vs async case, but just scope them into two different submodules. Our __repr__'s for these clases could make sure to be distinct, eg. <async.Response [200 OK]>

So eg...

Sync:

client = httpx.sync.Client()
client.get(...)

Async:

client = httpx.async.Client()
await client.get(...)

Update

Here's how the naming/scoping could look...

Top level:

response = httpx.get(...)
response = httpx.post(...)
response = httpx.request(...)
with httpx.stream(...) as response:
    ...
response = await httpx.aget(...)
response = await httpx.apost(...)
response = await httpx.arequest(...)
async with httpx.astream(...) as response:
    ...

Client usage:

client = httpx.SyncClient()
client = httpx.AsyncClient()

(Unlike our previous approach there's no Client case there, and no subclassing. Just a pure async implementation, and a pure sync implementation.)

We can just have a single Request class, that'd accept whatever types on init and expose both streaming and async streaming interfaces to be used by the dispatcher, that raises errors in invalid cases eg. the sync streaming case is called, but it was passed an async iterable for body data.

Most operations would return a plain ol' Response class, with no I/O available on it. See #588 (comment)
When using .stream, either an AsyncResponse or a SyncResponse would be returned.

There'd be sync and async variants of the dispatcher interface and implementations, but that wouldn't be part of the 1.0 API spec. Neither would the backend API.

@tomchristie
Copy link
Member Author

Ah, actually I guess that might not quite be okay, given that async is a keyword?

Probably workarounds available, but could be problematic.

Anyway's there's:

  • scope via subpackage.
  • scope via two different pypi pacakges.
  • scope via different naming. (Eg. SyncResponse/AsyncResponse or Response/AsyncResponse)

@florimondmanca
Copy link
Member

florimondmanca commented Nov 30, 2019

IMO sync should be the second class citizen, not async. Not sticking to this means going backwards wrt discussions and conclusions we’ve drawn from #522.

So, another option: keep the same API, but have it live under httpx.sync.

In short...

>>> import httpx.sync as httpx

@phillmac
Copy link

phillmac commented Dec 1, 2019

I get that this is not stable software, but this was a bit of a rude shock when all my tests broke.

I completely disagree that "sync should be the second class citizen". Imho 90% of use cases will be fetch this page then fetch the next and so on...
Having to shoehorn strictly synchronous processes to async seems to be a bit of a pain in the backside to me.

@tomchristie
Copy link
Member Author

@phillmac I hear you competely. We've had a big pain point here in that our prior approach to sync support needed to be ripped out in order to change how we handle it.

Having cleared that out the way, I now absolutely want to support the sync and async cases both equally. We've got a bit of technical stuff to do to get that done, and we've also got the naming decisions here to resolve, but yes it'll be on the way back in.

I suggest pinning your httpx dependency to 0.7.x, and waiting a couple of weeks to see where we're at on this.

@phillmac
Copy link

phillmac commented Dec 1, 2019

Cool, my fault for not pinning it earlier, but I wanted bleeding edge stuff. Guess I got what I asked for

@tomchristie
Copy link
Member Author

I should be slightly more precise here - one thing that may well differ is that our sync API may not support HTTP/2. (I see you mention that in the thread in #522.)

@phillmac
Copy link

phillmac commented Dec 1, 2019

Yeah that's the whole point, I have a http/2 api server, there seems not to be any just bog standard http/2 libraries 😞

@tomchristie tomchristie added the enhancement New feature or request label Dec 3, 2019
@lundberg
Copy link
Contributor

lundberg commented Dec 5, 2019

I've been playing around with this a little now...took an other approach. Might be too magic though, but wanted to throw it out for discussion anyways ;-)

Idea is to use the same high-level api for both sync and async by reverting the api methods get(), post() etc back to regular sync variants, but instead returning the coroutine from request().

On top of that they could be decorated with a new decorator that evaluates if we're in an async context or not. If we're in async context, just return the coroutine which is awaitable by the user, but for sync context, help the user by running the coroutine in a new, or reusable?, asyncio loop.

Tested example:

# api.py

def awaitable(func: Callable) -> Callable:
    def wrapper(
        *args: Any, **kwargs: Any
    ) -> Union[Response, Coroutine[Any, Any, Response]]:
        coroutine = func(*args, **kwargs)
        try:
            # Detect if we're in async context
            sniffio.current_async_library()
        except sniffio.AsyncLibraryNotFoundError:
            # Not in async context, run coroutine in a new event loop.
            return asyncio.run(coroutine)
        else:
            # We're in async context, return the coroutine for await usage
            return coroutine

    return wrapper


async def request(...) -> Response:
    async with Client(...) as client:
        return await client.request(...)

@awaitable
def get(...) -> Union[Response, Coroutine[Any, Any, Response]]:
    return request(...)

# test_api.py
def test_sync_get():
    response = httpx.get(...)


@pytest.mark.asyncio
def test_async_get():
    response = await httpx.get(...)

Pros are that it doen't change the current api, meaning it's still awaitable coroutines returned that can be used with asyncio.gather etc. Cons are that it gives a sad extra context-check overhead, if not solved some way, cache?, and also maybe harder to document.

To get rid of the context check, a setting could be introduced instead, i.e. httpx.sync = True

Thoughts?

@StephenBrown2
Copy link
Contributor

Aside from asyncio.run() being Py3.7+ only, and my needing to run httpx on 3.6, that looks quite nifty.

@lundberg
Copy link
Contributor

lundberg commented Dec 5, 2019

Aside from asyncio.run() being Py3.7+ only, and my needing to run httpx on 3.6, that looks quite nifty.

True, the asyncio.run() should probably be more clever and also reuse the created helper loop.

@tomchristie
Copy link
Member Author

It's a neat approach. Potentially not great for folks using mypy or other type checkers against their codebase.

@lundberg
Copy link
Contributor

lundberg commented Dec 5, 2019

@tomchristie do you have any experience/thoughts on sniffio.current_async_library() giving us too much (unnecessary) overhead for the "primary" async use case?

@tomchristie
Copy link
Member Author

@lundberg Not something I've looked into, no. Any particular indication that might be an issue?

@tomchristie
Copy link
Member Author

I've updated the issue description with my thoughts on how the naming/scoping could look here.

@lundberg
Copy link
Contributor

lundberg commented Dec 5, 2019

Any particular indication that might be an issue?

No other than that it feels wrong that I had to do the check on every api call.

Potentially not great for folks using mypy or other type checkers against their codebase.

We could supply a typing shorthand for this, not as nice as Response, but still.

@florimondmanca
Copy link
Member

Potentially not great for folks using mypy or other type checkers against their codebase.

Can confirm: mypy would disallow awaiting the return value of @awaitable-decorated functions because all types in the Union aren't awaitable. Also, I had to go to great lengths to make the decorator please mypy… 😄

Full snippet:

import asyncio
import typing

import httpx
import sniffio

T = typing.TypeVar("T")


def awaitable(
    func: typing.Callable[..., typing.Union[T, typing.Awaitable[T]]]
) -> typing.Callable[..., typing.Union[T, typing.Awaitable[T]]]:
    def wrapper(*args: typing.Any, **kwargs: typing.Any) -> typing.Union[T, typing.Awaitable[T]]:
        coroutine = func(*args, **kwargs)
        try:
            # Detect if we're in async context
            sniffio.current_async_library()
        except sniffio.AsyncLibraryNotFoundError:
            # Not in async context, run coroutine in a new event loop.
            coroutine = typing.cast(typing.Awaitable[T], coroutine)
            return asyncio.run(coroutine)
        else:
            # We're in async context, return the coroutine for await usage
            return coroutine

    return wrapper


async def request(method: str, url: str) -> httpx.Response:
    async with httpx.Client() as client:
        return await client.request(method, url)


@awaitable
def get(url: str) -> typing.Union[httpx.Response, typing.Coroutine[typing.Any, typing.Any, httpx.Response]]:
    return request('GET', url)


async def main() -> None:
    _ = await get('https://example.org')
$ mypy example.py
_debug/app.py:42: error: Incompatible types in "await" (actual type "Union[Response, Awaitable[Response]]", expected type "Awaitable[Any]")

While this is smart, I think the implicitness induced by having to look at the context around the function call makes it a no-no for me. On the contrary, an API such as .get()/.aget() makes it easy to know at-a-glance whether a coroutine or a plain value is returned (while pleasing type checkers more easily).

@florimondmanca
Copy link
Member

florimondmanca commented Dec 5, 2019

@tomchristie A few questions regarding the update in the issue description.

A) Do we want AsyncClient to have .get(), .post(), etc, or to match the top-level API with .aget(), .apost(), etc? (I'm inclined towards the latter, for consistency and also knowing at a glance that we're using an async client without having to call it async_client or analyze the surrounding context.)

B) Re: responses, as I understand Response would be the default, won't-do-any-I/O, simple container response class, while SyncResponse would be for sync-streaming responses, and AsyncResponse would be for async-streaming responses, correct?

I'm thinking it might be possible to take advantage of the fact that with cmand async with acm are two distinct protocols, with their own pair of dunder methods. We could have:

class StreamingResponse:
    def __enter__(self):
        assert is_sync_iterator(self.content)
        ...

    def __exit__(self, *args):
        ...

    def __aenter__(self):
        assert is_async_iterator(self.content)
        ...

    def __aexit__(self, *args):
        ...

This way:

  • The async client would enforce usage as async with client.stream() as response: ....
  • The sync client would enforce usage as with client.stream() as response: ....

And we don't have to create sync/async classes just for the streaming case. :-)

(Caveat: we'd need to do the same kind of checks for the .stream_*()/.astream_*() methods.)

@lundberg
Copy link
Contributor

lundberg commented Dec 5, 2019

Also, I had to go to great lengths to make the decorator please mypy...

Nice try though @florimondmanca 😅

Also really agree that it's more clear with separate sync/async apis like get/aget etc. But even though my rough idea, read think outside the box, probably is not the way to go, it does keep the core code base untouched, in other words no need for sync/async implementations like before.

Still think it would be nice to keep the core async, and "add" sync support/help on top. Maybe just so, a sync-wrapper api, maybe even go all the way with opt-in install pip install httpx[sync] 😉

@florimondmanca
Copy link
Member

There's still the "code-gen the sync version from the async one" approach that we've been thinking about seeing what folks from hip are doing… It's not 100% obvious to me it's still the best approach, though. 💭

@tomchristie
Copy link
Member Author

tomchristie commented Dec 5, 2019

Do we want AsyncClient to have .get(), .post(), etc, or to match the top-level API with .aget(), .apost(), etc?

Ah, that’s a good point. I’d not considered that those two wouldn’t mirror each other. At that point I’d probably just say “let’s not have the top level API for async.” Different constraints - the top level API is a convenience case, and there’s really no good reason not to always use a client instance in the async case.

And yeah neat point about the StreamingContext being able to handle both the async and sync cases in the same implementation. 👍

@lundberg
Copy link
Contributor

lundberg commented Dec 6, 2019

Couldn't let go of my attempt of having one high-level api... 😄

New idea is to skip the sync/async context checking and instead wrap the coroutine from request() in a FutureResponse, allowing proper await, but also adds a sync send() helper.

Example:

# models.py
class FutureResponse(Awaitable[Response]):
    def __init__(self, coroutine):
        self.coroutine = coroutine

    def __await__(self):
        return self.coroutine.__await__()

    def send(self):
        # Alternative names; result, run, resolve, call, __call__, ?
        # Should be more clever here and reuse sync helper loop and support py3.6
        return asyncio.run(self.coroutine)
# api.py
async def request(...) -> Response:
    async with Client(...) as client:
        return await client.request(...)

def get(...) -> FutureResponse:
    return FutureResponse(request(...))
# test_api.py
@pytest.mark.asyncio
async def test_async_get():
    response = await httpx.get(...)

def test_sync_get():
    response = httpx.get(...).send()

Thoughts?

@florimondmanca
Copy link
Member

That is outrageously clever, @lundberg. 🤯

On trio you'd have to some a weird trio.run(lambda: self.coroutine) (since trio.run() expects a callable that returns a Coroutine), but yeah, it's a fun one!

@lundberg
Copy link
Contributor

lundberg commented Dec 6, 2019

On trio you'd have to some a weird trio.run(lambda: self.coroutine) (since trio.run() expects a callable that returns a Coroutine), but yeah, it's a fun one!

Do you mean that FutureResponse.send() also would need to support trio @florimondmanca ?

If so, I personally think that it doesn't need to be that clever, since if you have trio installed, you're probably already in an async context, or know what you're doing.

Additionally, about the parallel api namespace approach, .get() / .aget(). If thats the road, I think they should be named carefully to favour async, meaning if we drop, or stuff away, sync support in the future. Personally, I'd like to see httpx.get() to be the async one, to allow simpler drop of sync api's, and leave the async ones untouched, making upgrade HTTPX a lot easier for async users. In the future, everything will be async, right? 😉

@tomchristie
Copy link
Member Author

I'mma have to curtail the fun here I think. 😅

Let's not be clever.

@florimondmanca made a good point about aget() etc not mirroring the client methods, so I think we'll just keep things really simple. Standard ol' thread-concurrency API for the top level convienence case, great with the REPL etc.

The SyncClient and AsyncClient for when you're actually being precise about things. There's no good enough reason to support async without an explicit client instance.

@florimondmanca
Copy link
Member

@tomchristie So, about the SyncClient/AsyncClient distinction — as I understand, we're going towards dropping the code-gen approach here, correct? Or is this only a discussion about API for now, and we'll see how to bring that API to life later?

@florimondmanca
Copy link
Member

florimondmanca commented Dec 6, 2019

Also, I'm not too wild on enforcing the usage of a client in the async case.

One of the killer features of Requests is the ability to just import requests and .get() your way through things. This is far enough for most use cases. Yes, it sets up a connection pool and tears it down every time (doesn't it?), but most people don't care that much about performance (and when they do, they can switch to a client).

I'd really like to be able to just import httpx about await httpx.get() (or .aget() (?)).

In short, I'm in favor of an async top-level API with the rationale that the top-level API is not only convenient for the REPL use cases.

@imbolc
Copy link

imbolc commented Dec 8, 2019

import httpx.sync as httpx

Why should they be in the same package then? Won't it be easier to have a separate httpx_sync package and just keep compatibility on API level. This may reveal that most of the people use only one of them. E.g. if there would be a lack of resources to maintain both, it would be easier to decide which one to drop. The common staff could obviously go into a separate httpx_base package.

@imbolc
Copy link

imbolc commented Dec 8, 2019

I completely disagree that "sync should be the second class citizen". Imho 90% of use cases will be fetch this page then fetch the next and so on...

@phillmac I'd even say its 99% and await keyword here is exactly for this purpose :)

@imbolc
Copy link

imbolc commented Dec 8, 2019

@lundberg with the @awaitable api:

  1. Can't there be some edge cases you'd intentionally want to run stuff synchronously from an async context?
  2. Can't we go a bit further and resolve the coroutine inside awaitable instead of returning it? This way api would be completely the same, without even a need to add the await keyword. It will also eliminate these silent errors if we forget to add await. We still can get an error if we add an unnecessary await, but it won't be silent anymore.

@florimondmanca
Copy link
Member

florimondmanca commented Dec 8, 2019

So, regarding the implementation of a sync variant, I guess one thing we agree on at this point is that the sync version must be purely sync, right? So, no asyncio.run() calls, just an end-to-end, sync-only, socket-based impementation.

While there's the question of how to syncify the Request and Response interfaces (we've already gathered some hints about that in here), I'm thinking about what should happen regarding the dispatchers, esp. ConnectionPool and HTTPConnection, as well as the Client.

Option 1) is to re-implement everything in a sync variant, so SyncClient, SyncDispatcher base class, SyncConnectionPool, SyncHTTPConnection, etc. Obviously that's a lot of code, so that's when we get to talk about option 2): code generation; i.e. generating the sync variant from the async one.

I had some fun today putting together a sample package that uses unasync to code-gen a sync version of itself: hello-unasync.

The main goal was to see how the development experience looked like, and how to set things up to get mypy, flake8, and code coverage to work with code generation.

Getting code coverage to work locally and on CI was a bit involved, but overall besides the compilation step there isn't much that changes.

I must say the development experience is less pleasing/straight-forward than for a simple "it's all in the source tree" package, but having the sync version "maintain itself" is really handy.

Wondering what applying this approach to HTTPX would involve in practice. My thoughts are this would require the following…

  1. Move async code that should be code-gen'd to an _async sub-package. So this means Request, Response, Client, dispatchers, middleware, the high-level API, and the ConcurrencyBackend base class (so that we can code-gen a sync variant for SyncBackend to use).
  2. Add a SyncBackend (based on Add SyncBackend and use it on Client #525).
  3. Update auto-detection in lookup_backend to detect when we're in the _sync part of the code and automatically use SyncBackend in that case.
  4. Update our toolchain.
  5. Profit?

That's quite a lot of upfront work, but it's not clear to me yet which of 1) re-implementing sync from async or 2) setting up code generation would be the best. I guess both have their pro's and con's…

Is there anything we can do/try to see whether codegen would fit us well? Maybe only tackle syncifying HTTPConnection? (Which would already require syncifying HTTP11Connectionmaybe HTTP2Connection can stay as-is for now.)

@lundberg
Copy link
Contributor

lundberg commented Dec 9, 2019

  1. Can't there be some edge cases you'd intentionally want to run stuff synchronously from an async context?

I guess there might be, i.e. a 3rd-p-package limiting to a regular def @imbolc. The sync send() helper would need to handle catching existing loop and reuse that one.

  1. Can't we go a bit further and resolve the coroutine inside awaitable instead of returning it?

IMO, that would limit async features, i.e. gather etc, or passing the coroutine around before awaiting.

@lundberg
Copy link
Contributor

lundberg commented Dec 9, 2019

I had some fun today putting together a sample package that uses unasync to code-gen a sync version of itself: hello-unasync.

Nice to see this alternative as well @florimondmanca 😅

As we know, a lot of libraries and frameworks are struggling with the same stuff, and tackling it in different ways. Django released 3.0, ~ a week ago, with asgi- and some async support. Until Django is fully async, their solution is async_to_sync.

@florimondmanca
Copy link
Member

Until Django is fully async, their solution is async_to_sync.

Yup. Though I think this is the kind of "bridging" approach we had been trying to use here before (only via a custom implementation), i.e. running wrapping an async operation in an asyncio.run() call, with extra care not to run an event loop in a thread where there's one running. It's brought us a wealth of issues (see the beginning of this issue description), and I believe we don't want to go that path anymore.

@lundberg
Copy link
Contributor

lundberg commented Dec 9, 2019

... and I believe we don't want to go that path anymore.

Totally agree @florimondmanca. Just wanted to highlight the alternative.

But having said that, it's still a nice way of keeping the code base tight, either solution and alternative used, to keep the core async and "just" wrap it for sync use.

@imbolc
Copy link

imbolc commented Dec 9, 2019

one thing that may well differ is that our sync API may not support HTTP/2

Is there any reason for bringing back the sync version at all If it would be the case? How would it differ from the requests?

@tomchristie
Copy link
Member Author

Actually turns out that I think our Sync variant will support HTTP/2. (After our latest refactoring we don't actually need background tasks in the HTTP/2 case.)

How would it differ from the requests?

  • Fully type annotated.
  • 100% coverage. (Well, 99% ATM, but we'll be back there soon. and enforce it.)
  • API equivelent sync+async variants. (Even if you're not using async now, that'd make it easier to switch between the two or provide libraries with support for either.)
  • A more neatly constrained streaming API.
  • Ability to plug directly into WSGI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants