Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sans-I/O client middleware stack #783

Closed
wants to merge 5 commits into from
Closed

Conversation

florimondmanca
Copy link
Member

@florimondmanca florimondmanca commented Jan 20, 2020

So…

Implementing retries naturally involved adding a .send_handling_retries() method to clients. This has resulted in creating a chain of 3 method calls:

send_handling_retries() -> send_handling_redirects() -> send_handling_auth() -> send_single_request()

With #778, both retries and auth would be using a generator-based API. We don't use one for redirects, but we very much could.

So all of this looked an awful lot like a middleware stack to me, so…

This PR moves the redirects and auth logic to a dedicated httpx.middleware, which defines a generator-based middleware API. As a result, Client and AsyncClient are really just pretty type-hinted shells around a dispatcher and a stack of middleware.

A no-op middleware looks like this:

class NullMiddleware:
    def __init__(self, parent):
        self.parent = parent

    def __call__(self, request, context, timeout) -> Response:
        return yield from self.parent(request, context, timeout)

The question obviously is how can this API work both in the sync and async cases? For example, the redirect middleware needs to call response.read()/await response.aread()...

The solution is to defer the awaiting of yielded value right to the edge... This is what these two helper functions allow to do:

  • consume_generator()
  • consume_generator_of_awaitables()

They allow writing generators based on functions that may or may not return awaitables. For example, calling dispatcher.send() will return a coroutine in the async case (because we have an AsyncDispatcher), but a plain response in the sync case (because we have a SyncDispatcher). So if we yield dispatcher.send(), and make sure the generator is passed through consume_generator() in the sync case, and await consume_generator_of_awaitables() in the async case, then we'll have the coroutine awaited in the async case, and the plain value left as-is in the sync case. (Makes sense?)

For cases when the API isn't exactly the same in the sync and async cases, we introduce a special SyncOrAsync container that defines what we should call in both situations, and then the two helper functions do the unwrapping when the stumble upon an instance of this container. So this allows doing the switch between e.g. response.read() and response.aread().

And… tada. ✨


Yes, this is generator black magic pushed to the limit.

But, I think this middleware API could actually be a viable one, which is a pretty big deal.

The real killer thing IMO is that it allows all of the following:

  • Writing code for both the sync and the async clients.
  • Writing functionality that potentially requires sending multiple requests.
  • Dynamically expanding the control knobs of the client, i.e. not being restricted to what can be passed to client.send()[^0]. This is done with a context that works a bit like scope in ASGI.

([^0] This bit would just require adding a **kwargs everywhere on the client API…)

Eventually, we could aim for something like this…

class MyMiddleware:
    def __init__(self, parent, initial: dict):
        self.initial = initial

    def __call__(self, request, context, timeout):
        my_data = context.get("my_data", {})
        request.headers["X-My-Data"] = ", ".join(f"{key}={value}" for key, value in my_data.items())
        return yield from self.parent(request, context, timeout)


client = httpx.Client(
    middleware=[
        # v *Maybe* make these optional…
        httpx.Middleware(AuthMiddleware, auth=("user", "pass")),
        httpx.Middleware(RedirectMiddleware, max_redirects=3),
        # ^
        httpx.Middleware(MyMiddleware, initial={"message": "Hello, world!"}),
)

response = client.get("https://example.org", my_data={"foo": "bar"})

@florimondmanca florimondmanca added the refactor Issues and PRs related to code refactoring label Jan 20, 2020
@florimondmanca florimondmanca mentioned this pull request Jan 20, 2020
@florimondmanca
Copy link
Member Author

I think we should get rid of the context thingy on the middleware callable API, and replace it with a **kwargs. Then it’s even more natural to know « this middleware requires these parameters », because they’d be defined as actual parameters on the .__call__() implementation. And we wouldn’t need to manipulate a dict to extract them/set default values. Would also fit better how those parameters would be passed to the client — as that can’t be sensibly done any other way than via a **kwargs on client methods.

@tomchristie
Copy link
Member

So, probably no surprise that I'm a bit cautious about us introducing something here.

Last time around adding a middleware stack felt like an abstraction too far, that masked the actual workings of the the Client, when a little more flatness was actually sufficient, and more readable.

Something I'd suggest for any exploratory work here would be to add a middleware API without also moving around our existing implementation into middleware itself. That'd be a more minimal footprint to look over, and would let us focus more on "is this the right API to expose to users" vs. "is this feasible, and how does this affect our implementation".

At this point it's not even clear to me that we neccessarily need a middleware API.

For example, one keep-it-simple alternative here would be to make sure that we're documenting the build_request/send pair of functions as our lowest level of public API on the Client instance, and if folks want to wrap some behaviour around send, then well, great, subclass Client or AsyncClient, and implement a send() method that calls into the parent class.

Granted, those wouldn't be modular drop-me-in-for-reuse classes, but that might be okay.

Alternately, it could be that a more constrained "event hooks" API would be sufficient (again along with adding good documentation for send() for more advanced stuff.)

I guess my rough take at the moment on this would be to push back on it for as long as possible while gathering example use-cases. Documenting send() in the meantime could be a sufficient and conservative choice with less potential to bite us?

@florimondmanca
Copy link
Member Author

@tomchristie Cool, thanks for the feedback. Obviously I would have been very surprised that this would go through without push back or caution. :)

I can try to isolate the addition of the middleware API in a separate PR. Not using it for our internal layers might end up making it seem like it’s not thorough enough, when I actually think it’s a pretty meaningful API that’s just at the right level of abstraction this time...

Your point about « let’s just have users subclass and wrap .send() » is sensible, I just have the feeling we might be able to do better if this proves to be a right way to go.

But right, let’s do that and take it from there.

@florimondmanca
Copy link
Member Author

Closing, followed up with #800.

@florimondmanca florimondmanca deleted the sansio-middleware branch February 2, 2020 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Issues and PRs related to code refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants