Client middleware API #295

florimondmanca · 2019-08-29T20:59:57Z

There's been quite a few PRs trying to take a stab at making the Client/AsyncClient extensible, mostly with a middleware approach whose idea seems to have originated in #136 (comment) to help solve Digest Auth.

As mentioned in #268 (comment) it might be useful to define and discuss in which direction we'd like to go.

Problem statement

A few requirements we may want to satisfy, gathered from looking at the various discussions:

Support the simplest case: modify the request (before getting a response from the underlying middleware).
Support middleware that may return an early response (e.g. a cache middleware), i.e. allow to skip calling the underlying middleware.
Support middleware that need to make subsequent requests after getting the initial response (e.g. Redirect, Digest Auth).
Allow users to write their own middleware, e.g. for particular authentication schemes.
Allow users to assemble a middleware pipeline (with what level of control?).

None of the approaches attempt to solve 4) nor 5) — and that can definitely be dealt with in its own issue once this one is resolved. As a result, the scope of this issue is designing an internal API to solve 1), 2) and 3).

Existing implementation

A reminder of the features that HTTPX currently supports that may benefit from being refactored as as middleware:

Basic authentication: add an Authorization header based on a username and password.
"Custom authentication": process the request via a req -> req callable. (I personally think this might be too lose — we could just require the callable to return an Authorization header — but let's tighten the scope of this issue and keep things as they are.)
Redirect: automatically handle redirect responses by making request to the specified location.

These are currently implemented as part of the BaseClient itself. Them cluttering the BaseClient is, as far as I understand, the main reason for refactoring them into loosely-coupled middleware.

Proposed solutions

Here's a summary of the various APIs that were proposed up to now and their respective "no-op middleware" for the sake of example:

#176 : process_request / process_response

class Middleware:
    async def process_request(self, request: AsyncRequest) -> AsyncRequest:
        return request

    async def process_response(
        self, request: AsyncRequest, response: AsyncResponse
    ) -> AsyncResponse:
        return response

#267 : building on top of the dispatcher API

from httpx.base import AsyncDispatcher

class Middleware(AsyncDispatcher):
    def __init__(self, next_dispatcher, **kwargs):
        self.next_dispatcher = next_dispatcher

    async def send(
        self,
        request: AsyncRequest,
        verify: VerifyTypes = None,
        cert: CertTypes = None,
        timeout: TimeoutTypes = None,
    ) -> AsyncResponse:
        return await self.next_dispatcher.send(request, verify=verify, cert=cert, timeout=timeout)

#268: callable middleware

class Middleware:
    async def __call__(self, request: AsyncRequest, get_response: typing.Callable) -> AsyncResponse:
        return await get_response(request)

I think the various PRs proved that all APIs tick 1), 2), and 3) as defined above, with various degrees of simplicity and boilerplate.

The callable middleware API seems to be the simplest one while offering enough flexibility. Making extra requests based on the initial response can be achieved by recursively calling the middleware, as in the RedirectMiddleware. As another PoC, a rough Digest-Auth implementation based on this snippet:

class DigestAuthMiddleware:
    async def __call__(self, request, get_response):
        response = await get_response(request)

        # Did the server respond with 401 + WWW-Authenticate?
        if self.requires_authentication(response):
            authorization = self.build_authorization_header(request, response)
            request.headers["Authorization"] = authorization
            return await self(request, get_response)

        return response

cc @tomchristie @yeraydiazdiaz @sethmlarson

The text was updated successfully, but these errors were encountered:

florimondmanca · 2019-08-29T21:15:53Z

Trying to answer a few other questions I saw in the various discussions we've had…

FAQ

What about the BaseAuth interface?

It seems the proposed middleware APIs encompass the current auth interface, which can be trivially implemented as a middleware that modifies the request before sending it. So we can definitely consider removing BaseAuth entirely.

Shouldn't redirect stay part of the client?

Maybe!

We already have an allow_redirects flag (inherited from Requests?) that allows users to turn redirection handling on and off.

(More precisely, this flag controls whether we automatically handle redirections. We still do some processing to store the history of responses as well as set response.next if redirections aren't allowed and we receive a redirect response.)

Besides, if we think of middleware as a stack, I think we want redirection to always be handled first — e.g. we don't want auth to fail because it actually needed to call www.service.com instead of service.com. So it's another argument for keeping it closer to the client.

However, the various PRs have shown that it is possible to implement redirection handling as a middleware. So to strike a balance, we can make it so that the redirect middleware will always be added first by the client, and be wrapped by any extra (and potentially user-defined) middleware such as authentication.

So we'll need to setup a middleware stack per request?

It seems so: Client.get(), Client.post() etc accept parameters that affect what middleware should be called and how they should behave — e.g. redirect_middleware and auth.

florimondmanca · 2019-09-01T10:56:56Z

Pinging in here — anyone's got thoughts on this? Do we need the internal middleware at all? Is any of the proposed APIs potentially a good fit for a user-facing extensibility mechanism?

sethmlarson · 2019-09-01T16:26:12Z

Thanks for putting this together! :) You and @yeraydiazdiaz have been killing it with taking ownership of the middleware implementation.

The callable middleware has my vote for simplicity and how close it resembles WSGI / ASGI. The calling itself recursively with a new request as a method of making a different request is smart too and solves the cache validation, digest auth, and redirect requirement.

There's still the question of "what's the right way to access the Client from within a middleware call?" because we're going to want some sort of session storage on the Client object for cookies, cache, remembering Alt-Svc headers. It'd be good if that session storage could be serializable to in-memory, filesystem, etc so that users can have their sessions persist easily (separate PR).

As far as ordering I thought briefly about how we could take a page out of pyramid's book and allow saying "insert this middleware before X" on the register stage and then allow class-based middlewares to specify a default stage to be inserted at so that users don't have to think about it. I'm thinking only a few like:

import httpx
import enum

class MiddlewareStage(enum.IntEnum):
    AUTH = 1
    CACHE = 2
    REDIRECT = 3

client = httpx.Client()
client.add_middleware(MyCustomMiddleware(), before=MiddlewareStage.AUTH)

As far as defining custom middlewares providing the type-hint for MiddlewareType should work along with potentially providing a base-class?

florimondmanca · 2019-09-01T16:54:58Z

@sethmlarson Thanks for your input!

There's still the question of "what's the right way to access the Client from within a middleware call?"

At first I thought, "what's the problem with storing anything we need to keep track of directly in middleware?". But then I remembered that middleware are re-created on each request, so we can't really persist anything there.

But then, why should it be the middleware's responsibility to manage that kind of storage? I suppose the client can intercept responses that came back with cookies, and do its own thing to make sure those cookies are sent in subsequent requests to that host.

Personally, I don't feel very comfortable discussing this because I don't have a clear vision yet of how one of these use cases could work. I think tackling caching would give us some insight on how to handle a concept of "client session" and its associated storage.

Regarding middleware ordering — the enum idea is interesting and definitely worth digging into. Do you have any link to the Pyramid feature that gave you this idea? (I'm not very familiar with Pyramid.)

As far as defining custom middlewares providing the type-hint for MiddlewareType should work along with potentially providing a base-class?

#268 already provides a BaseMiddleware class. Sure, we can union it with a wisely-defined Callable type to create MiddlewareTypes — that would allow to support both class-based and function-based middleware as far as type checkers are concerned.

Anyway, are we OK with proceeding with and merging #268 then? I'll ask for your review there, please let me know there if the MiddlewareTypes is something we should consider adding. :-)

sethmlarson · 2019-09-01T18:16:46Z

I'm good with #268 proceeding. :)

But then I remembered that middleware are re-created on each request, so we can't really persist anything there.

I see that the redirect and auth middlewares are being recreated per request but would custom middlewares be restricted by this?

florimondmanca · 2019-09-01T21:41:27Z

I see that the redirect and auth middlewares are being recreated per request but would custom middlewares be restricted by this?

Currently, yes. I guess it depends on whether we make .add_middleware (or whatever it’s going to be) accept a middleware class (which can be instanciated on each request) or a middleware instance. For some applications like redirection, only per-request middleware make sense. Maybe we’ll end up introducing a notion of scope (I’m thinking of pytest fixtures here), to be defined on the middleware class?

yeraydiazdiaz · 2019-09-02T08:31:19Z

Great job on this @florimondmanca!

I think #268 is a great first implementation of this, chances are we'll start noticing pain points as we start implementing more middlewares but at this point in the project it's something we need to go through.

I'll repurpose the Digest auth and look forward to more features using middlewares 🎉

florimondmanca · 2019-09-02T08:38:55Z

Super @yeraydiazdiaz, I’m excited about Digest Auth!

florimondmanca mentioned this issue Sep 1, 2019

Dispatcher middlewares (refactored) #268

Merged

florimondmanca closed this as completed in #268 Sep 1, 2019

florimondmanca mentioned this issue Sep 1, 2019

Dispatcher middlewares #267

Closed

This was referenced Sep 15, 2019

request.close or AsyncClient.close? #342

Closed

Public middleware API #345

Closed

tomchristie mentioned this issue Sep 15, 2019

Blockers for 1.0 release. #137

Closed

12 tasks

florimondmanca added this to the v1.0 milestone Sep 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client middleware API #295

Client middleware API #295

florimondmanca commented Aug 29, 2019 •

edited

florimondmanca commented Aug 29, 2019

florimondmanca commented Sep 1, 2019

sethmlarson commented Sep 1, 2019

florimondmanca commented Sep 1, 2019

sethmlarson commented Sep 1, 2019

florimondmanca commented Sep 1, 2019 •

edited

yeraydiazdiaz commented Sep 2, 2019

florimondmanca commented Sep 2, 2019

Client middleware API #295

Client middleware API #295

Comments

florimondmanca commented Aug 29, 2019 • edited

Problem statement

Existing implementation

Proposed solutions

#176 : process_request / process_response

#267 : building on top of the dispatcher API

#268: callable middleware

florimondmanca commented Aug 29, 2019

FAQ

florimondmanca commented Sep 1, 2019

sethmlarson commented Sep 1, 2019

florimondmanca commented Sep 1, 2019

sethmlarson commented Sep 1, 2019

florimondmanca commented Sep 1, 2019 • edited

yeraydiazdiaz commented Sep 2, 2019

florimondmanca commented Sep 2, 2019

florimondmanca commented Aug 29, 2019 •

edited

florimondmanca commented Sep 1, 2019 •

edited