Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client middleware API #295

Closed
florimondmanca opened this issue Aug 29, 2019 · 8 comments · Fixed by #268
Closed

Client middleware API #295

florimondmanca opened this issue Aug 29, 2019 · 8 comments · Fixed by #268
Milestone

Comments

@florimondmanca
Copy link
Member

florimondmanca commented Aug 29, 2019

There's been quite a few PRs trying to take a stab at making the Client/AsyncClient extensible, mostly with a middleware approach whose idea seems to have originated in #136 (comment) to help solve Digest Auth.

As mentioned in #268 (comment) it might be useful to define and discuss in which direction we'd like to go.

Problem statement

A few requirements we may want to satisfy, gathered from looking at the various discussions:

  1. Support the simplest case: modify the request (before getting a response from the underlying middleware).
  2. Support middleware that may return an early response (e.g. a cache middleware), i.e. allow to skip calling the underlying middleware.
  3. Support middleware that need to make subsequent requests after getting the initial response (e.g. Redirect, Digest Auth).
  4. Allow users to write their own middleware, e.g. for particular authentication schemes.
  5. Allow users to assemble a middleware pipeline (with what level of control?).

None of the approaches attempt to solve 4) nor 5) — and that can definitely be dealt with in its own issue once this one is resolved. As a result, the scope of this issue is designing an internal API to solve 1), 2) and 3).

Existing implementation

A reminder of the features that HTTPX currently supports that may benefit from being refactored as as middleware:

  • Basic authentication: add an Authorization header based on a username and password.
  • "Custom authentication": process the request via a req -> req callable. (I personally think this might be too lose — we could just require the callable to return an Authorization header — but let's tighten the scope of this issue and keep things as they are.)
  • Redirect: automatically handle redirect responses by making request to the specified location.

These are currently implemented as part of the BaseClient itself. Them cluttering the BaseClient is, as far as I understand, the main reason for refactoring them into loosely-coupled middleware.

Proposed solutions

Here's a summary of the various APIs that were proposed up to now and their respective "no-op middleware" for the sake of example:

#176 : process_request / process_response

class Middleware:
    async def process_request(self, request: AsyncRequest) -> AsyncRequest:
        return request

    async def process_response(
        self, request: AsyncRequest, response: AsyncResponse
    ) -> AsyncResponse:
        return response

#267 : building on top of the dispatcher API

from httpx.base import AsyncDispatcher

class Middleware(AsyncDispatcher):
    def __init__(self, next_dispatcher, **kwargs):
        self.next_dispatcher = next_dispatcher

    async def send(
        self,
        request: AsyncRequest,
        verify: VerifyTypes = None,
        cert: CertTypes = None,
        timeout: TimeoutTypes = None,
    ) -> AsyncResponse:
        return await self.next_dispatcher.send(request, verify=verify, cert=cert, timeout=timeout)

#268: callable middleware

class Middleware:
    async def __call__(self, request: AsyncRequest, get_response: typing.Callable) -> AsyncResponse:
        return await get_response(request)

I think the various PRs proved that all APIs tick 1), 2), and 3) as defined above, with various degrees of simplicity and boilerplate.

The callable middleware API seems to be the simplest one while offering enough flexibility. Making extra requests based on the initial response can be achieved by recursively calling the middleware, as in the RedirectMiddleware. As another PoC, a rough Digest-Auth implementation based on this snippet:

class DigestAuthMiddleware:
    async def __call__(self, request, get_response):
        response = await get_response(request)

        # Did the server respond with 401 + WWW-Authenticate?
        if self.requires_authentication(response):
            authorization = self.build_authorization_header(request, response)
            request.headers["Authorization"] = authorization
            return await self(request, get_response)

        return response

cc @tomchristie @yeraydiazdiaz @sethmlarson

@florimondmanca
Copy link
Member Author

Trying to answer a few other questions I saw in the various discussions we've had…

FAQ

What about the BaseAuth interface?

It seems the proposed middleware APIs encompass the current auth interface, which can be trivially implemented as a middleware that modifies the request before sending it. So we can definitely consider removing BaseAuth entirely.

Shouldn't redirect stay part of the client?

Maybe!

We already have an allow_redirects flag (inherited from Requests?) that allows users to turn redirection handling on and off.

(More precisely, this flag controls whether we automatically handle redirections. We still do some processing to store the history of responses as well as set response.next if redirections aren't allowed and we receive a redirect response.)

Besides, if we think of middleware as a stack, I think we want redirection to always be handled first — e.g. we don't want auth to fail because it actually needed to call www.service.com instead of service.com. So it's another argument for keeping it closer to the client.

However, the various PRs have shown that it is possible to implement redirection handling as a middleware. So to strike a balance, we can make it so that the redirect middleware will always be added first by the client, and be wrapped by any extra (and potentially user-defined) middleware such as authentication.

So we'll need to setup a middleware stack per request?

It seems so: Client.get(), Client.post() etc accept parameters that affect what middleware should be called and how they should behave — e.g. redirect_middleware and auth.

@florimondmanca
Copy link
Member Author

Pinging in here — anyone's got thoughts on this? Do we need the internal middleware at all? Is any of the proposed APIs potentially a good fit for a user-facing extensibility mechanism?

@sethmlarson
Copy link
Contributor

Thanks for putting this together! :) You and @yeraydiazdiaz have been killing it with taking ownership of the middleware implementation.

The callable middleware has my vote for simplicity and how close it resembles WSGI / ASGI. The calling itself recursively with a new request as a method of making a different request is smart too and solves the cache validation, digest auth, and redirect requirement.

There's still the question of "what's the right way to access the Client from within a middleware call?" because we're going to want some sort of session storage on the Client object for cookies, cache, remembering Alt-Svc headers. It'd be good if that session storage could be serializable to in-memory, filesystem, etc so that users can have their sessions persist easily (separate PR).

As far as ordering I thought briefly about how we could take a page out of pyramid's book and allow saying "insert this middleware before X" on the register stage and then allow class-based middlewares to specify a default stage to be inserted at so that users don't have to think about it. I'm thinking only a few like:

import httpx
import enum

class MiddlewareStage(enum.IntEnum):
    AUTH = 1
    CACHE = 2
    REDIRECT = 3

client = httpx.Client()
client.add_middleware(MyCustomMiddleware(), before=MiddlewareStage.AUTH)

As far as defining custom middlewares providing the type-hint for MiddlewareType should work along with potentially providing a base-class?

@florimondmanca
Copy link
Member Author

@sethmlarson Thanks for your input!

There's still the question of "what's the right way to access the Client from within a middleware call?"

At first I thought, "what's the problem with storing anything we need to keep track of directly in middleware?". But then I remembered that middleware are re-created on each request, so we can't really persist anything there.

But then, why should it be the middleware's responsibility to manage that kind of storage? I suppose the client can intercept responses that came back with cookies, and do its own thing to make sure those cookies are sent in subsequent requests to that host.

Personally, I don't feel very comfortable discussing this because I don't have a clear vision yet of how one of these use cases could work. I think tackling caching would give us some insight on how to handle a concept of "client session" and its associated storage.

Regarding middleware ordering — the enum idea is interesting and definitely worth digging into. Do you have any link to the Pyramid feature that gave you this idea? (I'm not very familiar with Pyramid.)

As far as defining custom middlewares providing the type-hint for MiddlewareType should work along with potentially providing a base-class?

#268 already provides a BaseMiddleware class. Sure, we can union it with a wisely-defined Callable type to create MiddlewareTypes — that would allow to support both class-based and function-based middleware as far as type checkers are concerned.

Anyway, are we OK with proceeding with and merging #268 then? I'll ask for your review there, please let me know there if the MiddlewareTypes is something we should consider adding. :-)

@sethmlarson
Copy link
Contributor

I'm good with #268 proceeding. :)

But then I remembered that middleware are re-created on each request, so we can't really persist anything there.

I see that the redirect and auth middlewares are being recreated per request but would custom middlewares be restricted by this?

@florimondmanca
Copy link
Member Author

florimondmanca commented Sep 1, 2019

I see that the redirect and auth middlewares are being recreated per request but would custom middlewares be restricted by this?

Currently, yes. I guess it depends on whether we make .add_middleware (or whatever it’s going to be) accept a middleware class (which can be instanciated on each request) or a middleware instance. For some applications like redirection, only per-request middleware make sense. Maybe we’ll end up introducing a notion of scope (I’m thinking of pytest fixtures here), to be defined on the middleware class?

@yeraydiazdiaz
Copy link
Contributor

Great job on this @florimondmanca!

I think #268 is a great first implementation of this, chances are we'll start noticing pain points as we start implementing more middlewares but at this point in the project it's something we need to go through.

I'll repurpose the Digest auth and look forward to more features using middlewares 🎉

@florimondmanca
Copy link
Member Author

Super @yeraydiazdiaz, I’m excited about Digest Auth!

@florimondmanca florimondmanca added this to the v1.0 milestone Sep 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants