Middleware API #1134

florimondmanca · 2020-08-05T21:32:47Z

And here I am pitching the idea of a middleware API again… :-)

Definitely not for 1.0, and not pressing at all, but I wanted to give this a new shot to see what this looks like now that we're closer to 1.0 (compared to eg #345).

As I mentioned in #984 (comment), and more recently in #1110 (comment), there is a variety of use cases that don't fall in the "make a custom transport" domain, the "subclass the client without using private API" domain. In general this is transport-agnostic feature that intercepts the request/response cycle, for example...

Caching: look up the request key in a cache, and optionally return early with a response w/o hitting the transport.
Throttling: defer the sending of the request until a throttle policy would be satisfied.
HSTS (?): modify the request to use HTTPS if the requested website is on the HSTS Preload List.
Retries: re-send a request until it succeeds, or fails too many times.

After playing with ideas, this PR is all we would need to support a middleware API that would support the above use cases and, I believe, many more.

Core ideas are…

Requests are passed through a stack of middleware. The processing stack is:

# Process request...
middleware_1
    ... middleware_n
        # The client's own hidden middleware...
        # Ultimately we dispatch to the transport.
        send_handling_redirects() 
    # Process the response...
    ... middleware_n
middleware_1

The API is inspired by Starlette's BaseHTTPMiddleware: a base class with "dispatch hooks" that are given the Request and a call_next function for calling into the next middleware in the stack.

For example, a ThrottleMiddleware could be implemented as follows:

import time
from typing import Callable, List, Iterator

import anyio

import httpx


class BaseThrottleMiddleware:
    def __init__(self, throttle: str) -> None:
        self._history: List[float] = []

        # Parse the thottle, which should be a string, like '100/minute'.
        count, _, duration = throttle.partition("/")
        self._max_in_history = int(count)
        self._cutoff = {"second": 1.0, "minute": 60.0, "hour": 3600.0}[duration]

    def _iter_throttled(self) -> Iterator[None]:
        now = time.time()

        while len(self._history) >= self._max_in_history:
            expiry = now - self._cutoff

            # Expire old entries in the history.
            self._history = [
                timestamp for timestamp in self._history if timestamp > expiry
            ]

            # Sleep for a bit if we've exceeded the throttle rate.
            if len(self._history) >= self._max_in_history:
                yield
                now = time.time()

        self._history.append(now)


class ThrottleMiddleware(BaseThrottleMiddleware, httpx.Middleware):
    """
    An HTTPX middleware that adds some basic rate-throttling functionality.

    client = httpx.Client(middleware=[ThrottleMiddleware('100/minute')])
    """

    def send(self, request: httpx.Request, call_next: Callable) -> httpx.Response:
        for _ in self._iter_throttled():
            time.sleep(0.1)
        return call_next(request)


class AsyncThrottleMiddleware(BaseThrottleMiddleware, httpx.AsyncMiddleware):
    """
    An HTTPX middleware that adds some basic rate-throttling functionality.

    client = httpx.AsyncClient(middleware=[AsyncThrottleMiddleware('100/minute')])
    """

    async def asend(
        self, request: httpx.Request, call_next: Callable
    ) -> httpx.Response:
        for _ in self._iter_throttled():
            await anyio.sleep(0.1)
        return await call_next(request)


async def main() -> None:
    from starlette.applications import Starlette
    from starlette.responses import PlainTextResponse
    from starlette.routing import Route

    async def home(request):  # type: ignore
        return PlainTextResponse("OK")

    app = Starlette(routes=[Route("/", home)])

    async with httpx.AsyncClient(
        app=app, middleware=[AsyncThrottleMiddleware("1/second")]
    ) as client:
        await client.get("http://testserver/")  # Not throttled
        await client.get("http://testserver/")  # Throttled, waits for about 1s.


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Likewise, a caching middleware could look like this:

import time
from typing import Callable, Dict, Tuple, Optional

import httpx


class CacheMiddleware(httpx.Middleware, httpx.AsyncMiddleware):
    """
    An HTTPX middleware that caches responses for a fixed amount of time.

    client = httpx.Client(middleware=[CacheMiddleware(ttl=3600)])
    """

    def __init__(self, ttl: float) -> None:
        self._cache = MemoryCache(ttl=ttl)

    def send(self, request: httpx.Request, call_next: Callable) -> httpx.Response:
        response = self._cache.get(request)
        if response is not None:
            return response
        response = call_next(request)
        self._cache.set(request, response)
        return response

    async def asend(
        self, request: httpx.Request, call_next: Callable
    ) -> httpx.Response:
        response = self._cache.get(request)
        if response is not None:
            return response
        response = await call_next(request)
        self._cache.set(request, response)
        return response


RequestKey = Tuple[str, httpx.URL, Tuple[Tuple[bytes, bytes], ...]]


class MemoryCache:
    def __init__(self, ttl: float) -> None:
        self._ttl = ttl
        self._data: Dict[RequestKey, Tuple[httpx.Response, float]] = {}

    def _build_key(self, request: httpx.Request) -> RequestKey:
        return (request.method, request.url, tuple(request.headers.raw))

    def get(self, request: httpx.Request) -> Optional[httpx.Response]:
        now = time.time()
        key = self._build_key(request)

        if key not in self._data:
            return None

        response, expiry_date = self._data[key]

        if now > expiry_date:
            del self._data[key]
            return None

        return response

    def set(self, request: httpx.Request, response: httpx.Response) -> None:
        now = time.time()
        expiry_date = now + self._ttl
        key = self._build_key(request)
        self._data[key] = (response, expiry_date)


async def main() -> None:
    from starlette.applications import Starlette
    from starlette.responses import PlainTextResponse
    from starlette.routing import Route

    async def home(request):  # type: ignore
        return PlainTextResponse("OK")

    app = Starlette(routes=[Route("/", home)])

    async with httpx.AsyncClient(
        app=app, middleware=[CacheMiddleware(ttl=60)]
    ) as client:
        cached_response = await client.get("http://testserver/")
        response = await client.get("http://testserver/")
        assert response is cached_response
        other_response = await client.get("http://testserver/", params={"foo": "bar"})
        assert other_response is not cached_response


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Lastly, here's hstspreload back in the game:

from typing import Callable

import hstspreload
import httpx


class HSTSMiddleware(httpx.Middleware, httpx.AsyncMiddleware):
    """
    An HTTPX middleware that enforces HTTPS on websites that are on the
    Chromium HSTS Preload list, mimicking the behavior of web browsers.

    client = httpx.Client(middleware=[HSTSMiddleware()])
    """

    def _get_url(self, url: httpx.URL) -> httpx.URL:
        if (
            url.scheme == "http"
            and hstspreload.in_hsts_preload(url.host)
            and len(url.host.split(".")) > 1
        ):
            port = None if url.port == 80 else url.port
            url = url.copy_with(scheme="https", port=port)

        return url

    def send(self, request: httpx.Request, call_next: Callable) -> httpx.Response:
        request.url = self._get_url(request.url)
        return call_next(request)

    async def asend(
        self, request: httpx.Request, call_next: Callable
    ) -> httpx.Response:
        request.url = self._get_url(request.url)
        return await call_next(request)


async def main() -> None:
    async with httpx.AsyncClient(middleware=[HSTSMiddleware()]) as client:
        response = await client.get("http://paypal.org")
        assert response.request.url.scheme == "https"


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Things I'm not sure about:

Right now we pass middleware instances on client init: middleware=[SomeMiddleware(arg=1, ...)]. I think this is okay. (We don't need an equivalent of Starlette's Middleware wrapping helper, since HTTPX middleware aren't given any "parent app" on init.)
Not super pleased with the naming of .send()/.asend().
- One thing I'm pretty convinced of though is that we need the same property than for HTTPCore byte streams, i.e. "allow to implement both sync and async on the same class", so the method names must be different.
Not sure if I need to do for m in middleware or for m in reverse(middleware).
We're building the call_next function on each request, which could be a performance burden. OTOH Starlette is able to define .call_next() statically as a method of BaseHTTPMiddleware, and the middleware stack is built once and for all on app init. But HTTPX gets parameters that may differ on each request (auth, allow_redirects) so it seems hard to do differently. But not impossible I guess - needs some more thinking.

TODO:

Validate this idea for a multi-request middleware, such as a RetryMiddleware.
Needs tests.
Needs docs.

johnanthonyowens · 2020-08-08T14:46:58Z

Another use case for middleware is tracing. We’re using Datadog APM tracing in our services and in order to get trace spans for all HTTP requests I’m currently monkeypatching send_single_request() to wrap a trace context manager around the request. This wouldn’t be possible with the proposal as I understand it because the middleware wouldn’t be able to observe redirects. One could argue that the right way to handle this is by wrapping httpcore in a custom transport that adds the tracing and then supplying that to my HTTPX clients - fair enough, I just haven’t tried that yet to see it works out in practice, and the monkeypatching approach is pretty trivial and wasn’t explicitly unsupported until the 0.14 release added the underscore. 😀

tomchristie · 2020-08-08T18:34:10Z

@johnanthonyowens It might be worth opening a separate issue to discuss that in more detail. Things that'd be useful reference points here would be...

What exactly where you tracking previously, and how?
How does datadog handle this with requests or how would you be handling it if you were working against the requests API?
Would request,redirect, and response event hooks be sufficient here for your use case?

ionelmc · 2020-09-02T14:54:01Z

Is there anything I can use right now to implement a response cache?

florimondmanca · 2020-09-02T17:47:47Z

@ionelmc Most likely a Client / AsyncClient subclass that overrides .send()?

tomchristie · 2020-09-03T09:31:47Z

@ionelmc A sensible first thing to do with any question like that is to start with "how would I do this with requests" - have a look around in their ecosystem, and see if there's any implementations that do the same thing there, and then think about which part of the API it's plugged into.

The main override points for httpx for stuff like that are either:

As @florimondmanca says, override .send() on the client instance, to wrap up some additional behaviour.
Create a custom transport implementation that wraps up some additional behaviour at that layer, calling into the connection pool as needed.

florimondmanca · 2020-09-06T16:01:00Z

Just described a lighter form of this "middleware API" idea in the form of "interceptors", here… #790 (comment)

It's basically the same than the API proposed in this draft PR, except it's callable-based (sync functions for Client, async functions for AsyncClient). We get the drawback of having the sync/async schism exposed to developers and users (two kinds of everything), but it's also more lightweight in that there's no requirement to deal with classes.

florimondmanca · 2020-11-20T20:34:19Z

Okay, going to close this off again. For "middleware" that just wraps a request between the client and the final transport, it's already perfectly doable using the transport API, even though the instantiation pattern is a bit quirky for now (though not terrible).

import hstspreload
import httpcore
import httpx


class HSTSTransport(httpcore.SyncHTTPTransport, httpcore.AsyncHTTPTransport):
    """
    A transport wrapper that enforces HTTPS on websites that are on the
    Chromium HSTS Preload list, mimicking the behavior of web browsers.
    """
    def __init__(self, transport: Union[httpcore.SyncHTTPTransport, httpcore.AsyncHTTPTransport]) -> None:
        self._transport = transport

    def _maybe_https_url(self, url: tuple) -> tuple:
        scheme, host, port, path = url

        if (
            scheme == b"http"
            and hstspreload.in_hsts_preload(host.decode())
            and len(host.decode().split(".")) > 1
        ):
            port = None if port == 80 else port
            return b"https", host, port, path

        return url

    def request(self, method, url, headers, stream, ext):
        url = self._maybe_https_url(url)
        return self._transport.request(method, url, headers, stream, ext)

    async def arequest(self, method, url, headers, stream, ext):
        url = self._maybe_https_url(url)
        return await self._transport.arequest(method, url, headers, stream, ext)


transport = httpx.HTTPTransport()  # Soon
transport = HSTSTransport(transport)

with httpx.Client(transport=transport):
    ...

johtso · 2020-11-21T19:29:10Z

@ionelmc also, regarding response caching.. this should be usable https://github.com/johtso/httpx-caching

florimondmanca added 2 commits August 5, 2020 23:17

Middleware API

42395aa

Lint

3006f3f

florimondmanca marked this pull request as draft August 5, 2020 21:49

florimondmanca mentioned this pull request Aug 5, 2020

Middleware API #1133

Closed

StephenBrown2 mentioned this pull request Aug 7, 2020

Retries #1141

Closed

florimondmanca closed this Nov 20, 2020

florimondmanca deleted the middleware branch November 20, 2020 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Middleware API #1134

Middleware API #1134

florimondmanca commented Aug 5, 2020 •

edited

Loading

johnanthonyowens commented Aug 8, 2020

tomchristie commented Aug 8, 2020

ionelmc commented Sep 2, 2020

florimondmanca commented Sep 2, 2020

tomchristie commented Sep 3, 2020

florimondmanca commented Sep 6, 2020

florimondmanca commented Nov 20, 2020 •

edited

Loading

johtso commented Nov 21, 2020

Middleware API #1134

Middleware API #1134

Conversation

florimondmanca commented Aug 5, 2020 • edited Loading

johnanthonyowens commented Aug 8, 2020

tomchristie commented Aug 8, 2020

ionelmc commented Sep 2, 2020

florimondmanca commented Sep 2, 2020

tomchristie commented Sep 3, 2020

florimondmanca commented Sep 6, 2020

florimondmanca commented Nov 20, 2020 • edited Loading

johtso commented Nov 21, 2020

florimondmanca commented Aug 5, 2020 •

edited

Loading

florimondmanca commented Nov 20, 2020 •

edited

Loading