Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Middleware API #1134

Closed
wants to merge 2 commits into from
Closed

Middleware API #1134

wants to merge 2 commits into from

Conversation

florimondmanca
Copy link
Member

@florimondmanca florimondmanca commented Aug 5, 2020

And here I am pitching the idea of a middleware API again… :-)

Definitely not for 1.0, and not pressing at all, but I wanted to give this a new shot to see what this looks like now that we're closer to 1.0 (compared to eg #345).


As I mentioned in #984 (comment), and more recently in #1110 (comment), there is a variety of use cases that don't fall in the "make a custom transport" domain, the "subclass the client without using private API" domain. In general this is transport-agnostic feature that intercepts the request/response cycle, for example...

  • Caching: look up the request key in a cache, and optionally return early with a response w/o hitting the transport.
  • Throttling: defer the sending of the request until a throttle policy would be satisfied.
  • HSTS (?): modify the request to use HTTPS if the requested website is on the HSTS Preload List.
  • Retries: re-send a request until it succeeds, or fails too many times.

After playing with ideas, this PR is all we would need to support a middleware API that would support the above use cases and, I believe, many more.

Core ideas are…

  • Requests are passed through a stack of middleware. The processing stack is:
# Process request...
middleware_1
    ... middleware_n
        # The client's own hidden middleware...
        # Ultimately we dispatch to the transport.
        send_handling_redirects() 
    # Process the response...
    ... middleware_n
middleware_1
  • The API is inspired by Starlette's BaseHTTPMiddleware: a base class with "dispatch hooks" that are given the Request and a call_next function for calling into the next middleware in the stack.

For example, a ThrottleMiddleware could be implemented as follows:

import time
from typing import Callable, List, Iterator

import anyio

import httpx


class BaseThrottleMiddleware:
    def __init__(self, throttle: str) -> None:
        self._history: List[float] = []

        # Parse the thottle, which should be a string, like '100/minute'.
        count, _, duration = throttle.partition("/")
        self._max_in_history = int(count)
        self._cutoff = {"second": 1.0, "minute": 60.0, "hour": 3600.0}[duration]

    def _iter_throttled(self) -> Iterator[None]:
        now = time.time()

        while len(self._history) >= self._max_in_history:
            expiry = now - self._cutoff

            # Expire old entries in the history.
            self._history = [
                timestamp for timestamp in self._history if timestamp > expiry
            ]

            # Sleep for a bit if we've exceeded the throttle rate.
            if len(self._history) >= self._max_in_history:
                yield
                now = time.time()

        self._history.append(now)


class ThrottleMiddleware(BaseThrottleMiddleware, httpx.Middleware):
    """
    An HTTPX middleware that adds some basic rate-throttling functionality.

    client = httpx.Client(middleware=[ThrottleMiddleware('100/minute')])
    """

    def send(self, request: httpx.Request, call_next: Callable) -> httpx.Response:
        for _ in self._iter_throttled():
            time.sleep(0.1)
        return call_next(request)


class AsyncThrottleMiddleware(BaseThrottleMiddleware, httpx.AsyncMiddleware):
    """
    An HTTPX middleware that adds some basic rate-throttling functionality.

    client = httpx.AsyncClient(middleware=[AsyncThrottleMiddleware('100/minute')])
    """

    async def asend(
        self, request: httpx.Request, call_next: Callable
    ) -> httpx.Response:
        for _ in self._iter_throttled():
            await anyio.sleep(0.1)
        return await call_next(request)


async def main() -> None:
    from starlette.applications import Starlette
    from starlette.responses import PlainTextResponse
    from starlette.routing import Route

    async def home(request):  # type: ignore
        return PlainTextResponse("OK")

    app = Starlette(routes=[Route("/", home)])

    async with httpx.AsyncClient(
        app=app, middleware=[AsyncThrottleMiddleware("1/second")]
    ) as client:
        await client.get("http://testserver/")  # Not throttled
        await client.get("http://testserver/")  # Throttled, waits for about 1s.


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Likewise, a caching middleware could look like this:

import time
from typing import Callable, Dict, Tuple, Optional

import httpx


class CacheMiddleware(httpx.Middleware, httpx.AsyncMiddleware):
    """
    An HTTPX middleware that caches responses for a fixed amount of time.

    client = httpx.Client(middleware=[CacheMiddleware(ttl=3600)])
    """

    def __init__(self, ttl: float) -> None:
        self._cache = MemoryCache(ttl=ttl)

    def send(self, request: httpx.Request, call_next: Callable) -> httpx.Response:
        response = self._cache.get(request)
        if response is not None:
            return response
        response = call_next(request)
        self._cache.set(request, response)
        return response

    async def asend(
        self, request: httpx.Request, call_next: Callable
    ) -> httpx.Response:
        response = self._cache.get(request)
        if response is not None:
            return response
        response = await call_next(request)
        self._cache.set(request, response)
        return response


RequestKey = Tuple[str, httpx.URL, Tuple[Tuple[bytes, bytes], ...]]


class MemoryCache:
    def __init__(self, ttl: float) -> None:
        self._ttl = ttl
        self._data: Dict[RequestKey, Tuple[httpx.Response, float]] = {}

    def _build_key(self, request: httpx.Request) -> RequestKey:
        return (request.method, request.url, tuple(request.headers.raw))

    def get(self, request: httpx.Request) -> Optional[httpx.Response]:
        now = time.time()
        key = self._build_key(request)

        if key not in self._data:
            return None

        response, expiry_date = self._data[key]

        if now > expiry_date:
            del self._data[key]
            return None

        return response

    def set(self, request: httpx.Request, response: httpx.Response) -> None:
        now = time.time()
        expiry_date = now + self._ttl
        key = self._build_key(request)
        self._data[key] = (response, expiry_date)


async def main() -> None:
    from starlette.applications import Starlette
    from starlette.responses import PlainTextResponse
    from starlette.routing import Route

    async def home(request):  # type: ignore
        return PlainTextResponse("OK")

    app = Starlette(routes=[Route("/", home)])

    async with httpx.AsyncClient(
        app=app, middleware=[CacheMiddleware(ttl=60)]
    ) as client:
        cached_response = await client.get("http://testserver/")
        response = await client.get("http://testserver/")
        assert response is cached_response
        other_response = await client.get("http://testserver/", params={"foo": "bar"})
        assert other_response is not cached_response


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Lastly, here's hstspreload back in the game:

from typing import Callable

import hstspreload
import httpx


class HSTSMiddleware(httpx.Middleware, httpx.AsyncMiddleware):
    """
    An HTTPX middleware that enforces HTTPS on websites that are on the
    Chromium HSTS Preload list, mimicking the behavior of web browsers.

    client = httpx.Client(middleware=[HSTSMiddleware()])
    """

    def _get_url(self, url: httpx.URL) -> httpx.URL:
        if (
            url.scheme == "http"
            and hstspreload.in_hsts_preload(url.host)
            and len(url.host.split(".")) > 1
        ):
            port = None if url.port == 80 else url.port
            url = url.copy_with(scheme="https", port=port)

        return url

    def send(self, request: httpx.Request, call_next: Callable) -> httpx.Response:
        request.url = self._get_url(request.url)
        return call_next(request)

    async def asend(
        self, request: httpx.Request, call_next: Callable
    ) -> httpx.Response:
        request.url = self._get_url(request.url)
        return await call_next(request)


async def main() -> None:
    async with httpx.AsyncClient(middleware=[HSTSMiddleware()]) as client:
        response = await client.get("http://paypal.org")
        assert response.request.url.scheme == "https"


if __name__ == "__main__":
    import asyncio

    asyncio.run(main())

Things I'm not sure about:

  • Right now we pass middleware instances on client init: middleware=[SomeMiddleware(arg=1, ...)]. I think this is okay. (We don't need an equivalent of Starlette's Middleware wrapping helper, since HTTPX middleware aren't given any "parent app" on init.)
  • Not super pleased with the naming of .send()/.asend().
    • One thing I'm pretty convinced of though is that we need the same property than for HTTPCore byte streams, i.e. "allow to implement both sync and async on the same class", so the method names must be different.
  • Not sure if I need to do for m in middleware or for m in reverse(middleware).
  • We're building the call_next function on each request, which could be a performance burden. OTOH Starlette is able to define .call_next() statically as a method of BaseHTTPMiddleware, and the middleware stack is built once and for all on app init. But HTTPX gets parameters that may differ on each request (auth, allow_redirects) so it seems hard to do differently. But not impossible I guess - needs some more thinking.

TODO:

  • Validate this idea for a multi-request middleware, such as a RetryMiddleware.
  • Needs tests.
  • Needs docs.

@florimondmanca florimondmanca marked this pull request as draft August 5, 2020 21:49
@florimondmanca florimondmanca mentioned this pull request Aug 5, 2020
@StephenBrown2 StephenBrown2 mentioned this pull request Aug 7, 2020
@johnanthonyowens
Copy link

Another use case for middleware is tracing. We’re using Datadog APM tracing in our services and in order to get trace spans for all HTTP requests I’m currently monkeypatching send_single_request() to wrap a trace context manager around the request. This wouldn’t be possible with the proposal as I understand it because the middleware wouldn’t be able to observe redirects. One could argue that the right way to handle this is by wrapping httpcore in a custom transport that adds the tracing and then supplying that to my HTTPX clients - fair enough, I just haven’t tried that yet to see it works out in practice, and the monkeypatching approach is pretty trivial and wasn’t explicitly unsupported until the 0.14 release added the underscore. 😀

@tomchristie
Copy link
Member

@johnanthonyowens It might be worth opening a separate issue to discuss that in more detail. Things that'd be useful reference points here would be...

  • What exactly where you tracking previously, and how?
  • How does datadog handle this with requests or how would you be handling it if you were working against the requests API?
  • Would request,redirect, and response event hooks be sufficient here for your use case?

@ionelmc
Copy link

ionelmc commented Sep 2, 2020

Is there anything I can use right now to implement a response cache?

@florimondmanca
Copy link
Member Author

@ionelmc Most likely a Client / AsyncClient subclass that overrides .send()?

@tomchristie
Copy link
Member

@ionelmc A sensible first thing to do with any question like that is to start with "how would I do this with requests" - have a look around in their ecosystem, and see if there's any implementations that do the same thing there, and then think about which part of the API it's plugged into.

The main override points for httpx for stuff like that are either:

  • As @florimondmanca says, override .send() on the client instance, to wrap up some additional behaviour.
  • Create a custom transport implementation that wraps up some additional behaviour at that layer, calling into the connection pool as needed.

@florimondmanca
Copy link
Member Author

Just described a lighter form of this "middleware API" idea in the form of "interceptors", here… #790 (comment)

It's basically the same than the API proposed in this draft PR, except it's callable-based (sync functions for Client, async functions for AsyncClient). We get the drawback of having the sync/async schism exposed to developers and users (two kinds of everything), but it's also more lightweight in that there's no requirement to deal with classes.

@florimondmanca
Copy link
Member Author

florimondmanca commented Nov 20, 2020

Okay, going to close this off again. For "middleware" that just wraps a request between the client and the final transport, it's already perfectly doable using the transport API, even though the instantiation pattern is a bit quirky for now (though not terrible).

import hstspreload
import httpcore
import httpx


class HSTSTransport(httpcore.SyncHTTPTransport, httpcore.AsyncHTTPTransport):
    """
    A transport wrapper that enforces HTTPS on websites that are on the
    Chromium HSTS Preload list, mimicking the behavior of web browsers.
    """
    def __init__(self, transport: Union[httpcore.SyncHTTPTransport, httpcore.AsyncHTTPTransport]) -> None:
        self._transport = transport

    def _maybe_https_url(self, url: tuple) -> tuple:
        scheme, host, port, path = url

        if (
            scheme == b"http"
            and hstspreload.in_hsts_preload(host.decode())
            and len(host.decode().split(".")) > 1
        ):
            port = None if port == 80 else port
            return b"https", host, port, path

        return url

    def request(self, method, url, headers, stream, ext):
        url = self._maybe_https_url(url)
        return self._transport.request(method, url, headers, stream, ext)

    async def arequest(self, method, url, headers, stream, ext):
        url = self._maybe_https_url(url)
        return await self._transport.arequest(method, url, headers, stream, ext)


transport = httpx.HTTPTransport()  # Soon
transport = HSTSTransport(transport)

with httpx.Client(transport=transport):
    ...

@florimondmanca florimondmanca deleted the middleware branch November 20, 2020 20:34
@johtso
Copy link
Contributor

johtso commented Nov 21, 2020

@ionelmc also, regarding response caching.. this should be usable https://github.com/johtso/httpx-caching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants