Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context-managed client usage #422

Closed
florimondmanca opened this issue Oct 1, 2019 · 10 comments · Fixed by #487
Closed

Context-managed client usage #422

florimondmanca opened this issue Oct 1, 2019 · 10 comments · Fixed by #487

Comments

@florimondmanca
Copy link
Member

#389 has seen some interesting discussions on context-manager usage of Client and AsyncClient, esp. #389 (comment) and #389 (comment). Figured it would be worth porting the discussion to a dedicated issue.

Retransmission of the two comments mentioned above:

@tomchristie

Context managed client instantiation

I've not yet seen any concrete evidence that backgrounding is strictly neccessary for HTTP/2 / HTTP/3 (vs. the alternative of allowing any background work to occur during any other interactions) I'm open to it potentially being a requirement, but I've not seen that throughly demonstrated at this point in time.

A good point of reference here would be to see how browsers treat HTTP/2 connections to hosts with open tabs - Do they continually ping them in the background? Does the behavior change for inactive tabs?

It'd be a fairly reasonable policy to allow connections to lapse with ping timeouts in cases where we're not performing any network activity with the client at all.

Either ways around, something that I'd like to see addressed there is how the client instantitation pattern looks in a standard web app setup. To explain that it's easiest to just work from the basic threaded case, and a bog-standard WSGI app, and ask "how does client setup look if you want to use a context-managed client instantiation?"

Typically in a flask app or whatever, if you wanted to use a single requests session for all outgoing requests you can do...

client = requests.Session()  # or httpx.Client()

Somewhere in some global app config, and then just import and use client everywhere.

I'm not sure how the pattern would look instead if you want to use a context managed with requests.Session() as client: ... block? (or httpx.Client() block.)

It makes more sense to answer that for the standard client / WSGI case, since it's the simpler variant of the two, but still has the same requirement if you wanted strictly block managed backgrounding.

@sethmlarson

What are your thoughts on having the Client only allow HTTP/2 within async with blocks?
We're going to hit more issues with this when we try to implement HTTP/3 with retransmission alarms.

Let's continue discussion here?

@florimondmanca
Copy link
Member Author

florimondmanca commented Oct 1, 2019

@tomchristie I understand HostedAPI is going to help answer the questions you mentioned in your comment?

Here's my take on what a Flask setup would look like, though.

Long-lived client

I don't think this is possible with Flask — it doesn't seem to support app-wide teardown callbacks, but only per-request callbacks when the app context is torn down.

I'd argue a single long-lived client would lead to all sorts of issues related to network resources over time though, so not sure this is something users should even consider.

Per-request client

We could naively with httpx.Client() our way in each view, but considering we'll typically want to set up some defaults on the client, here's a more realistic setup for an imaginary app powered by the TMDb API:

# project/tmdb.py

import os
import typing

import httpx
from flask import Flask, g


TMDB_API_KEY = os.environ["TMDB_API_KEY"]


def get_tmdb() -> httpx.Client:
    if "tmdb" not in g:
        # Create a client.
        # Note that this hints at keeping `Client.__enter__()` in
        # its current state - not doing anything particular.
        # (This is similar to the `open()` built-in not requiring us to manually call `__enter__()`
        # to use it without the `with` statement.)
        g.tmdb = httpx.Client(
            base_url="https://api.themoviedb.org/3", headers={"x-api-key": TMDB_API_KEY}
        )
    return g.tmdb


def close_tmdb() -> None:
    # Ensure the client is closed if `get_tmdb()` was called.
    tmdb: httpx.Client = g.pop("tmdb", None)
    if tmdb is not None:
        tmdb.close()


def init_app(app: Flask) -> None:
    app.teardown_appcontext(close_tmdb)


# project/movies.py

from flask import Blueprint, jsonify, request
from .tmdb import get_tmdb

bp = Blueprint("movies", __name__, url_prefix="movies")


@bp.route("/search")
def search_movies():
    q = request.args.get("q")
    with get_tmdb() as tmdb:
        # If we made multiple requests to the TMDb API here,
        # and assuming TMDb supports HTTP/2,
        # then we'd be reusing the connection across requests.
        # Would it be desirable to share connections across views, anyway?
        r = tmdb.get("/search/tv", params={"query": q, "page": 1})
        r.raise_for_status()
        rows = r.json()
        shows = [{"id": row["id"], "title": row["name"]} for row in rows]
        return jsonify(shows)


# project/app.py

from flask import Flask


def create_app() -> Flask:
    from . import tmdb
    from . import movies

    app = Flask(__name__)
    tmdb.init_app(app)
    app.register_blueprint(movies.bp)

    return app


if __name__ == "__main__":
    app = create_app()
    app.run()

Scaling up

In bigger projects, users would probably want to factor API calls into some kind of service class acting as a wrapper around the Client:

# project/tmdb.py

import os
import typing

import httpx
from flask import Flask, g


TMDB_API_KEY = os.environ["TMDB_API_KEY"]


class TMDbService:
    def __init__(self, client: httpx.Client):
        self.client = client

    def search_movies(self, q: str) -> typing.List[dict]:
        r = self.client.get("/search/tv", params={"query": q, "page": 1})
        r.raise_for_status()
        rows = r.json()
        shows = [{"id": row["id"], "title": row["name"]} for row in rows]
        return shows

    def close(self) -> None:
        self.client.close()


def get_tmdb() -> TMDbService:
    if "tmdb" not in g:
        client = httpx.Client(
            base_url="https://api.themoviedb.org/3", headers={"x-api-key": TMDB_API_KEY}
        )
        g.tmdb = TMDbService(client=client)
    return g.tmdb


def close_tmdb(exc: typing.Optional[Exception]) -> None:
    tmdb: TMDbService = g.pop("tmdb", None)
    if tmdb is not None:
        tmdb.close()


def init_app(app: Flask) -> None:
    app.teardown_appcontext(close_tmdb)


# project/movies.py

from flask import Blueprint, jsonify, request
from .tmdb import get_tmdb

bp = Blueprint("movies", __name__, url_prefix="movies")


@bp.route("/search")
def search_movies():
    q = request.args.get("q")
    with get_tmdb() as tmdb:
        return jsonify(tmdb.search_movies(q=q))

Conclusion

I think my take-away from this is that in a WSGI web app context, context-managed usage with the with statement is not often possible.

In fact, it's clearer to me now that enforcing context-managed usage is not synonymous for enforcing with usage. Some frameworks allow to automatically execute cleanup code by registering callbacks, and that works well enough — as long as clients expose a .close() method, which they already do.

@tomchristie
Copy link
Member

I'd argue a single long-lived client would lead to all sorts of issues related to network resources over time though, so not sure this is something users should even consider.

You do want to share the connection pool across all outgoing requests, so a single Client instance is preferable.

I think my clear take home is that "yeah we want to support with-block-context-managed usage, but we don't want to hard enforce it, and should have an explicit .close() for other cases."

I don't think there's a significant enough difference between the HTTP/1.1 and HTTP/2 cases for us to say that we only support HTTP/2 in a with-block:

  • HTTP/1.1: "An open Client may have open connections on it, and those resources won't be freed unless you're using it nicely in a with-block, or explicitly calling .close()"
  • HTTP/2: "An open Client may have open connections on it, and can send pings on 'em and those resources won't be freed unless you're using it nicely in a with-block, or explicitly calling .close()"

Wrt. "structured concurrency" what that'd map to is "Prefer structured concurrency as the norm everywhere, but do allow for explicitly breaking out from that in some limited well-defined contexts."

It's actually totally OK to have a dangling thread of control that's just "send pings on any open connections, and time out after a determined period". It's just that should be an explicitly non-standard case vs. a standard construct for "please perform these two seperate bits of application logic concurrently".

@tomchristie
Copy link
Member

This thread python-trio/hip#125 from @njsmith is nicely timed. Pretty much in line with how I'd see it.

@sethmlarson
Copy link
Contributor

That thread couldn't be any better on the timing. I like the global pool with a system level watcher. :)

@florimondmanca
Copy link
Member Author

florimondmanca commented Oct 2, 2019

I think my clear take home is that "yeah we want to support with-block-context-managed usage, but we don't want to hard enforce it, and should have an explicit .close() for other cases."

To be clear, this is the way I now see it as well after the quick Flask experiment above. :) And that thread on urllib3 seems in line with that reasoning.

@florimondmanca
Copy link
Member Author

Since it seems we’ve reached a common understanding, are there any actions points to take away from this issue before closing? Documentation updates maybe?

@tomchristie
Copy link
Member

We ought to:

  • Document with Client() as client, which we support, but don't currently document.
  • Document the alternative client.close() for explcitly closing the connection pool without block-usage.

We could(?) consider switching our top-level API cases to use a lazily created singleton client instance, so that they get connection pooling, and HTTP/2 support. We'd need to switch off cookie persistence on the singleton client, which isn't currently something we support doing, so we'd want a preliminary issue for supporting something like Client(cookies=False). I think we kinda want something like that in any case.

@njsmith
Copy link

njsmith commented Oct 3, 2019

We'd need to switch off cookie persistence on the singleton client, which isn't currently something we support doing, so we'd want a preliminary issue for supporting something like Client(cookies=False). I think we kinda want something like that in any case.

I thought you had an ABC for cookie storage? Just make a NullCookieJar that's always empty and silently discards any added cookies.

@njsmith
Copy link

njsmith commented Oct 3, 2019

Or maybe I'm misremembering, but if you don't you probably should :-)

@sethmlarson
Copy link
Contributor

We have a cookiejar abstraction, I think setting it to an always-empty jar like you describe is best. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants