Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace background managers #526

Closed
wants to merge 2 commits into from
Closed

Conversation

florimondmanca
Copy link
Member

@florimondmanca florimondmanca commented Nov 13, 2019

Disclaimer: this PR is dense, and I'm not expecting it to be merged. If the idea seems sound, I'll cut it into 3 pieces (ASGI dispatcher, HTTP/1.1 dispatcher, HTTP/2 dispatcher) that can be reviewed and merged independently.


Currently, the way we send the request and receive the response, in both HTTP/1.1 and HTTP/2 cases, is:

  1. Send the request headers and body, and receive the response headers concurrently.
  2. Receive the response body.

Right now, 1) is achieved via a custom BackgroundManager API. This works, but it has its own quirks. In particular, we can't always adhere to the context-managed usage, meaning that we sometimes have to manually deal with .__aenter__() and __aexit__() (e.g. in the ASGI dispatcher), which is fiddly and bug-prone. The context manager implementation also makes propagation of exceptions difficult (currently I'm not sure we get that aspect right).

@tomchristie, I think we mentioned several times in the past that the current situation is not ideal. urllib3-async takes a different approach which this PR takes inspiration from (cc @pquentin).

This PR proposes to split the background manager API into several more narrowly-scoped primitives that address the two actual use cases we use BackgroundManager for:

  1. Run coroutines in parallel and wait for them to finish. (Used to send the request and receive the response concurrently, in the h11 and h2 dispatchers.)
  2. Start a coroutine in the background while we're doing other things in the foreground. (Used in the ASGI dispatcher.)

Point 1) is solved by two new APIs. First, a writer/reader pair:

  • Writer gets bytes from the client code (via a callback defined in the dispatcher), and writes them to the network (via the socket stream).
  • Reader reads bytes from the network (via the socket stream), and passes them to the client code (via a callback defined in the dispatcher).

The construction of these coroutines is provided by a single .build_writer_reader_pair() implementation in the base backend, which builds on top of .write() and .read().

Then we have a .run_concurrently() method (implemented by each backend) that runs those coroutines in parallel and waits for them to terminate.

We can then use this API to send the request (headers and body) and receive the response headers. This requires a bit of rejigging to the h11 and h2 dispatchers, as the corresponding methods must be turned into (async) iterators, whose yielded chunks of data are fed into the writer/reader pair. (Hopefully the code is more clear than I can be in prose.)

Finally, to solve 2), a new .start_in_background() method is introduced. It returns an opaque callback that we can call to properly terminate the coroutine.

Still to do:

  • Use the AsyncExitStack backport on 3.6. (Require users to install it, because it's only used to make the trio implementation more robust.)

I'll admit it's not immediately obvious why this is an interesting change, as the resulting code is not the most natural. So happy to read feedback on this!

@florimondmanca florimondmanca requested review from tomchristie and a team November 13, 2019 19:28
@florimondmanca florimondmanca added the refactor Issues and PRs related to code refactoring label Nov 13, 2019
@florimondmanca
Copy link
Member Author

@tomchristie Considering discussion happening in #522, does this have a chance of getting in? I guess there's a bit more investigation about why we even need to write the request/read the response concurrently, compared to what urllib3 is doing? I'll close it for now to keep things tidy, but I'm still happy to read feedback and thoughts.

@tomchristie
Copy link
Member

We almost certainly don’t need to be reading/writing concurrently.
I misunderstood the intent behind why the urllib3 fork takes that approach, and I don’t think it’s the right trade-off for us. I’m hoping to revisit it.

@pquentin
Copy link
Contributor

pquentin commented Nov 22, 2019

For what it's worth, I think the clarification occurred partly when discussing about this issue on the Trio Gitter: https://gitter.im/python-trio/general?at=5dceb2fdadd5717a88122484

Reiterating here so that it does not get lost: urllib3 first writes the request, including the body, and only reads the response after that. We took a different approach in hip to support early responses, a more advanced HTTP feature. (It's mentioned in RFC 2616, not sure about the current RFCs.) Early responses allow servers to answer before they receive the whole body, eg. with an authentication error. The send_and_receive_for_a_while function is a novel way to support such early responses with trio, twisted and the (sync) standard library: three really different ways to perform I/O which we'd like to support.

At least that what's I understood of it, I was not part of the project when this got implemented. :)

This does introduce complexity, so I understand why httpx would not want to include this. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Issues and PRs related to code refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants