Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SyncBackend and use it on Client #525

Closed
wants to merge 2 commits into from
Closed

Conversation

florimondmanca
Copy link
Member

@florimondmanca florimondmanca commented Nov 13, 2019

Okay so, related to #508 and considering the current discussion to drop sync support #522, I figured I'd give the idea of a sync backend suggested by @pquentin a shot.

This PR adds a SyncBackend that performs blocking I/O and synchronization operations using the socket and threading modules.

The idea is that we keep the async-annotated ConcurrencyBackend interface, but require all async-annotated methods to actually be synchronous. This constraint is implemented by run_secretly_sync_async_function.

There's a bunch of code adapted from urllib3, in particular sync_backend.py, taking into account that we can use modern Python features.

There's still some polishing left, but what strikes me is how little existing code had to be modified. Only the Client was modified to use SyncBackend instead of AsyncioBackend. This is interesting w.r.t. the sync code generation idea: we could generate the sync client from the async one, and tweak the concurrency backend (as well as some other parts — I don't think dropping async annotations will be enough) and things should run smoothly.

Still, I think this PR has value on its own. We currently support sync, and it's definitely a good idea to have the sync client perform sync I/O at all times.

@florimondmanca florimondmanca requested a review from a team November 13, 2019 10:14
@florimondmanca florimondmanca force-pushed the feat/sync-backend branch 2 times, most recently from 39ebcc2 to 0f1a67a Compare November 13, 2019 10:15
@florimondmanca
Copy link
Member Author

florimondmanca commented Nov 13, 2019

Oh and, a proof that this allows the sync client to work seamlessly in async environments, which would fix #508:

>>> import asyncio
>>> import httpx
>>> async def main():
...     r = httpx.get("https://example.org")
...     print(r)
... 
>>> asyncio.run(main())
<Response [200 OK]>

I'd need to add tests for this as in #513, though. Edit: done.

@tomchristie
Copy link
Member

Great work!

I'm not sure if we'd want this in, in itself? Granted, it resolves #508, but it's a lot of extra complexity in order to get there.

I think it's more likely we'd want this alongside dropping the async-to-sync bridging code, and have this plugged into a code-gen'ed sync variant of httpx.

A couple of initial thoughts here...

  • There are likely potential race conditions inside the client, connection pools, and connections when used in a threaded context. We'd need to think about that anyplace that multiple threads will be sharing a single client.
  • Our sync implementation here uses a thread pool for the "read and write concurrently" case. I'm interested in how the existing urllib achieves read-and-write-for-a-bit, because I'm assuming that it's not by spawning a new thread?

@florimondmanca
Copy link
Member Author

florimondmanca commented Nov 13, 2019

a code-gen'ed sync variant of httpx

To be fair, I'm not sure that path would lead us to a less complex setup. :-) Besides, I'm afraid we're putting too much hope in this generated sync version of HTTPX. There are aspects we'll need to deal with anyway — like, how do we actually talk to a socket at a high enough level if we don't use asyncio or trio? Those aspects are encapsulated in this backend, and even with code gen I think we won't be able to get around it. (Maybe this is what you envision too; I'm not sure from your comment.)

There are likely potential race conditions inside the client,

Hmm, probably, yes. Is this a general issue we'd need to address, or do you think this particular PR makes things worse?

I'm assuming that it's not by spawning a new thread?

Yup, urllib3's implementation is even named send_and_receive_for_a_while(). This allows urllib3 to ditch any notion of BackgroundManager as we currently have, and instead manually alternate between sending and receiving data. We could switch to this style, but it would have to be a global change, not particularly in this sync backend.

@tomchristie
Copy link
Member

Yup, urllib3's implementation is even named send_and_receive_for_a_while().

Wrt. To that question I was refering specifically to the actual "urllib3" package, rather than to the trio fork. I'm interested in how the existing thread based library handles that case.

@florimondmanca
Copy link
Member Author

Wrt. To that question I was refering specifically to the actual "urllib3" package, rather than to the trio fork. I'm interested in how the existing thread based library handles that case.

Well, as of today, it seems urllib3 doesn't do any kind of concurrency at that point: it sends the request, then reads the response - source. Perhaps @pquentin can confirm?

@pquentin
Copy link
Contributor

pquentin commented Nov 14, 2019

Well, as of today, it seems urllib3 doesn't do any kind of concurrency at that point: it sends the request, then reads the response - source. Perhaps @pquentin can confirm?

Right. urllib3 is thread-safe: if you want more concurrency, you should open a pool manager and use it from different threads, and it will reuse connections efficiently. For what it's worth, this is why I was surprised to see that httpx uses threads, even in the trio version, as it makes it hard to cancel requests correctly.

@tomchristie
Copy link
Member

"I was surprised to see that httpx uses threads, even in the trio version" - That might(?) be a misunderstanding. We don't use threads for the async cases. (With the exception of some very limited threadpooling stuff, such as "read the SSL context from a file" where we use a threadpool so as to not block the task.)

@tomchristie
Copy link
Member

it sends the request, then reads the response

Okay, that's surprising. We'd been doing that in httpx, until it turned out that it doesn't actually cover everything correctly, and we were forced to switch to the more complicated "read and write concurrently", in the same way that trio's urllib3 fork has "read and write for a bit".

I don't recall the full details of that, but its not at all clear to me why that's not neccessary in urllib3's case. (I don't have the headspace to dive back into it, but perhaps to do with buffering and closed connections working differently in the sync case?)

In any case we're going off topic on this issue.

This PR is a "close" to me. There's significant extra complexity, and insufficient benefit.

The edge case of "you want to used httpx in a sync context, but there's also an existing async loop" only occurs in the slightly weird context of being in a repl with a fuzzy "you're not really in sync or async mode" state. The simplest, more sensible way to deal with that is to have the sync client transparently run the request within a threadpool in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants