Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept for parallel requests #52

Closed
wants to merge 2 commits into from
Closed

Conversation

tomchristie
Copy link
Member

@tomchristie tomchristie commented May 10, 2019

To issue a bunch of requests together...

import asyncio
import httpcore

async def lets_go():
    client = httpcore.Client()
    with client.parallel() as p:
        for idx in range(10):
            p.request_soon('GET', 'http://example.com/')

        while p.has_pending_responses:
            r = await p.next_response()
            print(r)

asyncio.run(lets_go())

Alternatively, to issue requests in parallel and deal with each as if they'd been issued sequentially...

import asyncio
import httpcore

async def lets_go():
    client = httpcore.Client()
    with client.parallel() as p:
        pending1 = p.request_soon('GET', 'http://example.com/')
        pending2 = p.request_soon('GET', 'http://example.com/')

        r = await pending1.get_response()
        print(r)
        r = await pending2.get_response()
        print(r)

asyncio.run(lets_go())

This gets more exciting if we follow through on #51 and go sync first.
(Since we'll be able to provide a standard sync interface, on top of an async parallelization backend)

Very much open to API rejigs on this one. Considerations with why it looks the way it does...

  • Strict cancellation once you leave the parallel block.
  • Exception handling just fits in with the standard flow. (Ie. use try/catch around next_response or get_response if you want it.)

@taoufik07
Copy link
Contributor

What about adding a bulk request, that gather tasks and await for results, I can't think of any use case (maybe because I just woke up) but I m'sure there's quite

@jordic
Copy link

jordic commented May 14, 2019

perhaps a join() to gather all pending results..

@tomchristie
Copy link
Member Author

Not really sold on it.

If you do that, then you need to consider how it interacts with exceptions, and what happens when multiple cases raise exceptions.

The only two types of primitives we need are get_response (Ask for a specific response) and next_response (Ask for whatever’s available next).

Once you’ve got those two, then building any wrapping behaviour around that is easy.

@jordic
Copy link

jordic commented May 22, 2019

@tomchristie
Copy link
Member Author

So I think TaskGroup is basically trio’s nursery concept. In which case, yeah the parallel blocks already adhere to that style.

@didip
Copy link

didip commented Jun 17, 2019

What about having a similar API to asyncio.gather or asyncio.wait? Example: https://stackoverflow.com/questions/42231161/asyncio-gather-vs-asyncio-wait/42246632

This way, users can reuse the same code between async-awaiting one task vs many tasks.

@StephenBrown2
Copy link
Contributor

Just commented over on #50 (comment) , but the requests-toolbelt has methods that allows it to join responses, handle exceptions, and whatnot: https://toolbelt.readthedocs.io/en/latest/threading.html

e.g. (cut-and-paste from the linked doc):

from requests_toolbelt.threaded import pool

urls = [
    # My list of URLs to get
]

p = pool.Pool.from_urls(urls)
p.join_all()

for response in p.responses():
    print('GET {0}. Returned {1}.'.format(response.request_kwargs['url'],
                                          response.status_code))
for exc in p.exceptions():
    print('GET {0}. Raised {1}.'.format(exc.request_kwargs['url'],
                                        exc.message))

new_pool = pool.Pool.from_exceptions(p.exceptions())
new_pool.join_all()

@inventionlabsSydney
Copy link

Hi everyone,
Sorry if this comes across as snarky, I don't mean it too...
Why is it that this is listed in the official documentation as implemented (see: https://www.encode.io/http3/async/) however on 0.6.5 client = http3.AsyncClient() does not have any reference to parallel() despite the documentation on that page suggesting otherwise.

I guess, it might be worth prefixing the page with "this is in alpha / beta" as to not mislead the user.

On that note, will this be merged soon / is there a schedule on this?
Thanks and love the project.

--Karl.

@tomchristie
Copy link
Member Author

@inventionlabsSydney So - we've got a prominent note at the top of https://www.encode.io/http3/parallel/

I hadn't noticed that we also need one in the place that we link to it from the AsyncClient docs.

@inventionlabsSydney
Copy link

inventionlabsSydney commented Jul 3, 2019 via email

@tomchristie tomchristie mentioned this pull request Jul 3, 2019
@tomchristie tomchristie mentioned this pull request Jul 23, 2019
12 tasks
@StephenBrown2
Copy link
Contributor

I'd love to see this implemented, and I see it's just a wee bit out of date with master. Is there anything I can do to assist?

@tomchristie
Copy link
Member Author

I guess you could start out by issuing a new pull request that brings this up to date with master.

@StephenBrown2
Copy link
Contributor

Should I start with this branch, or parallel-2?

@tomchristie
Copy link
Member Author

Don't mind either way, but a new branch name might be simpler?

@StephenBrown2
Copy link
Contributor

Yeah, I'd start with a new branch from current master, just wanted to see if one or the other had better logic to copy over and update. I should have some time this weekend to bang something out.

@kokes
Copy link

kokes commented Aug 7, 2019

I'm not terribly versed in async/await, so here's my 2c about the non-async suggestions:

  1. The has_pending and .next_response() don't feel very Pythonic to me. I think this could be remedied with an iterator with no explicit order - see e.g. imap_unordered in Python's multiprocessing package.
  2. There needs to be some throttling/threadcount limitation, because you don't want to overload a given site/API.

I've combined these two observations with my own implementation of parallel requests a while ago, just writing some pseudocode from memory:

with Pool(8) as p:
   for response in p.imap_unordered(requests.get, [site_a, site_b, site_c, ...]):
    process(response)

I'm not suggesting using this exact pattern, I'm just trying to influence the design to be a) more pythonic, b) more controllable in terms of traffic.

@pquentin
Copy link
Contributor

pquentin commented Aug 7, 2019

Here's another possible API: https://github.com/python-trio/trimeter

@berislavlopac
Copy link

A while ago I have implemented essentially a pooled async client (along the lines of https://medium.com/@cgarciae/making-an-infinite-number-of-requests-with-python-aiohttp-pypeln-3a552b97dc95), and I had the idea (but not the time) that it might make sense to mimic the already familiar interface from concurrent.futures by introducing a TaskPoolExecutor. This way httpx could use any of the parallelism mechanisms, by passing the right executor type... Thoughts?

@cjw296
Copy link

cjw296 commented Nov 22, 2019

Oh, I missed that big warning banner - shame, this is exactly what I'm looking for. Any idea on when it might land?

@tomchristie tomchristie deleted the parallel branch November 26, 2019 11:20
@cjw296
Copy link

cjw296 commented Nov 26, 2019

Oh :'(

@tomchristie
Copy link
Member Author

Any idea on when it might land?

It's not really neccessary if you're using httpx in async land - just use the existing concurrency primitives there.

Wrt. parallel requests on sync, that'd depend on us first coming back to how we want to approach the sync client. See #522, #544.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants