Proof of concept for parallel requests #52

tomchristie · 2019-05-10T11:40:21Z

To issue a bunch of requests together...

import asyncio
import httpcore

async def lets_go():
    client = httpcore.Client()
    with client.parallel() as p:
        for idx in range(10):
            p.request_soon('GET', 'http://example.com/')

        while p.has_pending_responses:
            r = await p.next_response()
            print(r)

asyncio.run(lets_go())

Alternatively, to issue requests in parallel and deal with each as if they'd been issued sequentially...

import asyncio
import httpcore

async def lets_go():
    client = httpcore.Client()
    with client.parallel() as p:
        pending1 = p.request_soon('GET', 'http://example.com/')
        pending2 = p.request_soon('GET', 'http://example.com/')

        r = await pending1.get_response()
        print(r)
        r = await pending2.get_response()
        print(r)

asyncio.run(lets_go())

This gets more exciting if we follow through on #51 and go sync first.
(Since we'll be able to provide a standard sync interface, on top of an async parallelization backend)

Very much open to API rejigs on this one. Considerations with why it looks the way it does...

Strict cancellation once you leave the parallel block.
Exception handling just fits in with the standard flow. (Ie. use try/catch around next_response or get_response if you want it.)

taoufik07 · 2019-05-11T04:10:20Z

What about adding a bulk request, that gather tasks and await for results, I can't think of any use case (maybe because I just woke up) but I m'sure there's quite

jordic · 2019-05-14T17:32:54Z

perhaps a join() to gather all pending results..

tomchristie · 2019-05-14T18:34:13Z

Not really sold on it.

If you do that, then you need to consider how it interacts with exceptions, and what happens when multiple cases raise exceptions.

The only two types of primitives we need are get_response (Ask for a specific response) and next_response (Ask for whatever’s available next).

Once you’ve got those two, then building any wrapping behaviour around that is easy.

jordic · 2019-05-22T19:17:32Z

have you seen this?
https://github.com/edgedb/edgedb/blob/master/edb/common/taskgroup.py

tomchristie · 2019-05-23T17:15:36Z

So I think TaskGroup is basically trio’s nursery concept. In which case, yeah the parallel blocks already adhere to that style.

didip · 2019-06-17T22:25:12Z

What about having a similar API to asyncio.gather or asyncio.wait? Example: https://stackoverflow.com/questions/42231161/asyncio-gather-vs-asyncio-wait/42246632

This way, users can reuse the same code between async-awaiting one task vs many tasks.

StephenBrown2 · 2019-06-25T20:03:38Z

Just commented over on #50 (comment) , but the requests-toolbelt has methods that allows it to join responses, handle exceptions, and whatnot: https://toolbelt.readthedocs.io/en/latest/threading.html

e.g. (cut-and-paste from the linked doc):

from requests_toolbelt.threaded import pool

urls = [
    # My list of URLs to get
]

p = pool.Pool.from_urls(urls)
p.join_all()

for response in p.responses():
    print('GET {0}. Returned {1}.'.format(response.request_kwargs['url'],
                                          response.status_code))
for exc in p.exceptions():
    print('GET {0}. Raised {1}.'.format(exc.request_kwargs['url'],
                                        exc.message))

new_pool = pool.Pool.from_exceptions(p.exceptions())
new_pool.join_all()

inventionlabsSydney · 2019-07-03T00:39:09Z

Hi everyone,
Sorry if this comes across as snarky, I don't mean it too...
Why is it that this is listed in the official documentation as implemented (see: https://www.encode.io/http3/async/) however on 0.6.5 client = http3.AsyncClient() does not have any reference to parallel() despite the documentation on that page suggesting otherwise.

I guess, it might be worth prefixing the page with "this is in alpha / beta" as to not mislead the user.

On that note, will this be merged soon / is there a schedule on this?
Thanks and love the project.

--Karl.

tomchristie · 2019-07-03T07:57:10Z

@inventionlabsSydney So - we've got a prominent note at the top of https://www.encode.io/http3/parallel/

I hadn't noticed that we also need one in the place that we link to it from the AsyncClient docs.

inventionlabsSydney · 2019-07-03T07:58:23Z

My apologies Tom I should have seen that notice! From: Tom Christie <notifications@github.com> Reply-To: encode/http3 <reply@reply.github.com> Date: Wednesday, 3 July 2019 at 5:57 pm To: encode/http3 <http3@noreply.github.com> Cc: Karl Kloppenborg <karl@hyperconnect.com.au>, Mention <mention@noreply.github.com> Subject: Re: [encode/http3] Proof of concept for parallel requests (#52) @inventionlabsSydney So - we've got a prominent note at the top of https://www.encode.io/http3/parallel/ I hadn't noticed that we also need one in the place that we link to it from the AsyncClient docs. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

StephenBrown2 · 2019-07-24T15:26:30Z

I'd love to see this implemented, and I see it's just a wee bit out of date with master. Is there anything I can do to assist?

tomchristie · 2019-07-24T15:36:07Z

I guess you could start out by issuing a new pull request that brings this up to date with master.

StephenBrown2 · 2019-07-25T12:12:18Z

Should I start with this branch, or parallel-2?

tomchristie · 2019-07-25T12:29:16Z

Don't mind either way, but a new branch name might be simpler?

StephenBrown2 · 2019-07-25T13:43:44Z

Yeah, I'd start with a new branch from current master, just wanted to see if one or the other had better logic to copy over and update. I should have some time this weekend to bang something out.

kokes · 2019-08-07T08:49:17Z

I'm not terribly versed in async/await, so here's my 2c about the non-async suggestions:

The has_pending and .next_response() don't feel very Pythonic to me. I think this could be remedied with an iterator with no explicit order - see e.g. imap_unordered in Python's multiprocessing package.
There needs to be some throttling/threadcount limitation, because you don't want to overload a given site/API.

I've combined these two observations with my own implementation of parallel requests a while ago, just writing some pseudocode from memory:

with Pool(8) as p:
   for response in p.imap_unordered(requests.get, [site_a, site_b, site_c, ...]):
    process(response)

I'm not suggesting using this exact pattern, I'm just trying to influence the design to be a) more pythonic, b) more controllable in terms of traffic.

pquentin · 2019-08-07T09:03:56Z

Here's another possible API: https://github.com/python-trio/trimeter

berislavlopac · 2019-09-19T22:38:24Z

A while ago I have implemented essentially a pooled async client (along the lines of https://medium.com/@cgarciae/making-an-infinite-number-of-requests-with-python-aiohttp-pypeln-3a552b97dc95), and I had the idea (but not the time) that it might make sense to mimic the already familiar interface from concurrent.futures by introducing a TaskPoolExecutor. This way httpx could use any of the parallelism mechanisms, by passing the right executor type... Thoughts?

cjw296 · 2019-11-22T18:25:10Z

Oh, I missed that big warning banner - shame, this is exactly what I'm looking for. Any idea on when it might land?

cjw296 · 2019-11-26T11:22:10Z

Oh :'(

tomchristie · 2019-11-26T11:23:49Z

Any idea on when it might land?

It's not really neccessary if you're using httpx in async land - just use the existing concurrency primitives there.

Wrt. parallel requests on sync, that'd depend on us first coming back to how we want to approach the sync client. See #522, #544.

tomchristie added 2 commits May 10, 2019 12:27

Proof of concept for parallel requests

ac3d554

Add Client.parallel()

6d4294e

tomchristie mentioned this pull request Jul 3, 2019

Community discussion #78

Closed

tomchristie mentioned this pull request Jul 23, 2019

Blockers for 1.0 release. #137

Closed

12 tasks

StephenBrown2 mentioned this pull request Aug 20, 2019

Make ASGI dispatcher asyncio-agnostic #247

Closed

5 tasks

tomchristie mentioned this pull request Aug 21, 2019

Proposal: AsyncClient API unification #258

Closed

PrimozGodec mentioned this pull request Sep 23, 2019

Resending requests in batches >= 1000 biolab/orange3-imageanalytics#148

Closed

aaugustin mentioned this pull request Oct 12, 2019

Random crash when sending many notifications Pr0Ger/PyAPNs2#92

Closed

PrimozGodec mentioned this pull request Oct 18, 2019

Hyper alternative biolab/orange3-imageanalytics#146

Closed

iluxonchik mentioned this pull request Nov 17, 2019

Python 3.6 RuntimeError: read() called while another coroutine is already waiting for incoming data #382

Closed

tomchristie closed this Nov 26, 2019

tomchristie deleted the parallel branch November 26, 2019 11:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept for parallel requests #52

Proof of concept for parallel requests #52

tomchristie commented May 10, 2019 •

edited

Loading

taoufik07 commented May 11, 2019

jordic commented May 14, 2019

tomchristie commented May 14, 2019

jordic commented May 22, 2019

tomchristie commented May 23, 2019

didip commented Jun 17, 2019 •

edited

Loading

StephenBrown2 commented Jun 25, 2019

inventionlabsSydney commented Jul 3, 2019

tomchristie commented Jul 3, 2019

inventionlabsSydney commented Jul 3, 2019 via email

StephenBrown2 commented Jul 24, 2019

tomchristie commented Jul 24, 2019

StephenBrown2 commented Jul 25, 2019

tomchristie commented Jul 25, 2019

StephenBrown2 commented Jul 25, 2019

kokes commented Aug 7, 2019 •

edited

Loading

pquentin commented Aug 7, 2019

berislavlopac commented Sep 19, 2019

cjw296 commented Nov 22, 2019

cjw296 commented Nov 26, 2019

tomchristie commented Nov 26, 2019

Proof of concept for parallel requests #52

Proof of concept for parallel requests #52

Conversation

tomchristie commented May 10, 2019 • edited Loading

taoufik07 commented May 11, 2019

jordic commented May 14, 2019

tomchristie commented May 14, 2019

jordic commented May 22, 2019

tomchristie commented May 23, 2019

didip commented Jun 17, 2019 • edited Loading

StephenBrown2 commented Jun 25, 2019

inventionlabsSydney commented Jul 3, 2019

tomchristie commented Jul 3, 2019

inventionlabsSydney commented Jul 3, 2019 via email

StephenBrown2 commented Jul 24, 2019

tomchristie commented Jul 24, 2019

StephenBrown2 commented Jul 25, 2019

tomchristie commented Jul 25, 2019

StephenBrown2 commented Jul 25, 2019

kokes commented Aug 7, 2019 • edited Loading

pquentin commented Aug 7, 2019

berislavlopac commented Sep 19, 2019

cjw296 commented Nov 22, 2019

cjw296 commented Nov 26, 2019

tomchristie commented Nov 26, 2019

tomchristie commented May 10, 2019 •

edited

Loading

didip commented Jun 17, 2019 •

edited

Loading

kokes commented Aug 7, 2019 •

edited

Loading