Providing a way to wrap blocking iterators #1344

vxgmichel · 2019-12-20T10:59:10Z

It is currently non-trivial to integrate blocking iterators into trio, see the issues #501 and #1308 about trio.Path.iterdir for instance.

The main problem is that there are many kind of iterators:

fast vs slow
small/finite vs infinite/large
ordered vs sporadic

Delegating to a thread is not free: in particular, there are two costy operations:

spawning a thread
switching contexts (i.e giving the control back to the trio event loop)

On my machine for instance, the full round trip is 200 us. This means that running trio.to_thread.run_sync for producing each item in range(5000) would add an overhead of about one second.

I've wrote a small benchmark comparing several approaches and I came up with one that (I think) comes close to ticking all the boxes:

import trio
import time
import outcome

async def to_thread_iter_sync(fn, *args, cancellable=False, limiter=None):
    """Convert a blocking iteration into an async iteration using a thread.

    In order to attenuate the overhead of spawning threads and switching
    contexts, values from the blocking iteration are batched for a time one
    order of magnitude greater than the spawn time of a thread.
    """

    def run_batch(items_iter, start_time):
        now = time.monotonic()
        spawn_time = now - start_time
        deadline = now + 10 * spawn_time

        if items_iter is None:
            items_iter = iter(fn(*args))

        batch = []
        while True:

            try:
                item = next(items_iter)
            except Exception as exc:
                batch.append(outcome.Error(exc))
                break
            else:
                batch.append(outcome.Value(item))

            if time.monotonic() > deadline:
                break

        return items_iter, batch

    items_iter = None
    while True:

        items_iter, batch = await trio.to_thread.run_sync(
            run_batch,
            items_iter,
            time.monotonic(),
            cancellable=cancellable,
            limiter=limiter
        )

        for result in batch:
            try:
                yield result.unwrap()
            except StopIteration:
                return

I'd like to submit a PR to add this function as trio.to_thread.iter_sync, if you think the idea is worth considering.

The text was updated successfully, but these errors were encountered:

parity3 · 2019-12-25T23:42:07Z

If performance is valued a bit more over readability, you might take inspiration from a more functional take on the above code. Take it or leave it, just usually better not to iterate these inner loops within the python interpreter and instead inside the python interpreter's internal map/filter C-level loops.

However, haven't done any benchmarks here. If it's not much gain, then readability may still "win". Your call!

oremanj · 2020-05-12T09:13:40Z

@vxgmichel Sorry for the late response, but I think this would be a useful tool if you're still interested in submitting a PR!

vxgmichel · 2020-05-12T09:48:54Z

@oremanj I'm sorry I don't have time to work on this issue at the moment, I might get back to it later :)

vxgmichel mentioned this issue Dec 21, 2019

Path.iterdir/glob/rglob methods do not delegate file system calls to a thread #1308

Closed

oremanj added the threads label May 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Providing a way to wrap blocking iterators #1344

Providing a way to wrap blocking iterators #1344

vxgmichel commented Dec 20, 2019 •

edited

parity3 commented Dec 25, 2019

oremanj commented May 12, 2020

vxgmichel commented May 12, 2020

Providing a way to wrap blocking iterators #1344

Providing a way to wrap blocking iterators #1344

Comments

vxgmichel commented Dec 20, 2019 • edited

parity3 commented Dec 25, 2019

oremanj commented May 12, 2020

vxgmichel commented May 12, 2020

vxgmichel commented Dec 20, 2019 •

edited