Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing a way to wrap blocking iterators #1344

Open
vxgmichel opened this issue Dec 20, 2019 · 3 comments
Open

Providing a way to wrap blocking iterators #1344

vxgmichel opened this issue Dec 20, 2019 · 3 comments
Labels

Comments

@vxgmichel
Copy link
Contributor

vxgmichel commented Dec 20, 2019

It is currently non-trivial to integrate blocking iterators into trio, see the issues #501 and #1308 about trio.Path.iterdir for instance.

The main problem is that there are many kind of iterators:

  • fast vs slow
  • small/finite vs infinite/large
  • ordered vs sporadic

Delegating to a thread is not free: in particular, there are two costy operations:

  • spawning a thread
  • switching contexts (i.e giving the control back to the trio event loop)

On my machine for instance, the full round trip is 200 us. This means that running trio.to_thread.run_sync for producing each item in range(5000) would add an overhead of about one second.

I've wrote a small benchmark comparing several approaches and I came up with one that (I think) comes close to ticking all the boxes:

import trio
import time
import outcome

async def to_thread_iter_sync(fn, *args, cancellable=False, limiter=None):
    """Convert a blocking iteration into an async iteration using a thread.

    In order to attenuate the overhead of spawning threads and switching
    contexts, values from the blocking iteration are batched for a time one
    order of magnitude greater than the spawn time of a thread.
    """

    def run_batch(items_iter, start_time):
        now = time.monotonic()
        spawn_time = now - start_time
        deadline = now + 10 * spawn_time

        if items_iter is None:
            items_iter = iter(fn(*args))

        batch = []
        while True:

            try:
                item = next(items_iter)
            except Exception as exc:
                batch.append(outcome.Error(exc))
                break
            else:
                batch.append(outcome.Value(item))

            if time.monotonic() > deadline:
                break

        return items_iter, batch

    items_iter = None
    while True:

        items_iter, batch = await trio.to_thread.run_sync(
            run_batch,
            items_iter,
            time.monotonic(),
            cancellable=cancellable,
            limiter=limiter
        )

        for result in batch:
            try:
                yield result.unwrap()
            except StopIteration:
                return

I'd like to submit a PR to add this function as trio.to_thread.iter_sync, if you think the idea is worth considering.

@parity3
Copy link

parity3 commented Dec 25, 2019

If performance is valued a bit more over readability, you might take inspiration from a more functional take on the above code. Take it or leave it, just usually better not to iterate these inner loops within the python interpreter and instead inside the python interpreter's internal map/filter C-level loops.

However, haven't done any benchmarks here. If it's not much gain, then readability may still "win". Your call!

@oremanj
Copy link
Member

oremanj commented May 12, 2020

@vxgmichel Sorry for the late response, but I think this would be a useful tool if you're still interested in submitting a PR!

@vxgmichel
Copy link
Contributor Author

@oremanj I'm sorry I don't have time to work on this issue at the moment, I might get back to it later :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants