You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is currently non-trivial to integrate blocking iterators into trio, see the issues #501 and #1308 about trio.Path.iterdir for instance.
The main problem is that there are many kind of iterators:
fast vs slow
small/finite vs infinite/large
ordered vs sporadic
Delegating to a thread is not free: in particular, there are two costy operations:
spawning a thread
switching contexts (i.e giving the control back to the trio event loop)
On my machine for instance, the full round trip is 200 us. This means that running trio.to_thread.run_sync for producing each item in range(5000) would add an overhead of about one second.
I've wrote a small benchmark comparing several approaches and I came up with one that (I think) comes close to ticking all the boxes:
importtrioimporttimeimportoutcomeasyncdefto_thread_iter_sync(fn, *args, cancellable=False, limiter=None):
"""Convert a blocking iteration into an async iteration using a thread. In order to attenuate the overhead of spawning threads and switching contexts, values from the blocking iteration are batched for a time one order of magnitude greater than the spawn time of a thread. """defrun_batch(items_iter, start_time):
now=time.monotonic()
spawn_time=now-start_timedeadline=now+10*spawn_timeifitems_iterisNone:
items_iter=iter(fn(*args))
batch= []
whileTrue:
try:
item=next(items_iter)
exceptExceptionasexc:
batch.append(outcome.Error(exc))
breakelse:
batch.append(outcome.Value(item))
iftime.monotonic() >deadline:
breakreturnitems_iter, batchitems_iter=NonewhileTrue:
items_iter, batch=awaittrio.to_thread.run_sync(
run_batch,
items_iter,
time.monotonic(),
cancellable=cancellable,
limiter=limiter
)
forresultinbatch:
try:
yieldresult.unwrap()
exceptStopIteration:
return
I'd like to submit a PR to add this function as trio.to_thread.iter_sync, if you think the idea is worth considering.
The text was updated successfully, but these errors were encountered:
If performance is valued a bit more over readability, you might take inspiration from a more functional take on the above code. Take it or leave it, just usually better not to iterate these inner loops within the python interpreter and instead inside the python interpreter's internal map/filter C-level loops.
However, haven't done any benchmarks here. If it's not much gain, then readability may still "win". Your call!
It is currently non-trivial to integrate blocking iterators into trio, see the issues #501 and #1308 about
trio.Path.iterdir
for instance.The main problem is that there are many kind of iterators:
Delegating to a thread is not free: in particular, there are two costy operations:
On my machine for instance, the full round trip is 200 us. This means that running
trio.to_thread.run_sync
for producing each item inrange(5000)
would add an overhead of about one second.I've wrote a small benchmark comparing several approaches and I came up with one that (I think) comes close to ticking all the boxes:
I'd like to submit a PR to add this function as
trio.to_thread.iter_sync
, if you think the idea is worth considering.The text was updated successfully, but these errors were encountered: