Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Implement a strategy for handling (async) generator cleanup #265
There's a problem with generators and async generators: they don't necessarily iterate to the end. Sometimes they get suspended in the middle, and then garbage collected. When this happens, their
So for example, this code:
def generator_fn(): try: yield finally: print(trio.current_task()) async def caller(): for _ in generator_fn(): break
Here the generator's
PEP 533 would be the real fix, but that seems to be stalled for now.
For regular generators, we just need to document this; I can't see what else we can do. There are some related issues in #264 that also need documentation; probably they should go in the same section of the manual.
For async generators, this problem is both better and worse. The thing about async generators is that the gc can't actually clean them up directly: it would like to do the same thing as it does for regular generators and have
The intention was that these hooks are used to let the event loop take care of doing
So we aren't going to use these hooks in the way they were originally intended. But, until PEP 533 becomes a thing, maybe we can at least use these hooks for harm reduction.
The good thing about the
(If PEP 533 is too much for 3.7, then maybe a smaller ask would be set_gen_hooks() that work the same way but for synchronous generators?)
referenced this issue
Aug 4, 2017
referenced this issue
Aug 19, 2017
added a commit
Dec 24, 2017
referenced this issue
Feb 14, 2018
Here's an idea: what if we scope async generators to the task that first iterated them? That is:
There's some subtlety here as far as treating the nested child sorta-task like a task for these purposes, and scoping asyncgens created there to the direct-parent nursery rather than to the lifetime of the overarching task that opened that nursery, but I think that's mostly details.
This is only a partial solution; it still doesn't deal correctly with asyncgens that create nurseries/cancel scopes.
From reading the discussion in #264, it seems like the desired behavior here would be to limit each turn of agen() to 0.8 seconds, and the whole thing (including time spent by caller() when agen() is suspended) to 6 seconds. That gives you a shorter-lived cancel scope (in caller()) more deeply in the cancel scope stack than a longer-lived one (in agen()). I guess context variables are supposed to solve this? I feel vaguely like it should be solvable without them, but can't currently think of how. I guess cancel scopes could detect that they were created inside an asyncgen (via... stack introspection? ugh) and mark themselves accordingly, and cancel scope
Thinking about this more:
Expanding on that last bullet point: we want to distinguish
(the second should interrupt the sleep while the first should not). Luckily, the firstiter hook is called when the coroutine returned by asend() is constructed, before the send(None) is performed and thus before any of the async generator code is executed. So if we add asyncgens to
I'm sure the devil will be in the details, but can you see any missing pieces in the overall idea?
None of this helps with regular non-async generators that create a cancel scope. But since there's not much point to making a cancel scope that you can't put any awaits in, I don't think there's any use for such a generator besides decorating it with @contextmanager, and we've established we like the existing behavior in that case.
Even more wrinkles: asyncgens can iterate over other asyncgens! I don't have time to think through the implications of this right now; mostly leaving a note here for my future self. I can't immediately come up with any reasons this would invalidate the approach I'm proposing, but it probably does at least mean we need to be careful about the order in which we apply the asyncgen cancel scopes to the cancel scope stack.
Also, ag_await is None is not enough to know an asyncgen is suspended at a yield -- it could just be exhausted. The condition we want for is-currently-suspended-via-yield-to-caller is
Random technical notes
I was definitely going to ask about bpo-32526; that's a pretty obscure issue, so nice catch :-).
It's also legal to create an async generator in one task and then iterate it in another task. Actually I've even used this in an example... in the live-coded happy eyeballs, I cheat a little bit, because I claim that it's doing the same thing as the twisted happy eyeballs code, but twisted's version actually supports incremental
async def fake_incremental_getaddrinfo(*args, **kwargs): for target in await getaddrinfo(*args, **kwargs): yield target
Then in the happy eyeballs code, each child task instead of doing
try: target = await targets_aiter.__anext__() except StopAsyncIteration: return
You also need to do a bit of fiddling with the
I suppose one could keep a global list of all living async generators, and scan them whenever a cancel scope is created or a task traps, but this would be way too expensive. (It might even be too expensive if we could restrict to a single task's generators... these are pretty performance-sensitive operations.)
Another potential way to map cancel scopes to async generators would be to use the
But let's step back a bit
The bigger question is what semantics we're even going for here. I think in this issue we should try to focus on just the GC issues, since they're simpler. (I added a link back to here from #264 though so the comments don't get forgotten.)
If you want to work on this, then probably the first thing that would be useful would be to figure out exactly what Python is doing right now, with no hooks defined :-). I wrote down a guess in the first post, but I haven't checked the details (which may require reading CPython's source and stuff).
There's also a simple thing we could do that might be better than what we do now, not sure: guarantee that all async generators will get
Thanks for the happy-eyeballs example; that does seem like something we shouldn't gratuitously forbid.
python-trio/async_generator#14 has the GC hooks support, and AFAICT behaves identically to native async generators whether or not a finalizer hook is installed (and your hypothesis in the first post on this thread matches the results of my investigation).
GC: I like the idea of doing aclose_forcefully() to finalize a generator, if no one put an aclosing() block around it to make sure the potentially-blocking operations in its cleanup don't leak out of the cancel scope intended for them. As far as where to do this (for purposes of exception propagation), how do you feel about "the nearest enclosing task that's still around"? That is, we'd install a firstiter hook that adds a weakref to the async generator object to a list in the current Task, as above, but instead of closing async generators when their task completes, we'd just roll them up to be async generators of the parent task. And the finalizer hook would make it so aclose_forcefully() gets called in the context of the task in whose asyncgens list this generator currently lives (at the time of its GC, whatever that might be).
Cancel scopes: Unfortunately, you can't weakly reference a frame object. I guess we could still keep a set of
which might be implemented something like
And then we just have to implement CancelScope.suspend().
Er, nix the yield_from_ bit; clearly we do want to keep our cancel scopes active until the delegate actually yields something. But we could make async_generator.yield_from_ take an optional yield_ parameter, and say e.g.
Another idea: still assuming we make async_generator.yield_from_ support a user-specified yield_ function, we could move the cancel scope "stack switching" to the side doing the iteration rather than the side doing the yielding, and then people can write native-syntax async generators if they don't care about 3.5 compatibility:
so interestingly in aiohttp we were getting a very strange issue where a task was abandoned, followed by a
Note how the generator except/finally doesn't have the same task as when it was entered, and further that it's a completely different task that I did not create.
@devxpy We don't have a general workaround, and I don't think there's any prospect of 533 making it into 3.8. But I think the discussion in #638 may be getting closer to an acceptable workaround – in particular starting here: #638 (comment)
The idea is instead of having a generator that you start and stop and then have to figure out how to handle it being abandoned in the "stop" state, you run the "generator" as a concurrent task that sends values to the loop.