New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make yield_ and yield_from_ interoperate with native async generators again #16

Open
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@oremanj
Contributor

oremanj commented Apr 5, 2018

Instead of trying to explicitly construct AsyncGenValueWrapper objects, trick the interpreter into doing the wrapping and unwrapping for us. There's still some ctypes hackery involved, but it's much less finicky: just changing the type of an async generator object.

Joshua Oreman
Make yield_ and yield_from_ interoperate with native async generators…
… again

Instead of trying to explicitly construct AsyncGenValueWrapper objects, trick the interpreter into doing the wrapping and unwrapping for us. There's still some ctypes hackery involved, but it's much less finicky: just changing the type of an async generator object.
@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Apr 5, 2018

Codecov Report

Merging #16 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #16   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           7      7           
  Lines         972    992   +20     
  Branches       77     79    +2     
=====================================
+ Hits          972    992   +20
Impacted Files Coverage Δ
async_generator/_tests/test_async_generator.py 100% <100%> (ø) ⬆️
async_generator/_impl.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22eddc1...790155a. Read the comment docs.

codecov bot commented Apr 5, 2018

Codecov Report

Merging #16 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master    #16   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           7      7           
  Lines         972    992   +20     
  Branches       77     79    +2     
=====================================
+ Hits          972    992   +20
Impacted Files Coverage Δ
async_generator/_tests/test_async_generator.py 100% <100%> (ø) ⬆️
async_generator/_impl.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 22eddc1...790155a. Read the comment docs.

@oremanj oremanj requested a review from njsmith Apr 5, 2018

Show outdated Hide outdated async_generator/_impl.py Outdated
@oremanj

This comment has been minimized.

Show comment
Hide comment
@oremanj

oremanj May 2, 2018

Contributor

@njsmith ping? I'm interested in getting this merged and a new async_generator version released with it and #15 so I can depend on that version in other projects (e.g. make trio do something sane with the GC hooks). I'd be happy to get #17 in too, but that's a larger change and I don't want to tie this one's fate to that one. :-)

Contributor

oremanj commented May 2, 2018

@njsmith ping? I'm interested in getting this merged and a new async_generator version released with it and #15 so I can depend on that version in other projects (e.g. make trio do something sane with the GC hooks). I'd be happy to get #17 in too, but that's a larger change and I don't want to tie this one's fate to that one. :-)

@njsmith

This comment has been minimized.

Show comment
Hide comment
@njsmith

njsmith Aug 1, 2018

Member

@oremanj Sorry for being so slow to review this! This is going to be kind of a combined review of this (#16) and #17...

So first of all, I want to say again how incredibly impressive this work is on the technical level! I am in awe.

If I understand correctly, the pitch for this PR is "you can use yield_from_ without having to type @async_generator", and the pitch for #17 is "async generators get several times faster and their tracebacks stop being so incredibly terrible".

If I have the right, then I don't this PR is worth it. It's not that hard to write @async_generator, right? And whenever people skip writing it, it makes their code incompatible with PyPy, and also dependent on CPython implementation details in a way that could cause it to break in the future. That seems like a bad trade-off.

OTOH, #17 is somewhat attractive, if we can make it essentially transparent, so people don't write their code any differently but just magically get better runtime behavior. The tracebacks are really bad. I guess I am particularly annoyed at them because of how trio's nursery implementation uses @async_generator + @asynccontextmanager and that makes like, every trio traceback terrible. We shouldn't weight that too highly, since we need to rewrite the nursery to stop using @asynccontextmanager anyway (python-trio/trio#393), but still, the tracebacks are bad.

It's such an intrusive change that it does make me nervous – async_generator gets several thousand downloads a day now – but if it can be done in a seamless kind of way where it "just works" and no-one notices, then that'd be pretty cool actually. As written #17 isn't like that yet, with the deprecations and everything, and I'm not sure whether it's actually possible or not (combining bytecode introspection and rewriting code objects and ctypes leaves a lot of room for things to go subtly wrong!), but if you want to convince me I'm open to being convinced?

Member

njsmith commented Aug 1, 2018

@oremanj Sorry for being so slow to review this! This is going to be kind of a combined review of this (#16) and #17...

So first of all, I want to say again how incredibly impressive this work is on the technical level! I am in awe.

If I understand correctly, the pitch for this PR is "you can use yield_from_ without having to type @async_generator", and the pitch for #17 is "async generators get several times faster and their tracebacks stop being so incredibly terrible".

If I have the right, then I don't this PR is worth it. It's not that hard to write @async_generator, right? And whenever people skip writing it, it makes their code incompatible with PyPy, and also dependent on CPython implementation details in a way that could cause it to break in the future. That seems like a bad trade-off.

OTOH, #17 is somewhat attractive, if we can make it essentially transparent, so people don't write their code any differently but just magically get better runtime behavior. The tracebacks are really bad. I guess I am particularly annoyed at them because of how trio's nursery implementation uses @async_generator + @asynccontextmanager and that makes like, every trio traceback terrible. We shouldn't weight that too highly, since we need to rewrite the nursery to stop using @asynccontextmanager anyway (python-trio/trio#393), but still, the tracebacks are bad.

It's such an intrusive change that it does make me nervous – async_generator gets several thousand downloads a day now – but if it can be done in a seamless kind of way where it "just works" and no-one notices, then that'd be pretty cool actually. As written #17 isn't like that yet, with the deprecations and everything, and I'm not sure whether it's actually possible or not (combining bytecode introspection and rewriting code objects and ctypes leaves a lot of room for things to go subtly wrong!), but if you want to convince me I'm open to being convinced?

@oremanj

This comment has been minimized.

Show comment
Hide comment
@oremanj

oremanj Aug 2, 2018

Contributor

Thanks for the review!

I would say the pitch of #16 is "you can use await yield_from_ in a native async generator with native yield statements", like the async_generator package previously supported on Linux but not Windows. Being able to use await yield_ too is an unintended and mostly-harmless side effect.

(And if you leave off the @async_generator from an async function that uses only await yield_ and/or await yield_from_, and doesn't use any native yield statements, you get not an async generator but a normal async function, that will hopelessly confuse your event loop unless you happen to call it from another async generator -- i.e., you have written something that fits in a similar place as yield_() itself. Everything in this parenthetical is the same with or without #16, since there are no async generators involved.)


And, of course, the other half of the pitch for #16 is that it's required in order for #17 to work. I think I agree with you that we shouldn't commit #16 if we don't intend to commit some form of #17. I think that to the extent there's a case to be made for committing both of them, it goes like so:

If you're writing an async application that targets Python 3.6+ only, you're probably going to use native async generators, because they take less typing to write / are faster to execute / don't add lots of traceback frames / are arguably syntactically clearer. If your application uses an async library that wants to support 3.5, that library is probably going to use @async_generator functions, because it kinda has to. Now within your Python interpreter are two different beasts called "async generator", which behave in mostly but not entirely identical ways. That seems to me to be a recipe for confusion. My overall goal with this set of PRs (#15, #16, #17) has been to reduce the confusion by making @async_generator functions behave as much like native async generators as the constraints of the underlying implementation will permit. Making them identical under the hood seems to be about as much alike as we could possibly achieve. I think this is a good idea, though obviously it's allowed to compete with other good ideas like "don't depend on internal implementation details".


Regarding PyPy support: Apparently PyPy has reasonably solid 3.6 support on nightly these days, including async generators. I played around with its async generators and, via a similar type-punning scheme to the one used in this PR for CPython, got my hands on something that appears to be an AsyncGenValueWrapper object. Unfortunately, any attempt to use it beyond id() segfaults. My understanding of the PyPy object memory layout is based on a blog post from 2009, so it's not unreasonable that these difficulties might be resolved with a less cursory effort. (Very much not guaranteed to work out, though.)

I think #17 can be modified to make it transparent without much trouble. Currently it is very loud and conservative about the one tiny difference it's aware of (the ag_running thing), but I think it would be reasonable for us to just declare that if "ag_running is False while an asend is active but suspended" is good enough for CPython, it's good enough for us. As dicey as bytecode introspection can be in the general case, this one is looking for an extremely simple pattern with no potential for false positives -- maybe some wacky stack management could make us fail to detect an asyncgen that is truly safe, but we'll never think something is safe (returns only None) when it has non-None return paths. And the mechanics of the code object reconstruction is exactly the same thing that the stdlib types module does to set CO_ITERABLE_COROUTINE. Not that that inspires complete confidence, but at least it's not nothing. :-)

Any thoughts? I'm inclined to push a little more on seeing whether I can make this native integration strategy work with PyPy, and will feel less bullish about these diffs if I can't.

Contributor

oremanj commented Aug 2, 2018

Thanks for the review!

I would say the pitch of #16 is "you can use await yield_from_ in a native async generator with native yield statements", like the async_generator package previously supported on Linux but not Windows. Being able to use await yield_ too is an unintended and mostly-harmless side effect.

(And if you leave off the @async_generator from an async function that uses only await yield_ and/or await yield_from_, and doesn't use any native yield statements, you get not an async generator but a normal async function, that will hopelessly confuse your event loop unless you happen to call it from another async generator -- i.e., you have written something that fits in a similar place as yield_() itself. Everything in this parenthetical is the same with or without #16, since there are no async generators involved.)


And, of course, the other half of the pitch for #16 is that it's required in order for #17 to work. I think I agree with you that we shouldn't commit #16 if we don't intend to commit some form of #17. I think that to the extent there's a case to be made for committing both of them, it goes like so:

If you're writing an async application that targets Python 3.6+ only, you're probably going to use native async generators, because they take less typing to write / are faster to execute / don't add lots of traceback frames / are arguably syntactically clearer. If your application uses an async library that wants to support 3.5, that library is probably going to use @async_generator functions, because it kinda has to. Now within your Python interpreter are two different beasts called "async generator", which behave in mostly but not entirely identical ways. That seems to me to be a recipe for confusion. My overall goal with this set of PRs (#15, #16, #17) has been to reduce the confusion by making @async_generator functions behave as much like native async generators as the constraints of the underlying implementation will permit. Making them identical under the hood seems to be about as much alike as we could possibly achieve. I think this is a good idea, though obviously it's allowed to compete with other good ideas like "don't depend on internal implementation details".


Regarding PyPy support: Apparently PyPy has reasonably solid 3.6 support on nightly these days, including async generators. I played around with its async generators and, via a similar type-punning scheme to the one used in this PR for CPython, got my hands on something that appears to be an AsyncGenValueWrapper object. Unfortunately, any attempt to use it beyond id() segfaults. My understanding of the PyPy object memory layout is based on a blog post from 2009, so it's not unreasonable that these difficulties might be resolved with a less cursory effort. (Very much not guaranteed to work out, though.)

I think #17 can be modified to make it transparent without much trouble. Currently it is very loud and conservative about the one tiny difference it's aware of (the ag_running thing), but I think it would be reasonable for us to just declare that if "ag_running is False while an asend is active but suspended" is good enough for CPython, it's good enough for us. As dicey as bytecode introspection can be in the general case, this one is looking for an extremely simple pattern with no potential for false positives -- maybe some wacky stack management could make us fail to detect an asyncgen that is truly safe, but we'll never think something is safe (returns only None) when it has non-None return paths. And the mechanics of the code object reconstruction is exactly the same thing that the stdlib types module does to set CO_ITERABLE_COROUTINE. Not that that inspires complete confidence, but at least it's not nothing. :-)

Any thoughts? I'm inclined to push a little more on seeing whether I can make this native integration strategy work with PyPy, and will feel less bullish about these diffs if I can't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment