-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for async callbacks #4978
Support for async callbacks #4978
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4978 +/- ##
==========================================
+ Coverage 88.70% 88.85% +0.14%
==========================================
Files 162 162
Lines 10787 10962 +175
Branches 1851 1894 +43
==========================================
+ Hits 9569 9740 +171
Misses 942 942
- Partials 276 280 +4
|
Not sure why some related tests fail on 3.6 asyncio-pinned, I cannot reproduce this locally at this time. |
Now I can reproduce them on a freshly built 3.6.12 while it worked on my old 3.6.9, both from pyenv. |
It's Twisted 17.9.0-specific again. |
pytest-twisted installs the asyncio reactor with eventloop=None, so on Twisted 17.9.0 uses |
…gen-proper-rebased
This reverts commit 92f2c9e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the separate commits, it definitely made review much more approachable.
I think it may be best to add tests for parallel_async
that ensure complete code coverage for _AsyncCooperatorAdapter
, specially given its complexity (I won’t claim to fully understand it).
I love the documentation entry on how to handle waiting on deferreds now! When releasing 2.5, we should probably remember to add a brief mention about this to the release notes as a backward-incompatible change (on an experimental feature, but still), linking to the section for further details.
…e-asyncgen-proper-rebased
…n exception are processed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am proposing some changes documentation-wise, but otherwise this looks great to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @wRAR! +1 to merge, after merging in a clean up by @Gallaecio.
I only got superficial understanding of how the code works, but what I understood makes total sense to me :) Tests also makes sense, and the overall approach looks right.
Shouldn't we also update |
@auxsvr on one hand we haven't thought about this and doing this shouldn't be necessary to merge this feature as existing CrawlSpider-based spiders should continue to work and this support can be added separately (if you suspect this is not true please voice your suspicions), on other hand it should be possible to convert any CrawlSpider-based spider to a normal one if one wants to use async callbacks. |
…based Refactor the asynchronous process_spider_output documentation
A fix for the typing issues: wRAR#2 (and wRAR@56e2eea specifically) |
Asyncio parse fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overwhelmingly amazing!
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Tremendous work, thanks @wRAR! |
This PR adds support for defining
async def parse_*
and using bothyield
andawait
inside, with or without the asyncio reactor. There is one caveat: if the asyncio reactor is enabled, you cannot await on Deferreds directly and need to wrap them. I've added this to docs.This change was tested on two real-world projects, for one of them I also replaced using
inline_requests
to get HTTP responses inside a callback with alternativelytreq
oraiohttp
. I didn't test the performance, but as far as I can see the existing non-async callbacks shouldn't have any significant code path changes so there should be no regressions and if there are any perf problems with async callbacks we can solve that later.This doesn't require or support async def start_requests.
My main concern with this is large and non-intuitive
scrapy.utils.defer._AsyncCooperatorAdapter
. I would like to replace it with something better if that's possible but for now it seems to work correctly. I've tried to document it as comprehensively as possible.I've split it in several commits with different features in case this help with review for someone.