-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CloseSpider can be raised on spider_idle signal handler to set the closing reason #5191
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5191 +/- ##
==========================================
- Coverage 88.19% 84.11% -4.08%
==========================================
Files 162 162
Lines 10497 10502 +5
Branches 1517 1518 +1
==========================================
- Hits 9258 8834 -424
- Misses 965 1407 +442
+ Partials 274 261 -13
|
Example of a case where this would be useful: https://stackoverflow.com/questions/65688851/scrapy-change-closing-reason-from-finished-to-myreason |
Hey! The feature makes total sense to me.
|
Hi @kmike . I've added a test case and also documented a possible use case in the signals documentation page. I have a problem with the test |
This functionality makes a lot of sense to me too, nice contribution @ivanprado 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me!
That’s an issue in |
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Thank you @Gallaecio for your review 🍻. I've accepted all your suggestions. |
Thank you everybody for the reviews, commend and for merging it :-) |
The execution of a spider might have been successful or not depending on many conditions. For example, you might have expected many items to be extracted but none were extracted. It might be useful to be able to evaluate at the end of the crawling if the results are successful, and if not, close the spider using the appropriate message (e.g. "too_few_results") instead of the regular one "finished".
The
spider_idle
signal handler looks like the perfect place to put such logic. At this stage, all the work has been done, so all the stats are ready to be queried to decide the final status. The problem I found is that is not possible to set the closing reason in this handler.In this PR I propose to reuse the exception
CloseSpider
to provide the closing reason. The required changes are not so big, and it seems to fit.Let me know what do you think, and if you agree on the functionality, what else would be required (tests, etc).