Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spider middleware: catch spider callback exceptions early #4272

Merged
merged 6 commits into from Feb 7, 2020

Conversation

elacuesta
Copy link
Member

@elacuesta elacuesta commented Jan 10, 2020

Fixes #4260.

Evaluates the output iterable right after the spider callback, as it's currently being done in the process_spider_output chain.

(Plus some minor styling adjustments)

@elacuesta elacuesta added the bug label Jan 10, 2020
Plus some minor styling adjustments
@codecov
Copy link

codecov bot commented Jan 10, 2020

Codecov Report

Merging #4272 into master will decrease coverage by 0.31%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4272      +/-   ##
==========================================
- Coverage   84.06%   83.75%   -0.32%     
==========================================
  Files         166      166              
  Lines        9730     9880     +150     
  Branches     1454     1468      +14     
==========================================
+ Hits         8180     8275      +95     
- Misses       1296     1352      +56     
+ Partials      254      253       -1
Impacted Files Coverage Δ
scrapy/utils/python.py 80.1% <100%> (ø) ⬆️
scrapy/core/spidermw.py 100% <100%> (ø) ⬆️
scrapy/robotstxt.py 75.3% <0%> (-22.23%) ⬇️
scrapy/utils/test.py 49.35% <0%> (-8.99%) ⬇️
scrapy/utils/ftp.py 23.8% <0%> (-6.2%) ⬇️
scrapy/pipelines/files.py 61.66% <0%> (-3.99%) ⬇️
scrapy/core/downloader/handlers/datauri.py 93.33% <0%> (-0.79%) ⬇️
scrapy/crawler.py 89.26% <0%> (-0.36%) ⬇️
scrapy/core/downloader/handlers/http10.py 100% <0%> (ø) ⬆️
scrapy/http/response/text.py 100% <0%> (ø) ⬆️
... and 17 more

@elacuesta elacuesta added this to the v2.0 milestone Jan 22, 2020
scrapy/core/spidermw.py Show resolved Hide resolved
@kmike kmike removed this from the v2.0 milestone Jan 30, 2020
result = method(response=response, result=result, spider=spider)
except Exception as ex:
exception_result = process_spider_exception(Failure(ex), method_index+1)
if isinstance(exception_result, Failure):
raise
return exception_result
if _isiterable(result):
result = evaluate_iterable(result, method_index)
result = _evaluate_iterable(result, method_index+1, recovered)
Copy link
Member

@kmike kmike Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a better name possible for method_index argument of _evaluate_iterable function, because here method_index+1 is passed. Something like process_exception_index?

Copy link
Member Author

@elacuesta elacuesta Feb 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to exception_processor_index, thanks!

@kmike
Copy link
Member

kmike commented Feb 7, 2020

Thanks @elacuesta!

@kmike kmike merged commit 957681b into scrapy:master Feb 7, 2020
2 checks passed
@elacuesta elacuesta deleted the spider-middleware branch Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

First Spider Middleware does not process exception for generator callback
3 participants