Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Rule.process_request: access Response object #3682

Merged
merged 10 commits into from Mar 27, 2019

Conversation

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Mar 13, 2019

Allow for the process_request function passed to Rule objects to take the Response from which the Request was extracted as second positional argument.

This change grants the ability to access previously collected information in the Response.meta dict, for instance, which is not currently possible AFAICT (at least not without overriding private methods 🤷‍♂️ ). It is implemented in a backward compatible manner, i.e. request processing functions that take only one argument still work.

Updated docs, added tests for the existing and new functionality.

@codecov
Copy link

@codecov codecov bot commented Mar 13, 2019

Codecov Report

Merging #3682 into master will increase coverage by 0.2%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master    #3682     +/-   ##
=========================================
+ Coverage   84.54%   84.75%   +0.2%     
=========================================
  Files         167      168      +1     
  Lines        9420     9471     +51     
  Branches     1402     1407      +5     
=========================================
+ Hits         7964     8027     +63     
+ Misses       1199     1188     -11     
+ Partials      257      256      -1
Impacted Files Coverage Δ
scrapy/spiders/crawl.py 82.35% <100%> (+3.97%) ⬆️
scrapy/http/__init__.py 100% <0%> (ø) ⬆️
scrapy/http/request/json_request.py 93.75% <0%> (ø)
scrapy/settings/default_settings.py 98.64% <0%> (ø) ⬆️
scrapy/item.py 98.48% <0%> (+0.07%) ⬆️
scrapy/extensions/feedexport.py 84.9% <0%> (+6.43%) ⬆️

@elacuesta elacuesta force-pushed the rule_process_request_response_parameter branch from 382d01c to 92bbc52 Mar 16, 2019
Wrapper around the request processing function to maintain backward compatibility
with functions that do not take a Response object as parameter.
"""
argcount = self.process_request.__code__.co_argcount
Copy link
Contributor

@raphapassini raphapassini Mar 19, 2019

Very elegant solution :)

Copy link
Member

@Gallaecio Gallaecio Mar 22, 2019

What about a deprecation warning when the specified function only expects one argument?

Copy link
Member Author

@elacuesta elacuesta Mar 22, 2019

That's a good idea, I'll get to it, thanks!

Copy link
Member

@kmike kmike Mar 22, 2019

There are also some related utilities in scrapy.utils.python: get_func_args and get_spec. I wonder if we can use an existing function. If not, I think this code should be extracted to a new helper in scrapy.utils.python, with dedicated tests for it.

Copy link
Member Author

@elacuesta elacuesta Mar 22, 2019

Awesome, get_func_args did the trick.

@raphapassini
Copy link
Contributor

@raphapassini raphapassini commented Mar 19, 2019

LGTM

docs/topics/spiders.rst Outdated Show resolved Hide resolved
Wrapper around the request processing function to maintain backward compatibility
with functions that do not take a Response object as parameter.
"""
argcount = self.process_request.__code__.co_argcount
Copy link
Member

@Gallaecio Gallaecio Mar 22, 2019

What about a deprecation warning when the specified function only expects one argument?

@elacuesta
Copy link
Member Author

@elacuesta elacuesta commented Mar 22, 2019

Updated, now a ScrapyDeprecationWarning is logged if the process_request function takes only one argument.
Currently, the warning is logged only when the function is "compiled" (had to move the compiling code, the alternative was to check on every function call and that would be a lot of warnings). I think it would be better if it were executed when the Rule is instantiated, the problem is they need to be compiled from the CrawlSpider object to which they belong because process_response could be a string with the name of a method in the spider (in which case there is no way of analyzing the function from the Rule's constructor).
However, as far as I understand Rule objects are not really meant for standalone use and are not supposed to be ready until they are compiled, so the above is probably not an issue in practice.
Looking forward to reading your comments.

@Gallaecio Gallaecio changed the title Rule.process_request: access Response object [MRG+1] Rule.process_request: access Response object Mar 26, 2019
@kmike
Copy link
Member

@kmike kmike commented Mar 27, 2019

Looks good, thanks @elacuesta!

@kmike kmike merged commit ec719f5 into scrapy:master Mar 27, 2019
3 checks passed
@kmike kmike added this to the v1.7 milestone Mar 27, 2019
@elacuesta elacuesta deleted the rule_process_request_response_parameter branch Mar 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants