-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Callback kwargs #3563
[MRG+1] Callback kwargs #3563
Conversation
8b9dee0
to
a67f1ce
Compare
Codecov Report
@@ Coverage Diff @@
## master #3563 +/- ##
=========================================
- Coverage 85.46% 85.4% -0.06%
=========================================
Files 169 169
Lines 9666 9682 +16
Branches 1440 1443 +3
=========================================
+ Hits 8261 8269 +8
- Misses 1157 1166 +9
+ Partials 248 247 -1
|
75c2285
to
bddfeab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elacuesta great work here 👏
Looks good to me, but I left some comments.
Besides those:
kwargs
does not sound good, it seems to be Request's kwargs and not the callback- I like
callback_kwargs
- The same kwargs for the callback should be used for the failure one (keep compatible with
meta
)
Maybe, the best solution would be using partial functions, but IMHO, they'd look verbose and not pythonic. Unless you know some library to help with partial functions 😄
Yeah I like |
I decided to go with |
f8c9e3b
to
6760bca
Compare
tests/spiders.py
Outdated
@@ -28,6 +28,45 @@ def closed(self, reason): | |||
self.meta['close_reason'] = reason | |||
|
|||
|
|||
class KeywordArgumentsSpider(MockServerSpider): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a test which checks how Scrapy behaves if there is a mismatch between parameters parse
accept and parameters passed via callback kwargs, e.g. some required argument is missing, or an extra argument is passed? Is exception raised? Does traceback look good (no need to write a test for it)? Is errback called?
Another case which could be nice to check explicitly is how default values are handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added tests for defaults and argument mismatch.
Errback is not called with this current implementation, it does if we add the callback and the errback to the Deferred in two separate steps, i.e.
dfd.addCallback(request.callback or spider.parse, **request.cb_kwargs)
dfd.addErrback(request.errback)
instead of
dfd.addCallbacks(
callback=request.callback or spider.parse,
errback=request.errback,
callbackKeywords=request.cb_kwargs)
In any case, the Request
object is not bounded to the Failure
received by the errback. Personally, I don't think calling the errback would be appropriate here, since it's not an error with the request/response itself, but with the code that handles it. The logged error is very descriptive, and similar to the one that currently appears when, for instance, a callback does not take a second positional argument (TypeError: parse() takes 1 positional argument but 2 were given
).
By searching our docs for "meta" it is possible to find a few other places where docs may need an update:
I could miss something - it'd be great to check all cases "meta" is mentioned somewhere. |
All other things equal, I'd prefer either |
0ccca1a
to
70a4d93
Compare
Updated/replaced several occurences of |
Thanks for the work @elacuesta, and thanks for a careful review @Gallaecio and @ejulio! The PR looks great. I think we can merge it without errback kwargs support, and discuss this feature separately. May I ask for one additional test though? It would be awesome if a middleware can change request.cb_kwargs, and changes have an effect; it is not clear if this is supported. This feature would enable some cool use cases, mostly related to dependency injection, similar to how pytest fixtures work. For example, it'd be possible to implement this: class MySpider(scrapy.Spider):
# ...
def parse(self, response):
# ... business as usual
def parse_page(self, response, cookiejar: scrapy.CookieJar):
# we ask for a current cookiejar object;
# cookie middleware inspects `callback.__annotations__`, figures out
# user wants to read / write current cookies, and provides
# (injects) cookiejar object. or similar features - e.g. it may be useful to integrate more tightly with browsers (ask for a browser in addition to response), etc. It is not clear that such API is the way to go, but it'd be awesome if we make sure it is possible to implement, that we're not closing the door for this. |
@kmike Added tests for downloader/spider middlewares. |
That's awesome it works @elacuesta, thanks for checking it and adding more tests! Regarding actually implementing cookiejar feature, it may need a bit of thought. For example, @pawelmhm raised a good point in #1878 (comment): cookiejar API is not adequate, it is hard to use. So probably if we make this feature built-in, we may need to have some wrapper to make working with cookies straightforward. We may also think if that's possible to fix some other issues using the same API: e.g. how to "fork" a cookiejar (copy current cookies, but update cookiejars separately afterwards). It seems this all needs a proposal and a separate discussion. Some starter implementation can aid discussion, but overall it looks more complex than making a PR out of gist. It'd probably be the easiest to release this middleware as a gist snippet, or a small Python package in the meantime, so that people can start using it, while waiting for a solution in Scrapy itself. |
Thanks! I thought about making a small package with that gist, I'll do that 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate of #
Fixes #1138
This is just a first approach. It's currently lacking docs and tests, I'll add those if the implementation is good.Update: added tests and docsI see (at least) two points for discussion:
errback_kwargs
or something like that. On the other hand, the request object is available in the failure received by the errback,failure.request.cb_kwargs
gives access to the arguments, so I think it shouldn't be necessary.I'm not a fan of theUpdate Renamed tokwargs
name, I think it could be easily confused with Python's own "kwargs" naming convention, i.e., people could understand that any remaining keyword argument passed to the Request constructor will be passed to the callbacks. Is "callback_kwargs" too verbose? Maybe it's not compatible with the previous point.cb_kwargs
Sample spider:
Output:
Tasks
Request
changesResponse.follow
changesScraper
changes/cc @kmike @dangra