-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Create Request from curl command #3862
Conversation
Based on comments from #2985 (@kmike), I have renamed the function, added docs, added some tests and implemented I have also improved the function by adding My idea is to add more tests with special cases (got some ideas from: https://github.com/NickCarneiro/curlconverter/tree/master/fixtures/curl_commands) and of course implement all missing features. Any ideas or feedback? :) |
Codecov Report
@@ Coverage Diff @@
## master #3862 +/- ##
==========================================
+ Coverage 85.56% 85.63% +0.07%
==========================================
Files 164 167 +3
Lines 9565 9742 +177
Branches 1435 1457 +22
==========================================
+ Hits 8184 8343 +159
- Misses 1133 1146 +13
- Partials 248 253 +5
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great, thanks for picking this up.
I think there are a few pages of the documentation that would greatly benefit from covering this feature, if you are up to it:
@noviluni just heads up: we want to make Scrapy release next week; if you finish this PR this week, we'll get it in :) |
This comment has been minimized.
This comment has been minimized.
@kmike That would be great, but I need feedback about what to do with unrecognized parameters (comment above). Apart of the above, we could explain in the docs that there is a limited support and that certain parameters are not allowed. |
1008d17
to
28197ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work with the documentation topics. I’ve just added a few suggestions.
Yes. Maybe we should even list all the specific parameters that are supported. And, if we end up raising an error on unrecognized parameters, we should also list supported parameters that have no impact on the resulting |
ec52575
to
876db14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me.
Further improvements (supporting more cURL options, allowing to override certain middlewares, enforcing cURL’s default user agent if no User-Agent header is provided) can come later based on user feedback.
scrapy/http/request/__init__.py
Outdated
""" | ||
request_kwargs = curl_to_request_kwargs(curl_command, ignore_unknown_options) | ||
request_kwargs.update(kwargs) | ||
return Request(**request_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it should use cls instead of Request, not sure about this. Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's right... It’s not as trivial as it seems. Suppose someone uses from_curl
from JSONRequest
:
JSONRequest.from_curl('curl example.org')
.
Using Request
the result would be:
{'_encoding': 'utf-8',
'method': 'GET',
'_url': 'http://example.org',
'_body': b'',
'priority': 0,
'callback': None,
'errback': None,
'cookies': {},
'headers': {},
'dont_filter': False,
'_meta': None,
'_cb_kwargs': None,
'flags': []}
Using cls
the result would be:
{'_dumps_kwargs': {'sort_keys': True},
'_encoding': 'utf-8',
'method': 'GET',
'_url': 'http://example.org',
'_body': b'',
'priority': 0,
'callback': None,
'errback': None,
'cookies': {},
'headers': {b'Content-Type': b'application/json',
b'Accept': b'application/json, text/javascript, */*; q=0.01'},
'dont_filter': False,
'_meta': None,
'_cb_kwargs': None,
'flags': []}
(The same happens with XmlRpcRequest
)
I think that the first approach could be better, because it respects the curl command, but I understand that for some users could be counter-intuitive.. seems a UX dilema. In any case, we should probably add something to the docstring. What do you think? @Gallaecio any thoughs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are not using cls
, maybe we should reimplement the new class method in our built-in Request
subclasses to raise a NotImplementedError
exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the same situation as with middlewares: scrapy should try to create the request with the inputs specified, and then if they are overridden by the class itself, or by the middlewares, you just need to be prepared to that it might happen and not expect 100% conformance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a valid argument. We could use cls
and extend the "caution" message regarding the middlewares adding that if it's used from Request
subclasses they could add some modifications to the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add a commit with that idea. @kmike @Gallaecio tell me what you think. :)
Thanks @noviluni! This looks really good. I got a late comment about a class which from_curl constructor should create, but no strong opinion on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @noviluni, @immerrr and @Gallaecio!
The Python 3.7 test failures look like Scrapy’s typical random test failures. |
This is a WIP, but I open this PR to discuss how to implement this feature.
Based on: #2985 (abandoned)