-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support transparent mapping of Scrapy requests to Zyte Data API requests #41
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we OK with making the feature opt-out, or should it be opt-in
I'm leaning towards having it opt-out (enabled by default) since it seems more natural to write the requests this way. +1 on the current setup.
Better suggestions for new settings? Are the two future-proofing-related settings worth adding?
Could you clarify the expected behaviors for ZYTE_API_UNSUPPORTED_HEADERS
and ZYTE_API_BROWSER_HEADERS
? I'm not entirely sure if I understood how they work.
Any additional considerations to make the plugin more future-proof? Does any of the approaches go too far?
I think the ZYTE_API_AUTOMAP
captures most of the cases that we need. Great work!
Are we OK with the approach to warning, and the scenarios causing a warning? Should we be more lenient with headers about warnings? Is it OK to warn against request metadata usage for all keys that can be defined through request attributes?
The route of issuing a warning sounds great. Though I think we need to tweak it a bit to make it clear for the users what is actually passed (i.e. the values taking precedence) when warnings are used.
hey! I was thinking about being able to do the following 3 things in a spider, at the same time:
Use case: extract data from some pages, or to use some browser action, while downloading everything else as usual. So, it seems it's neither "opt-in" nor "opt-out", it's almost like two separate features (1+2 vs 3), which exist in parallel. Does it make sense? |
1 would be the default behavior with the proposed implementation, and could be disabled through a setting. 2 would also work with the proposed implementation. But you would get a warning if you try to control through Zyte API parameters something that you can control through Request parameters. For 3, what about defining a |
First off: I am more than open to remove them altogether. The idea is for these parameters to allow flexibility to support some future Zyte API changes without needing to upgrade to a newer scrapy-zyte-api version. If tomorrow Zyte API starts allowing to set the Similar for |
Codecov Report
@@ Coverage Diff @@
## main #41 +/- ##
==========================================
+ Coverage 99.41% 99.69% +0.27%
==========================================
Files 4 4
Lines 172 323 +151
==========================================
+ Hits 171 322 +151
Misses 1 1
|
Tests refactored to use the new _get_api_params instead of instantiating a mock server every time. They are also split into groups, but otherwise are basically the same as before. However, I also made the following changes as I was working on that refactoring:
The implementation is growing horrible, but I think we can make it clean once we figure out what behavior we want API-wise. |
tests/test_api_requests.py
Outdated
{ | ||
"httpResponseBody": True, | ||
"httpResponseHeaders": True, | ||
"httpRequestBody": "YQ==", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also ensure that the httpRequestMethod
is not GET
when the body is set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is technically possible for it to be. If it was not technically possible, it should be Scrapy itself complaining, not this plugin.
I am not sure whether or not Zyte API itself allows it, but if it does not, I think it may be best to let Zyte API be the one who complains; similarly to how I think we should allow httpResponseBody
and browserHtml
be combined, even though Zyte API does not currently support that.
Co-authored-by: Kevin Lloyd Bernal <kevinoxy@gmail.com>
@kmike I backtracked on what we discussed elsewhere about marking |
…nd do not enable httpResponseHeaders as a side effect of browserHtml
I have documented I also realized that there was no reason for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome, thanks @Gallaecio!
Ok, there are no further comments, so let's merge it :) |
Built on top of #40, this pull request aims to address point 4 from #40 (comment).
Please, ignore the actual implementation at the time being, and instead have a look at the test scenarios introduced here. I think we should discuss changes, additions and removals around those before moving forward. Questions to discuss include:
To do:
Resolves #12, resolves #16, resolves #17, resolves #19.