Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Request subclass for json requests #3504 #3505

Merged
merged 13 commits into from Mar 22, 2019
Merged

Conversation

@kasun
Copy link
Contributor

@kasun kasun commented Nov 24, 2018

PR for #3504

This is a more convenient subclass of Request when dealing with json requests. This would allow something like this.

from scrapy.http import JSONRequest

data = {
    'name': 'value',
}
yield JSONRequest(url, data=data)

Unit tests are included. Will work on documentation depending on the feedback.

@lopuhin
Copy link
Member

@lopuhin lopuhin commented Nov 26, 2018

Thanks for PR @kasun ! I didn't check the code really carefully, but this sounds like a useful feature to me, since we already have FormRequest and XmlRpcRequest, and given that scraping often involves calling APIs directly which often accept JSON payload.
Any other opinions?

@codecov
Copy link

@codecov codecov bot commented Nov 27, 2018

Codecov Report

Merging #3505 into master will increase coverage by 0.03%.
The diff coverage is 93.93%.

@@            Coverage Diff             @@
##           master    #3505      +/-   ##
==========================================
+ Coverage   84.48%   84.52%   +0.03%     
==========================================
  Files         167      168       +1     
  Lines        9405     9438      +33     
  Branches     1397     1402       +5     
==========================================
+ Hits         7946     7977      +31     
- Misses       1201     1202       +1     
- Partials      258      259       +1
Impacted Files Coverage Δ
scrapy/http/__init__.py 100% <100%> (ø) ⬆️
scrapy/http/request/json_request.py 93.75% <93.75%> (ø)

Copy link
Member

@lopuhin lopuhin left a comment

@kasun thanks for the PR, left some comments below

scrapy/http/request/json_request.py Outdated Show resolved Hide resolved
scrapy/http/request/json_request.py Outdated Show resolved Hide resolved
tests/test_http_request.py Outdated Show resolved Hide resolved
@lopuhin
Copy link
Member

@lopuhin lopuhin commented Nov 27, 2018

Will work on documentation depending on the feedback.

I think feedback is positive, so if you could also add the docs that would be awesome 👍

Copy link

@asadurski asadurski left a comment

I second @lopuhin, I think that's a very useful feature.

scrapy/http/request/json_request.py Outdated Show resolved Hide resolved
scrapy/http/request/json_request.py Outdated Show resolved Hide resolved
tests/test_http_request.py Outdated Show resolved Hide resolved
scrapy/http/request/json_request.py Outdated Show resolved Hide resolved
@lopuhin
Copy link
Member

@lopuhin lopuhin commented Dec 10, 2018

Thanks @kasun going to have a look now.

Jessie test failure here https://travis-ci.org/scrapy/scrapy/builds/465618944 (not linking directly as it produces a huge log) looks completely unrelated to the PR, has anyone seen it before?

Copy link
Member

@lopuhin lopuhin left a comment

@kasun thanks for moving this forward 👍
I think PR is almost ready, left some comments below.

docs/topics/request-response.rst Outdated Show resolved Hide resolved
docs/topics/request-response.rst Outdated Show resolved Hide resolved
docs/topics/request-response.rst Outdated Show resolved Hide resolved
docs/topics/request-response.rst Show resolved Hide resolved
scrapy/http/request/json_request.py Outdated Show resolved Hide resolved
@kasun
Copy link
Contributor Author

@kasun kasun commented Dec 11, 2018

@lopuhin Suggested changes are done. Please check.

The only remaining thing is the jessie test failure.

@lopuhin
Copy link
Member

@lopuhin lopuhin commented Dec 11, 2018

Thanks @kasun ,

The only remaining thing is the jessie test failure.

Looks like it's failing on all builds, so please ignore it here.

scrapy/http/request/json_request.py Outdated Show resolved Hide resolved
@lopuhin
Copy link
Member

@lopuhin lopuhin commented Dec 11, 2018

I don't have any other comments besides sort_keys=True question. @kmike do you have some extra comments or questions?

Copy link
Member

@lopuhin lopuhin left a comment

Looks good to me, thanks @kasun 👍
Our policy is to have at least two approvals, and I think @kmike wanted to have a look too.

@lopuhin lopuhin changed the title Request subclass for json requests #3504 [MRG+1] Request subclass for json requests #3504 Dec 12, 2018
docs/topics/request-response.rst Outdated Show resolved Hide resolved
warnings.warn('Both body and data passed. data will be ignored')

elif not body_passed and data_passed:
kwargs['body'] = json.dumps(data, sort_keys=True)
Copy link
Member

@kmike kmike Dec 14, 2018

There are other kwargs which can be passed to dumps function, most importantly - ensure_ascii=False; without it some request bodies can be 2x larger than needed. What do you think about adding dumps_kwargs parameter, which allows to override keyword arguments passed to json.dumps (including ensure_ascii and sort_keys)?

I think it can also be a good idea to move serialization to a method; it would make it easier to e.g. make a subclass which uses ujson instead of stdlib json library.

Copy link
Contributor Author

@kasun kasun Dec 17, 2018

Agree. Changed.

warnings.simplefilter("always")
super(JSONRequestTest, self).setUp()

def test_data(self):
Copy link
Member

@kmike kmike Dec 14, 2018

would you mind split this test method into several methods, for individual cases?

Copy link
Member

@kmike kmike Dec 14, 2018

Also, could you please add a test for .replace method (make sure replacing body works; does replacing data work?)

Copy link
Contributor Author

@kasun kasun Dec 17, 2018

Test for replacing data is done. Testing for replacing body is already in the base RequestTest class which JSONRequestTest class inherits from.


def dump(self, data, **kwargs):
"""Convert to JSON """
return json.dumps(data, sort_keys=True, **kwargs)
Copy link
Member

@kmike kmike Jan 14, 2019

I think sort_keys=True default should be set in self.dumps_kwargs, not here; otherwise users won't be able to disable sort_keys via dumps_kwargs.

Copy link
Contributor Author

@kasun kasun Jan 14, 2019

@kmike Agree. Btw do you think users should be allowed to disable sort_keys? Do you see a use case for that?

Copy link
Contributor Author

@kasun kasun Jan 14, 2019

Now sort_key=True is set in self._dumps_kwargs. Let me know if you think we should allow to override it. I think if we allow it, although very unlikely users have the chance to theoretically mess up the duplicate filter if they turn off sort_keys.


return super(JSONRequest, self).replace(*args, **kwargs)

def dump(self, data, **kwargs):
Copy link
Member

@kmike kmike Jan 14, 2019

I think it is better to call this method "dumps", not "dump", because json.dump does a different thing.

Copy link
Member

@kmike kmike Jan 14, 2019

I also think that either kwargs should be removed from this method, or it should be made private (_dumps). With the current code the following is not doing what's expected:

req = JsonRequest(url, data=data, dumps_kwargs={...})
foo = req.dump({'x': 5})  # this doesn't use dumps_kwargs

Copy link
Contributor Author

@kasun kasun Jan 14, 2019

This is made private while also removing **kwargs from the method.

scrapy/http/request/json_request.py Outdated Show resolved Hide resolved
@kmike kmike added this to the v1.7 milestone Feb 6, 2019
@codecov
Copy link

@codecov codecov bot commented Feb 6, 2019

Codecov Report

Merging #3505 into master will decrease coverage by 0.53%.
The diff coverage is 93.93%.

@@            Coverage Diff             @@
##           master    #3505      +/-   ##
==========================================
- Coverage   84.48%   83.95%   -0.54%     
==========================================
  Files         167      168       +1     
  Lines        9405     9453      +48     
  Branches     1397     1408      +11     
==========================================
- Hits         7946     7936      -10     
- Misses       1201     1256      +55     
- Partials      258      261       +3
Impacted Files Coverage Δ
scrapy/http/__init__.py 100% <100%> (ø) ⬆️
scrapy/http/request/json_request.py 93.75% <93.75%> (ø)
scrapy/core/downloader/handlers/s3.py 62.9% <0%> (-32.26%) ⬇️
scrapy/utils/boto.py 46.66% <0%> (-26.67%) ⬇️
scrapy/core/downloader/tls.py 77.5% <0%> (-12.5%) ⬇️
scrapy/extensions/feedexport.py 70.33% <0%> (-8.14%) ⬇️
scrapy/core/downloader/handlers/http11.py 89.92% <0%> (-2.62%) ⬇️
scrapy/core/scraper.py 86.48% <0%> (-2.03%) ⬇️
scrapy/pipelines/files.py 67.15% <0%> (-1.1%) ⬇️
... and 1 more

@kmike kmike changed the title [MRG+1] Request subclass for json requests #3504 [MRG+2] Request subclass for json requests #3504 Mar 22, 2019
@kmike
Copy link
Member

@kmike kmike commented Mar 22, 2019

Thanks @kasun, I think that's fine to merge it. @lopuhin do you want to re-check it, since there were new commits after your approval?

@lopuhin
Copy link
Member

@lopuhin lopuhin commented Mar 22, 2019

Looks great, thank you @kasun and thanks everyone for reviews 👍

@lopuhin lopuhin merged commit af2b666 into scrapy:master Mar 22, 2019
3 checks passed
@kasun
Copy link
Contributor Author

@kasun kasun commented Mar 23, 2019

Thanks everyone for the reviews.

@kmike kmike mentioned this pull request Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants