Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Request subclass for json requests #3504 #3505

Merged
merged 13 commits into from Mar 22, 2019
32 changes: 32 additions & 0 deletions docs/topics/request-response.rst
Expand Up @@ -508,6 +508,38 @@ method for this job. Here's an example spider which uses it::

# continue scraping with authenticated session...

JSONRequest
-----------

The JSONRequest class extends the base :class:`Request` class with functionality for
dealing with JSON requests.

.. class:: JSONRequest(url, [data, ...])
kmike marked this conversation as resolved.
Show resolved Hide resolved

The :class:`JSONRequest` class adds a new argument to the constructor called data. The
remaining arguments are the same as for the :class:`Request` class and are
not documented here.

Using the :class:`JSONRequest` will set the ``Content-Type`` header to ``application/json``
and ``Accept`` header to ``application/json, text/javascript, */*; q=0.01``

:param data: is any JSON serializable object that needs to be JSON encoded and assigned to body.
if :attr:`Request.body` argument is provided this parameter will be ignored.
if :attr:`Request.body` argument is not provided and data argument is provided :attr:`Request.method` will be
set to ``'POST'`` automatically.
:type data: JSON serializable object

JSONRequest usage example
-------------------------

Sending a JSON POST request with a JSON payload::

data = {
'name1': 'value1',
'name2': 'value2',
}
yield JSONRequest(url='http://www.example.com/post/action', data=data)
lopuhin marked this conversation as resolved.
Show resolved Hide resolved


Response objects
================
Expand Down
1 change: 1 addition & 0 deletions scrapy/http/__init__.py
Expand Up @@ -10,6 +10,7 @@
from scrapy.http.request import Request
from scrapy.http.request.form import FormRequest
from scrapy.http.request.rpc import XmlRpcRequest
from scrapy.http.request.json_request import JSONRequest

from scrapy.http.response import Response
from scrapy.http.response.html import HtmlResponse
Expand Down
31 changes: 31 additions & 0 deletions scrapy/http/request/json_request.py
@@ -0,0 +1,31 @@
"""
This module implements the JSONRequest class which is a more convenient class
(than Request) to generate JSON Requests.

See documentation in docs/topics/request-response.rst
"""

import json
import warnings

from scrapy.http.request import Request


class JSONRequest(Request):
def __init__(self, *args, **kwargs):
body_passed = kwargs.get('body', None) is not None
data = kwargs.pop('data', None)
data_passed = data is not None

if body_passed and data_passed:
warnings.warn('Both body and data passed. data will be ignored')
kmike marked this conversation as resolved.
Show resolved Hide resolved

elif not body_passed and data_passed:
kwargs['body'] = json.dumps(data, sort_keys=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other kwargs which can be passed to dumps function, most importantly - ensure_ascii=False; without it some request bodies can be 2x larger than needed. What do you think about adding dumps_kwargs parameter, which allows to override keyword arguments passed to json.dumps (including ensure_ascii and sort_keys)?

I think it can also be a good idea to move serialization to a method; it would make it easier to e.g. make a subclass which uses ujson instead of stdlib json library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Changed.


if 'method' not in kwargs:
kwargs['method'] = 'POST'

super(JSONRequest, self).__init__(*args, **kwargs)
self.headers.setdefault('Content-Type', 'application/json')
self.headers.setdefault('Accept', 'application/json, text/javascript, */*; q=0.01')
64 changes: 63 additions & 1 deletion tests/test_http_request.py
Expand Up @@ -2,14 +2,16 @@
import cgi
import unittest
import re
import json
import warnings

import six
from six.moves import xmlrpc_client as xmlrpclib
from six.moves.urllib.parse import urlparse, parse_qs, unquote
if six.PY3:
from urllib.parse import unquote_to_bytes

from scrapy.http import Request, FormRequest, XmlRpcRequest, Headers, HtmlResponse
from scrapy.http import Request, FormRequest, XmlRpcRequest, JSONRequest, Headers, HtmlResponse
from scrapy.utils.python import to_bytes, to_native_str


Expand Down Expand Up @@ -1147,5 +1149,65 @@ def test_latin1(self):
self._test_request(params=(u'pas£',), encoding='latin1')


class JSONRequestTest(RequestTest):
request_class = JSONRequest
default_method = 'GET'
default_headers = {b'Content-Type': [b'application/json'], b'Accept': [b'application/json, text/javascript, */*; q=0.01']}

def setUp(self):
warnings.simplefilter("always")
super(JSONRequestTest, self).setUp()

def test_data(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind split this test method into several methods, for individual cases?

Copy link
Member

@kmike kmike Dec 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you please add a test for .replace method (make sure replacing body works; does replacing data work?)

Copy link
Contributor Author

@kasun kasun Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test for replacing data is done. Testing for replacing body is already in the base RequestTest class which JSONRequestTest class inherits from.

r1 = self.request_class(url="http://www.example.com/")
self.assertEqual(r1.body, b'')
self.assertEqual(r1.method, 'GET')

body = b'body'
r2 = self.request_class(url="http://www.example.com/", body=body)
self.assertEqual(r2.body, body)
self.assertEqual(r2.method, 'GET')

data = {
'name': 'value',
}
r3 = self.request_class(url="http://www.example.com/", data=data)
self.assertEqual(r3.body, to_bytes(json.dumps(data)))
self.assertEqual(r3.method, 'POST')

r4 = self.request_class(url="http://www.example.com/", data=[])
self.assertEqual(r4.body, to_bytes(json.dumps([])))
self.assertEqual(r4.method, 'POST')

with warnings.catch_warnings(record=True) as _warnings:
r5 = self.request_class(url="http://www.example.com/", body=body, data=data)
self.assertEqual(r5.body, body)
self.assertEqual(r5.method, 'GET')
self.assertEqual(len(_warnings), 1)
self.assertIn('data will be ignored', str(_warnings[0].message))

with warnings.catch_warnings(record=True) as _warnings:
r6 = self.request_class(url="http://www.example.com/", body=b'', data=data)
self.assertEqual(r6.body, b'')
self.assertEqual(r6.method, 'GET')
self.assertEqual(len(_warnings), 1)
self.assertIn('data will be ignored', str(_warnings[0].message))

with warnings.catch_warnings(record=True) as _warnings:
r7 = self.request_class(url="http://www.example.com/", body=None, data=data)
self.assertEqual(r7.body, to_bytes(json.dumps(data)))
self.assertEqual(r7.method, 'POST')
self.assertEqual(len(_warnings), 0)

with warnings.catch_warnings(record=True) as _warnings:
r8 = self.request_class(url="http://www.example.com/", body=None, data=None)
self.assertEqual(r8.method, 'GET')
self.assertEqual(len(_warnings), 0)

def tearDown(self):
warnings.resetwarnings()
super(JSONRequestTest, self).tearDown()


if __name__ == "__main__":
unittest.main()