New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Feed exports: beautify JSON and XML #2456
Changes from 4 commits
7e9153b
766b2c8
c7bb2fa
63b8caf
25535db
3a0a86e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -140,7 +140,7 @@ output examples, which assume you're exporting these two items:: | |
BaseItemExporter | ||
---------------- | ||
|
||
.. class:: BaseItemExporter(fields_to_export=None, export_empty_fields=False, encoding='utf-8') | ||
.. class:: BaseItemExporter(fields_to_export=None, export_empty_fields=False, encoding='utf-8', indent=0) | ||
|
||
This is the (abstract) base class for all Item Exporters. It provides | ||
support for common features used by all (concrete) Item Exporters, such as | ||
|
@@ -149,7 +149,7 @@ BaseItemExporter | |
|
||
These features can be configured through the constructor arguments which | ||
populate their respective instance attributes: :attr:`fields_to_export`, | ||
:attr:`export_empty_fields`, :attr:`encoding`. | ||
:attr:`export_empty_fields`, :attr:`encoding`, :attr:`indent`. | ||
|
||
.. method:: export_item(item) | ||
|
||
|
@@ -216,6 +216,15 @@ BaseItemExporter | |
encoding). Other value types are passed unchanged to the specific | ||
serialization library. | ||
|
||
.. attribute:: indent | ||
|
||
Amount of spaces used to indent the output on each level. Defaults to ``0``. | ||
|
||
* ``indent=None`` selects the most compact representation, | ||
all items in the same line with no indentation | ||
* ``indent<=0`` each item on it's own line, no indentation | ||
* ``indent>0`` each item on it's own line, indentated with the provided numeric value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. indented |
||
|
||
.. highlight:: none | ||
|
||
XmlItemExporter | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -209,6 +209,7 @@ These are the settings used for configuring the feed exports: | |
* :setting:`FEED_STORE_EMPTY` | ||
* :setting:`FEED_EXPORT_ENCODING` | ||
* :setting:`FEED_EXPORT_FIELDS` | ||
* :setting:`FEED_EXPORT_INDENT` | ||
|
||
.. currentmodule:: scrapy.extensions.feedexport | ||
|
||
|
@@ -266,6 +267,21 @@ If an exporter requires a fixed set of fields (this is the case for | |
is empty or None, then Scrapy tries to infer field names from the | ||
exported data - currently it uses field names from the first item. | ||
|
||
.. setting:: FEED_EXPORT_INDENT | ||
|
||
FEED_EXPORT_INDENT | ||
------------------ | ||
|
||
Default: ``0`` | ||
|
||
Amount of spaces used to indent the output on each level. If ``FEED_EXPORT_INDENT`` | ||
is a non-negative integer, then array elements and object members will be pretty-printed | ||
with that indent level. An indent level of ``0``, or negative, will put each item on a new line. | ||
``None`` selects the most compact representation | ||
|
||
Currently used by :class:`~scrapy.exporters.JsonItemExporter` | ||
and :class:`~scrapy.exporters.XmlItemExporter` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that'd be nice to mention something like "i.e. when you're exporting to .json or .xml". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That was the spirit of the following comment ("Currently used by..."), perhaps I could emphasize that only those implement it for now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean that for user it may be unclear what these JsonItemExporter and XmlItemExporter mean - user almost never instantiates them directly, for user these classes are implementation details of how json or xml exports are implemented. User doesn't have to know that these classes are used when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @elacuesta , @kmike , I took the liberty to push a commit with your proposal @kmike : 3a0a86e There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks Paul. That's true, I'm sorry I missed it before. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey @elacuesta , don't be sorry! Your work is great |
||
|
||
.. setting:: FEED_STORE_EMPTY | ||
|
||
FEED_STORE_EMPTY | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -319,14 +319,14 @@ def test_export_no_items_not_store_empty(self): | |
@defer.inlineCallbacks | ||
def test_export_no_items_store_empty(self): | ||
formats = ( | ||
('json', b'[\n\n]'), | ||
('json', b'[]'), | ||
('jsonlines', b''), | ||
('xml', b'<?xml version="1.0" encoding="utf-8"?>\n<items></items>'), | ||
('csv', b''), | ||
) | ||
|
||
for fmt, expctd in formats: | ||
settings = {'FEED_FORMAT': fmt, 'FEED_STORE_EMPTY': True} | ||
settings = {'FEED_FORMAT': fmt, 'FEED_STORE_EMPTY': True, 'FEED_EXPORT_INDENT': None} | ||
data = yield self.exported_no_data(settings) | ||
self.assertEqual(data, expctd) | ||
|
||
|
@@ -425,25 +425,177 @@ def test_export_encoding(self): | |
header = ['foo'] | ||
|
||
formats = { | ||
'json': u'[\n{"foo": "Test\\u00d6"}\n]'.encode('utf-8'), | ||
'json': u'[{"foo": "Test\\u00d6"}]'.encode('utf-8'), | ||
'jsonlines': u'{"foo": "Test\\u00d6"}\n'.encode('utf-8'), | ||
'xml': u'<?xml version="1.0" encoding="utf-8"?>\n<items><item><foo>Test\xd6</foo></item></items>'.encode('utf-8'), | ||
'csv': u'foo\r\nTest\xd6\r\n'.encode('utf-8'), | ||
} | ||
|
||
for format in formats: | ||
settings = {'FEED_FORMAT': format} | ||
for format, expected in formats.items(): | ||
settings = {'FEED_FORMAT': format, 'FEED_EXPORT_INDENT': None} | ||
data = yield self.exported_data(items, settings) | ||
self.assertEqual(formats[format], data) | ||
self.assertEqual(expected, data) | ||
|
||
formats = { | ||
'json': u'[\n{"foo": "Test\xd6"}\n]'.encode('latin-1'), | ||
'json': u'[{"foo": "Test\xd6"}]'.encode('latin-1'), | ||
'jsonlines': u'{"foo": "Test\xd6"}\n'.encode('latin-1'), | ||
'xml': u'<?xml version="1.0" encoding="latin-1"?>\n<items><item><foo>Test\xd6</foo></item></items>'.encode('latin-1'), | ||
'csv': u'foo\r\nTest\xd6\r\n'.encode('latin-1'), | ||
} | ||
|
||
for format in formats: | ||
settings = {'FEED_FORMAT': format, 'FEED_EXPORT_ENCODING': 'latin-1'} | ||
settings = {'FEED_EXPORT_INDENT': None, 'FEED_EXPORT_ENCODING': 'latin-1'} | ||
for format, expected in formats.items(): | ||
settings['FEED_FORMAT'] = format | ||
data = yield self.exported_data(items, settings) | ||
self.assertEqual(formats[format], data) | ||
self.assertEqual(expected, data) | ||
|
||
@defer.inlineCallbacks | ||
def test_export_indentation(self): | ||
items = [ | ||
{'foo': ['bar']}, | ||
{'key': 'value'}, | ||
] | ||
|
||
test_cases = [ | ||
# JSON | ||
{ | ||
'format': 'json', | ||
'indent': None, | ||
'expected': b'[{"foo": ["bar"]},{"key": "value"}]', | ||
}, | ||
{ | ||
'format': 'json', | ||
'indent': -1, | ||
'expected': b"""[ | ||
{"foo": ["bar"]}, | ||
{"key": "value"} | ||
]""", | ||
}, | ||
{ | ||
'format': 'json', | ||
'indent': 0, | ||
'expected': b"""[ | ||
{"foo": ["bar"]}, | ||
{"key": "value"} | ||
]""", | ||
}, | ||
{ | ||
'format': 'json', | ||
'indent': 2, | ||
'expected': b"""[ | ||
{ | ||
"foo": [ | ||
"bar" | ||
] | ||
}, | ||
{ | ||
"key": "value" | ||
} | ||
]""", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hm, is this correct that here there is an empty line in the beginning on the file instead of a newline at the end, and with indent=0 there are new lines both in the beginning and in the end? Wouldn't it be better not to have an empty line in the beginning and always have a new line in the end (unless indent is None)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That newline is there only to make the tests more pretty, it does not end up in the final expected result (note the |
||
}, | ||
{ | ||
'format': 'json', | ||
'indent': 4, | ||
'expected': b"""[ | ||
{ | ||
"foo": [ | ||
"bar" | ||
] | ||
}, | ||
{ | ||
"key": "value" | ||
} | ||
]""", | ||
}, | ||
{ | ||
'format': 'json', | ||
'indent': 5, | ||
'expected': b"""[ | ||
{ | ||
"foo": [ | ||
"bar" | ||
] | ||
}, | ||
{ | ||
"key": "value" | ||
} | ||
]""", | ||
}, | ||
|
||
# XML | ||
{ | ||
'format': 'xml', | ||
'indent': None, | ||
'expected': b"""<?xml version="1.0" encoding="utf-8"?> | ||
<items><item><foo><value>bar</value></foo></item><item><key>value</key></item></items>""", | ||
}, | ||
{ | ||
'format': 'xml', | ||
'indent': -1, | ||
'expected': b"""<?xml version="1.0" encoding="utf-8"?> | ||
<items> | ||
<item><foo><value>bar</value></foo></item> | ||
<item><key>value</key></item> | ||
</items>""", | ||
}, | ||
{ | ||
'format': 'xml', | ||
'indent': 0, | ||
'expected': b"""<?xml version="1.0" encoding="utf-8"?> | ||
<items> | ||
<item><foo><value>bar</value></foo></item> | ||
<item><key>value</key></item> | ||
</items>""", | ||
}, | ||
{ | ||
'format': 'xml', | ||
'indent': 2, | ||
'expected': b"""<?xml version="1.0" encoding="utf-8"?> | ||
<items> | ||
<item> | ||
<foo> | ||
<value>bar</value> | ||
</foo> | ||
</item> | ||
<item> | ||
<key>value</key> | ||
</item> | ||
</items>""", | ||
}, | ||
{ | ||
'format': 'xml', | ||
'indent': 4, | ||
'expected': b"""<?xml version="1.0" encoding="utf-8"?> | ||
<items> | ||
<item> | ||
<foo> | ||
<value>bar</value> | ||
</foo> | ||
</item> | ||
<item> | ||
<key>value</key> | ||
</item> | ||
</items>""", | ||
}, | ||
{ | ||
'format': 'xml', | ||
'indent': 5, | ||
'expected': b"""<?xml version="1.0" encoding="utf-8"?> | ||
<items> | ||
<item> | ||
<foo> | ||
<value>bar</value> | ||
</foo> | ||
</item> | ||
<item> | ||
<key>value</key> | ||
</item> | ||
</items>""", | ||
}, | ||
] | ||
|
||
for row in test_cases: | ||
settings = {'FEED_FORMAT': row['format'], 'FEED_EXPORT_INDENT': row['indent']} | ||
data = yield self.exported_data(items, settings) | ||
print(row['format'], row['indent']) | ||
self.assertEqual(row['expected'], data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its