Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

process XML data before pretty-printing to trim whitespace #172

Merged
merged 2 commits into from

4 participants

@unsignedint

When receiving XML data that is already 'pretty' the current pretty printing process seems to add extra new line characters. I've modified the code to actually strip whitespace from the source input which seems to resolve this.

Input:

>>> raw = '<a>\n  <b>blah</b>\n  <c>123</c>\n</a>\n'
>>> print raw
<a>
  <b>blah</b>
  <c>123</c>
</a>

Current:

>>> print parseString(raw).toprettyxml(indent=' '*4)
<?xml version="1.0" ?>
<a>


    <b>
        blah
    </b>


    <c>
        123
    </c>


</a>

With patch:

>>> processed = ''.join((x.strip() for x in raw.split('\n')))
>>> print parseString(processed).toprettyxml(indent=' '*4)
<?xml version="1.0" ?>
<a>
    <b>
        blah
    </b>
    <c>
        123
    </c>
</a>
@unsignedint

OK, my github skills are lacking, but commit bee10e5 is an alternative solution that uses ElementTree (and the indent function from http://effbot.org/zone/element-lib.htm#prettyprint).

The difference is that the output XML formatting will then put fields on a single line rather than with the added line-breaks. I.e. compare the version above with:

<a>
    <b>123</b>
    <c>456</c>
</a>
@jkbrzt
Owner

@unsignedint Thanks for the pull request! Would you mind adding a couple of test cases to showcase and verify the new behaviour?

@maedox

Am I right in assuming this did not get merged yet?
I am mostly working with XML input and output and the extra two blank lines per line of XML is driving me nuts. sad-:panda_face:

@skurfer

Am I right in assuming this did not get merged yet?

Looks like it’s waiting on tests, but you can try it now. (I am. Huge improvement.)

git clone https://github.com/jkbr/httpie.git
cd httpie
git checkout -b xml master
git pull https://github.com/unsignedint/httpie master
python setup.py install

That last step can be done in a virtualenv, etc. Since the version number in master matches the version on PyPI (for now), you should still see future updates.

@unsignedint, since your first commit is largely replaced, you can squash it with the second to make it look like you did it right the first time. :smiley: (Try something like git rebase -i HEAD~2.) But then you’d have to force-push to get the changes here, which may or may not be frowned upon, depending on the project’s policy.

@jkbrzt jkbrzt commented on the diff
httpie/output.py
((24 lines not shown))
def process_body(self, content, content_type, subtype, encoding):
if subtype == 'xml':
try:
- # Pretty print the XML
- doc = xml.dom.minidom.parseString(content.encode(encoding))
- content = doc.toprettyxml(indent=' ' * DEFAULT_INDENT)
- except xml.parsers.expat.ExpatError:
+ root = ElementTree.fromstring(content.encode(encoding))
+ self.indent(root)
@jkbrzt Owner
jkbrzt added a note

Isn't XMLProcessor.indent() static?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jkbrzt jkbrzt merged commit 733771f into jkbrzt:master
@jkbrzt
Owner

Merged. Tests are still welcome though :wink:

@unsignedint

Thanks for merging... TBH I started looking at tests but got lost somewhere with mocks... will try to revisit it in the near future :)

Oh you are right about the static thing, I probably didn't notice it. I could change that but guess it doesn't matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
Showing with 25 additions and 5 deletions.
  1. +25 −5 httpie/output.py
View
30 httpie/output.py
@@ -2,7 +2,7 @@
"""
import json
-import xml.dom.minidom
+from xml.etree import ElementTree
from functools import partial
from itertools import chain
@@ -406,13 +406,33 @@ class XMLProcessor(BaseProcessor):
"""XML body processor."""
# TODO: tests
+ # in-place prettyprint formatter
+ # c.f. http://effbot.org/zone/element-lib.htm#prettyprint
+ @staticmethod
+ def indent(elem, indent_text=' ' * DEFAULT_INDENT):
+ def _indent(elem, level=0):
+ i = "\n" + level * indent_text
+ if len(elem):
+ if not elem.text or not elem.text.strip():
+ elem.text = i + indent_text
+ if not elem.tail or not elem.tail.strip():
+ elem.tail = i
+ for elem in elem:
+ _indent(elem, level + 1)
+ if not elem.tail or not elem.tail.strip():
+ elem.tail = i
+ else:
+ if level and (not elem.tail or not elem.tail.strip()):
+ elem.tail = i
+ return _indent(elem)
+
def process_body(self, content, content_type, subtype, encoding):
if subtype == 'xml':
try:
- # Pretty print the XML
- doc = xml.dom.minidom.parseString(content.encode(encoding))
- content = doc.toprettyxml(indent=' ' * DEFAULT_INDENT)
- except xml.parsers.expat.ExpatError:
+ root = ElementTree.fromstring(content.encode(encoding))
+ self.indent(root)
@jkbrzt Owner
jkbrzt added a note

Isn't XMLProcessor.indent() static?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ content = ElementTree.tostring(root)
+ except ElementTree.ParseError:
# Ignore invalid XML errors (skips attempting to pretty print)
pass
return content
Something went wrong with that request. Please try again.