-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
bpo-14465: xml.etree.ElementTree pretty printing #4016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
Generates a string representation of an XML element, including all | ||
subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is | ||
the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to | ||
generate a Unicode string (otherwise, a bytestring is generated). *method* | ||
is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). | ||
*short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. | ||
*pretty* enables pretty printing with default indent. Keyword-only *indent* | ||
allows to set custom indent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allows to set → allows setting (or allows one to set, etc)
@@ -935,7 +946,8 @@ ElementTree Objects | |||
The keyword-only *short_empty_elements* parameter controls the formatting | |||
of elements that contain no content. If ``True`` (the default), they are | |||
emitted as a single self-closed tag, otherwise they are emitted as a pair | |||
of start/end tags. | |||
of start/end tags. *indent* enables and set indentation level for the | |||
human readable output. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enables and sets
human-readable
@vadmium Thanks for the doc review - I've fixed the mistakes. |
|
||
Generates a string representation of an XML element, including all | ||
subelements. *element* is an :class:`Element` instance. *encoding* [1]_ is | ||
the output encoding (default is US-ASCII). Use ``encoding="unicode"`` to | ||
generate a Unicode string (otherwise, a bytestring is generated). *method* | ||
is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). | ||
*short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. | ||
*pretty* enables pretty printing with default indent. Keyword-only *indent* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please name it pretty_print
instead for lxml compatibility.
Does this slow down the normal serialisation in any noticeable way? |
As for serialization performance - here are some simple tests. I've tested with current master and my branch with and without pretty printing. I did tostring serialization on 207KB xml file.
So the new version is 14% slower. Pretty printing is 17% slower than non pretty printing. Overall, pretty printing enabled code is 34% slower than original version from master. I'm not sure if it's really scary. Do we have such comparison for lxml? |
I suggest you to use perf timeit, timeit is not reliable. |
Here are the benchmarks using
This one shows that new version is 12% slower. Pretty printing adds 10%. I don't think that slowing down because of pretty printing is a big deal because if you need fast serialization you don't need a pretty output. I'm gonna see how lxml behaves with and without pretty printing, cause I think I'll see the same percentage. What I'm worried about is 12% slowing against master version. Should we really stress it? Because I think if you really need to serialize a lot of XML and need it fast you'll gonna use lxml anyway. |
97783fb
to
1e9149c
Compare
@dzeban I think it's unfair to assume users will always use
|
I don't see how a rare and "mostly for debugging" feature like pretty printing could possibly justify such a big slow-down for a time critical feature like serialisation.
"Many" seems highly exaggerated.
A surprising argument, given that almost all of the documentation (and tutorials and books and ...) for
You shouldn't believe everything that random people write on the Internet. Why do you think there is a dedicated subclassing API in lxml? |
@scoder Valid counterpoints. Though, on your last point:
I was referencing an answer I wrote myself based on the documentation for
Upon further review, it appears |
While I'm in favor of adding an option for pretty printing, I'm not sure it's worth a 12% decrease in performance. Could the performance impact be lessened if the |
@Stevoisiak do you mean to remove |
@Stevoisiak This feels like a serious ungrateful bikeshedding to me. Could we first add this feature, which will finally remove for me the last and only reason why to use |
That's true for many use cases but not for all. ElementTree is actually very memory efficient and very fast on certain API operations. Building trees is something that can be done very fast in ElementTree but is not quite as fast in lxml. If the serialisation then slows it down so much afterwards, that is a problem. I would rather like to see the serialisation performance of ElementTree improved, definitely not further slowed down. |
The point I wanted to make is, that the question of performance of the ElementTree as found in the standard library is quite orthogonal to whether I would still have to pepper my code with silliness like: # from http://infix.se/2007/02/06/gentlemen-indent-your-xml
def __xmlIndent(self,elem,level=0):
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
for e in elem:
self.__xmlIndent(e, level+1)
if not e.tail or not e.tail.strip():
e.tail = i + " "
if not e.tail or not e.tail.strip():
e.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i Performance is the never ending story (https://bugs.python.org/issue14465#msg323690 promises more holy wars over |
Well, you can't just drop a performance regression on everyone just to get a feature implemented that many people won't use, and that is particularly uninteresting for performance relevant cases (where the amount of data is large and you wouldn't want to inflate it further). This is simply the wrong tradeoff.
What about providing a pull request that takes that code and either a) adds it as an |
Well, of course, the
It wouldn't solve all problems. I would really prefer having compatible API (of course, again, defaulting to |
Ah, but the performance regression of this PR was measured to be 12% when it the option is disabled and another 10% when enabling the option. I wouldn't mind the pretty printing case being slower, but I am arguing against making the normal case so much slower. |
Rejecting this PR since it slows down the normal (non pretty-printed) serialisation. |
This PR adds pretty printing to the xml.etree.ElementTree module, namely to the
tostring
,tostringlist
functions andwrite
method of the ElementTree object.Pretty printing feature saves backward compatibility in a sense that old output won't be changed.
https://bugs.python.org/issue14465