Slight ElementTree serialization performance enhancement for trees with str tags #118687

danifus · 2024-05-07T06:52:44Z

Feature or enhancement

Proposal:

This proposal improves the performance of writing xml whose trees are made up of tag names that are predominantly strings. This comes at the cost of performance for trees with tags that are predominantly QNames

As far as I'm aware, using a str for the tag name is more common than using a QName and we should optimise for that scenario (for example, parsing an xml document with ElementTree returns Elements whose tags are all strings).

Reordering the following if block to make the isinstance(tag, str) check first gives a performance improvement of 1 - 1.5% on a tree parsed from a file that was about 300kb:

--- a/Lib/xml/etree/ElementTree.py
+++ b/Lib/xml/etree/ElementTree.py
@@ -827,12 +827,12 @@ def add_qname(qname):
     # populate qname and namespaces table
     for elem in elem.iter():
         tag = elem.tag
-        if isinstance(tag, QName):
-            if tag.text not in qnames:
-                add_qname(tag.text)
-        elif isinstance(tag, str):
+        if isinstance(tag, str):
             if tag not in qnames:
                 add_qname(tag)
+        elif isinstance(tag, QName):
+            if tag.text not in qnames:
+                add_qname(tag.text)
         elif tag is not None and tag is not Comment and tag is not PI:
             _raise_serialization_error(tag)
         for key, value in elem.items():

As this enhancement is within a loop that traverses the entire xml document, the larger the xml tree, the greater the performance improvement as the tree traversal starts to account for more time than other setup code.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

The text was updated successfully, but these errors were encountered:

danifus added the type-feature A feature request or enhancement label May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slight ElementTree serialization performance enhancement for trees with str tags #118687

Slight ElementTree serialization performance enhancement for trees with str tags #118687

danifus commented May 7, 2024 •

edited by github-actions bot

Slight ElementTree serialization performance enhancement for trees with str tags #118687

Slight ElementTree serialization performance enhancement for trees with str tags #118687

Comments

danifus commented May 7, 2024 • edited by github-actions bot

Feature or enhancement

Proposal:

Has this already been discussed elsewhere?

Links to previous discussion of this feature:

danifus commented May 7, 2024 •

edited by github-actions bot