Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight ElementTree serialization performance enhancement for trees with str tags #118687

Open
danifus opened this issue May 7, 2024 · 0 comments
Labels
type-feature A feature request or enhancement

Comments

@danifus
Copy link
Contributor

danifus commented May 7, 2024

Feature or enhancement

Proposal:

This proposal improves the performance of writing xml whose trees are made up of tag names that are predominantly strings. This comes at the cost of performance for trees with tags that are predominantly QNames

As far as I'm aware, using a str for the tag name is more common than using a QName and we should optimise for that scenario (for example, parsing an xml document with ElementTree returns Elements whose tags are all strings).

Reordering the following if block to make the isinstance(tag, str) check first gives a performance improvement of 1 - 1.5% on a tree parsed from a file that was about 300kb:

--- a/Lib/xml/etree/ElementTree.py
+++ b/Lib/xml/etree/ElementTree.py
@@ -827,12 +827,12 @@ def add_qname(qname):
     # populate qname and namespaces table
     for elem in elem.iter():
         tag = elem.tag
-        if isinstance(tag, QName):
-            if tag.text not in qnames:
-                add_qname(tag.text)
-        elif isinstance(tag, str):
+        if isinstance(tag, str):
             if tag not in qnames:
                 add_qname(tag)
+        elif isinstance(tag, QName):
+            if tag.text not in qnames:
+                add_qname(tag.text)
         elif tag is not None and tag is not Comment and tag is not PI:
             _raise_serialization_error(tag)
         for key, value in elem.items():

As this enhancement is within a loop that traverses the entire xml document, the larger the xml tree, the greater the performance improvement as the tree traversal starts to account for more time than other setup code.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

@danifus danifus added the type-feature A feature request or enhancement label May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant