You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This proposal improves the performance of writing xml whose trees are made up of tag names that are predominantly strings. This comes at the cost of performance for trees with tags that are predominantly QNames
As far as I'm aware, using a str for the tag name is more common than using a QName and we should optimise for that scenario (for example, parsing an xml document with ElementTree returns Elements whose tags are all strings).
Reordering the following if block to make the isinstance(tag, str) check first gives a performance improvement of 1 - 1.5% on a tree parsed from a file that was about 300kb:
--- a/Lib/xml/etree/ElementTree.py
+++ b/Lib/xml/etree/ElementTree.py
@@ -827,12 +827,12 @@ def add_qname(qname):
# populate qname and namespaces table
for elem in elem.iter():
tag = elem.tag
- if isinstance(tag, QName):
- if tag.text not in qnames:
- add_qname(tag.text)
- elif isinstance(tag, str):
+ if isinstance(tag, str):
if tag not in qnames:
add_qname(tag)
+ elif isinstance(tag, QName):
+ if tag.text not in qnames:
+ add_qname(tag.text)
elif tag is not None and tag is not Comment and tag is not PI:
_raise_serialization_error(tag)
for key, value in elem.items():
As this enhancement is within a loop that traverses the entire xml document, the larger the xml tree, the greater the performance improvement as the tree traversal starts to account for more time than other setup code.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
The text was updated successfully, but these errors were encountered:
Feature or enhancement
Proposal:
This proposal improves the performance of writing xml whose trees are made up of tag names that are predominantly strings. This comes at the cost of performance for trees with tags that are predominantly
QName
sAs far as I'm aware, using a
str
for the tag name is more common than using aQName
and we should optimise for that scenario (for example, parsing an xml document withElementTree
returnsElement
s whose tags are all strings).Reordering the following
if
block to make theisinstance(tag, str)
check first gives a performance improvement of 1 - 1.5% on a tree parsed from a file that was about 300kb:As this enhancement is within a loop that traverses the entire xml document, the larger the xml tree, the greater the performance improvement as the tree traversal starts to account for more time than other setup code.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
The text was updated successfully, but these errors were encountered: