-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In xml.etree.ElementTree findall() can't search all elements in a namespace #72425
Comments
In the example there are two namespaces in one document, but it is impossible to search all elements only in one namespace: >>> import xml.etree.ElementTree as etree
>>>
>>> s = '<feed xmlns="http://def" xmlns:x="http://x"><a/><x:b/></feed>'
>>>
>>> root = etree.fromstring(s)
>>>
>>> root.findall('*')
[<Element '{http://def}a' at 0xb73961bc>, <Element '{http://x}b' at 0xb7396c34>]
>>>
>>> root.findall('{http://def}*')
[]
>>> And same try with site package lxml works fine: >>> import lxml.etree as etree
>>>
>>> s = '<feed xmlns="http://def" xmlns:x="http://x"><a/><x:b/></feed>'
>>>
>>> root = etree.fromstring(s)
>>>
>>> root.findall('*')
[<Element {http://def}a at 0xb70ab11c>, <Element {http://x}b at 0xb70ab144>]
>>>
>>> root.findall('{http://def}*')
[<Element {http://def}a at 0xb70ab11c>]
>>> |
lxml has a couple of nice features here:
"{}" is also accepted but is the same as "*". Note that "*" is actually allowed as an XML tag name by the spec, but rare enough to hijack it for this purpose. I've actually never seen it used anywhere in the wild. lxml's implementation isn't applicable to ElementTree (searching has been subject to excessive optimisation), but it shouldn't be hard to extend the one in ET's ElementPath.py module, as well as Element.iter() in ElementTree.py, to support this kind of tag comparison. PR welcome. lxml's tests are here (and in the following test methods): Note that they actually test the deprecated .getiterator() method for historical reasons. They should probably call .iter() instead these days. lxml's ElementPath implementation is under src/lxml/_elementpath.py, but the tag comparison itself is done elsewhere in Cython code (here, in case it matters:) |
PR submitted, feedback welcome. |
BTW, I found that lxml and ET differ in their behaviour when searching for '*'. ET takes it as meaning "any tree node", whereas lxml interprets it as "any Element". Since ET's parser does not create comments and processing instructions by default, this does not make a difference in most cases, but when the tree contains comments or PIs, then they will be found by '*' in ET but not in lxml. At least for "{}", they now both return only Elements. Changing either behaviour for '*' is probably not a good idea at this point. |
Is this an issue with Python's |
This feature was added to the |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: