Skip to content

ElementTree.find doesn't use registered namespaces when .find is called, and the inconsistency between ElementTree and ElementPath in how namespaces are defined #140123

@BEKJ-wb

Description

@BEKJ-wb

example xml: xml/example.xml

<root>

<h:title xmlns:h="http://www.w3.org/TR/html4/">
  <h:tr>
    <h:td>Apples</h:td>
    <h:td>Bananas</h:td>
  </h:tr>
</h:title>

<f:table xmlns:f="http://www.w3.org/TR/html4/">
  <f:name>African Coffee Table</f:name>
  <f:width>80</f:width>
  <f:length>120</f:length>
</f:table>

</root>

minimal example

import xml.etree.ElementTree as ET

ns = {"h": "http://www.w3.org/TR/html4/", "f": "http://www.w3.org/TR/html4/"}
for prefix, uri in ns.items():
    ET.register_namespace(prefix, uri)

tree = ET.parse("xml/example.xml")

print(tree._root) # <Element 'root' at 0x000002F6570FD940>

print(tree.find(".//{http://www.w3.org/TR/html4/}title")) # <Element '{http://www.w3.org/TR/html4/}title' at 0x000002F6570FD9E0>

print(tree.find(".//h:title", namespaces=ns)) # <Element '{http://www.w3.org/TR/html4/}title' at 0x000002F6570FD9E0>

print(tree.find(".//f:table", namespaces=ns)) # <Element '{http://www.w3.org/TR/html4/}table' at 0x000002F6570FDC60>

print(tree.find(".//f:table")) # SyntaxError: prefix 'f' not found in prefix map

When using the ET.register_namespace function the docstring mentions that it is global. But then when .find is used on a ElementTree instance the namespaces isn't passed on to the ElementPath that is used to find the element.

during the dig as to why the global register wasn't registering i also found an inconsistency in the way the namespaces are define between the ElementTree and the ElementPath:

_namespace_map[uri] = prefix

At the end of the register_namespace function:
_namespace_map[uri] = prefix

https://github.com/python/cpython/blob/3490a99046078e4f9df7ac7570f62a0181bb3b89/Lib/xml/etree/ElementPath.py#L85C46-L85C64
At the end of the xpath_tokenizer function:
namespaces[prefix]

the ElementTree also doesn't allow for multiple prefixes to point to the same URI where the ElementPath class has no issue with it, which i assume is a choice to allow for the serlization of the xml?

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytopic-XMLtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions