Skip to content

Xpath with namespace and position#353

Closed
knit-bee wants to merge 3 commits into
lxml:masterfrom
knit-bee:xpath-ns
Closed

Xpath with namespace and position#353
knit-bee wants to merge 3 commits into
lxml:masterfrom
knit-bee:xpath-ns

Conversation

@knit-bee
Copy link
Copy Markdown

I noticed that it is not possible to use elem.find or elem.findall with an xpath that contains position indices if the method is called with the namespaces argument.
This behavior has also been reported in Bug #1873886.

It appears that during the tokenization of the xpath, the numbers are treated as tags, i.e. they are concatenated with the default namespace (during function calls with namespaces). This results in a wrong path imo.
For example:

>>> from lxml import etree
>>> doc = etree.XML("""
      <foo xmlns="http://example.com/foo">
        <bar>baz</bar>
      </foo>""")
>>> path = "./bar[1]"
>>> doc.find(path, namespaces={None:"http://example.com/foo"})
None

The target element is not found here because the path that is used is effectively:
./{http://example.com/foo}bar[{http://example.com/foo}1]

Changes:

  • I added a check during the tokenization of the xpath to determine whether the processed tag is a number to avoid concatenation with the namespace.

Copy link
Copy Markdown
Member

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think we should reduce the amount of code that we add here, though.

Comment on lines +377 to +380
self.assertEqual(
summarize_list(elem.findall("tag", namespaces=namespaces)),
["{nsnone}tag", "{nsnone}tag"],
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how many examples you add here, this seems worth its own custom assert method: .assertFindallEqual(element, path, expected, namespaces=None).

Also, do we actually need to add these tests? ISTM that we could get away with running the existing tests three times, once without namespaces dict, once with an empty one, and once with a non-empty one.

Comment thread src/lxml/_elementpath.py Outdated
@knit-bee
Copy link
Copy Markdown
Author

Thank you for the review, I will try to include your suggestions next week.

Comment thread src/lxml/_elementpath.py Outdated
@scoder scoder closed this in e950468 May 13, 2025
@scoder
Copy link
Copy Markdown
Member

scoder commented May 13, 2025

Thanks for this PR, I have integrated it in e950468 but implemented the tests differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants