Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xpath doesn't support ancestor-or-self on child elements #1652

Loceka opened this issue Oct 14, 2021 · 2 comments · Fixed by sakibguy/jsoup#26

Xpath doesn't support ancestor-or-self on child elements #1652

Loceka opened this issue Oct 14, 2021 · 2 comments · Fixed by sakibguy/jsoup#26


Copy link

Loceka commented Oct 14, 2021


I was trying a really simple XPath expression to retrieve all the ancestors of a specified element but it returned an empty Elements object :
Elements ancestors = element.selectXpath("ancestor-or-self::*");

Debugging it, I saw that in the org.jsoup.nodes.NodeUtils.selectXpath(String, Element, Class) method, you did that :

org.w3c.dom.Document wDoc = w3c.fromJsoup(el);
NodeList nodeList = w3c.selectXpath(xpath, wDoc);

The XPath expression is always executed at document level, which pretty much defies the purpose... (or at least isn't at all intuitive when applying it to a specific element)

If there is a way to apply relative XPath expressions, it would be really welcome.

Copy link

jhy commented Oct 16, 2021

The Xpath is evaluated on the specific element - you can see an example in this test:

@Test public void supportsXpathFromElement() {
String html = "<body><div><p>One</div><div><p>Two</div><div>Three</div>";
Document doc = Jsoup.parse(html);
Element div = doc.selectFirst("div");
Elements els = div.selectXpath("/div/p");
assertEquals(1, els.size());
assertEquals("One", els.get(0).text());
assertEquals("p", els.get(0).tagName());
assertEquals(0, div.selectXpath("//body").size());
assertEquals(1, doc.selectXpath("//body").size());

That works as the w3c document is constructed from the scoped input element. That was implemented as an optimization vs constructing the w3c doc around the entire jsoup doc as I had anticipated that queries would generally select "down" (the current or lower elements) and not up - so by constructing just on the portion of the tree in use, it would be more efficient.

So, the issue you are seeing with the ancestor-or-self axis not working is due to that optimization - the parent structure is not present, so there are no ancestors.

I think it would make sense to change the implementation to construct around the entire document, and then directly scope the query, if the xpath evaluator will support that. That would enable this use case and other direct axes like preceding-sibling.

@jhy jhy changed the title Xpath doesn't evaluate on the selected node Xpath doesn't support ancestor-or-self on child elements Oct 16, 2021
@jhy jhy closed this as completed in 1e4d127 Oct 19, 2021
@jhy jhy self-assigned this Oct 19, 2021
@jhy jhy added this to the 1.15.1 milestone Oct 19, 2021
Copy link

jhy commented Oct 19, 2021

Thanks, fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

Successfully merging a pull request may close this issue.

2 participants