Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Added support for XPath 1.0 #80

Closed
wants to merge 399 commits into
from

Conversation

Projects
None yet

btd commented Mar 18, 2011

From test you can see usage.

jhy and others added some commits Feb 5, 2010

Added val() and val(string) to Element and Elements.
Treat contents of textarea as text, not data.

Closes #14
Added :has(selector) pseudo-selector.
Added Element#parents() and Elements#parents() methods.

Fixes #20
Improved implicit close tag heuristic detection when parsing malforme…
…d HTML.


Fixes an issue where appending / prepending rows to a table (or  to similar implicit
element structures) would create a redundant wrapping elements.

Fixes #21

kzn and others added some commits Jan 14, 2011

Evaluator.match(Element test) ->
Evaluator.match(Element root, Element test)
change
added RootSelector
updated tree selectors wrt subtree matching
Added javadocs for Evaluators.
Updated tests.
Updated parser
Merge remote branch 'upstream/master'
Conflicts:
	src/main/java/org/jsoup/parser/TokenQueue.java
Reverted changes that only allow empty tags in pre-defined instances.
Markup like <tag /> needs to be parsed as an empty element.
Removed com.sun.xml.internal.ws.util.StringUtils to fix https://githu…
…b.com/jhy/jsoup/issues/#issue/69

"jsoup/src/main/java/org/jsoup/select/selectors/AndSelector.java:[8,35] package com.sun.xml.internal.ws.util does not exist"
Ensure that Jsoup.Connect handles relative redirects in cases where the
underlying HTTP stack doesn't automatically follow them.

Fixes #73
Updated Jsoup.Connection so that cookies set on a redirect response w…
…ill be included on the redirected request and response.
Moved .wrap, .before, and .after from Element to Node for flexibility…
…. Overriding implementations in Element still return Element.
Added ability to change an element's tag with Element.tagName(String)…
…, and to change many at once with Elements.tagName(String).
Fixed issue with selector parser where some boolean AND + OR combined…
… queries (e.g. "meta[http-equiv], meta[content]") were being parsed incorrectly as OR only queries (e.g. former as "meta, [http-equiv], meta[content]")

Fixed issue where a content-tye specified in a meta tag may not be reliably detected, due to the above issue.

This should also support single quotes: eg Content-Type:text/html; charset='utf-8'

Site that has this: http://www.roundwoodpark.herts.sch.uk/

riczhao commented Oct 23, 2013

No one working on xpath support?

The performance of CSS locator (selector) is significantly faster than Xpath.
It would also be easy to rewrite Xpath into CSS locator scripts. :)

For your reference:
http://sauceio.com/index.php/2011/05/why-css-locators-are-the-way-to-go-vs-xpath/

In my test with HTMLUnit (using XPath) and Jsoup (using CSS locator), the result also shows the same!

[Testing Log]

Parsing E:/profiling.html using Xpath ...
Loading doc by HTMLUnit ...
Time spent: 8122.613623 milliseconds.

Searching doc by HTMLUnit ...
XPath: //div[@Class="alonesort"]/div[@Class="mc"]/dl[@Class="fore"]/dd/em/span/a
Matched Element Count: 1260
Time spent: 35.006174 milliseconds.

Parsing E:/profiling.html using CSS Selector ...
Loading doc by Jsoup ...
Time spent: 151.277634 milliseconds.

Searching doc by Jsoup ...
CSS Locator: div.alonesort > div.mc > dl.fore > dd > em > span > a
Matched Element Count: 1260
Time spent: 13.975146 milliseconds.

I think Jsoup is already good enough by supporting CSS locator!
The rule is also shorter and simpler in contrast to Xpath.

riczhao commented Oct 24, 2013

Thanks. My requirement is simple. Getting xpath or css selector in browser,
past it in code, get the content. Now using chrome, I can get xpath in dev
tools window easily.
When I try to translate the xpath to css selector, one problem is I don't
know how to handle tag[1], which mean the first tag with name 'tag'. :eq()
means the nth child.

Hi riczhao,

The equivalent expression of "tag[1]" in CSS Selector is "tag:nth-child(1)".
You can check the following links:
http://jsoup.org/apidocs/org/jsoup/select/Selector.html
http://stackoverflow.com/questions/16914980/parsing-htmlnot-well-formed-with-jsoup

slorber commented Jun 11, 2014

Hello,

Please document that JSoup does not currently support XPath, as it is not really clear while it should be at first glance.

@jhy jhy closed this Aug 2, 2015

ik-j commented Mar 22, 2017

I can get cssSelector using jSoup but am not able to get xpath:idrelative, xpath:attributes, xpath:location.
Though I am able to get xpath fixed with the code below:

StringBuilder absPath=new StringBuilder();
Elements parents = e.parents();

                for (int j = parents.size()-1; j >= 0; j--) {
                    Element element = parents.get(j);
                    absPath.append("/");
                    absPath.append(element.tagName());
                    absPath.append("[");
                    absPath.append(element.siblingIndex());
                    absPath.append("]");
                }

Help would be appreciated :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment