Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find by xpath #18

Closed
abacha opened this issue Aug 12, 2013 · 5 comments
Closed

find by xpath #18

abacha opened this issue Aug 12, 2013 · 5 comments

Comments

@abacha
Copy link

abacha commented Aug 12, 2013

is it possible to do something like that:

page = Upton::Scraper.new(url)
page.find_by_xpath("//body/div/a").value
@jeremybmerrill
Copy link
Contributor

Hi @abacha,

Yes, Upton supports searching by XPath.

If you had an index page ( = a page with links you want to scrape), you could do something like this:

scraper = Upton::Scraper.new(url, "//body/div/a")
scraper.scrape do | instance_html, instance_url, instance_index|
   puts "The title of the page at #{instance_url} is #{Nokogiri::HTML(instance_html).title}"
end

Thanks to #11, you can use XPath or CSS selectors interchangeably.

@abacha
Copy link
Author

abacha commented Aug 12, 2013

I wish I could do it in a simple way like I've demonstrated.. I need to do lots of searches through different xpath's in the same url

@jeremybmerrill
Copy link
Contributor

Is the value of the content specified by the XPath expression another link to be scraped? Or just data you want to access?

And do you have lots of pages, or just one page to be scraped?

@jeremybmerrill
Copy link
Contributor

If you just want to scrape lots of data from one page, just use Nokogiri. (Upton uses Nokogiri for HTML parsing.)

Nokogiri(Net::HTTP.get(URI(url)).xpath("//body/div/a").text

@jeremybmerrill
Copy link
Contributor

Were you able to find a solution, @abacha?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants