A web scraper using Webrat::MechanizeSession - does acceptance-based web scraping relying on the great webrat behavior description language.
Ruby
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
spec
.document
.gitignore
LICENSE
README.rdoc
Rakefile
VERSION
webrat-scraper.gemspec

README.rdoc

webrat-scraper

A web scraper built on Webrat::Mechanize that traverses the web and allows access to the webrat session through Mechanize and Nokogiri objects

How to use

Install

gem sources -a http://gems.github.com
gem install jtzemp-webrat-scraper

Use

require 'webrat_scraper'

class MyScraper < WebratScraper
  def initialize
    @url = "http://www.google.com"
    super
  end

  def first_result_for(search_term)
    visit @url

    fill_in "q", :with => search_term
    click_button

    first_link = (doc/"li.g a.l").first

    link = {}
    link[:text] = first_link.inner_text
    link[:url]  = first_link.attributes["href"].to_s
    link
  end
end

m = MyScraper.new
result = m.first_result_for("webrat-mechanize")
puts result.inspect

For more info check out documentation for:

  • Webrat

  • Mechanize

  • Nokogiri

    • CSS Selectors

    • XPath Selectors

Note on Patches/Pull Requests

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add specs for it. This is important so I don't break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)

  • Send me a pull request. Bonus points for topic branches.

Copyright

Copyright © 2009 JT Zemp. See LICENSE for details.