Skip to content
Fail-fast web scraping
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib
test
.gitignore
.travis.yml
CHANGELOG.md
Gemfile
LICENSE.txt
README.md
Rakefile
grubby.gemspec

README.md

grubby

Fail-fast web scraping. grubby adds a layer of utility and error-checking atop the marvelous Mechanize gem. See API summary below, or browse the full documentation.

Examples

The following example scrapes the Hacker News front page:

require "grubby"

class HackerNews < Grubby::PageScraper

  scrapes(:items) do
    page.search!(".athing").map{|item| HackerNewsItem.new(item) }
  end

end

class HackerNewsItem < Grubby::Scraper

  scrapes(:title) { @row1.at!(".storylink").text }
  scrapes(:submitter) { @row2.at!(".hnuser").text }
  scrapes(:story_uri) { URI.join(@base_uri, @row1.at!(".storylink")["href"]) }
  scrapes(:comments_uri) { URI.join(@base_uri, @row2.at!(".age a")["href"]) }

  def initialize(source)
    @row1 = source
    @row2 = source.next_sibling
    @base_uri = source.document.url
    super
  end

end

grubby = Grubby.new

# The following line will raise an exception if anything goes wrong
# during the scraping process.  For example, if the structure of the
# HTML does not match expectations, either due to a bad assumption or
# due to a site-wide change, the script will terminate immediately with
# a relevant error message.  This prevents bad values from propogating
# and causing hard-to-trace errors.
hn = HackerNews.new(grubby.get("https://news.ycombinator.com/news"))

puts hn.items.take(10).map(&:title) # your scraping logic goes here

Core API

Supplemental API

grubby includes several gems which extend Ruby objects with convenience methods. When you load grubby you automatically make these methods available. The included gems are listed below, along with a few of the methods each provides. See each gem's documentation for a complete API listing.

Installation

Install from Ruby Gems:

$ gem install grubby

Then require in your Ruby script:

require "grubby"

Contributing

Run rake test to run the tests. You can also run rake irb for an interactive prompt that pre-loads the project code.

License

MIT License

You can’t perform that action at this time.