HappyScraper makes it easy to write and use large quantities of nibbler based html scrapers.
It features a clean declarative DSL, the possibility to define scrapers in external files and a capability based approch when it comes to autoselecting the correct scraper for a given input.
External definition of a sample scraper (e.g. in
url "http://blog.example.org/" with "body.vlog" with "/html/head/meta[@property='og:type' and @content='blog']" element :title elements ".hentry" => :articles do element "h2" => :headline element "a/@href" => :url end
And here is a Ruby application which uses above scraper.
require 'happyscraper' require 'open-uri' # Load all scrapers from `scrapers` directory. happy = HappyScraper.load_scrapers( Dir.glob( 'scrapers/*' ) ) url = 'http://blog.example.org/' html = URI(url).read # Based on the 'url' and 'with' capabilities set in scraper definitions # the correct scraper is automatically selected for scraping. blog = happy.scrap( html, url ) blog.title # => 'blog title' blog.articles.last.headline # => 'headline of last article' blog.articles.last.url # => 'http://blog.example.org/entry/10'
Installation with bundler
Put this line in your Gemfile and
gem "happyscraper", :git => "git://github.com/pdg/happyscraper.git"
HappyScraper is build upon the excellent nibbler gem which is used for the heavy lifting.
Released under the MIT License. Copyright © 2012, Patrick Das Gupta