Permalink
Browse files

renamed "Scraper" → "Nibbler"

  • Loading branch information...
mislav committed Aug 15, 2010
1 parent 9dad30b commit ae8abd5eb517d5d6a01534822fc39b2076dbf1ea
Showing with 21 additions and 21 deletions.
  1. +8 −8 README.md
  2. +2 −2 Rakefile
  3. +3 −3 examples/delicious.rb
  4. +3 −3 examples/tweetburner.rb
  5. +1 −1 examples/twitter.rb
  6. +4 −4 scraper.rb → lib/nibbler.rb
View
@@ -1,12 +1,12 @@
-Scraper
+Nibbler
=======
-*Scraper* is a cute HTML screen-scraping tool.
+*Nibbler* is a cute HTML screen-scraping tool.
- require 'scraper'
+ require 'nibbler'
require 'open-uri'
- class BlogScraper < Scraper
+ class BlogScraper < Nibbler
element :title
elements 'div.hentry' => :articles do
@@ -28,15 +28,15 @@ Scraper
There are sample scripts in the "examples/" directory; run them with:
- ruby -rubygems examples/<script>.rb
+ ruby -Ilib -rubygems examples/<script>.rb
-[See the wiki][wiki] for more on how to use *Scraper*.
+[See the wiki][wiki] for more on how to use *Nibbler*.
Requirements
------------
-*None*. Well, [Nokogiri][] is a requirement if you pass in HTML content that needs to be parsed, like in the example above. Otherwise you can initialize the scaper with an Hpricot document or anything else that implements `at(selector)` and `search(selector)` methods.
+*None*. Well, [Nokogiri][] is a requirement if you pass in HTML content that needs to be parsed, like in the example above. Otherwise you can initialize the scraper with an Hpricot document or anything else that implements `at(selector)` and `search(selector)` methods.
-[wiki]: http://wiki.github.com/mislav/scraper
+[wiki]: http://wiki.github.com/mislav/nibbler
[nokogiri]: http://nokogiri.rubyforge.org/nokogiri/
View
@@ -2,12 +2,12 @@ task :default => :spec
desc %(Run specs)
task :spec do
- exec %(ruby -rubygems scraper.rb --color)
+ exec %(ruby -Ilib -rubygems lib/nibbler.rb --color)
end
desc %(Count lines of code in implementation)
task :loc do
- File.open('scraper.rb') do |file|
+ File.open('lib/nibbler.rb') do |file|
loc, counting = 1, false
file.each_line do |line|
View
@@ -3,12 +3,12 @@
# Let's pretend that delicious.com doesn't have an API.
# This is a demonstration of the most common use-case.
-require 'scraper'
+require 'nibbler'
require 'open-uri'
require 'date'
# extracts data from a single bookmark
-class Bookmark < Scraper
+class Bookmark < Nibbler
element 'h4 a' => :title
element '.description' => :description
@@ -25,7 +25,7 @@ class Bookmark < Scraper
end
# finds all bookmarks on the page
-class Delicious < Scraper
+class Delicious < Nibbler
elements '#bookmarklist div.bookmark' => :bookmarks, :with => Bookmark
end
View
@@ -3,7 +3,7 @@
# I needed to dump my Tweetburner archive to CSV
# http://tweetburner.com/users/mislav/archive
-require 'scraper'
+require 'nibbler'
require 'uri'
require 'open-uri'
require 'date'
@@ -13,7 +13,7 @@
module Tweetburner
SITE = URI('http://tweetburner.com')
- class Scraper < ::Scraper
+ class Scraper < ::Nibbler
# add our behavior to convert_document; open web pages with UTF-8 encoding
def self.convert_document(url)
URI === url ? Nokogiri::HTML::Document.parse(open(url), url.to_s, 'UTF-8') : url
@@ -24,7 +24,7 @@ def self.convert_document(url)
end
# a single link (table row one the archive page)
- class Link < ::Scraper
+ class Link < ::Nibbler
element './/a[starts-with(@href, "/links/")]/@href' => :stats_url, :with => lambda { |href|
SITE + href.text
}
View
@@ -1,7 +1,7 @@
## JSON data extraction example
#
# This is an example how we're not limited to Nokogiri and HTML screen-scraping.
-# Here we use Scraper to extract tweets from a Twitter API JSON response.
+# Here we use Nibbler to extract tweets from a Twitter API JSON response.
#
# Requirements: a JSON library (tested with "json" gem)
@@ -1,6 +1,6 @@
## A minimalistic, declarative HTML scraper
-class Scraper
+class Nibbler
attr_reader :doc
# Accepts string, open file, or Nokogiri-like document
@@ -78,7 +78,7 @@ def self.parse_rule_declaration(*args, &block)
selector, property = name ? [name.to_s, name.to_sym] : options.to_a.flatten
raise ArgumentError, "invalid rule declaration: #{args.inspect}" unless property
# eval block in context of a new scraper subclass
- delegate = Class.new(delegate || Scraper, &block) if block_given?
+ delegate = Class.new(delegate || Nibbler, &block) if block_given?
return selector, property, delegate
end
@@ -105,7 +105,7 @@ def self.convert_document(doc)
require 'spec/autorun'
HTML = DATA.read
- class Article < Scraper
+ class Article < Nibbler
element 'h1' => :title
element 'a/@href' => :link
end
@@ -124,7 +124,7 @@ class SpecialArticle < Article
element 'span'
end
- class BlogScraper < Scraper
+ class BlogScraper < Nibbler
element :title
elements '#nav li' => :navigation_items
end

0 comments on commit ae8abd5

Please sign in to comment.