Browse files

write README

  • Loading branch information...
1 parent 1feed78 commit 2c7deeae2f5dd6eab2237f64d3f1188e8e9b7ede @mislav committed Oct 24, 2009
Showing with 38 additions and 18 deletions.
  1. +38 −0 README.md
  2. +0 −18 scraper.rb
View
38 README.md
@@ -0,0 +1,38 @@
+Scraper
+=======
+
+*Scraper* is a cute HTML screen-scraping tool.
+
+ require 'scraper'
+ require 'open-uri'
+
+ class BlogScraper < Scraper
+ element :title
+
+ elements 'div.hentry' => :articles do
+ element 'h2' => :title
+ element 'a/@href' => :url
+ end
+ end
+
+ blog = BlogScraper.parse open('http://example.com')
+
+ blog.title
+ #=> "My blog title"
+
+ blog.articles.first.title
+ #=> "First article title"
+
+ blog.articles.first.url
+ #=> "http://example.com/article"
+
+[See the wiki][wiki] for more on how to use *Scraper*.
+
+Requirements
+------------
+
+*None*. Well, [Nokogiri][] is a requirement if you pass in HTML content that needs to be parsed, like in the example above. Otherwise you can initialize the scaper with an Hpricot document or anything else that implements `at(selector)` and `search(selector)` methods.
+
+
+[wiki]: http://wiki.github.com/mislav/scraper
+[nokogiri]: http://nokogiri.rubyforge.org/nokogiri/
View
18 scraper.rb
@@ -1,22 +1,4 @@
## A minimalistic, declarative HTML scraper
-#
-# Example:
-#
-# class ArticleScraper < Scraper
-# element 'h1' => :title
-# element 'a[@href]/@href' => :link
-# end
-#
-# class BlogScraper < Scraper
-# element :title
-# elements 'div.hentry' => :articles, :with => ArticleScraper
-# end
-#
-# blog = BlogScraper.parse(html)
-#
-# blog.title # => "Some page title"
-# blog.articles.first.link # => "http://example.com"
-#
class Scraper
attr_reader :doc

0 comments on commit 2c7deea

Please sign in to comment.