Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

write README

  • Loading branch information...
commit 2c7deeae2f5dd6eab2237f64d3f1188e8e9b7ede 1 parent 1feed78
@mislav authored
Showing with 38 additions and 18 deletions.
  1. +38 −0 README.md
  2. +0 −18 scraper.rb
View
38 README.md
@@ -0,0 +1,38 @@
+Scraper
+=======
+
+*Scraper* is a cute HTML screen-scraping tool.
+
+ require 'scraper'
+ require 'open-uri'
+
+ class BlogScraper < Scraper
+ element :title
+
+ elements 'div.hentry' => :articles do
+ element 'h2' => :title
+ element 'a/@href' => :url
+ end
+ end
+
+ blog = BlogScraper.parse open('http://example.com')
+
+ blog.title
+ #=> "My blog title"
+
+ blog.articles.first.title
+ #=> "First article title"
+
+ blog.articles.first.url
+ #=> "http://example.com/article"
+
+[See the wiki][wiki] for more on how to use *Scraper*.
+
+Requirements
+------------
+
+*None*. Well, [Nokogiri][] is a requirement if you pass in HTML content that needs to be parsed, like in the example above. Otherwise you can initialize the scaper with an Hpricot document or anything else that implements `at(selector)` and `search(selector)` methods.
+
+
+[wiki]: http://wiki.github.com/mislav/scraper
+[nokogiri]: http://nokogiri.rubyforge.org/nokogiri/
View
18 scraper.rb
@@ -1,22 +1,4 @@
## A minimalistic, declarative HTML scraper
-#
-# Example:
-#
-# class ArticleScraper < Scraper
-# element 'h1' => :title
-# element 'a[@href]/@href' => :link
-# end
-#
-# class BlogScraper < Scraper
-# element :title
-# elements 'div.hentry' => :articles, :with => ArticleScraper
-# end
-#
-# blog = BlogScraper.parse(html)
-#
-# blog.title # => "Some page title"
-# blog.articles.first.link # => "http://example.com"
-#
class Scraper
attr_reader :doc
Please sign in to comment.
Something went wrong with that request. Please try again.