Permalink
Browse files

Tempfile, StringIO are also considered content and passed to Nokogiri

Now it's possible to pass the result of open-uri request directly:
  MyScraper.parse(open('http://example.com'))
  • Loading branch information...
1 parent 579b3a5 commit df425086cc0c75669e1b8a76fae3c83696550800 @mislav committed Oct 24, 2009
Showing with 18 additions and 13 deletions.
  1. +18 −13 scraper.rb
View
@@ -21,20 +21,10 @@
class Scraper
attr_reader :doc
- # Accepts string, open file, or Nokogiri document instance
+ # Accepts string, open file, or Nokogiri-like document
def initialize(doc)
- @doc = case doc
- when String, IO
- require 'nokogiri' unless defined? ::Nokogiri
- Nokogiri::HTML(doc)
- else
- doc
- end
-
- # initialize plural accessor values
- self.class.rules.each do |name, (s, k, plural)|
- send("#{name}=", []) if plural
- end
+ @doc = self.class.convert_document(doc)
+ initialize_plural_accessors
end
# Initialize a new scraper and process data
@@ -109,6 +99,21 @@ def self.rules
def self.inherited(subclass)
subclass.rules.update self.rules
end
+
+ def initialize_plural_accessors
+ self.class.rules.each do |name, (s, k, plural)|
+ send("#{name}=", []) if plural
+ end
+ end
+
+ def self.convert_document(doc)
+ if String === doc or IO === doc or %w[Tempfile StringIO].include? doc.class.name
+ require 'nokogiri' unless defined? ::Nokogiri
+ Nokogiri::HTML(doc)
+ else
+ doc
+ end
+ end
end

0 comments on commit df42508

Please sign in to comment.