Skip to content
This repository has been archived by the owner on Apr 7, 2023. It is now read-only.

Commit

Permalink
Tempfile, StringIO are also considered content and passed to Nokogiri
Browse files Browse the repository at this point in the history
Now it's possible to pass the result of open-uri request directly:
  MyScraper.parse(open('http://example.com'))
  • Loading branch information
mislav committed Oct 24, 2009
1 parent 579b3a5 commit df42508
Showing 1 changed file with 18 additions and 13 deletions.
31 changes: 18 additions & 13 deletions scraper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,10 @@
class Scraper
attr_reader :doc

# Accepts string, open file, or Nokogiri document instance
# Accepts string, open file, or Nokogiri-like document
def initialize(doc)
@doc = case doc
when String, IO
require 'nokogiri' unless defined? ::Nokogiri
Nokogiri::HTML(doc)
else
doc
end

# initialize plural accessor values
self.class.rules.each do |name, (s, k, plural)|
send("#{name}=", []) if plural
end
@doc = self.class.convert_document(doc)
initialize_plural_accessors
end

# Initialize a new scraper and process data
Expand Down Expand Up @@ -109,6 +99,21 @@ def self.rules
def self.inherited(subclass)
subclass.rules.update self.rules
end

def initialize_plural_accessors
self.class.rules.each do |name, (s, k, plural)|
send("#{name}=", []) if plural
end
end

def self.convert_document(doc)
if String === doc or IO === doc or %w[Tempfile StringIO].include? doc.class.name
require 'nokogiri' unless defined? ::Nokogiri
Nokogiri::HTML(doc)
else
doc
end
end
end


Expand Down

0 comments on commit df42508

Please sign in to comment.