Skip to content

Commit

Permalink
Fix http://github.com/fizx/robots/issues#issue/1; Tests don't rely on…
Browse files Browse the repository at this point in the history
… network.
  • Loading branch information
fizx committed May 29, 2010
1 parent 16636bc commit ee76d0e
Show file tree
Hide file tree
Showing 9 changed files with 795 additions and 24 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
0.9.0
- Fix http://github.com/fizx/robots/issues#issue/1
- Tests don't rely on network.
0.8.0
- Add multiple values from robots.txt (via joost)
0.7.3
Expand Down
21 changes: 12 additions & 9 deletions lib/robots.rb
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,7 @@ class ParsedRobots
def initialize(uri, user_agent)
@last_accessed = Time.at(1)

io = nil
begin
Timeout::timeout(Robots.timeout) do
io = URI.join(uri.to_s, "/robots.txt").open("User-Agent" => user_agent) rescue nil
end
rescue Timeout::Error
STDERR.puts "robots.txt request timed out"
end

io = Robots.get_robots_txt(uri)

if !io || io.content_type != "text/plain" || io.status != ["200", "OK"]
io = StringIO.new("User-agent: *\nAllow: /\n")
Expand Down Expand Up @@ -99,12 +91,23 @@ def other_values
protected

def to_regex(pattern)
return /should-not-match-anything-123456789/ if pattern.strip.empty?
pattern = Regexp.escape(pattern)
pattern.gsub!(Regexp.escape("*"), ".*")
Regexp.compile("^#{pattern}")
end
end

def self.get_robots_txt(uri)
begin
Timeout::timeout(Robots.timeout) do
io = URI.join(uri.to_s, "/robots.txt").open("User-Agent" => user_agent) rescue nil
end
rescue Timeout::Error
STDERR.puts "robots.txt request timed out"
end
end

def self.timeout=(t)
@timeout = t
end
Expand Down
2 changes: 2 additions & 0 deletions test/fixtures/emptyish.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
User-agent: *
Disallow:
Loading

0 comments on commit ee76d0e

Please sign in to comment.