a non-threaded spider bot that spiders a site with response time stats. easily extendable.
Ruby
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
lib
.gitignore
README.rdoc
Rakefile
spider_bot.gemspec

README.rdoc

Spider bot



a non-threaded spider bot that spiders a site with response time stats. easily extendable

Installation

sudo gem install ssoroka-spider_bot

Usage



Usable in code or from terminal:

$ spider_bot http://www.example.com
$ spider_bot http://0.0.0.0:3000

Code example of a custom spider script in script/spider:

#!/usr/bin/env ruby
require 'rubygems'
require 'spider_bot'

class MySpider < SpiderBot
  # override these for handling events
  def on_page(page)
  end

  def on_404(link)
  end

  def on_500(link)
  end

  # override these for changing how urls are classified as links
  def off_site?(url)
    url !~ /^\// # urls not starting with a /
  end

  def ignorable?(url)
    url =~ /\/.*\..+/ && # files with extensions
      url !~ /\.html$/ # but not html files
  end
end

spider = MySpider.new(:quiet => false)
spider.start(ARGV[1])

Gem Requirements

  • ssoroka-ansi

  • mechanize