Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

a non-threaded spider bot that spiders a site with response time stats. easily extendable.

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 bin
Octocat-spinner-32 lib
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README.rdoc
Octocat-spinner-32 Rakefile
Octocat-spinner-32 spider_bot.gemspec
README.rdoc

Spider bot



a non-threaded spider bot that spiders a site with response time stats. easily extendable

Installation

sudo gem install ssoroka-spider_bot

Usage



Usable in code or from terminal:

$ spider_bot http://www.example.com
$ spider_bot http://0.0.0.0:3000

Code example of a custom spider script in script/spider:

#!/usr/bin/env ruby
require 'rubygems'
require 'spider_bot'

class MySpider < SpiderBot
  # override these for handling events
  def on_page(page)
  end

  def on_404(link)
  end

  def on_500(link)
  end

  # override these for changing how urls are classified as links
  def off_site?(url)
    url !~ /^\// # urls not starting with a /
  end

  def ignorable?(url)
    url =~ /\/.*\..+/ && # files with extensions
      url !~ /\.html$/ # but not html files
  end
end

spider = MySpider.new(:quiet => false)
spider.start(ARGV[1])

Gem Requirements

  • ssoroka-ansi

  • mechanize

Something went wrong with that request. Please try again.