Magellan is a web testing tool that embraces the discoverable nature of the web.
$ [sudo] gem install magellan
In your Rakefile add:
require 'magellan/rake/broken_link_task' Magellan::Rake::BrokenLinkTask.new("digg") do |t| t.origin_url = "http://digg.com/" t.explore_depth = 3 end
This will crawl any links within the same domain as the origin_url to a depth of 3. Treating the origin_url as a depth of 1 that means we will crawl all links that are linked within 2 pages of digg.com.
The second rake task is one that will explore your site and ensure that given links exist.
require 'magellan/rake/expected_links_task' Magellan::Rake::ExpectedLinksTask.new("gap") do |t| t.origin_url = "http://www.gap.com/" t.explore_depth = 2 t.patterns_and_expected_links = [[/.*/,'http://www.oldnavy.com'],[/http:\/\/[^\/]*\/\z/,'/browse/division.do?cid=5643']] end
The pattern and expected links is a array of tuples of regex, string. If the current url matches the regex the task will look for the associated url string in the document. This task by default only crawls //a's. If you are having trouble with regex's in ruby I highly suggest you utilize rubular.com/
An array of urls to not crawl: Example:
t.ignored_urls = ["http://www.google.com/foo.html"]
You can override what tags and attributes are crawled by setting the links_to_explore property: Example:
t.links_to_explore = [['a','href'],['script','src']]
This will set the crawler to explore all a href's and script src's.
If specified the rake task will create a log that you can tail of all the failures found while the task is running. Example:
t.failure_log = "log.txt"
* ruby 1.8.6 * mechanize[http://mechanize.rubyforge.org/] * activesupport[http://as.rubyonrails.org/]
General help forum is located at:
nolane at gmail dot com